I was wondering myself: When a binary calls a function inside a shared lib, how the linker knows where the code resides in the library?
The second question was: can we change the name of functions in a binary and in a library and that everything works after?
And third: can we use some fancy characters in function names? Like changing color of the xterm when doing an objdump or gdb the binary? You know ANSI escape codes? What if we put ANSI escape code in function name?
1/ Start smoothlyMy computer is currently a raspberry pi. Everything here has been tested under this architecture. It should work everywhere, but your mileage may vary.
Let's take an example:
We can compile and run this binary:
The dynamic linker search for the lib in the current path (which is not really secure, but out of the scope of this blogpost).
This binary runs fine, as expected.
2/ Symbol resolutionThe question is, how does the binary knows where to look for the this_is_an_external_func_in_a_lib() call? It's obviously related to string comparison:
Well, if we have the string this_is_an_external_func_in_a_lib in the binary and the library, maybe because they are associated?
Proof: if you alter one of these strings, the program doesn't work anymore:
The same happens if you change the function in the library:
Seems logic. it search a function by its name.
But wait, what if we change names in BOTH? Would it work? Binary calls for AAAA_is_an_external_func_in_a_lib(), linker will step through all library linked, find libpoem.so, open it, read functions names, fint it and call it. Does it works?
Still a fail, although we have the same name in library and binary:
3/ Read The Freaky Manual (If it exists...)When you search something, you can read the manual. But in that case, it won't help because there is no manual.
When you google for symbol resolution, you'll end up with a lot of blog post talking about PLT/GOT stuff. Very interesting (yes, read them, it's very valuable), but there is still magic in those blogposts. (In french: https://www.segmentationfault.fr/linux/role-plt-got-ld-so/ ).
And how those blog posts explains how resolution is made?
In the previous blogspot, it just says: "it's a long and complicated code, but in the end, you get the address". I don't like magic in computing.
4/ No magic. Just show me.Here are the main links which could help you:
This sections contains a header, bloom filters, and hashes. Libc developers wants to run binary fast. When you solve symbols, you have to step through each symbols and make a strcmp. This is slow. Developers add lots of improvements.
I wrote a parser of .gnu.hash sections (values are displayed both in little and big endian):
4/1/ First speedup: Hash table.For quickly find object in a list, use hashtable. Hashtable are a convenient way to sort and find items in a list. The hash function used in the resolver is the djbx33a one:
We can calculate easily the hash of our function:
We can find our hash in the .gnu.hash section: 0xcceaa1de (minus the lower bit, but it's nonsignificant when solver compares hashes, although I spent too much time on this detail).
So, if you change the name of the function and its associated hash, it should work? No, not so easily. This is an hash table, you have to get the same bucket. Long story short, your (new_hash % nbuckets) should be equal to (old_hash % nbuckets). nbuckets equals 3 in this library. Let's work with this number:
- this_is_an_external_func_in_a_lib : hash(func)%3 = 0xCCEAA1DF%3 = 0
- AAAA_is_an_external_func_in_a_lib : hash(func)%3 = 0xEEA9C6CB%3 = 1 -> Not the same bucket, won't work
- BAAA_is_an_external_func_in_a_lib : hash(func)%3 = 0xFFE18ACC%3 = 0 -> Good.
So, we change name of the functions, and change hash with 0xFFE18ACC. Will it work? Still not, one last change to do.
4/2/ Second speedup added: Bloom filterUsing hashes is a big speedup, but libc maintenairs adds another big boost: bloom filter. The goal of this is to quickly reject unknown symbols. This bloom filter is made of another hash, and is used as a fast rejection process. If bloom filter fails, the symbol is not in the file. If bloom filter pass, it maybe or maybe not in the file. Apparently, this causes a huge speedup in symbol resolution. That's clever, but I have to change my fnction name.
If you want to bypass this bloom filter, you can recalculate it. Or you can put all bits to 1 which means: always pass. I'm not a programmer, I want things to work the way I want. So let put all bits to 1, and don't try to recalculate anything.
And after the bloom filter change, it will works, because the linker will say:
- does it pass the bloom filter? Yes
- does it have an hash? Yes
- -n the hash bucket, does a function with the same name exists? Yes
- --> Symbol resolution is done, code is here, work your way.
4/3/ First win:We have to change function name: Easy, we use BAAA_is_an_external_func_in_a_lib
We have to break bloom filter: Easy, put all bits to 1
We have to change hash value: Easy, just take care of the bucket.
After an hexediting (All bytes have been changed by hand):
Look bloom filter (all bits are 1), and hash change.
And now, it works like a charm!
As you can see, I'm calling the function BAAA_is_an_external_func_in_a_lib(), and it works.
We know how to change a function name inside a binary and its lib without breaking anything!
5/ Now the fun part!Ok, let's write a quick python patcher, called, patch.py
You can use anything in the range \x01-\xff for function name. Changing a character in a function is not fun. We can be good boyz (or girlz) and use internationalization. Write UTF-8, and be happy with it. But do you know that your xterm interprets escape sequence? \e]34; will print everything in black. Let write black on black and confuse reversers.
5/1/ Fun with ANSI escape codewe can use a function containing ansi escape code. Ansi escape code can be used to send BEEP, blink characters, change xterm name, change colors, and so on. Here is the fun part, where we change the xterm title when printing the function:
Little known fact: An evil binary can rename your xterm while begin debugged. Blogpost incoming.— Mitsurugi Heishiro (@0xmitsurugi) 16 février 2018
Hours of fun: disassembly printed black on black, different functions with same name, and so on. pic.twitter.com/xVTV1ojuWS
Fun, but can we do better? Ansi escape code can go backward.
So, we can overwrite function name:
Three functions with the same name?!?! Which one is the good one? You can spend a lot of time in this crackme with static analysis only.
Those fuctions are different. Their name is A\x1b[1Dcalling, B\x1b[1Dcalling and C\x1b[1Dcalling. The \x1b[1D is the sequence backward of 1 char, so it overwrites the first char.
5/2/ Fun with IDAYou can play with IDA. IDA doesn't recognize characters and replace them with _. How in the world would you debug a binary
calling functions ____() and ____() and ____() which are different?
I think there is a lot of improvements here, I'll try to make another blogpost with funny sequences.
6/ The EndI think this blogpost is waaaaay too long, so I'll finish it here. Code will be posted to github, it's just a python script which patch address in binary.
Today not possible. Tomorrow possible.