Last night, I was looking through C-code and ARM assembly.
I was wondering myself: When a binary calls a function inside a shared lib, how the linker knows where the code resides in the library?
The second question was: can we change the name of functions in a binary and in a library and that everything works after?
And third: can we use some fancy characters in function names? Like changing color of the xterm when doing an objdump or gdb the binary? You know ANSI escape codes? What if we put ANSI escape code in function name?
1/ Start smoothly
My computer is currently a raspberry pi. Everything here has been tested under this architecture. It should work everywhere, but your mileage may vary.
Let's take an example:
mitsurugi@raspi:~/resolv_func/$ cat libpoem.h
int this_is_an_external_func_in_a_lib();
mitsurugi@raspi:~/resolv_func/$ cat libpoem.c
/* Compile with gcc -shared -o libpoem.so libpoem.c */
#include <stdio.h>
int this_is_an_external_func_in_a_lib() {
puts("ARM disassembly"); //5
puts("Reading symbol resolving"); //7
puts("In the cold of night"); //5
return 42;
}
mitsurugi@raspi:~/resolv_func/$ cat proj.c
/* gcc -o proj -Wl,-rpath=. -L. -I. -l poem proj.c */
#include "libpoem.h"
int main() {
int ret;
ret=this_is_an_external_func_in_a_lib();
return ret;
}
mitsurugi@raspi:~/resolv_func/$
We can compile and run this binary:
mitsurugi@raspi:~/resolv_func/blog$ gcc -shared -o libpoem.so libpoem.c
mitsurugi@raspi:~/resolv_func/blog$ gcc -o proj -Wl,-rpath=. -L. -I. -l poem proj.c
mitsurugi@raspi:~/resolv_func/blog$ ldd proj
linux-vdso.so.1 (0x7efd9000)
libpoem.so => ./libpoem.so (0x76f33000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76e30000)
/lib/ld-linux-armhf.so.3 (0x76f57000)
mitsurugi@raspi:~/resolv_func/blog$ ./proj
ARM disassembly
Reading symbol resolving
In the cold of night
mitsurugi@raspi:~/resolv_func/blog$
The dynamic linker search for the lib in the current path (which is not really secure, but out of the scope of this blogpost).
This binary runs fine, as expected.
2/ Symbol resolution
The question is, how does the binary knows where to look for the
this_is_an_external_func_in_a_lib() call? It's obviously related to string comparison:
mitsurugi@raspi:~/resolv_func/blog$ strings proj libpoem.so | grep external
this_is_an_external_func_in_a_lib
this_is_an_external_func_in_a_lib
this_is_an_external_func_in_a_lib
this_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$
Well, if we have the string this_is_an_external_func_in_a_lib in the binary and the library, maybe because they are associated?
Proof: if you alter one of these strings, the program doesn't work anymore:
mitsurugi@raspi:~/resolv_func/blog$ sed s/this_is_an_external_func_in_a_lib/AAAA_is_an_external_func_in_a_lib/g proj > proj2
mitsurugi@raspi:~/resolv_func/blog$ chmod +x proj2
mitsurugi@raspi:~/resolv_func/blog$ ldd proj2
linux-vdso.so.1 (0x7ed45000)
libpoem.so => ./libpoem.so (0x76eed000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76dea000)
/lib/ld-linux-armhf.so.3 (0x76f11000)
mitsurugi@raspi:~/resolv_func/blog$ ./proj2
./proj2: symbol lookup error: ./proj2: undefined symbol: AAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$
The same happens if you change the function in the library:
mitsurugi@raspi:~/resolv_func/blog$ mv libpoem.so libpoem.so.ori
mitsurugi@raspi:~/resolv_func/blog$ sed s/this_is_an_external_func_in_a_lib/AAAA_is_an_external_func_in_a_lib/g libpoem.so.ori > libpoem.so
mitsurugi@raspi:~/resolv_func/blog$ chmod +x libpoem.so
mitsurugi@raspi:~/resolv_func/blog$ ldd proj //this is the unaltered binary
linux-vdso.so.1 (0x7ea81000)
libpoem.so => ./libpoem.so (0x76ede000)
libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76ddb000)
/lib/ld-linux-armhf.so.3 (0x76f02000)
mitsurugi@raspi:~/resolv_func/blog$ ./proj
./proj: symbol lookup error: ./proj: undefined symbol: this_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$
Seems logic. it search a function by its name.
But wait, what if we change names in BOTH? Would it work? Binary calls for
AAAA_is_an_external_func_in_a_lib(), linker will step through all library linked, find libpoem.so, open it, read functions names, fint it and call it. Does it works?
mitsurugi@raspi:~/resolv_func/blog$ ./proj2
./proj2: symbol lookup error: ./proj2: undefined symbol: AAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$
Still a fail, although we have the same name in library and binary:
mitsurugi@raspi:~/resolv_func/blog$ strings proj2 libpoem.so | grep external_func
AAAA_is_an_external_func_in_a_lib
AAAA_is_an_external_func_in_a_lib
AAAA_is_an_external_func_in_a_lib
AAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$
3/ Read The Freaky Manual (If it exists...)
When you search something, you can read the manual. But in that case, it won't help because there is no manual.
When you google for symbol resolution, you'll end up with a lot of blog post talking about PLT/GOT stuff. Very interesting (yes, read them, it's very valuable), but there is still magic in those blogposts. (In french:
https://www.segmentationfault.fr/linux/role-plt-got-ld-so/ ).
And how those blog posts explains how resolution is made?
In the previous blogspot, it just says: "it's a long and complicated code, but in the end, you get the address". I don't like magic in computing.
4/ No magic. Just show me.
Here are the main links which could help you:
I'll try to summarize things. First, we have a hash section in ELF files:
mitsurugi@raspi:~/resolv_func/blog$ readelf -x .gnu.hash libpoem.so
Hex dump of section '.gnu.hash':
0x00000118 03000000 08000000 02000000 06000000 ................
0x00000128 890020b1 00c44289 08000000 0c000000 .. ...B.........
0x00000138 0f000000 00af34e8 4245d5ec dea1eacc ......4.BE......
0x00000148 bbe3927c beda571b d871581c b98df10e ...|..W..qX.....
0x00000158 76543c94 ead3ef0e 59ef9779 vT<.....Y..y
mitsurugi@raspi:~/resolv_func/blog$
This sections contains a header, bloom filters, and hashes. Libc developers wants to run binary fast. When you solve symbols, you have to step through each symbols and make a strcmp. This is slow. Developers add lots of improvements.
I wrote a parser of .gnu.hash sections (values are displayed both in little and big endian):
mitsurugi@raspi:~/resolv_func/blog$ ./hashparse.py libpoem.so
*** Get GNU HASH section for libpoem.so
[+] Ok, one line. Good
[+] GNU HASH mapping fits perfectly disk and memory layout
starting at 0x00000118
and size is 0x00004c long
*** Extracting .gnu.hash
*** Parsing...
[+] Header
3 hash buckets //we'll use this number later
8 symndx
2 bloom masks
6 bloomshift (minimum 6)
[+] Part 2 - bloom masks
Mask 0 : 0xb1200089L | 0x890020b1L
Mask 1 : 0x8942c400L | 0xc44289
[+] Part 3 - N Buckets of hash
Bucket 0 : 0x8 | 0x8000000
Bucket 1 : 0xc | 0xc000000
Bucket 2 : 0xf | 0xf000000
[+] Part 4 - Hashes
Hash 0 : 0xe834af00L | 0xaf34e8
Hash 1 : 0xecd54542L | 0x4245d5ec
Hash 2 : 0xcceaa1deL | 0xdea1eaccL //pay attention to this hash
Hash 3 : 0x7c92e3bb | 0xbbe3927cL
Hash 4 : 0x1b57dabe | 0xbeda571bL
Hash 5 : 0x1c5871d8 | 0xd871581cL
Hash 6 : 0xef18db9 | 0xb98df10eL
Hash 7 : 0x943c5476L | 0x76543c94
Hash 8 : 0xeefd3ea | 0xead3ef0eL
Hash 9 : 0x7997ef59 | 0x59ef9779
mitsurugi@raspi:~/resolv_func/blog$
4/1/ First speedup: Hash table.
For quickly find object in a list, use hashtable. Hashtable are a convenient way to sort and find items in a list. The hash function used in the resolver is the djbx33a one:
static uint_fast32_t
dl_new_hash (const char *s)
{
uint_fast32_t h = 5381;
for (unsigned char c = *s; c != '\0'; c = *++s)
h = h * 33 + c;
return h & 0xffffffff;
}
We can calculate easily the hash of our function:
mitsurugi@raspi:~/resolv_func/blog$ ./dl_new_hash.py this_is_an_external_func_in_a_lib
[+] Calculating hash for this_is_an_external_func_in_a_lib
Output is 0xCCEAA1DF
mitsurugi@raspi:~/resolv_func/blog$
We can find our hash in the .gnu.hash section: 0xcceaa1de (minus the lower bit, but it's nonsignificant when solver compares hashes, although I spent too much time on this detail).
So, if you change the name of the function and its associated hash, it should work? No, not so easily. This is an hash table, you have to get the same bucket. Long story short, your (new_hash % nbuckets) should be equal to (old_hash % nbuckets). nbuckets equals 3 in this library. Let's work with this number:
- this_is_an_external_func_in_a_lib : hash(func)%3 = 0xCCEAA1DF%3 = 0
- AAAA_is_an_external_func_in_a_lib : hash(func)%3 = 0xEEA9C6CB%3 = 1 -> Not the same bucket, won't work
- BAAA_is_an_external_func_in_a_lib : hash(func)%3 = 0xFFE18ACC%3 = 0 -> Good.
So, we change name of the functions, and change hash with 0xFFE18ACC. Will it work? Still not, one last change to do.
4/2/ Second speedup added: Bloom filter
Using hashes is a big speedup, but libc maintenairs adds another big boost: bloom filter. The goal of this is to quickly reject unknown symbols. This bloom filter is made of another hash, and is used as a fast rejection process. If bloom filter fails, the symbol is not in the file. If bloom filter pass, it maybe or maybe not in the file. Apparently, this causes a huge speedup in symbol resolution. That's clever, but I have to change my fnction name.
If you want to bypass this bloom filter, you can recalculate it. Or you can put all bits to 1 which means: always pass. I'm not a programmer, I want things to work the way I want. So let put all bits to 1, and don't try to recalculate anything.
And after the bloom filter change, it will works, because the linker will say:
- does it pass the bloom filter? Yes
- does it have an hash? Yes
- -n the hash bucket, does a function with the same name exists? Yes
- --> Symbol resolution is done, code is here, work your way.
4/3/ First win:
We have to change function name: Easy, we use
BAAA_is_an_external_func_in_a_lib
We have to break bloom filter: Easy, put all bits to 1
We have to change hash value: Easy, just take care of the bucket.
After an hexediting (All bytes have been changed by hand):
mitsurugi@raspi:~/resolv_func/blog$ readelf -x .gnu.hash libpoem.so
Hex dump of section '.gnu.hash':
0x00000118 03000000 08000000 02000000 06000000 ................
0x00000128 ffffffff ffffffff 08000000 0c000000 ................
0x00000138 0f000000 00af34e8 4245d5ec cc8ae1ff ......4.BE......
0x00000148 bbe3927c beda571b d871581c b98df10e ...|..W..qX.....
0x00000158 76543c94 ead3ef0e 59ef9779 vT<.....Y..y
mitsurugi@raspi:~/resolv_func/blog$
Look bloom filter (all bits are 1), and hash change.
And now, it works like a charm!
mitsurugi@raspi:~/resolv_func/blog$ ./proj2
ARM disassembly
Reading symbol resolving
In the cold of night
mitsurugi@raspi:~/resolv_func/blog$ gdb -q proj2
Reading symbols from proj2...(no debugging symbols found)...done.
gdb$ disass main
Dump of assembler code for function main:
0x000006ac <+0>: push {r7, lr}
0x000006ae <+2>: sub sp, #8
0x000006b0 <+4>: add r7, sp, #0
0x000006b2 <+6>: blx 0x584 <BAAA_is_an_external_func_in_a_lib@plt>
0x000006b6 <+10>: str r0, [r7, #4]
0x000006b8 <+12>: ldr r3, [r7, #4]
0x000006ba <+14>: mov r0, r3
0x000006bc <+16>: adds r7, #8
0x000006be <+18>: mov sp, r7
0x000006c0 <+20>: pop {r7, pc}
End of assembler dump.
gdb$
As you can see, I'm calling the function BAAA_is_an_external_func_in_a_lib(), and it works.
mitsurugi@raspi:~/resolv_func/blog$ strings proj2 libpoem.so | grep BAAA
BAAA_is_an_external_func_in_a_lib
BAAA_is_an_external_func_in_a_lib
BAAA_is_an_external_func_in_a_lib
BAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$
We know how to change a function name inside a binary and its lib without breaking anything!
5/ Now the fun part!
Ok, let's write a quick python patcher, called, patch.py
You can use anything in the range \x01-\xff for function name. Changing a character in a function is not fun. We can be good boyz (or girlz) and use internationalization. Write UTF-8, and be happy with it. But do you know that your xterm interprets escape sequence? \e]34; will print everything in black. Let write black on black and confuse reversers.
5/1/ Fun with ANSI escape code
we can use a function containing ansi escape code. Ansi escape code can be used to send BEEP, blink characters, change xterm name, change colors, and so on. Here is the fun part, where we change the xterm title when printing the function:
Fun, but can we do better? Ansi escape code can go backward.
So, we can overwrite function name:
Reading symbols from crack...(no debugging symbols found)...done.
(gdb) disass main
Dump of assembler code for function main:
0x00000688 <+0>: push {r7, lr}
0x0000068a <+2>: add r7, sp, #0
0x0000068c <+4>: blx 0x53c <calling@plt>
0x00000690 <+8>: movs r3, #0
0x00000692 <+10>: mov r0, r3
0x00000694 <+12>: pop {r7, pc}
End of assembler dump.
(gdb) q
mitsurugi@raspi:~/resolv_func/blog$
and, the library says:
mitsurugi@raspi:~/resolv_func/blog$ nm libcrack.so | grep ' T '
000004fc T calling
0000050a T calling
00000518 T calling
00000528 T _fini
000003fc T _init
mitsurugi@raspi:~/resolv_func/blog$
Three functions with the same name?!?! Which one is the good one? You can spend a lot of time in this crackme with static analysis only.
Those fuctions are different. Their name is
A\x1b[1Dcalling,
B\x1b[1Dcalling and
C\x1b[1Dcalling. The \x1b[1D is the sequence backward of 1 char, so it overwrites the first char.
5/2/ Fun with IDA
You can play with IDA. IDA doesn't recognize characters and replace them with _. How in the world would you debug a binary
calling functions ____() and ____() and ____() which are different?
I think there is a lot of improvements here, I'll try to make another blogpost with funny sequences.
6/ The End
I think this blogpost is waaaaay too long, so I'll finish it here. Code will be posted to github, it's just a python script which patch address in binary.
Today not possible. Tomorrow possible.
0xMitsurugi