- http://0x90909090.blogspot.fr/2018/01/solving-ctf-chall-easylazy-way-fic2018.html
- http://0x90909090.blogspot.fr/2018/01/solving-ctf-chall-hardgood-way-fic2018.html
- http://0x90909090.blogspot.fr/2018/01/solving-ctf-chall-crazyomg-way-fic2018.html
This is a more academic blogpost where I'll try to explain how to understand the logic behind the VM and the crackme.
1/ A bit of technic first.
Basically, when you implement a VM, you have to create a virtual CPU. This virtual CPU will have its own registers, memory, CPU flags. This virtual CPU will fetch, decode and execute instructions. Instructions are sequence of bits (for simplification, imagine a byte), and instructions can take 0 to N arguments.
if in pseudo code we want to make 13 xor 37, we can imagine this sequence instructions:
- PUT 13 in register (say, R1)
- PUT 37 in another register (say, R2)
- XOR R1 with R2
this is just encoding after it. If PUT is encoding with a 0x42, register by their numbers, and XOR is encoded as a 0xff, the logical sequence will be:
- 0x42 0x13 0x1
- 0x42 0x37 0x2
- 0xff 0x1 0x2
Easy. That's just conventions. The program is: 0x421301423702ff0102
And the CPU will work with this . Instruction pointer is at offset 0x00.
- Fetch: 0x42
- Decode: that's a push. It takes 2 arguments: value, then register. Increase instruction pointer by 3..
- Execute: moving value at (instruction pointer +1) to register (instruction pointer +2).
Next:
- Fetch 0x42
- and so on..
So if you want to break a VM, you have to learn where the instruction pointer is, where the registers are stored, and how to decode assembly. You have to figure out that 0x42 is a push in the previous example. How? That's the difficulty.
Now, back on our crackme. This is a VM. So, we have the program which emulate a CPU. So, we have to find a big loop: the fetch-decode-execute loop.
Once found, you'll know where the instruction pointer is, and where are the instructions.
Next, you'll have to understand the instructions. Once done, this is even more easy: understand program logic, break it, solve the chall, gain points.
2/ Find where things takes place
take time to read the assembly, and follow the dots 1,2,3..mitsurugi@dojo:~/chall/FIC2018/v24$ gdb -q a.out Reading symbols from a.out...(no debugging symbols found)...done. gdb$ disass main Dump of assembler code for function main: 0x0000000000400530 <+0>: push %rbx 0x0000000000400531 <+1>: mov $0x1000,%edi 0x0000000000400536 <+6>: callq 0x400510 <malloc@plt> 0x000000000040053b <+11>: or $0xffffffff,%edx 0x000000000040053e <+14>: test %rax,%rax 0x0000000000400541 <+17>: mov %rax,0x2023d0(%rip) # 0x602918 <stack> 0x0000000000400548 <+24>: je 0x40059a <main+106> //Here is a big loop. The fetch-decode-execute one, probably. //We read something at 0x602914 (regs+20) 1. 0x000000000040054a <+26>: movslq 0x2023c3(%rip),%rax # 0x602914 <regs+20> 0x0000000000400551 <+33>: cmpb $0xee,0x601440(%rax) //if equals to 0xee goto end 0x0000000000400558 <+40>: je 0x40058c <main+92> 3. 0x000000000040055a <+42>: mov $0x602800,%ebx //ebx will get increments from 0x10 to 0x10 0x000000000040055f <+47>: mov (%rbx),%edx 0x0000000000400561 <+49>: movslq 0x2023ac(%rip),%rax # 0x602914 <regs+20> 0x0000000000400568 <+56>: test %edx,%edx 0x000000000040056a <+58>: je 0x400582 <main+82> 4. 0x000000000040056c <+60>: movzbl 0x601440(%rax),%eax //we fetch the byte @0x601440+rax 0x0000000000400573 <+67>: cmp %edx,%eax //if eax==edx we call something. That's decode part. 0x0000000000400575 <+69>: jne 0x40057c <main+76> 0x0000000000400577 <+71>: xor %eax,%eax 5. 0x0000000000400579 <+73>: callq *0x8(%rbx) //the call. Probably the execute part. 0x000000000040057c <+76>: add $0x10,%rbx 0x0000000000400580 <+80>: jmp 0x40055f <main+47> 0x0000000000400582 <+82>: inc %eax //The regs+20 gets increased one by one -> so we step in the VM code probably. 2. 0x0000000000400584 <+84>: mov %eax,0x20238a(%rip) # 0x602914 <regs+20> 0x000000000040058a <+90>: jmp 0x40054a <main+26> //from here, this is the end of the program 0x000000000040058c <+92>: mov 0x202385(%rip),%rdi # 0x602918 <stack> 0x0000000000400593 <+99>: callq 0x4004c0 <free@plt> 0x0000000000400598 <+104>: xor %edx,%edx 0x000000000040059a <+106>: mov %edx,%eax 0x000000000040059c <+108>: pop %rbx 0x000000000040059d <+109>: retq End of assembler dump. gdb$
We almost understand how this VM works.
- The instruction pointer is at regs+20 (0x602914), we fetch the instruction at 0x601440+the value in regs+20.
- The byte is read, then compared to something on 0x602800, 0x602810, 0x602820 and so on. We say this is the decode part.
- Then, the callq rbx+0x8 is the execute part.
Fetch, decode, execute.
We know how the virtual CPU works. Lets dive into details. First, what do we have around the instruction pointer:
gdb$ x/16wx 0x601440 0x601440 <g_data>: 0x00000000 0x00000000 0x00000000 0x00000000 0x601450 <g_data+16>: 0x00000000 0x00000000 0x00000000 0x00000000 0x601460 <g_data+32>: 0x00000000 0x00000000 0x00000000 0x00000000 0x601470 <g_data+48>: 0x00000000 0x00000000 0x00000000 0x00000000 gdb$
We have a lot of 00 (a NOP maybe?). What next?
gdb$ x/160wx 0x601440 0x601440 <g_data>: 0x00000000 0x00000000 0x00000000 0x00000000 (snip ... snip ...snip) 0x601570 <g_data+304>: 0x00000000 0x0001e155 0x0c0d0300 0x0000e255 0x601580 <g_data+320>: 0x0cf20000 0x0000bb33 0xddf20000 0xcc1300bb 0x601590 <g_data+336>: 0x000000bb 0xbbdd0100 0xbbcc3700 0x00000000 0x6015a0 <g_data+352>: 0x00bbdd01 0x00bbccd3 0x01000000 0x3d00bbdd 0x6015b0 <g_data+368>: 0x0000bbcc 0xdd010000 0xccc000bb 0x000000bb 0x6015c0 <g_data+384>: 0xbbdd0100 0xbbccde00 0x00000000 0x00bbdd01 0x6015d0 <g_data+400>: 0x00bbccab 0x01000000 0xad00bbdd 0x0000bbcc 0x6015e0 <g_data+416>: 0xdd010000 0xcc1d00bb 0x000000bb 0xbbdd0100 0x6015f0 <g_data+432>: 0xbbccea00 0x00000000 0x00bbdd01 0x00bbcc13 0x601600 <g_data+448>: 0x01000000 0x3700bbdd 0x0000bbcc 0xaa010000 0x601610 <g_data+464>: 0x000000bb 0xbb33f200 0x00000001 0x02bbaa7a 0x601620 <g_data+480>: 0xf3000000 0x0003bb33 0x66600000 0x990100ab 0x601630 <g_data+496>: 0x020000bb 0x02ab66f9 0x00bb9903 0xaaf90200 0x601640 <g_data+512>: 0x000000bb 0xbb33f400 0x00000001 0x02bbaab2 0x601650 <g_data+528>: 0xf5000000 0x0003bb33 0x664e0000 0x990100ab 0x601660 <g_data+544>: 0x020000bb 0x02ab66f9 0x00bb9903 0xaaf90200 0x601670 <g_data+560>: 0x000000bb 0xbb33f600 0x00000001 0x02bbaab4 0x601680 <g_data+576>: 0xf7000000 0x0003bb33 0x66bb0000 0x990100ab 0x601690 <g_data+592>: 0x020000bb 0x02ab66f9 0x00bb9903 0xaaf90200 0x6016a0 <g_data+608>: 0x000000bb 0xbb33f800 0x00000001 0x02bbaae6 0x6016b0 <g_data+624>: 0xf9000000 0x0003bb33 0x66d40000 0x990100ab
The first non-zero byte is 0x55. This is probably the beginning of the code.
Now the decode part, if we look at what we have in 0x602800:
gdb$ x/20wx 0x602800 0x602800 <vm_func>: 0x00000011 0x00000000 0x00400696 0x00000000 0x602810 <vm_func+16>: 0x00000099 0x00000000 0x00400a40 0x00000000 0x602820 <vm_func+32>: 0x00000022 0x00000000 0x00400798 0x00000000 0x602830 <vm_func+48>: 0x00000033 0x00000000 0x00400697 0x00000000 0x602840 <vm_func+64>: 0x00000044 0x00000000 0x00400799 0x00000000 gdb$ x/x 0x00400696 //What is there? 0x400696 <vm_ret>: 0x77058bc3 // oooh, the beginning of vm_ret :) gdb$ x/x 0x00400a40 0x400a40 <vm_jnz>: 0x1ece058b //and vm_jnz, and the others ^_^ gdb$
Ok. So the program reads a byte in the g_data part. Then it calls a function depending on this byte.
That's really, really a good point. We have a byte, and a function. Doesn't take long to understand that this the assembly:
- 0x11 is vm_ret RETURN
- 0x99 is vm_jnz JUMP if NON ZERO
- 0x22 is vm_cll CALL
- 0x33 is vm_mov MOVE
- 0x44 is vm_push PUSH
- 0x55 is vm_ecl ??
- 0x66 is vm_cmp COMPARE
- 0x77 is vm_jmp JUMP
- 0x88 is vm_jzz JUMP if ZERO
- 0xaa is vm_mvp ?? move? pointer maybe?
- 0xbb is vm_and AND
- 0xcc is vm_add ADD
- 0xdd is vm_xor XOR
- 0x00 is NOP (we guessed it)
- 0xee is END (we guessed it also)
Ladies and gentlemen, the asm of the VM.
3/ Let see what happens
So, the first byte is vm_ecl. In order to quickly run the binary, we break at 0x000000000040054a only if $rax!=0
gdb$ b * 0x0000000000400573 if $rax!=0 Breakpoint 3 at 0x400573 gdb$ c Continuing. gdb$ info reg rax rax 0x55 0x55 gdb$ disass vm_ecl Dump of assembler code for function vm_ecl: 0x0000000000400d28 <+0>: mov eax,DWORD PTR [rip+0x201be6] # 0x602914 <regs+20> 0x0000000000400d2e <+6>: lea edx,[rax+0x1] 0x0000000000400d31 <+9>: mov DWORD PTR [rip+0x201bdd],edx # 0x602914 <regs+20> 0x0000000000400d37 <+15>: movsxd rdx,edx 0x0000000000400d3a <+18>: mov dl,BYTE PTR [rdx+0x601440] 0x0000000000400d40 <+24>: cmp dl,0xe1 0x0000000000400d43 <+27>: je 0x400d6f <vm_ecl+71> 0x0000000000400d45 <+29>: cmp dl,0xe2 0x0000000000400d48 <+32>: je 0x400de2 <vm_ecl+186> 0x0000000000400d4e <+38>: cmp dl,0xe0 0x0000000000400d51 <+41>: jne 0x400e55 <vm_ecl+301> (...) 0x0000000000400d6a <+66>: call 0x400520 <exit@plt> (...) 0x0000000000400ddd <+181>: jmp 0x4004d0 <write@plt> (...) 0x0000000000400e50 <+296>: jmp 0x4004e0 <read@plt>
Well, a switch case. If next byte is 0xe1 0xe2 or 0xe0, this function behaves differently. We have read, write and exit function in it. That should be for input/output. Let's step over for the moment, and see what's happening:
gdb$ stepo Temporary breakpoint 4 at 0x40057c ENTER PASS : gdb$
That's it. Let's go back to the VM disassembly a bit. We had:
0x601570 <g_data+304>: 0x00000000 0x0001e155 0x0c0d0300 0x0000e255 0x601580 <g_data+320>: 0x0cf20000 0x0000bb33 0xddf20000 0xcc1300bb
Put in right order: 55 e1 01 00 00 03 0d 0c 55 e2 00 00 f2 cf 33 bb... 55 is I/O, e1 seems to be output, numbers after are unknown (adress of the string probably), and next instructions should be 55 e2 (waiting for input). Let see the next instruction:
gdb$ info reg rax rax 0x0c 0x0c gdb$
Next instructions is 0x0c ?? As if the instruction pointer missed a step (?).
In our case, that's not really important because 0xc is not a valid instruction, so it will loop around all vm_functions, the iterate, then read 0x55. Let continue, stepover the vm_ecl function:
gdb$ stepo Temporary breakpoint 5 at 0x40057c ABCDEFABCDEF //entered myself 0x0000000000400580 in main () gdb$ gdb$ x/x 0x602914 0x602914 <regs+20>: 0x00000143 //offset of the instruction pointer gdb$ x/wx 0x601440+0x143 0x601583 <g_data+323>: 0x00bb330c //the instruction pointer gdb$
and once again, the 0x0c invalid instruction. vm_ecl doesn't increment the instruction pointer to the next instruction. The VM is built on a way that it doesn't matter, as long as the instruction is invalid... This is kind of a bug!
Let's fast forward a bit, until a 0xdd instruction (XOR):
Now, a bit of refactoring, this is just the VM assembly extracted from g_data:
0xdd 0xbb 0x00 0x13 //vm_xor
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01 //vm_add
0xdd 0xbb 0x00 0x37 //vm_xor
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01 //vm_add
0xdd 0xbb 0x00 0xd3
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01
0xdd 0xbb 0x00 0x3d
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01
0xdd 0xbb 0x00 0xc0
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01
Seeing a pattern? 0xbb shoud be an offset to somewhere, XOR is the key, and we slide this offset one by one, in pseudo code, it becomes:
xor(pass[i], 0x13)
i = i+1
xor(pass[i], 0x37)
i = i+1
etc...
We extract the key: 0x1337d33dc0 just by reading the vm assembly.
And what about the instruction pointer? Does it point to the right instruction after a vm_xor?
gdb$ info reg rax rax 0xcc 0xcc gdb$
yep, it works, so the vm_xor instruction advance the instruction pointer right.
The next steps are to understand other vm_XXX instructions, where data is stored, what is done with it, and so on.
Just step through the function and mark all known addresses (base address, offsets, registers, CPU flags, and you'll quickly be able to reverse any VM code. Follow the vm_cmp instruction, learn where are the offsets, and compare yourself the bytes.
4/ Why the crackme accepts more than one solution?
As we saw, sometimes, the instruction pointer is not incremented to the next instruction. If the instruction is illegal, nothing happen. But if the instruction pointer falls on a known instruction, a different behavior is done.0xF4b found that the vm_mov is also buggy, and an 0xbb instruction is called (vm_and) instead of the vm_cmp, and the JNZ is never called afterwards.
5/ Conclusion
Thank you for scrolling this far ;-) Learn to pwn crackme.If you want the a.out file to play with it, drop me a DM or email.
Those who want to do will find a way.
Those who don't want to do search an excuse.
0xMitsurugi