I like to play crackmes and one of my favorite tool is (of course) gdb. The title of this post is a tribute to an
0vercl0k's blogpost. Few days ago, I was trying to break a VM crackme (no spoil, no link).
Once you figure you are facing a VM, you are confronted with two CPU: the real one, where every gdb instructions make sense (it was x86), and the emulated one, where you have to deal with it to get the juicy part of the crackme.
At first, I spent a lot of time to step through all the x86 instructions and memory to read the instruction pointer, the opcode and the internal registers of the VM.
And I was thinking: Is there a way to emulate commands in gdb where a stepi could step efficiently in the VM, where info reg would dump the registers of the VM, and so on. This blogpost explain how to do it in a very simple and convenient manner. This is not a generic way to break VM, this is more how python can help you to break VM.
1/ Hi gdb this is python
Since a long time, you can
call python script from gdb:
or from a source script:
(gdb) source /path/to/script.py
You can also execute any gdb command within python and grab the result for analysis:
(gdb) python gdb.execute("info reg %s"%("eax"))
eax 0x1 1
(gdb)
But you can do a lot more. Python ca define new command in gdb, as defined in documentation (and excellent paper from 0vercl0k):
mitsurugi@mitsu:~/crackmes/VM$ cat TheName.py
import gdb
class HelloWorld(gdb.Command):
"""Greet the world"""
def __init__(self):
super(HelloWorld, self).__init__("mitsurugi", gdb.COMMAND_SUPPORT)
def invoke(self, args, from_tty):
print "The name's 0xMitsurugi\nRemember it!"
HelloWorld()
mitsurugi@mitsu:~/crackmes/VM$ gdb -q -nx ./VM.bin
Reading symbols from ./VM.bin...(no debugging symbols found)...done.
(gdb) source ./TheName.py
(gdb) mitsurugi
The name's 0xMitsurugi
Remember it!
(gdb)
And you can use anything in the invoke function, like:
- gdb.execute("gdb command")
- python evaluation
2/ Go VM, go!
When I was breaking this VM, I've quickly found a pattern:
- adress 0x0804843b
- read a byte at [esi]
- explode it in 5 bytes
- do stuff
- return to 0x0804843b
Another finding was that 8 bytes where used as temporary storage, at adresse 0x08049a84 (think registers).
What I need now is to have new gdb command that will do:
- stepi ==> named walk
- info reg ==> named registry
- print opcode ==> named opcode
- x/5i $eip ==> named lookahead
with this superset of command, I would navigate through the VM code like if it was native.
3/ python with a touch of class
For the clarity of the blogpost, all code has been moved at the end.
3/1/ Walk
Ok, the first one is very easy to do, named Walk. We just have to create aclasse doing a :
gdb.execute("c")
and check if we stop at the right breakpoint. We can add a lot of debug code, verify if the breakpoint is present, if there are another ones, and so on, but for simplicity, a single breakpoint and a single line do the trick.
3/2/ Register
We have 8 registers of one byte long. The goal here is to print them on the screen. I've choose to do it really simply:
The invoke function within the class Registry:
registry(gdb.execute("x/8bx 0x08049a84",to_string=True))
and the registry function doing the pretty printing. The eight registers appears as R1, R2,... R8
3/3/ Opcodes
Opcode is an interesting one. We learned that the VM read one byte at a time, splitted in 5 byte. First version of opcode just explode the byte:
print "Opcode is: "+ opc + " ",
a,b,c,d,e = exploding(opc)
print "value 0%d-0%d-0%d-0%d-0%d" % (a,b,c,d,e)
this saved me a lot of time. Everything depends of these 5 bytes, and patterns can be revealed easily.
The next work was to disass these opcodes. I filled slowly this function with what I've found:
if opc=='00': print 'NOP'
if opc=='c3':
print 'JUMP $esi+ %s - %s" %(get_esi(addr+"+1"),get_esi(addr+"+2"))
if c==6:
print 'MOV %s to R%d' % (get_esi(addr+"+1"),b)
and so on. It's really convenient, because you can hot reload the python script while working:
(gdb) opcode
Opcode is 00
value 00-00-00-00-00
(gdb) source ./vm.py #now telling that 00 is a NOP:
(gdb) opcode
Opcode is 00 NOP
value 00-00-00-00-00
(gdb)
There are two way to disassemble opcodes: reversing the x86 part, or just watching register before/after the operation.
3/4/ lookahead
The lookahead is equivalent to x/5i $eip. This is achieved easily, by calling opcode() function, with a cursor telling if we have to walk to next byte, or skip some. That's just a reuse of function opcode() and disassemble().
4/ Breaking VM with class
The challenge quickly resume with using walk, opcode, registry to find what opcode do, and run after run, we get a better understanding of what's going on. This is live debugging session, just after the first opcodes where discovered. Note how the VM code reveals itself, and disassembled opcodes are printed:
mitsurugi@mitsu:~/crackmes/VM$ gdb -q -nx ./VM.bin
Reading symbols from ./VM.bin...(no debugging symbols found)...done.
(gdb) b * 0x0804843b
Breakpoint 1 at 0x804843b
(gdb) r
Starting program: /home/mitsurugi/crackmes/VM/VM.bin
Please crack Me :toto
Breakpoint 1, 0x0804843b in ?? ()
(gdb) source ./vm.py
(gdb) opcode
Opcode is: 0xc3 JUMP $esi+ 0x01 - 0x00
value 03-00-03-00-00
(gdb) walk
(gdb) lookahead
Opcode is 0x00
== > NOP
Opcode is 0x00
== > NOP
Opcode is 0x26
== > MOV 0x20 to R4
Opcode is 0x3e
== > MOV 0x00 to R7
Opcode is 0x01
== > Mov 0x01-0x42 to R1 and R0
(gdb) registry
R0 R1 R2 R3 R4 R5 R6 R7
0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
(gdb) walk
(gdb) walk
(gdb) walk
(gdb) walk
(gdb) registry
R0 R1 R2 R3 R4 R5 R6 R7
0x00 0x00 0x00 0x00 0x20 0x00 0x00 0x00
(gdb) lookahead
Opcode is 0x01
== > Mov 0x01-0x42 to R1 and R0
Opcode is 0x87
== > Unknown Opcode
Opcode is 0x3c
== > Unknown Opcode
Opcode is 0x02
== > Unknown Opcode
Opcode is 0x03
== > Unknown Opcode
(gdb) walk
(gdb) registry
R0 R1 R2 R3 R4 R5 R6 R7
0x42 0x01 0x00 0x00 0x20 0x00 0x00 0x00
(gdb)
And that's waaaaay simpler than trying to use gdb commands. Understanding the VM was a piece of cake after that: we just focus on the important things, and we totally abstract the underlying CPU and instructions.
5/ Conclusion
After winning and reading other solutions, I figure that there were a lot of other ways to break it (Next in my TODO list: learn pin). It wasn't very complicated, but I've had a lot of fun with this VM crackme; Writing python was overkill for this one, but it will help me for beating next VM crackmes.
gdb + python can be really, really powerfull. For those who wondering, the VM was emulating a Z80 CPU, but I didn't recognizing the instructions ^_^
6/ Show me the code or GTFO
This code can not be read as a school example of coding. "It just works" (tm)
import gdb
#Main BP of the VM: 0x0804843b
def lookahead():
cursor=0
for i in range(0,5):
print "cursor is at:",
opcode=get_esi("$esi+"+str(cursor))
val,advance=disassemble("$esi+"+str(cursor))
cursor=cursor+1+advance
print "Opcode is %s" % (opcode)
print "\t == > %s" % (val)
def disassemble(addr):
opc=get_esi(addr)
o=""
cursor=0
a,b,c,d,e=exploding(opc)
if opc=='0x00':
o= 'NOP'
if opc=='0xc3':
o= 'JUMP $esi+ %s - %s' % (get_esi(addr+"+2"),get_esi(addr+"+1"))
cursor+=2
if c==6:
o= 'MOV %s to R%d' % (get_esi(addr+"+1"),b)
cursor+=1
if c==1:
o= 'Mov %s-%s to R1 and R0' % (get_esi(addr+"+2"),get_esi(addr+"+1"))
cursor+=2
if o=="": o="Unknown Opcode"
return o,cursor
def exploding(opc):
"""Found by reversing"""
a = int(opc, 16) >> 6
b = int(opc, 16) >> 3 & int ("0x7",0)
c = int(opc, 16) & int("0x7",0)
d = b & 1
e = b >> 1
return a,b,c,d,e
def get_esi(n):
"""esi can be seen as a global var, so we don't care passing it"""
byte=gdb.execute("x/bx "+n,to_string=True)
return byte.split()[1].rstrip()
def registry(regs):
#One of them is Z flags, should print it
for i in range(0,8):
print '\tR%d' % (i),
print
print regs.split(':')[1].rstrip()
def walk():
gdb.execute('c', to_string=True)
vv = gdb.execute("info reg $eip",to_string=True).split('x')[2]
if vv.rstrip() != '804843b':
print "Walk another time, we didn't reach the BP"
def opcode():
opc=get_esi("$esi")
print "Opcode is: "+ opc + " ",
print disassemble("$esi")[0]
print "value 0%d-0%d-0%d-0%d-0%d" % (exploding(opc))
#And now, the classes
class Opcode(gdb.Command):
"""Dump the Opcode"""
def __init__(self):
super(Opcode, self).__init__("opcode", gdb.COMMAND_SUPPORT)
def invoke(self, args, from_tty):
opcode()
class Walk(gdb.Command):
"""Walk one step"""
def __init__(self):
super(Walk, self).__init__("walk", gdb.COMMAND_SUPPORT)
def invoke(self, args, from_tty):
walk()
class Registry(gdb.Command):
"""Pretty print the VM registry"""
def __init__(self):
super(Registry, self).__init__("registry", gdb.COMMAND_SUPPORT)
def invoke(self, args, from_tty):
registry(gdb.execute("x/8bx 0x08049a84",to_string=True))
class Lookahead(gdb.Command):
"""Pretty print the VM registry"""
def __init__(self):
super(Lookahead, self).__init__("lookahead", gdb.COMMAND_SUPPORT)
def invoke(self, args, from_tty):
lookahead()
Opcode()
Walk()
Registry()
Lookahead()