Saving Time and Effort with IDAPython
For those familiar with CTF contests, time is an important constraint that needs to be managed effectively to get as many points as possible. Most task write-ups that you will encounter use some kind of script to automate long tasks or those hard to do manually. This saves not only time but also effort since the participant only needs to figure out what's being repeated once then automate the rest. This analogy is made clearer by the challenge we are going examine.
The task we are dealing with is from Nuit Du Hack Quals 2017 . It is a reverse engineering challenge named "Matrioshka: Step 4 (I did it again)" . We are going to see that this task not only does repetitive work that would take the reverser hours to complete but also makes it hard to reach the code that checks if the supplied input is correct or not.
What should you learn next?
The goal of this article is to write a Python solver script, which will automate everything from start to finish then print the flag at the end. 
We start first by seeing what kind of binary we are dealing with.
What we need to do now is setup the remote GDB debugger then connect to it from IDA64.
The reason behind why we are providing to the executable 334 characters as an argument is because that is what it expects.
After checking the length, the crackme allocates a memory aligned chunk of memory with memalign. It is going to subsequently use mprotect to modify the pages' permissions to RWX.
The data copied to this chunk starts from sub_40089D and goes to, but doesn't include sub_422C0C. Looking at some of what resides between these two addresses indicates that there will probably be some on-the-fly code decryption.
Next, the copy of "sub_40089D" from the allocated pages is called with the following arguments.
The return value must be 0x1 if the supplied argument represents the correct input.
So far, so good. Now let's look at what sub_40089D does.
We see that it iterates through each character of the supplied input and does the following:
- Read the character twice
- Save it in a local variable
- Read the characters again and add 1 to it. For example, A becomes B and so on
- Replace the original character in memory with the incremented value
- Restore it back from the local variable
One can interpret this as junk code since all it does is modify a character then restore it back to what it was before. However, this is not junk code. What this is, is a "protection" against detecting the flag checking algorithm by using hardware breakpoints. This "protection" would be weak if it is used only once, but that is not the case. It is used throughout the whole execution of the program, tens of times. This would take a LOT of time to reach the flag checking algorithm manually.
Now to the code decryption. Without delving into much detail, each function does the following:
This same stuff is done from 0x40089D to 0x40C305 (equivalent addresses in the allocated chunk). So, we have approximately 47 Kbytes of code that does nothing but decrypting itself and accessing (reading/writing) the user input.
Well, you may wonder now about how I did know that this was done from 0x40089D to 0x40C305. Certainly, I did not do it by pressing F9 and continually hitting HW breakpoints for hundreds of times. This is where IDAPython comes into play.
However, before starting to script, we need to make a few assumptions:
- When debugging the binary, I have only stepped through 4 functions that perform all these actions. So, my first assumption was that all the subsequent functions do the same except for the ones performing checks on the input.
- The second assumption, and the most important one, concerns the "protection" against HW breakpoints. I assumed that the functions that check whether the input is correct come after all
the routines we have seen. In other words, they are invoked last. If these checks were somewhere in the middle, meaning that we are going to be hitting more useless hardware breakpoints after the character was checked, the approach that I will take wouldn't work, and I am going to need to apply more criteria.
Hopefully, all my assumptions were correct, and the following script helped me to find exactly where the first character of the flag is verified.
We start by creating an empty list that will hold all the addresses (RIP) where the debugger will suspend the process because of the read/write hardware breakpoint. Next, in the prepare() function you can see that I modify the address parameter supplied to mprotect and the RAX register in the CALL instruction (Figure 4). I do this because IDA crashes at a later stage (second script) when trying to deal with hardware breakpoints. I have no idea why that is the case.
When returning from the function, the debugger is still suspended at CALL RAX, and we can extract the pointer to our string by dereferencing the pointer at RDX.
Subsequently, we set a Read/Write hardware breakpoint on the first character of our string. Moreover, right after that, we enter a loop that keeps appending the addresses (RIP) to the list. The loop continues until the process exits (event != BREAKPOINT). Finally, we delete the Read/Write hardware breakpoint and print the last address where the hardware breakpoint was hit. In other words, the code that checks the validity of the input supplied.
This is the output we get from IDA:
So, what we need to do now is see what the function containing the instruction at 0x40c336 does. Moreover, to do that we will need to set an execution hardware breakpoint on that address (software breakpoints are out of the question).
Once we hit that breakpoint, all we have to do is make IDA interpret the surrounding bytes as instructions then see what the function does.
Below is the python equivalent to the algorithm that checks whether two characters match the flag:
As you can see, the algorithm checks if each two characters are valid by using two hard coded double-words I named dword1 and dword2. Each function checks for ten characters, except the last one which does it for only 4. In other words, each function uses the latter code five times, except the last one which uses it for 2. This entails that, if you want to do this manually, you will need to repeat the same task of copying and pasting the different double-words into your brute forcing script for 167 times; This is very bad in the middle of a CTF contest.
What we have to do now is modify the python function so it can brute force the result variable using i and j; Which are the first and second characters respectively.
Then after that, we need to execute a second python script that will write the flag to a file:
What the script does is pretty obvious. The only part that needs clarifying is the two offsets added to both RIP and RBP:
- The first DWORD is always stored at a local variable [RBP-3Ch]
- The second DWORD is hard coded into an instruction operand. However, it is always located at RIP+0x54. RIP being where the debugger suspends the process after the HW breakpoint is triggered
As a result of executing this script we get a file containing the flag:
We can easily concatenate both scripts into one and run it only once under IDA .
Links & References:
: Binary file download: https://goo.gl/MhVl0g
: Full script: https://goo.gl/i4wSWl
FREE role-guided training plans
IDAPython docs: https://www.hex-rays.com/products/ida/support/idapython_docs/