Introduction

In the previous tutorial we discussed a basic introduction to the Defender program. We’ve also looked at the sub_402EA8, which returns the base address of the ntdll.dll library in memory. We’ve also talked about the fact that the Defender program doesn’t have a lot of functions that it calls; it has quite a low number of functions that certainly can’t provide all the functionality the program has. This is also the reason the program must find the base address of the ntdll.dll library, so that it can call other functions from that library at runtime.

Let’s rename the sub_402EA8 function to get_base_ntdll function name, so the function will now look like this:

We’ve also added the repeatable comment by selecting the Edit – Comments – Enter repeatable comment functionality. This is useful because we don’t have to remember what the function does; we just add a comment to the function. The good thing about repeatable comments (against normal comments) is also that they are also displayed at the referenced locations. Let’s take a look at the picture below which presents the start of the Defender program, but also where we can also see the added repeatable comments.

The base address of the ntdll.dll library is returned in the eax register, which we’re pushing to the stack right after the function call. Since we’re calling the function loc_4033D1 next, the newly pushed base address of the ntdll.dll library is the argument we’re passing to that function. That function is presented on the picture below:

What follows is another function named loc_40341E that is presented below (this function immediately follows the previous function loc_4033D1).

There’s also the rest of the code that’s presented on the pictures below:

After that, we can see something very interesting. The location at 0x004034D5, 0x004034D6 and all the locations following (and including) 0x004034DD are categorized as data and not as code.

The addresses between 0x004034DD and 0x004041FC all contain data and not code instructions. What follows after that address is the loc_4041FC instructions and the sub_404202 function, which Ida has already categorized as code function. What follows that function is our start routine where the program starts executing after running it.

Why is there a data section in the middle of code section, it doesn’t make any sense? The most probable reason is that the program has to deobfuscate the code when we run it; with the code obfuscated at this moment, Ida can’t correctly categorize it.

Analyzing Deobfuscation Routine

The only way to be sure what happens when the loc_4033D1 function is called is to actually analyze the function. Let’s present the instructions so we can analyze them in more detail. The instructions at the beginning at the loc_4033D1 are first initializing the new stack pointer and subtracting 0×22 from the stack pointer, which gives us the information about the number of local variables this routine will need: 0x22C / 0×4 = 8B local variables. After that, we’re saving the registers ebx, esi and edi on the stack to preserve their values.

What we’re doing next is storing the constant address 0x004034DD in the register eax and saving that constant in the [ebp-20] offset. Then we’re storing the constant 0x004041FD into the offset [ebp-18]. After that, we’re storing the constant 0x004034E5 into the address 0x004034D6.

Then we’re storing the value 1 on the stack at offset [ebp-8] and comparing it to 0. Since we just stored a value of 1 in there, it will never be equal to 0, which means that the jz will not evaluate to true and the jump will not be taken. This usually happens when we would like to do something in a loop where we’re taking the loop the first time, but next time we’ll probably jump over the loop body.

Next, we’re storing the value from stack offset [ebp-18] into eax; previously we saved the value 0x004041FD into that address. After that, we’re subtracting the 0x004041FD – 0x004034DD = 0xD20, which we’re storing at stack offset [ebp-30]. We’re also copying the value 0x004034DD into the stack offset [ebp-34] and zeroing the values at stack offset [ebp-24] and [ebp-28].

Next, we’re comparing the 0xD20 (offset [ebp-30]) constant that we previously calculated with a constant 3 and jumping to the loc_40346B when the value at stack offset [ebp-30] is lower or equal 3. On the left side, we can see a loop which is being repeated as long as the value at stack offset [ebp-30] is larger than 3. This is why we must look for the instructions that lower that value in each iteration: those instructions are located at addresses 0×00403460, 0×00403463 and 0×00403466.

The counter is being subtracted by 4 each iteration, which means that the loop will be repeated: 0xD20 / 0×4 = 0×348 times (840 in decimal). The rest of the instructions are calculating the XOR operations on the values stored at addresses 0x004034DD – 0x004041FD. This means that Ida is probably using the loop below to deobfuscate the instructions at those locations. We don’t actually need to go into the details how the XOR operations happen exactly, because we can add a breakpoint at the end of the loop below and the executable will automatically be deobfuscated.

We can add a breakpoint at the end of the loop presented on the picture above and execute the problem until that breakpoint is hit. Let’s again present the addresses that ought to be XORed by the deobfuscation routine. The hexadecimal bytes at the location which is supposed to be XORed is presented below:

After we have run the program and the breakpoint has been hit, those hexadecimal bytes will look like the ones presented on the picture below:

We can see that the bytes are different, which is the result of deobfuscation routing. But the disassembly code has not changed as it should have. The memory addresses still show that the data is in place; this can be shown in the picture below:

Why does this happen? It’s because we must tell Ida to reanalyze the changed bytes and try to present the right disassembly instructions.

Want to learn more?? The InfoSec Institute Reverse Engineering course teaches you everything from reverse engineering malware to discovering vulnerabilities in binaries. These skills are required in order to properly secure an organization from today's ever evolving threats. In this 5 day hands-on course, you will gain the necessary binary analysis skills to discover the true nature of any Windows binary. You will learn how to recognize the high level language constructs (such as branching statements, looping functions and network socket code) critical to performing a thorough and professional reverse engineering analysis of a binary. Some features of this course include:

  • CREA Certification
  • 5 days of Intensive Hands-On Labs
  • Hostile Code & Malware analysis, including: Worms, Viruses, Trojans, Rootkits and Bots
  • Binary obfuscation schemes, used by: Hackers, Trojan writers and copy protection algorithms
  • Learn the methodologies, tools, and manual reversing techniques used real world situations in our reversing lab.

Let’s try to do the same in Olly debugger. Before running the deobfuscation routine, the bytes at address 0x004034DD are as presented on the picture below. We can see that the bytes are the same as were already presented on one of the previous images.

If we place a breakpoint on the 0x0040346B and run the program, the bytes will change into the bytes presented on the picture below. Again, the deobfuscated bytes are the same as already discovered by Ida.

So why are we doing this once again in Olly debugger if we’re getting the same result as with Ida? Because Olly has a nice feature to analyze the code once more. If we right-click in the CPUview in Olly, we can click on the Analysis – Analyze code to analyze the assembly code again. After that, we can quickly discover that the assembly instructions have changed as we can see on the picture below:

Notice that the first 8 bytes are still data bytes, but what follows are real instructions that were data before. This proves that Olly can actually analyze the code again, which is a nice feature to have when working with deobfuscation routines. So we’ve deobfuscated the code that we need, but we still haven’t entirely figured out what the deobfuscation routine does. Let’s take a look at the jump instruction where we’ve set the breakpoint before. When the deobfuscation routine is done, the jump instruction is always taken, which transfers our execution at the 0x004034D5 address.

Once the breakpoint has been hit, we can step through the code. The first instruction executed will jump to the address 0x004034D5. Before executing this instruction, the code on that location will look like the picture below:

There are data bytes located at that address, so how can itbe jumping to it if we can’t really execute the data? Of course we can’t execute the data. When we press the button to execute the jump instruction, we’ll be presented with the dialog box presented on the picture below:

The dialog box is saying that Ida has detected data instructions and not code instructions and is asking us if we would like to transform those instructions into code instructions. Since we’ve run the deobfuscation routine, the code instructions should be present at that location, which is why it’s safe to assume that we can press the ‘Yes’ button. When confirming the dialog above, we’ll jump to the 0x004034D5 location as intended, but this time the instructions will be changed into the code instructions (instead of data). We can see those code instructions on the picture below:

The instructions above are loading the address 0x004034E5 into the register ebx and jumping to it. Let’s see what happens just before the jump is taken. On the picture below we can see that we’re jumping right into the data section, which doesn’t hold instructions. But if we step into the code again, another dialog box will appear asking us to change the data to code instructions.

Upon confirming that action, we’ll be taken to the 0x004034E5 address and the data will be translated into code instructions. This can be seen on the picture below. We can conclude that the jump is required to jump over the data instructions that are left at the address 0x004034DD and following until the 0x004034E4 address.

If you remember correctly, we jumped over some code in the deobfuscation routine, but are those instructions ever evaluated or not? We can add a breakpoint on address 0x0040346D and run the program; if the breakpoint is hit, then that function is called sometime in the program execution. Once we’ve run the program, we can see that the function is called. We can see that breakpoint being hit and all the executed instructions below:

It seems that we’re taking the value stored at [ebp-18] and subtracting the value stored in [ebp-20] and storing the result into [ebp-40]. Then we’re overwriting the value at [ebp-44] with the value stored at [ebp-20]. The value stored at [ebp-18] is 0x004041FD and the value in [ebp-20] is 0x004034DD, which means that we’re repeating the calculation: 0x004041FD – 0x004034DD = 0xD20. After that there’s aXORing routine that XORs the same instructions as before. Let’s take another look at the values at address 0x004034DD before XORing:

After the first XORing:


After the second XORing:

We can see that the deobfuscation routine is also an obfuscation routine, which first deobfuscates the instructions and executes them and after all that it obfuscates the instructions back at their original value (notice that the first and the third image have the same values at presented addresses).

This just gave us the total overview of the deobfuscation routine, which is also an obfuscation routine.

Conclusion

We’ve taken a look at the deobfuscation routine, which later proved to also be an obfuscation routine. The code that we analyzes first XORs the instructions, executes them and then obfuscates them back at their original form.

In the next tutorial we’ll take a look at what the deobfuscated instructions actually do and try to understand the function call in whole.