Before we begin, we must mention that it's impossible to completely prevent reversing. What is possible is that we can place as many obstacles on the way as we want to make the process slow enough that reverse engineers will give up. Actually there are hardware implementations where you can buy a black box that attaches to your computer which can do the encryption/decryption for you, but this is far from being used in everyday life.
Techniques to Harden Reverse Engineering
FREE role-guided training plans
The most basic approaches to harden the reverse engineering of programs are the following :
- Eliminating Symbolic Information
- Obfuscating the Program
- Embedding Antidebugger Code
When eliminating symbolic information, we're taking the textual information from the program, which means we're striping all symbolic information from the program executable. In bytecode programs, the executable often contains large amounts of internal symbolic information such as class names, class member names, the names of instantiated global objects. By removing every symbol from the executable or by renaming every symbol, the reverser is faced with a bigger problem than usual because symbol names alone can often be used to gather enough information about what the function does, which simplifies the reverse engineering part.
This can easily be done in C/C++ programs where we only have to append a few compiler flags to the command line that actually compiles the program into the executable. It's much harder with programming languages like Java and .NET, where those symbols are used internally to reference variables, functions, etc. This is also the reason why Java and .NET programs can easily be converted into a pretty good source code of the original program. We can still strip the symbols from such programs by renaming all the symbols from their meaningful names into meaningless representations, which effectively does the job.
Besides stripping the executable symbols, we can also obfuscate the program. When obfuscating a program, we're basically changing the code of the program without actually changing the logic behind it, so the program does the same as before but its code is far less readable. Here we have two techniques that can achieve that:
- Encoding: With encoding, we must add the decoding instructions that decode the whole program before it's being run. This can be done by appending the decoding instruction at the end of the program and changing the entry point to point to the decoding instructions. When the program is run, the decoding instructions are executed first, which decodes the whole program into its original form. After that, we must jump to the start of the program and actually run the original instructions as if the encoding didn't even happen.
- Packing: When packing the executable, we're basically reducing the size of the executable as well as encrypting it. When such a program is run, it must first be decoded in memory and then run.
By obfuscating the program with nonstandard encoders/packers, we can greatly complicate the task of reverse engineering the executable, but at the end, a persistent reverse engineer will nevertheless be able to bypass that and get the non-obfuscated version of the executable, which can easily be reversed.
Last but not least, we can use an antidebugger code, where we can include a code into the executable that can detect if the program is currently being debugged. If that happens, the program terminates itself prematurely without actually executing the functions that would normally be executed if it wasn't running under a debugger.
Before discussing how anti-debugging tricks do their magic, we must first talk about how the debugger is able to debug the program. We know that we can stop and resume the program with the use of either software or hardware breakpoints.
When using software breakpoints, we're replacing the instruction on which we've set the breakpoint with the INT 3 instruction (at least on the x86 architecture), which is a special software interrupt. In this case, we're passing the value 3 to the instruction INT, which means that we're generating the software interrupt 3. This causes the function pointed to by the 3rd vector in the interrupt address table (IAT) to be executed. I guess we're all familiar with the INT 80 interrupt that makes a system call on Linux systems.
The INT 3 instruction temporarily replaces the current instruction in a running program. This is also a way for the debugger to know that a software breakpoint has occurred and the program execution should be stopped. After that, the debugger replaces the INT 3 instruction with the original instruction so the program can continue without the loss of instructions, which can otherwise cause abnormal program behavior.
When we use a hardware breakpoint, it's the processor's job to know when the breakpoint has been hit and the program has to be stopped. This is why the program is not modified when a hardware breakpoint is set.
When the breakpoint is hit, the program is stopped and we can safely execute instructions in our favorite debugger. At that point, we can run instructions step-by-step by entering into functions, or by executing them the same time. If we're interested in what the function does, we need to enter into the function; otherwise we can safely ignore the function and step over it. When stepping through the code, each instruction is executed on its own and then the program is again stopped, so we're able to analyze what the instruction has just done.
When stepping through the code with a debugger, the Trap Flag (TF) in the EFLAGS register is used. When the TF is enabled, an interrupt will be generated after every executed instruction, so we get the feeling of stepping though the program instruction by instruction.
The IsDebuggerPresent is a Windows API function, which we can see on the picture below:
The function doesn't take any arguments and returns a Boolean value notifying us whether the program is running under a debugger or not. This function can be used to trivially detect whether a debugger is being used to run the program. The function uses the Process Environment Block (PEB) to get information about whether the user-mode debugger is used.
Let's create a simple program that prints the number 0 or 1 if the debugger is present or not. We can do that by first creating an empty console project under Visual Studio C++ and then changing the code of the main cpp file into the following:
// isdebuggerpresent.cpp : Defines the entry point for the console application.
int _tmain(int argc, _TCHAR* argv)
num = 0;
num = 1;
printf("Number: %dn", num);
/* wait */
The program prints "Number: 0" if the debugger is present and "Number: 1" if the debugger is not. If we run the application under Visual Studio, the program will display the number 0 because it's being run under a debugger. This can be seen on the picture below:
Let's also run the program under OllyDbg to be sure that the number 0 is displayed. This can be quickly confirmed by loading the executable program and running it. On the picture below, we can see that the number 0 was printed when the program was run under OllyDbg debugger:
But if we run the same program under normal cmd.exe, it will display the number 1. This can be seen on the picture below:
We can see that the IsDebuggerPresent API function call works as expected, but that the function call is easy to detect and bypass. This is because we can quickly find this function call in the executable and delete it or bypass it. To do this, we can simply open the executable in Ida debugger and check out the Imports table to verify if that function exists somewhere in there. We're right, the function IsDebuggerPresent is listed among all the imported functions as we can see on the picture below:
This is a clear indication that the executable is using the function to do something different when the debugger is attached to the executable. We can also locate the exact instructions that are used to call that function. The whole Ida graph of the main function that does exactly the same as the main function from the C++ source code above is presented on the picture below:
We can see that, at first, we're initializing the stack for the function and calling the IsDebuggerPresent function. After that, we're testing the returned value in eax against itself to determine whether a true or false value was returned. If the eax holds a value different than 0 (1 in our case), then the zero flag will be set and the first box that sets the [ebp+num] to 0 is called. This is exactly what happens now, because we're running the program under a debugger, but otherwise the block that sets the [ebp+num] to 1 is called. After that, we're just moving the value of [ebp+num] into the register eax and printing it with the printf function.
If we now set the breakpoint on the call to the IsDebuggerPresent function and rerun the program, the execution will be stopped right where we want it. After the breakpoint has been hit, we can step into the function to see what the function actually does. On the picture below, we can see the function in question:
We can see that the function is pretty simple: we're loading the address of the currently active thread (TIB) in the register eax and then accessing the structure member that's located at the 0x30 offset; the PEB data structures lies at that offset. After that, we're loading the address of PEB in eax and then accessing its data member at 0x2 offset, which holds the data member named BeingDebugged. Thus, we've successfully taken a look at what the IsDebuggerPresent function actually does and how it does it. We can see that it's very simple and not really hard to bypass.
We can determine that IsDebuggerPresent is being used when we try to reverse engineer an executable and the program terminates prematurely, a different execution path is taken, or something else unexpected happens. In such cases, we must first check the Imports table if the IsDebuggerPresent function is being called anywhere in the executable. If that is the case, we can simply delete the instructions that call the IsDebuggerPresent function call, so it won't bother us when reversing the executable.
On the other hand, if we're developing a program and we would like to use the IsDebuggerPresent function call, we can copy the above instructions directly into our code, so that we're not actually calling the IsDebuggerPresent function directly, but using its function body instructions to figure out whether the debugger is being used to run the executable. This is just another trick so that reverse engineers won't immediately notice the use of IsDebuggerPresent function call and will make the debugging slightly more complicated.
We've seen a few techniques to harden the reverse engineering process. The technique easiest to bypass is symbol elimination where we have to delete all the symbols presented in the executable. This effectively makes the names of the functions unavailable when debugging, which leaves it up to the debugger to properly name the functions. Another technique is program obfuscation, which can be a pretty simple operation like xoring the whole executable then running it, but it can also be pretty complicated. Things get further complicated if we're using obfuscation with the anti-reversing techniques, which detects if the program is being reversed and terminates the program prematurely if so, greatly hardening the reverse engineering of the executable.
: Reversing: Secrets of Reverse Engineering, Eldad Eilam.
What should you learn next?
 Chris Eagle, The IDA Pro Book: The unofficial guide to the world's most popular disassembler.