Malware Analysis and Reverse Engineering
This article provides a high-level overview of malware analysis and reverse engineering. If you are planning to get started with malware analysis and reverse engineering, this article can be a good starting point, as it covers a high-level overview of what you need to know before you download that debugger and get your hands dirty reversing a malware sample. In this article, we will cover some of the fundamental aspects such as:
- Introduction to malware analysis and reverse engineering
- What you need to know to get started with malware analysis and reverse engineering
- Common malware behavior
- Common anti-analysis techniques used by malware
So let’s begin.
While some malware makes use of common patterns such as stealing files and connecting to C&C server using Windows APIs, some malware makes use of zero-days or a specific vulnerability in order to exploit. Regardless of what techniques the malware uses, reverse engineering is one of the common approaches in analyzing malware.
It should be noted that reverse engineering is time-consuming, and it is known to be a complex subject — but only until you master it.
Things to note before we start:
- You should remember that malware analysis should be done only on an isolated computer that is intended ONLY for analysis to avoid any infection. Virtual machines can be used for this purpose
- Disconnect the machine from the internet if it is not required
What you need to know to get started?
The majority of malware families target the Microsoft Windows operating systems, for the obvious reasons. Windows holds 78.43% of the desktop users market share worldwide, and thus it is a major target for malware authors. As a reverse engineer, it is important to understand Windows internals and commonly used Windows APIs in order to be able to effectively analyze malware targeting Windows.
As an example, let’s assume that we came across the function isDebuggerPresent call while analyzing a malware sample. The existence of this function call gives us a hint that the application may be attempting to detect if it is being debugged. While this can be found by searching on the internet, a person with Win32 API knowledge can quickly confirm that the application is checking if a debugger is present. A quick look at the Win32 API documentation shows the following syntax.
As per the Win32 API documentation, this “Determines whether the calling process is being debugged by a user-mode debugger. If the current process is running in the context of a debugger, the return value is nonzero. If the current process is not running in the context of a debugger, the return value is zero.”
When analyzing malware, all we have is an executable file. Obviously, having access to the source code makes our life much easier, but that is far from reality. How, then, do we analyze the binaries? We will have to obtain the decompiled or disassembled version of the program to be able to understand the logic.
During the process of generating an executable file, there are multiple intermediate files generated during different phases as shown below.
Source Code -> preprocessing -> Compiling -> assembling -> linking -> Binary
Using a decompiler, it is possible to obtain high-level source code (not possible with all languages) from an executable file. Depending on which programming language was used and how heavily it was obfuscated by the authors, the difficulty level of decompiling varies.
Using a disassembler, it is possible to obtain the low-level source code (assembly code) from an executable file. Most debuggers also come with a disassembler. They disassemble the executable programs for us, allowing us to step through the program in order to perform debugging during our analysis.
Using a debugger, it is possible to actually execute the code and understand which branch the code flows to.
Choose Your Tools
Malware analysis is broadly categorized into two types: static analysis and dynamic analysis.
Depending on which type of analysis we are doing and what artifacts we are specifically looking for, the tool set may change.
While there are several tools available for both static and dynamic analysis, the following are some of the most commonly used debuggers for reverse engineering.
- Immunity Debugger
- IDA Pro
The behavior of a malware depends on what the malware is designed for. However, at a high level some of the most commonly seen behaviors are to modify the file system, make changes to the registry entries, make network communications, create new processes and encrypt files.
While it is easy to detect and grab artifacts of these activities, modern malware is intelligent. They usually attempt to detect if someone is analyzing them. If the malware detects that someone is analyzing it, it straight away exits or jumps into code that does not make any sense and will eventually waste our time. This activity is done by using a variety of techniques such as anti-debugging, anti-disassembly, anti-VM, packers, obfuscation, encoding and encryption.
Let’s take anti-VM as an example in order to understand how this is done. Let’s assume that we are analyzing malware in a virtual machine that is running inside VirtualBox. This VM usually leaves some artifacts about the hypervisor being used, which can be used by malware authors to be able to detect if the malware is being run inside a Virtual Machine.
For instance, the following processes usually run inside a virtual machine that is run using VirtualBox. So malware can look for these running processes and terminate itself or redirect the flow to some useless code if they are detected.
Similarly, if the virtual machine is run using VMware, the following processes can be found.
What might you see in the process?
Let’s see some of the modern malware that you may encounter during your malware analysis journey. It’s important not to get surprised when you see something different from traditional malware written using the C programming language.
One important class of malware that we should discuss is rootkits. Rootkits come in two types — user mode rootkits or kernel mode rootkits. User mode rootkits run in user space (Ring 3) and they use techniques such as inline API hooking. On the other hand, kernel mode rootkits run in kernel space (Ring 0), which usually perform modifications in kernel space. Understanding common rootkit behaviors is essential to be able to analyze them during malware analysis.
Reverse engineering C++ malware is another essential skill to have, as some forms of malware are written using C++. As C++ applications can use object-oriented programming concepts, these applications are relatively complex to analyze using reverse engineering. An analyst should be able to identify the classes and the relationships among them to be able to effectively analyze them.
Being able to analyze 64-bit malware is another important skill in malware analysis. Malware targeting 64-bit processors is becoming more popular with the growing popularity of 64-bit processors. To be able to effectively analyze 64-bit malware, it is important to have the knowledge of 64-bit architecture and the instruction set.
This article has provided a brief overview of how malware behaves and what tools and techniques can be used by reverse engineers to dissect and analyze it. We discussed how knowledge of Windows internals can help in reverse engineering. We also discussed how various skills, such as the ability to understand 64-bit architecture, can add value in malware analysis.
- Reversing C++, Black Hat
- Rootkits, Symantec
- Practical Malware Analysis by Michael Sikorski and Andrew Honig, book from No Starch Press