This article provides a quick comprehensive survey of digital forensics and covers its various aspects from the technical side, varied analysis approaches, and common tools. It can be used as a starting point to understand the major forensics principles, methodologies, and core concepts.
Digital forensics is computer forensic science. It involves the process of seizure, acquisition, analysis, and reporting the evidence from device media, such as volatile memory and hard disks, to be used in a court of law. The lifecycle of a forensic investigation starts with the seizure and marking of elements that will be used later in the process. Next, a copy of the data is acquired without altering or damaging the source to be analyzed later. In the analysis phase, evidence should be extracted by interpreting the acquired information. In case of multiple sources, the data is aggregated and correlated together. Finally, the results are reported in a document, along with a detailed description of the steps conducted during the investigation.
The next section tackles the different domains of digital forensics, covering malware forensics, email forensics, and more.
Malware is a type of software intentionally designed with malicious functionalities. The goal of malware forensics is to find out:
What the malware can do (and what it does in a particular situation)
To which family it belongs to (ransomware, keyloggers, remote administration tools)
How it can be detected and blocked, and
How it can be cleanly removed from an infected system
To achieve these goals, there are two approaches: static analysis and dynamic analysis. Each approach has its own pitfalls and advantages. Static analysis examines the binary without running it. It is the only option when the malware cannot be run, i.e. taken from a partial memory dump, missing pieces, or having an unavailable architecture. It tells the analyst everything the program can do, but this approach is less precise because of the need to reason about the program behavior without actually executing the code. By contrast, it achieves a larger coverage: one can reason about all possible executions at the same time. Dynamic analysis runs the program and observes its behavior. It tells the analyst exactly what the program does when it is executed in a given environment and with a particular input. It is more precise because it can observe the instructions executed and the values of registers and memory; however, it achieves a smaller coverage because it observes one execution path at the time.
A general approach to malware analysis would be:
Set up a controlled, isolated laboratory in which to examine the malware sample
Perform behavioral analysis to examine the sample’s interactions with its environment
Perform static code analysis to further understand the sample’s inner workings
Perform dynamic code analysis to understand the more difficult aspects of the code
If necessary, unpack the sample
Repeat steps 2, 3, and 4 (order may vary) until analysis objectives are met
Document findings and clean up the laboratory for future analysis
The following section describes each step with the common and popular tools used to achieve the goal.
Examining malicious software involves infecting a system with the malware sample and then using the appropriate behavior analysis tools to observe its interaction with the system. This requires an isolated laboratory environment that you can infect without affecting your production environment. The most common and flexible way is to use virtualization software (e.g., VMware or VirtualBox).
To understand the threat associated with the sample, the analyst needs to examine its behavior in the controlled environment already setup in the previous step. He uses Process Monitor to study the process, network, file, and registry interactions between the malware and the operating system.
Process Monitor is a common tool for capturing the following events:
Registry: Capture registry keys query, read, and creation operations.
File system: File creation, writing, deletion from local hard drives and network drives.
Network: Show the source and destination of TCP/UDP traffic, but it doesn’t show the data.
Analysts use Wireshark to capture data. Packets can be filtered based on source destination IP/port by Process Monitor.
Process: Shows processes and threads creation and exit, etc.
Profiling: Checks the amount of CPU time used by each process or the malware being studied and the memory use.
Is the malware a known binary?
To check if the sample is a known binary based on its hash or if it is similar to something already known based on its signature, the analyst could submit it to VirusTotal. VirusTotal is a sandbox tool for malware identification owned by Google. The tool has the biggest repository of malware and known file types around.
Malicious binaries are typically stripped of all symbols, obfuscated and packed. In addition, they implement plenty of anti-debugging and anti-analysis tricks and checks for analysis environments. Packing a program is compressing or encrypting the instructions and data in order to save disk space. It’s widely used by malware writers. Many packers automatically include anti-disassembly, anti-debugging, and anti-VM techniques to further complicate the analysis.
The packer can be identified based on its signature or by using heuristics. PEiD is a popular tool that can identify most common packers, cryptors, and compilers for PE files. It packs more than 600 different signatures in PE files, which make its detection rate higher than that of other similar tools.
There are several heuristic techniques to determine whether a program is packed, including sections with high entropy, weird section names, and few entries in the import table, etc. Mandiant’s Red Curtain tool computes entropy of sections. High entropy means that the program is likely packed or encrypted. The tool also scans for packing signatures and computes a threat score.
There are several approaches to unpacking a program. One first approach could be to manually reverse-engineer the packing stub and write the corresponding unpacking tool, but this is complex and time-consuming. An automatic and dynamic approach could be dumping the binary containing the unpacked program. In a few cases, the program can be unpacked automatically using a tool (e.g., the UPX tool, using –d option). PEiD comes with a set of plugins, including an UPX unpacker.
Disassemblers are among the tools that can be used to statically analyze binary programs and further understand the malware’s inner workings. These tools do not require the analyzed module to operate; it can be safer to use static analysis if it is known that the module under analysis is malicious. A disassembler converts machine language into assembly language. IDAPro is popular tool for doing this job.
In order to determine the higher-level logic of a function, such as loops, switches, and conditions, the malware analyst can use a decompiler. A decompiler converts assembly code into source code in a higher-level language such as C++ or C. Paid versions of IDAPro come with a C/C++ decompiler called Hex-Rays Decompiler. An alternative is to use a similar tool called Snowman.
The last step is to document the findings and analysis results in a report that summarizes the answers to the predefined questions. The analysis report covers, but not limited to, screenshots, notes, and observations.
Memory forensics is the process of investigating a memory dump to locate malicious behaviors. The dump is a snapshot capture of RAM memory at a specific point of time; it can be a full physical memory dump, a crash dump, or a hibernation file.
The investigator extracts useful artifacts from memory, including running processes, URLs, passwords, encryption keys, kernel modules, shared libraries, open sockets, active connections, and open registry keys. That information can be accessed by obtaining and analyzing the target computer’s physical memory dump.
A general approach to memory forensics would acquire and analyze physical memory.
Memory dump acquisition: can be performed using a program installed on the system, such as win32dd, win64dd, dumpit, or dd or by using dedicated hardware such as an internal acquisition card (PCI card), or sniffing direct memory access (DMA) transfer, or using a FireWire port. The difference is that the software may alter the system, in contrast to the use of hardware. However, using hardware may crash the system or lose information, in the case of FireWire. In addition, the hardware must be installed on the machine before an incident occurs.
Memory dump analysis: Many tools offer digital artifacts and analysis facilities. Volatility is the most popular memory forensics framework. It can extract digital artifacts from multiple types of memory (crash dump, core dump, hibernation file, etc). It provides an in-depth visibility into the runtime state of the system. Rekall is an advanced memory analysis solution. It is basically a fork of the Volatility memory analysis framework maintained by Google’s incidence response team.
To start the analysis, summary information of the dump can be viewed. This information includes the operating system version and target architecture (32 or 64 bits). The most commonly used analysis approach then is to list the processes that were running in the system, the loaded kernel modules, and shared libraries to locate malicious modules. The analysis can also cover other data, such as registry keys.
In addition to the active processes, the analyst should keep track of terminated and hidden processes, since they might also load malicious modules.
The analysis may end when malicious files are dumped. Then malicious file analysis comes to play as described in the previous section.
Emails are the main channel for worms, phishing, and the transportation of spam. Email forensics involves investigating email content and sources to reveal key information, such as the recipient’s identity, the trace path traversed by the message, the application used to compose the email, the timestamp when a message was generated, a unique message ID, etc.
Typically, email forensics consists of the following steps:
This involves investigation of port scanning metadata and keyword searching.
There are several approaches to email forensics such as header analysis, server investigation, client-side mailer fingerprint, network devices investigation, and bait tactics.
Many tools may assist in the study of source and content of e-mail message so that an attack or the malicious intent of the intrusions may be investigated. The following is a non-exhaustive list of email forensics tools:
Paraben E-Mail Examiner
Smartphone devices contain sensitive personal information such as contact lists, SMSs, calls, pictures, etc. This information can be used by attackers to impersonate the owner’s identity, so it is risky if it is lost or stolen. That’s why smartphones become an inevitable source for digital forensics. There are three primary approaches to smartphone forensics which focus on extraction of data that might be rightly challenged in a court of law.
General approaches to smartphone forensics
Manual Acquisition: The investigator browses the smartphone and takes pictures of each screen that contains important information. This technique does not alter the device and no tools are required to perform data acquisition. However, only data visible to the investigator can be recovered since only the user interface is used.
Physical Acquisition: The investigator clones the smartphone storage device and then normal disk forensic techniques are used (see Disk Forensics section).
Logical Acquisition: In this technique, little manual intervention or cloning is required. Here data available on the smartphone is acquired by automated tools for synchronizing the device and PC. With this technique, the investigator can’t acquire deleted data and unallocated spaces.
The following is a list of the popular tools available for smartphone forensics:
The goal of disk forensics is to acquire a copy of data resident on hard drives and USB memory sticks, analyzing it to extract digital evidence. The acquisition can be performed at the file level or the sector level. At the file level, the investigator can’t acquire deleted files and unallocated spaces. At the sector level, however, the investigator can acquire an exact copy of the device storage. If the storage is corrupt or damaged, then the investigator relies on file carving, which may recover data if the files’ metadata are lost. The most popular tools are the Sleuth Kit, Digital Forensic Framework, FTK, and EnCase.
Cloud forensics involves inspecting cloud components, which include logs, virtual machine disk images, volatile memory dumps, console logs, and network captures. Cloud forensic tools collect data from the cloud, image the instances, and recover data from cloud instances. FROST is a forensics tool for OpenStack.
Logs generated by the operating systems and applications are segregated and parsed to generate useful information. Correlation mechanisms are applied to find relationships between logs and external or internal events.
Computer Forensics Training
Simson Garfinkel, in his paper “Digital Forensics Research: The Next 10 Years,” summarized current forensic research directions and qualified the current digital forensic age as the “Golden Age of Digital Forensics.” This age is marked by the rapid growth of digital forensics in industry and research and is characterized by the widespread use of digital forensics tools, solutions, courses, research papers, and books. However, digital forensics faces a crisis for many reasons, including the increase in device storage sizes, the increased proliferation of operating systems and file formats, the increase in malware complexity, which no longer persists in storage, in addition to storage encryption and the use of cloud storage. All together, these factors will make the current forensics approaches irrelevant and new analysis models should be considered ; they include, but are not limited to, stream-based disk forensics, stochastic analysis, and prioritized analysis. In addition, new important issues, such as scalability, should be addressed in research, and attention must be given to cooperation, standardization, and shared development between research and industry entities to survive potential crises in digital forensics.
The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory
Practical Malware Analysis: A Hands-On Guide to Dissecting Malicious Software