Analysis of Malicious Documents 3
In the last part of the article series, we have seen some handy options of OfficeMalScanner like debug, scan, brute, etc.; learned about structures streams, etc. In this part of the article, we will take a look at how we can extract the shellcode from the malicious document and run extracted binaries through the regular analysis process.
In the last document, we have seen 3 offsets which OfficeMalScanner scan command identifies 0x90fca,0x90c53 and 0xf51. Now we have to carve an executable from these offsets and check which one of them has resulted in binary which executes the shellcode. For that, we will use Malhost-Setup like below
It must be noted that while extracting MalHost-Setup, it includes code to set up the runtime environment. To find this added code, use the wait option of MalHOst-setup like below
This command will create a sample.exe for this offset location. However, the interesting part is to note the patch instructions by the MalHost-Setup. Below we can see that the original bytes were 0xe8 0x00 at the offset and MalHOst-Setup has patched them to [0xeb 0xfe]
The analyst must keep this behavior in mind while trying to analyze the extracted binary sample.exe in this case with normal debugging tools like Olly DBG. Why? We see it below
Loading the sample.exe inside OllyDBG, we can get the new patched instructions
As we can see that the JMP instruction points to same address and the new patched instructions EB FE, we need to patch it with the pre-patch instruction E8 00 like below
Once this is done, now we have the executable with original instruction and ready for the analyst to analyze.
Now let’s analyze another sample, running OfficeMalScanner on it like below
We could see that it detects the malware to be of Rich Text File(RTF) format
As suggested using RTF utility included with OfficeMalScanner and running it on anothersample.doc like below
Produces below output
As we can see that the RTFScan utility has detected the shellcode pattern at 0x39 and dumped an unusual object at SV_anotherutility_1.bin. Now let’s extract the exe out of this bin file at offset 0x39 like below
Above command produces the following output9
As we have seen earlier the original bytes at 0x39(E8 00) is patched with [EB FE]. We can now use the above procedure and go ahead to carve out the executable from the document.
Well, this is all we are going to discuss the MS office applications and hope that now various structure, tools, and their syntax usage are now simpler for analysts to understand and to analyze any such malicious documents. Now let’s move onto another type of document known as Portable Document Format(PDF).
Portable Document Format(PDF)
PDF documents are used heavily both by enterprise as well as home-users considering their increased support and richness. Earlier these documents were touted as secure by researchers, however, as soon as the malicious threats via documents became mature these documents became the choice of malicious authors for spreading malware. These documents are greatly supported on the window and can easily infect the user. For example, the Adobe Acrobat integrates with Windows as a shell extension which means that even if users click the malicious PDF file for preview, even then an exploit can be triggered. Before we jump into various vulnerabilities that affect PDF files and tools for analysis, the analyst must know the structure of a PDF file.
The PDF file is a collection of elements which are responsible for precise execution handling of the document as a whole. PDF file has following components
- Header: Which contains information about the version of PDF.
Objects: After the header, there are some Objects which represent text, graphics, etc. in order to render PDF the document. They are of the format like
A B obj
Where A is object number and B is the version number in the PDF file. Sometimes between the obj and endobj, there are also some references to some other objects in the PDF file. They always end with ‘R’ character. We will see such examples later in this series. Also, it should be noted that data within objects are stored in the form of streams. These streams are often compressed and can be used by various algorithms. These algorithms are pointed by the /Filter keyword in the PDF file. Some of the popular algorithms supported by the PDF are FlateDecode, ASCIIHEXDECODE, etc.
- Xref: Cross-reference tables are references to objects offset in the table.
Trailer: Trailer is the appendix section of the PDF file as it contains
- The offset to location of xref table
- Number of objects
- Other metadata such as the location of the root object, creation date, etc.
We have talked about some of the keywords above like /Filter so let’s now look at all the relevant keywords which can provide instant hints to analysts while analyzing the PDF file.
- /AA: This defines the Automatic Actions that is embedded in the document when the user opens the document. It should be noted that events an also declared inside this like cursor movement to trigger a certain action.
- /ObjStm: This is used to define object stream which can hide certain other objects. We will see this in the later part of the series.
- /GoTo*: Redirected to the specified destination in the PDF file.
- /URI: Resource accessed as pointed by URL
- /SubmitForm and /GoToR: This indicates the data send to the URL.
- /Launch: This launches a program.
So, for this part of article series, we will cover only till this part of PDF. In the next part of series, we will see various tools that can be used to analyze the PDF document. We will also different challenges posed by these malicious documents to analysts.