Malware analysis

Portable Malware Lab for Beginners – Part 2

October 9, 2013 by Aparajit i

In the previous article, “Portable Malware Lab for Beginners,” I spoke about nested virtual machines, i.e., deploying a virtual machine with QEMU and Cuckoo. This acts as a base system for our portable malware analysis lab.

However, malware analysis is not limited to execution of a Windows binary; various other aspects are also involved. The main goal of malware is to gain privilege rights into the system which it intends to infect. In order to do so, various methods are used, e.g.:

  1. Transmission via email.
  2. Infection via web pages or hacked web servers.
  3. Infection via removable media.

One may come across many email attachments containing a malicious file. It can either be a zip file that may contain an exe, a pdf, or, in some cases, a Word file/spreadsheet. It should also be noted that malware authors will always try to mask the icons of the files to make them look like they belong to a specific application, e.g., a PDF icon for a binary. They may also make use of right-to-left override Unicode characters to spoof the file extension as shown below.

However, the most interesting of this lot is infection via web browser. In order to facilitate such infections, infection through the browser is the most common occurrence, wherein JavaScripts, Applets, JARs, and Flash objects are used extensively.

However, we will first go through with the deployments of ssdeep and yara. These tools will be helpful as you complete the integration of your portable malware lab. As said in the earlier article, the portable malware lab is not just as an amalgamation of different tools; it is intended to help you build a system that will contain all the tools necessary for analyzing malware. The output of several tools can also be integrated with other tools to provide a better overview of the behavior of malware.

YARA is a tool aimed at helping malware researchers to identify and classify malware samples. With YARA, you can create descriptions of malware families based on textual or binary patterns contained in samples of those families. Each description consists of a set of strings and a Boolean expression that determines its logic.

SSDEEP is a program for computing context-triggered piecewise hashes (CTPH). Also called fuzzy hashes, CTPH can match inputs that have homologies. Such inputs have sequences of identical bytes in the same order, although bytes in between these sequences may be different in both content and length.

Note: execute as root user.

$sudo su
Note: After executing this command, go to the directory where you have been downloading and extracting the files, used in the previous article.

# apt-get install ssdeep #apt-get install python-pyrex python-all python-all-dev
Note: these are required for pyssdeep installation
#apt-get install subversion libapr1 libaprutil1 libdb4.8 libsvn1 # apt-get install libfuzzy-dev libfuzzy2 # svn checkout pyssdeep
# cd pyssdeep # python build # python install

$ sudo apt-get install libpcre3 libpcre3-dev $ sudo apt-get install python-dev $ wget $ wget
Untar and configure YARA.
$ tar xvfz yara-1.7.tar.gz $ cd yara-1.7 $ ./configure
If there are no errors, make the executables.
$ make $ make check $ sudo make install
$ cd .. $ tar xvfz yara-python-1.7.tar.gz $ cd yara-python-1.7 $ python build $ sudo python install
You should now be able to call YARA from a shell prompt.

$ yara usage: yara [OPTION]… [RULEFILE]… FILE options:
-t <tag> print rules tagged as <tag> and ignore the rest. Can be used more than once.
-i <identifier> print rules named <identifier> and ignore the rest. Can be used more than once.
-n print only not satisfied rules (negate).
-g print tags.
-m print metadata.
-s print matching strings.
-d <identifier>=<value> define external variable.
-r recursively search directories.
-f fast matching mode.
-v show version information.

JavaScript, Applets, and Jar

In order to analyze JavaScripts we may use Google’s V8 engine or Rhino for parsing. In this tutorial, we will deploy both Rhino and V8 on our portable lab. However, as Rhino is based on Java, we also need to deploy Java libraries to ensure its smooth functioning.

When speaking about web-based threats, it is imperative for us to deploy a honey-client that will emulate your browser and provide you with better insights into the web-based threats. These threats are mostly in form of embedded I-frames or obfuscated JavaScripts embedded within the web page or an applet.

The best emulator for various browsers is Thug, which is based on Google’s V8 JavaScript engine.

Thug is a Python low-interaction honey-client based on a hybrid static/dynamic analysis approach.

Thug provides a DOM implementation which is (almost) compliant with W3C DOM Core, HTML, Events, Views, and Style specifications (Level 1, 2 and partially 3).

Thug makes use of the Google V8 JavaScript engine wrapped through PyV8 in order to analyze malicious JavaScript code and the Libemu library wrapped through Pylibemu in order to detect and emulate shell codes.

How to Install Rhino

# apt-get install ca-certificates-java default-jre-headless icedtea-6-jre-cacao icedtea-6-jre-jamvm java-common libjline-java libnss3-1d librhino-java openjdk-6-jre-headless openjdk-6-jre-lib rhino tzdata-java

# rhino -help
Usage: java [options…] [files]
Valid options are:
-?, -help Displays help messages.
-w Enable warnings.
-version 100|110|120|130|140|150|160|170
Set a specific language version.
-opt [-1|0-9] Set optimization level.
-f script-filename Execute script file, or "-" for interactive.
-e script-source Evaluate inline script.
-modules [path-or-url]
Add path or URL to the CommonJS modules search path.
-main [module] Set CommonJS main module id or file name.
-sandbox Enable CommonJS sandbox mode.
-debug Generate debug code.
-strict Enable strict mode warnings.
-fatal-warnings Treat warnings as errors.
-encoding charset Use specified character encoding as default when reading scripts.

How to Install V8 and Thug along with the Various Tools

# mkdir tools
# cd tools
# svn checkout v8
# git clone git://
# cp /tools/thug/patches/V8-patch* .
# patch -p0 < V8-patch1.diff
patching file v8/src/log.h
Hunk #1 succeeded at 82 (offset 1 line).

# apt-get install graphviz libcdt4 libcgraph5 libgraph4 libgvc5 libgvpr1 libpathplan4

# apt-get install build-essential libboost-python-dev
# apt-get install libboost-all-dev

# svn checkout pyv8
# export V8_HOME=/path/tools/v8
# cd pyv8
# python build #Ignore warnings
# python install
# cd ..

# easy_install beautifulsoup4
# easy_install html5lib

# git clone git://
# apt-get install autoconf2.13 libltdl-dev libtool
# cd libemu
# autoreconf -v -i
# ./configure –prefix=/opt/libemu
# make install
# cd ..

# git clone git://
# cd pylibemu
# python build
# python install

# easy_install pefile
# easy_install chardet
# easy_install httplib2
# easy_install cssutils
# easy_install zope.interface
# easy_install pyparsing==1.5.7

Note: This setting is for Python 2.x. In case you are using Python 3.x, then execute the command: # easy_install pyparsing

# easy_install pydot
# easy_install magic

Now test to see if it’s working. If you get the “ImportError: cannot open shared object file: No such file or directory” error, follow the solution below:

# touch /etc/
# echo "/opt/libemu/lib/" > /etc/
# ldconfig

Note: To execute thug
# cd thug/src
# python

Thug: Pure Python honeyclient implementation

python [ options ] url

-h, –help Display this help information
-V, –version Display Thug version
-u, –useragent= Select a user agent (see below for values, default: winxpie60)
-e, –events= Enable comma-separated specified DOM events handling

This screen grab shows the output from Thug along with the various sources that were downloaded:

Moving ahead, we now work with the tools related to deobfuscation/reversing of JAR files. The only decompiler that has worked up until now for me has been JD-GUI.

# wget
# mkdir jd-gui
# tar xvfz jd-gui-0.3.5.linux.i686.tar.gz -C /Path/to/your/jd-gui/
# cd jd-gui
# ./jd-gui

PDF Analysis

For analyzing PDF documents, peeppdf is the tool. Peeppdf is a Python-based tool that will assist the researcher in knowing about a PDF without the need for any additional tools. Since peeppdf uses V8 and Pylibemu, it also provides wrappers for JavaScript and Shell code analysis.

Features of peeppdf have been outlined below and more information can be found at

  • Decodings: hexadecimal, octal, name objects
  • More used filters
  • References in objects and where an object is referenced
  • Strings search (including streams)
  • Physical structure (offsets)
  • Logical tree structure
  • Metadata
  • Modifications between versions (changelog)
  • Compressed objects (object streams)
  • Analysis and modification of JavaScript (Spidermonkey): unescape, replace, join
  • Shell code analysis (Libemu python wrapper, pylibemu)
  • Variables (set command)
  • Extraction of old versions of the document
  • Easy extraction of objects, JavaScript code, shell codes (>, >>, $>, $>>)
  • Checking hashes on VirusTotal

During the installation of Thug, we have already deployed V8 and pylibemu, so we need not go through the entire process once again. However, for peeppdf to provide all the mentioned functionality, “lxml” is the required package that needs to be deployed.

# pip install lxml

For more information about installation of “lxml,” refer to its installation guide at:

Now we proceed with the installation of “peeppdf”.

# wget
# tar xvfz peepdf_0.2-BlackHatVegas.tar.gz
# cd peepdf_0.2-BlackHatVegas
# python ./

Usage: ./ [options] PDF_file

Version: peepdf 0.2 r158

-h, –help Shows this help message and exit.
-i, –interactive Sets console mode.
Loads the commands stored in the specified file and
execute them.
-f, –force-mode Sets force parsing mode to ignore errors.
-l, –loose-mode Sets loose parsing mode to catch malformed objects.
-u, –update Updates peepdf with the latest files from the
-g, –grinch-mode Avoids colorized output in the interactive console.
-v, –version Shows program’s version number.
-x, –xml Shows the document information in XML format.

Deploy Radare, Pyew, and Bokken

While researching, it is quite possible that researchers will come across a variety of samples and they need not be of the same file type. Static analysis is as important as dynamic analysis and this is where Bokken, Radare, and Pyew help us. It is basically a GUI front end for Pyew and Radare projects.


Pyew is a malware analysis tool developed in Python that provides a variety of features, including viewing HEX, disassembly, PE and ELF file formats, and code analysis. It also allows you to write scripts.


Radare, on the other hand, is used for disassembling, debugging, and a variety of tasks.

# apt-get install bokken libdistorm64-1 libgtksourceview2.0-0 libgtksourceview2.0-common libradare2-0.9 pyew python-gtksourceview2 python-radare2
# apt-get install libtidy-0.99-0 tidy python-utidylib
# apt-get install radare radare-common
# apt-get install radare-gtk

Windows-Based Malware Analysis Applications

Since we are using a Linux system and there are numerous Windows programs that are actively being used for analyzing malware, let’s deploy WINE, a windows emulator. By deploying WINE, we will be in a position to use a few of the Windows tools that are being used by researchers. However, there are certain limitations to their use, depending on the packages you have selected to use with WINE.

# apt-get install wine
# wine –version

Now that we have deployed WINE, the first Windows application that we download and deploy is Malzilla. According to the author, it’s a malware hunting tool. However, to summarize the usefulness of Malzilla in a sentence wouldn’t be possible. Since most of the present day malware and exploits are browser-based, Malzilla offers an excellent platform to analyze and reverse-engineer these types of malwares.

Download Malzilla from the below mentioned location and extract the contents from an archive. No installation is required.

# cd Path_To_Malzilla
# wine ./malzilla.exe


Another Windows-based tool that is excellent for deobfuscating JavaScripts is “ReveloJS,” written by Kahu Security. Extensive tutorials and examples have been made available for this tool. To read more about it and to download it, visit this link. The download link is at the end of the article.

The Article:

# unzip -d /Path/to/reveloJS/
# cd reveloJS
# unzip

SWF Investigator

SWF Investigator is a tool developed by Adobe and is extensively used for analyzing SWF files. It also allows you to conduct static as well as dynamic analysis.

# wget
# wine ./swfinvestigator_p5_win_update_052213.exe

WINE will proceed with the further execution of the executable and the rest of the installation is just like any other Windows application installation.


These two articles were created with an intention of assisting you create your own malware analysis lab in portable mode. Since this is heavily dependent on virtual machine, it is recommended that you ensure that proper backups of all the virtual hard disks are maintained.

Also, there are numerous tools available for *nix/Windows that have not been included, but they can always be used within this environment, either by utilizing the power of WINE; or, by using the method described to implement nested VMs, one can very well deploy an MS Windows OS and the Windows-specific tools.

Note: IDA Pro and Ollydbg function best within the MS Windows environment.


Posted: October 9, 2013
Articles Author
Aparajit i
View Profile

Aparajit i has worked in IT Security for more than 10 years with varied experience. Finding newer methods for detection of malware is a passion. Spare time is reserved for tracking botnets, CnC servers and writing articles for Infosec. Contact via Twitter : @iaparajit