Overview of Automated Malware Analysis in the Cloud
Malicious attackers are constantly on the lookout for new and advanced attacks, which they use to spread malware around the world. There are a vast number of malware samples spreading around the Internet by using different attacks vectors: malware can spread as email attachments, drive-by download attacks, watering hole attacks, etc. Because of the vast number of malicious samples being distributed by attacks, automated malware analysis techniques are a necessity. In this article, we’ll take a look at different automated malware analysis tools provided online and evaluate whether malware can be used to detect if it’s being executed in such an environment.
There are millions of malware samples being distributed around the world on a daily basis, which makes malware very widespread today. Despite so many malware samples constantly targeting different sectors of businesses, only a handful of them are actually new malware samples that do things differently. The majority of malware samples are a simple derivations of known malware samples, which have already been analyzed. Therefore, they can almost always be thoroughly analyzed by using one of the cloud automated malware analysis platforms – we can choose from many such services, which will be presented later in the article and are mostly free for use. The problem with new malware samples is that they can be too complex for cloud automated malware services to analyze them, since they can use different techniques to detect an automated malware analysis environment and execute a valid program instead.
Malware samples can use the following techniques to detect whether they are being executed in an automated malware analysis environment:
- Detecting a sandbox: a sandbox provides a virtual environment where a malware sample can be executed to determine whether the sample is malicious or not.
- Detecting a debugger: when malware is analyzed in a debugger, it can use different functions and techniques to detect if it’s being analyzed. The debugger is usually used when analyzing malware samples manually and not by automated cloud malware analysis services, but can still provide different barriers a malware analyst must overcome to be able to analyze malicious samples.
- Detecting a virtual environment: almost all of the cloud automated malware services are analyzing malware samples in a virtualized environment. This is because they provide many advantages that are quite useful when doing malware analysis. An especially useful feature is snapshots, which can be used to revert the virtual machine prior to malware infection. So we can setup a virtual machine, usually running Windows operating system, install all the required tools that we need for malware analysis, and create a snapshot of that virtual machine. Then we can run the malware inside that virtual machine, obtaining all the interesting pieces of information we can get in order to determine whether the sample is malicious and what it does. After the analysis is complete, we can revert back to the snapshot we had created earlier and start with a clean system ready to analyze another malware sample.
1. Cloud malware analysis services
There are plenty of automated malware analysis services on the Internet, most of which are free and can be used by anyone. Despite the services automating most parts of malware analysis, the analyst still requires deep understanding of what he’s looking for in order to understand the service’s output. In this article, we’ll take a look at the output provided by services supporting analysis of PE file formats or Windows executables, which are the following:
- Anubis : http://anubis.iseclab.org
- Comodo : http://camas.comodo.com
- Malwr : https://malwr.com/submission
- Threat Expert : http://www.threatexpert.com/submit.aspx
- Threat Track : http://www.threattracksecurity.com/resources/sandbox-malware-analysis.aspx
- Vicheck : https://www.vicheck.ca
- Hybrid Analysis: https://www.hybrid-analysis.com/
2. Determine if binary sample is malicious
Prior to using any of the services above, we might want to analyze a binary sample on VirusTotal, which will give us an indication whether the sample is malicious or not. If the binary sample is quite new, there’s a good chance that the binary won’t be detected as malicious, because the anti-virus companies didn’t yet have time to update their signatures. Since the anti-virus solutions don’t check only the signatures, the sample must be advanced enough to subvert any other detection mechanisms anti-virus solutions are using to detect malicious samples.
Let’s take a look at the malware known as “CRDF.Malware-Generic.1124918328”, which was first analyzed at VirusTotal on 17.1.2015, which is the current date. The analysis results are present at the following link. Right on the top of the page, we can see that on the current date only 5 out of 57 antivirus solutions detected the file as malicious. We can see the results on the picture below, where different information regarding this malware sample can be obtained. The SHA256 can be seen on the top of the page, which is normally used when identifying malware samples. Also all five antivirus solutions that detected the binary sample as malicious are presented on the bottom of the page, together with the name of the malware and the last time the signatures have been updated.
On the picture above, we can also see that there are different tabs we can look at to obtain more information about the analyzed sample. So far we have determined that the file is probably malicious, but getting more information from the malware sample is useful, so we can eliminate false positives and determine what the malware does. We want to obtain the necessary knowledge to determine whether the malware sample writes some files to the filesystem, whether it connects back to the C&C server to fetch and execute commands, whether it modifies certain registry keys to achieve persistence on the infected machine, etc.
On the “File Detail” tab, the compilation timestamp can be seen, which provides the information about when the malware sample was compiled by a malicious attacker. The PE windows executable file contains 5 different PE sections: .text, .rdata, .data, .rsrs, .reloc, which can be seen on the picture below. For each of the sections, their virtual address inside the file as well as their virtual size are given together with the MD5 of the entire contents of the section. This information can help us determine whether certain malicious samples have the same sections, because it’s often the case that malware authors won’t change the section containing the malware resources (the .rsrs section), but will only change the actual code that will be executed on the system (the .text section).
On the same tab we can also see all the DLLs used by the malware sample, which can give us an insight into what the sample is doing. Each of the imported DLLs can also be expanded to display the functions belonging to this DLL and used by the malware sample. If the malware is trying to detect whether it’s being debugged, it most often uses the IsDebuggerPresent function, which is part of the kernel32.dll DLL. If we expand the KERNEL32.DLL module, we can see that this sample actually uses this function, which gives us a clear indication the malware samples doesn’t want to be debugged. Therefore we can be fairly certain that this current sample is malicious, because valid programs rarely have the reason to use the IsDebuggerPresent function.
In the “Additional information” tab there is some additional information regarding the malicious sample as MD5/SHA1/SHA256 hashes, the ssdeep signatures, the size of the file, the file type, the TrID statistics, etc. The “Behavioural information” tab contains some useful information, like a note that the sample is using the IsDebuggerPresent API function, but other than that, there isn’t any other information.
3. Determine what the malware does
So far we’ve looked at VirusTotal, whose goal is to give us information about whether the binary sample is malicious or not, but doesn’t present much additional information regarding what the malware does. So in order to determine the true purpose of malware, we can turn to cloud automated malware analysis solutions we’ve already presented.
Let’s first take a look at Anubis. The submission of the file can be seen below, where it’s evident that besides choosing a file from our local disk, we also have to input the simple CAPTCHA to prove that we’re human. This is required to prove to the website that no scripts could be written to automatically submit malicious files for submission; the authors of the analysis engine are clearly trying to prevent spam or invalid entries from being submitted to their solution.
After pressing the “Submit for Analysis” button, the malware will get analyzed and a pretty status bar will be shown presenting the time needed to complete the analysis. An alert reader might have observed that the MD5 is shown on the picture below, which identifies each malware – if we go back to our VirusTotal analysis the same MD5 must be shown under the “Additional information” tab. That provides enough information for us to be confident that we’re working on the same file.
After the analysis has completed, the results of the analysis will be available in HTML, XML, PDF or TXT form as presented below. I usually use the HTML version of the report, since it’s the easiest to use, but other formats might come in handy depending on what you’re doing. If you want to save the details of the report for later inspection, you might open the report in a PDF, which allows you to save it to the hard drive and inspect it at any later time. The XML version of the report would certainly be best when writing a program that is able to automatically parse the results of analyzed malware.
An interesting thing about Anubis analysis is that it will show you a screenshot of a popup window if such a popup is detected. In the picture below, we can see that Anubis presented a dialog box showing the text “This is a test.”
The Anubis report also shows additional load-time DLLs that are used by the malware sample, like ws2_32.dll, which contains network-related functions that can be used to call back to the C&C server. The report also contains a list of run-time DLLs that were loaded after the executable has already been run. The picture below presents that the MSCTF.dll DLL was loaded at runtime. Currently, we can’t say for sure why the library was loaded, but by Googling we can determine that it can record keyboard and mouse inputs.
The ViCheck cloud analysis service doesn’t provide many details that we haven’t yet gathered. All of the information gathered from the uploaded binary is presented below.
Malwr is based on Cuckoo Sandbox and provides extensively more information that other analysis services. The “Quick overview” tab contains basic information that we already have, but also the screenshot of the entire desktop at the time of malware analysis. The picture below presents the popup box opened by the submitted binary.
The summary of files presents the files and directories accessed by the binary sample. There are standard temporary files created when the binary is run, so they are of no importance to use. The interesting part is the msctftime.ime, which after Googling a bit, appears to be a valid Microsoft file.
The interaction of the malware sample with Windows registry is presented below. The interesting part is the second entry, the AutoIt registry, which gives us an idea that AutoIt scripting language for automating Windows GUI is being used in the binary.
If we look at the “Static analysis” tab, we can see a list of all strings in a binary. An interesting part of the script is shown below, where it clearly states that this is a third-party compiled AutoIt script.
If we scroll to the bottom of the list, we can also see what’s presented on the picture below, which appears to be strings displayed to the user when something goes wrong in an AutoIt script.
The analyzed program appears to be an AutoIt script, which displays a text message in a popup box. Despite a few antivirus solutions saying that the file is malicious, it might not be, and those might be false positives. We can’t claim that with 100% accuracy since we’d probably have to manually analyze the file to determine what the binary does. It might just have been a simple AutoIt script created by some programmer trying to learn AutoIt language, but on the other hand, the answer might not be so obvious. Therefore, cloud malware analysis tools are developing new and improved features, which might be able to solve such uncertainties.
 Choosing the best Sandbox for malware analysis,
 5 Steps to Building a Malware Analysis Toolkit Using Free Tools,