Malware Analysis - Follow along reversing the German government's "Bundestrojaner"
I'm reasonably sure that anyone reading this particular article has heard about viruses, worms, trojans and malware; as well as numerous antivirus products like Symantec, McAfee, AVG and many others. Now, whenever our computer is infected by a virus, most of us hope that it is detected and removed from our computer by the antivirus software that we're running. Many a time though, this isn't the case, and you have to try and find out how exactly the virus is behaving. You then Google a lot for 'Manual removal instructions' and follow various 'How Tos' available online. Sometime you're lucky..sometime you're not. If you do succeed in removing the virus somehow, that's great. What if you're not able to though?
That's where an article like this comes in handy. In this article, we will reverse engineer the dropper for a relatively infamous trojan – the Bundestrojaner - - developed by the German government for spying purposes. Slate says the trojan “is sent disguised as a legitimate software update and was capable of recording Skype calls, monitoring Internet use, and logging messenger chats and keystrokes. It could also activate computer hardware such as microphones or webcams and secretly take snapshots or record audio before sending it back to the authorities.” In this article, we will adopt a step by step approach to reverse the trojan dropper; in a subsequent article we will reverse the driver and other user-mode components of the trojan.
I'd like to let you know here, right at the start, that I will be doing a little disassembling of assembly code in a debugger later in the article. So, if you're not that familiar with the subject itself, I strongly recommend that you stop reading this, go through the references I have mentioned and then come back here.
Please ensure that the machine on which you are running this malware is not connected to the network. Using a virtual machine is a good idea. For 1 sample guide on how to set up a malware lab you can go here.
That's a long enough introduction. Let's get started :)
A high level view...
The malware sample that we are running can be downloaded from here (note: password on the file is 'infosec' and you will need to rename this .rar to scuinst.exe). I ran this inside my VirtualBox virtual machine running WinXP SP2 to understand what all the malware "does" when it runs.
I took snapshots of my 'Startup' state using Sysinternals Autoruns, took a snapshot of my filesystem and registry using Regshot, listed all running processes using Process Explorer [And then paused it; we'll restart it after the malware runs and see if any new processes have started], started Wireshark to record any network activity, Process Monitor to get a low level detailed record of 'ALL' activity and CaptureBat to keep a track of files that the malware deletes.
In case you're not familiar with any of these tools for some reason, it's best to quickly read up on all of these. They're quite common and very easy to use as well. Here are a couple of blog articles I wrote a while back to get you up to speed. Moving on...
I started the malware, let it run for a while and then stopped the various captures on all of my tools. Here is what I found:
--- This seems to be some kind of an installer package which drops other files [DLL and SYS] on to the disk. The exact paths of these files are C:WINDOWSsystem32mfc42ul.dll
--- No visible network activity recorded
--- It looks like it copies another file to the Temp directory. The name of the file in Temp is
C:Documents and SettingsarvindLocal SettingsTemp~acrd~tmp~.exe. What this file is, is still unclear; we'll need to look at it separately.
-- It creates a blank file C:WINDOWSsystem32~pgp~.tmp for some reason. Why? Still unclear
-- It then deletes the original executable from its original location. This can be confirmed by the fact that the EXE vanishes automatically.
-- Adds a registry key that looks like a driver
So to sum up, from just dynamic analysis, it looks like this malware behaves like a classic installer and does nothing else. It just dumps files on to the disk and then deletes itself. That's quite a lot of valuable information. Let's now confirm what we found and see what else we can learn about this malware by looking at its disassembly listing and running it inside a debugger.
And now a deeper view....
Let me reiterate again, to follow this section you need to understand the basics of debugging and how to use one. If you don't, I strongly suggest you go back and read a little bit about them and then come back here.
Right, the focus here is to understand what the malware is doing and not do a complete disassembly of every single function in the code. You'll see why in just a minute. Open IDA Pro (I have the free version) and load the malware into it. Once its loaded navigate over to the functions Tab. And there are a total of 306 functions :)... doesn't make sense looking at each and every one of them. Right?
So now we need to pick and choose which functions we want to look at. So Open up Olly 2.01 and load the malware into it. The moment you load up the binary in Olly; some analysis is done and we halt at the address 404EDD.
Hitting F8 (To Step Over) will take you to the next instruction at 404EDE. Anytime there is a CALL instruction, which you are interested in, you 'Step Into' it by hitting F7. This enables you to study that particular function better. So moving on, the first familiar Windows function call made [at 404F03] is Kernel32.GetVersion. So the malware first gets the Operating System of the target machine. Makes sense. What's the next function call? Scrolling down a little, we find that the next function call is at 404F63; Kernel32.GetCommandLineA.
But wait...do you see that there are 5 CALL instructions before this instruction?
This effectively means that there were 5 other functions that the malware called before calling GetCommandLineA. Internal to these 5 functions; there might be further functions that were called. So the point really is, there could be a huge number of functions before the next "known" API is called.
You have a decision to make at this point of time; do you want to go into any of these functions or not? Remember there are a total of 306 such functions? One method that I use to decide is look at the names of these functions in IDA's list and see if IDA has given it a meaningful name [sub_ADDRESS is what I say is NOT a meaningful name]. If it has, then I skip that function for now. Yes, maybe at a later point I will return to this function to drill down deeper...but for now...on the first pass.. I will skip it.
So let's look at these 5 calls in IDA. The starting addresses for each of these functions are as follows:- 4061C1, 40500A, 405EAC, 40500A again and finally 408BC6. Do these functions have names in IDA?
The first function 4061C1 doesn't but the remaining 4 do. The best way to search for these is to sort by Start Address in IDA (click on top row) and then search for these addresses. When you find them, look at the name column to get the name. Since 4061C1 didn't have a name, we drilled down a little into it...and found out that primarily allocated heap memory. That's a reasonably common function that you will find while analyzing most executables; so its nothing specific to this malware.
So we just rename this function in IDA as "heap_alloc" and go on. Its a good practice to name any functions and variables that you have already looked at. It gives you greater clarity at a later stage.
So here are the other names that I got from IDA.
This is how we will continue to proceed while analyzing the entire executable. So, I won't break it down so slowly as I move forward..and will draw conclusions much faster.
So we ignored all 5 CALL's and GetCommandLineA and went on. The next 2 functions that we notice are GetStartupInfoA at 404F8E and GetModuleHandle at 404FB1. Seeing these 2 functions should tell you that you're pretty close to the "real" start of the program :)
The next 2 CALL functions are at 403D50 and 4080AE. Let's step over them..and continue analyzing. Oops....
Look at the left bottom corner. Process Terminated???? Well...it just means that there was something in one of those 2 calls which caused the program to exit. Which one? And what did the malware do? We never found THAT out at all...
Patience :). Let's restart the program and put a breakpoint on the first call at 403D50.
Now let's explore a feature called the Hit Trace in Olly. Its a really nice feature which will put a little red dot by the side of every single instruction that was executed at least once. So once you've put a breakpoint, click on Trace and then Run Hit Trace.
Saw the red dot? Start of the 3rd column? Yeah..so every instruction that was hit at least once has a little dot there. This is really helpful in understanding Jump flow and which loop was called when. You'll notice gaps inbetween...that usually means there was a JMP somewhere. Now let's Step Over once (F8)....not yet terminated....reach the second call...step over...terminated.
Cool. So its the last call which is terminating the program. We can add a comment in Olly if we want at this point; although its quite clear in this case..so let's go on. Let's look at the function names that IDA has decided. Logically..these shouldn't have any meaningful names. The names should just be sub_403D50 and sub_4080AE.
Oops. Not really. They're calledWinMain and __exit respectively. Turns out IDA is really smart. Somehow it has internally decided that the function at 403D50 is the "real" start and has called it WinMain, which is the start of any program. Also, we found out above that the function exits when we step past the 2nd call..Right? IDA has cleverly called that __exit. The moral of the story is – do make an effort to understand IDA better. I know I'm still learning :). So now logically...if the last CALL is called __exit and the program does actually exit after it; it means that the heart of the program is in the CALL at 403D50. Yeah..makes sense..so let's "Step Into" (F7) the call at 403D50.
Now I'm going to go a little faster and point out all the interesting areas directly. I'll assume that you now know enough to stop and start a debugger and follow through with me. Inside the CALL at 403D50 there is another CALL to 403A20. In this function, the malware checks if it can write to the System32 directory and exits the process if it can't. It does so by checking the return value of the CreateFileA function, which in this case is 00000030 and hence valid.
It then goes ahead and deletes the file it just wrote.
It then gets the System Directory into which it mostly (we know this from dynamic analysis) plans to dump more files.
Eventually, there is a call made to a function at 4010F0 and the complete path for the DLL file we saw earlier; in this case – C:WindowsSystem32mfc42ul.dll used. A blank file by the name mfc42ul.dll is created here.
Later on this DLL is filled up with content. This is done at 4010CF when a call to the function 401090 is made.
At 403EC4 there is a call made to 401E60 with 2 arguments; one is a file called mfc42.dll (originally present in System32) and the other is the new DLL that was created here (mfc42ul.dll). From inside 401E60 a call is made to 401DC0 where the "File Times" for mfc42.dll (which was already there) are grabbed.
The file mfc42ul.dll's time is now modified and set to this time. This is to make the file really hard to find in case there is any forensics done on the machine to find newly created files. This is done in the function 401E10 which is called at 4101E97.
Once you're done with all this you'll exit from all the functions that were called and come to the address 403F0E. Here you can see a comment in Olly which says "winsys32.sys". That, while not conclusive, potentially points to the malware being done with mfc42ul.dll and now turning its attention to winsys32.sys. Note though, that malware authors might put in comments just to lead analysts down the wrong track.
The first useful thing with respect to winsys32.sys that we see is at 403F37. A CALL to 4011C0 is made where the malware checks whether the process is running on a 32 bit or a 64 bit system using the IsWow64 function.
You then step into a call at 403F49 and then into the calls at 40107B, 4010A9 and then 401109. This will lead you to a function starting at 401160 where File System Redirection is disabled and reverted. All of this is done (I think) because winsys32.sys may be some driver which will work only on a 32 bit or a 64 bit system. For now we won't go too deep into that and will go on.
Exiting from 401160 brings us back to 40110E. After a few F8's we come to another CreateFileA call which creates a blank winsys32.sys file in System32.
Similar to how content was written into mfc42ul.dll, there is content written into winsys32.sys using a WriteFileA call.
Then we step through a CALL at 401A00 and come to 401BC0 where the malware tries to get the address of the function CreateServiceA from the DLL advapi32.dll. If you didn't already know, a DLL exports a lot of functions; advapi32.dll exports winsys32.sys in this case, which is what the malware wants.
The function is then called and a service created.
We exit from the function to then find out that another call to 401E60 is made to change the time of winsys32.sys also to that of the original mfc42.dll. Again, this is to obfuscate things mildly.
A call to 4017C0 is made to get the Temp Path of the system. The same routines that were used earlier to create and write the earlier 2 files are called again. Only this time the file that is written into is in the Temp directory and is called ~acrd~tmp.exe. What this is we really don't know. We'll look at that some other time...
And finally the original file is set to be deleted using the MoveFileExW call..
At last!! We've only now come out of 403D50. Remember WinMain? Ages ago? Yes...its only now that we have come out of it :). All that remains is to look at how the program exits. Step into the CALL at 4080AE and after some more F7's and F8's you will come to an ExitProcess call at 40816D which terminates the process.
Well, I tried to make that as simple as I could to encourage as many people to read it as possible. I know how difficult it is for someone who is starting out in reversing but unsure on how to proceed. Hopefully this article has something for the newbie reverser and is also an enjoyable read for an experienced reverser.
In another article InfoSec Institute will take a look at the other components of this malware – the EXE file in Temp, the DLL as well as the kernel driver. Until then.. enjoy this article :)
- A series of articles for starting off in reversing – http://ardsec.blogspot.in/2011/07/reverse-engineering-6-malware-lab-setup.html [My blog :D]
- The IDA Pro book - http://www.amazon.com/The-IDA-Pro-Book-Disassembler/dp/1593272898/ref=sr_1_1?ie=UTF8&qid=1333485647&sr=8-1
- Secrets of Reverse Engineering – Eilad Eldam - http://www.amazon.com/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817/ref=sr_1_2?ie=UTF8&qid=1333485647&sr=8-2
- MSDN Windows Api reference - http://msdn.microsoft.com/en-us/library/windows/desktop/hh447209%28v=vs.85%29.aspx
- The most awesome reverse engineering forum – http://www.woodmann.com/forum/ [Do register; its a really nice bunch who's helped me many a time]