In the last installment, we examined the PEB Loader Data Structure. We take up the discussion here.
Locate and Isolate the Embedded Decrypted Executable
Once the VAs of the necessary APIs are stored, we are back to the next instruction after the CALL at address 004085CD that we mentioned earlier.
The piece of code that follows is also of great interest:
004085D2 64A1 30000000 MOV EAX, DWORD PTR FS: ß get address of PEB
004085D8 8B40 08 MOV EAX, DWORD PTR DS:[EAX+8] ß get self image base from PEB (main module)
004085DB 8983 38153A00 MOV DWORD PTR DS:[EBX+3A1538], EAX ß store self image base
004085E1 8BBB 38153A00 MOV EDI, DWORD PTR DS:[EBX+3A1538] ß move self image base to EDI
004085E7 03BB 60153A00 ADD EDI, DWORD PTR DS:[EBX+3A1560] ß add EDI a constant (AE000)
The following five instructions are another example of obfuscation. The final result is always 10000, so it could just do MOV ESI, 10000. However, this could be a dynamic calculation of the size of the area needed to allocate based on the characteristics of the file wrapped with this loader.
004085ED BE 61010000 MOV ESI, 161
004085F2 03B3 5C153A00 ADD ESI, DWORD PTR DS:[EBX+3A155C]
004085F8 03B3 6C153A00 ADD ESI, DWORD PTR DS:[EBX+3A156C]
004085FE 81C6 00000100 ADD ESI, 10000
00408604 81E6 0000FFFF AND ESI, FFFF0000
Then uses VirtualAlloc API to allocate some extra memory with PAGE_EXECUTE_READWRITE access rights.
0040860A 6A 40 PUSH 40
0040860C 68 00300000 PUSH 3000
00408611 56 PUSH ESI
00408612 6A 00 PUSH 0
00408614 FF93 25153A00 CALL NEAR DWORD PTR DS:[EBX+3A1525] ß points to Imports Table above, at the address of the VirtualAlloc API.
Once the new memory area is allocated, it will start writing some code there. The first code transfer takes place a few instructions later.
00408631 F3A4 REP MOVS BYTE PTR ES:[EDI], BYTE PTR DS:[ESI] ß ESI points to 00408993, EDI is whatever address was returned by the VirtualAlloc API, and ECX which is the counter is 161.
The next code transfer to the allocated memory area:
0040865C F3A4 REP MOVS BYTE PTR ES:[EDI], BYTE PTR DS:[ESI] ß ESI points to 00402488, EDI is whatever address was returned by the VirtualAlloc API + 4349, and ECX this time is BE.
Next code transfer:
0040866A F3A4 REP MOVS BYTE PTR ES:[EDI], BYTE PTR DS:[ESI] ß ESI points to 00403140, EDI is whatever address was returned by the VirtualAlloc API + 4407, and ECX this time is 17B7.
Next code transfer:
00408678 F3A4 REP MOVS BYTE PTR ES:[EDI], BYTE PTR DS:[ESI] ß ESI points to 00406028, EDI is whatever address was returned by the VirtualAlloc API + 5BBE, and ECX this time is 255C.
The next interesting part of loader’s code is at address 0040868C where it calls a function that decrypts a portion of the code transferred to the previously allocated memory area.
0040896E 89CE MOV ESI, ECX
00408970 83E6 03 AND ESI, 3
00408973 75 12 JNZ SHORT 00408987
00408975 8B5D 10 MOV EBX, DWORD PTR SS:[EBP+10]
00408978 6601DA ADD DX, BX
0040897B 6BD2 03 IMUL EDX, EDX, 3
0040897E 66F7D2 NOT DX
00408981 C1CA 07 ROR EDX, 7
00408984 8955 10 MOV DWORD PTR SS:[EBP+10], EDX
00408987 3010 XOR BYTE PTR DS:[EAX], DL ß
after performing a few calculations, it XORes the byte in the memory location pointed by EAX with the value stored in DL. Starting address is 002D4349.
00408989 40 INC EAX
0040898A C1CA 08 ROR EDX, 8
0040898D E2 DF LOOPD SHORT 0040896E ß loop up 3DD1 times
Once we exit this function, we meet another code transfer.
004087CA A4 MOVS BYTE PTR ES:[EDI], BYTE PTR DS:[ESI] ß ESI points to the memory area on which the previous function was performing the decryption and EDI. EDI is whatever address was returned by the VirtualAlloc API + 161. In this case, ECX is the counter (which here is equal to 3DD1).
Before continuing, let’s take a look at how the decrypted code that is going to be transferred looks like. I am only showing a small code block from the beginning of it.
Well, it looks like that the decrypted code is an executable module, but obviously it is not yet well re-constructed in memory. The code that follows, aims to reconstruct the decrypted executable in memory, and the following code block show the beginning of the executable after this has been done.
002D01A1 º.´.Í!¸LÍ!This program cannot be run in DOS mode…$……..
It is quite obvious that the loader just decrypted a UPX packed executable module. When this happens, the most common scenarios are: to dump this memory area to a new file and launch a child process, or transfer the execution directly to the entry point of the decrypted executable in memory.
In any case, we are going to need this executable in order to analyse the next stage, so I am going to isolate it from memory in two simple steps. First, I am going to dump the whole allocated memory area because I know the executable is there, and then I will cut-off all the pre-pended code that I don’t need anymore, and save the file.
The following figure demonstrates the second step:
Figure 2 – Selecting pre-pended bytes to cut-off
At this point we can directly start working on the UPX packed executable we just saved, since the loader is going to jump to its entry point in memory anyway after writing its code from the allocated memory inside its own PE image address space.
002D006A FFE1 JMP NEAR ECX ß jump to entry point of the UPX packed file.
Going Through the Third Stage of the Loader
We can now start working on the UPX packed file we extracted from the loader’s memory during the previous part of the analysis. Manually unpacking a UPX packed file is quite trivial so I am not going to dedicate any more lines talking about UPX. Instead I will continue with the analysis of the code of the malware’s loader.
0040E5D4 55 PUSH EBP ß OEP
0040E5D5 89E5 MOV EBP, ESP
0040E5D7 81EC 34020000 SUB ESP, 234
0040E5DD C745 CC 0000000 MOV DWORD PTR SS:[EBP-34], 0
0040E5E4 C745 D0 0000000 MOV DWORD PTR SS:[EBP-30], 0
0040E5EB C745 D4 0000000 MOV DWORD PTR SS:[EBP-2C], 0
A few instructions later, we observe an attempt to detect whether the malware is currently running inside a sandbox. I can’t tell which sandbox the following trick was tested against by the author, but here is how it is implemented.
It actually pushes on the stack the absolute path of the directory in which it’s located and then pushes on the stack the string “sand-box”. Finally, it uses the strstr function to check if the absolute path contains this substring.
0040E666 51 PUSH ECX
0040E667 50 PUSH EAX
0040E668 FF15 AAF84000 CALL ntdll.strstr
0006FD4C 0006FD54 s1=”c:\users\r.c.e\desktop\matsui\upx_packed_decrypted.pe”
0006FD50 0006FF70 s2 = “sand-box”
If the check described above succeeds, the process will terminate.
During this stage, the loader will first copy the imports table from one location to another, and then it will attempt to create a child process and inject a thread to it. If you take a look, a few instructions later you will notice a few calls to the memcpy function through which copies the imports table are created.
Once the imports table is copied, there is a CALL to a function at address 0040E6F8. This function is dedicated to the creation of the child process and also calls another function dedicated to the injection of the malicious thread. In the next part, I am going to demonstrate two ways to keep control of the execution of the injected code on the new thread, which is set to run immediately after creation.
Keep Control on Injected Threads
By entering the function from the CALL mentioned above, we can see the piece of code that launches the child process is in suspended mode.
0040E808 56 PUSH ESI
0040E809 57 PUSH EDI
0040E80A 6A 00 PUSH 0
0040E80C 6A 00 PUSH 0
0040E80E 6A 04 PUSH 4
0040E810 6A 00 PUSH 0
0040E812 6A 00 PUSH 0
0040E814 6A 00 PUSH 0
0040E816 50 PUSH EAX
0040E817 6A 00 PUSH 0
0040E819 FF15 E4F14000 CALL kernel32.CreateProcessA
0006FC6C 00000000 |ModuleFileName = NULL
0006FC70 00403DB9 |CommandLine = “svchost.exe”
0006FC74 00000000 |pProcessSecurity = NULL
0006FC78 00000000 |pThreadSecurity = NULL
0006FC7C 00000000 |InheritHandles = FALSE
0006FC80 00000004 |CreationFlags = CREATE_SUSPENDED
0006FC84 00000000 |pEnvironment = NULL
0006FC88 00000000 |CurrentDir = NULL
0006FC8C 0006FCA0 |pStartupInfo = 0006FCA0
0006FC90 0006FCE4 \pProcessInfo = 0006FCE4
As you can see, the author chooses to launch svchost.exe as child process that wouldn’t make a user suspect something through the process names from the task manager (or any other process enumeration tool).
At this point we need to know the PID of the child process that is going to be created, which we can retrieve from the PROCESS_INFORMATION structure once the child process has been created. Because Windows is going to be more than one processes with the same name, we need to know which one was created by the loader of the malware in order to attach to that one later.
A few lines later at address 0040E828 will CALL the function dedicated to the injection of the malicious thread. It will first allocate some extra memory on the child process, still in suspended mode.
0040E847 C745 F9 0000000 MOV DWORD PTR SS:[EBP-7], 0
0040E84E 6A 40 PUSH 40
0040E850 68 00301000 PUSH 103000
0040E855 68 D4D50000 PUSH 0D5D4
0040E85A 6A 00 PUSH 0
0040E85C FF75 08 PUSH DWORD PTR SS:[EBP+8]
0040E85F FF15 A8F14000 kernel32.VirtualAllocEx
Then it will use the WriteProcessMemory API to inject the code into the allocated memory area inside the child thread.
0040E874 51 PUSH ECX
0040E875 68 D4D50000 PUSH 0D5D4
0040E87A 56 PUSH ESI
0040E87B FF75 E8 PUSH DWORD PTR SS:[EBP-18]
0040E87E FF75 08 PUSH DWORD PTR SS:[EBP+8]
0040E881 FF15 B4F14000 CALL kernel32.WriteProcessMemory
0006FC54 00000038 |hProcess = 00000038 (window)
0006FC58 7FFA0000 |Address = 7FFA0000
0006FC5C 00401000 |Buffer = UPX_pack.00401000
0006FC60 0000D5D4 |BytesToWrite = D5D4 (54740.)
0006FC64 0006FC68 \pBytesWritten = 0006FC68
Method 1 – Injecting an Infinite Loop
As you can see above, the address of the start of the buffer that is going to be copied to the child process is 00401000. So, at this point, and since we don’t want to miss the execution of the thread, we can go to the buffer and change (in this case) the first 2 bytes from 5589 to EBFE (which corresponds to a jump instruction that jumps back to itself). In this way, we create an infinite loop.
Finally, the loader will start the injected thread, but remember, we had set an infinite loop at the beginning of it.
0040E88A 50 PUSH EAX
0040E88B 6A 00 PUSH 0
0040E88D 6A 00 PUSH 0
0040E88F FF75 E8 PUSH DWORD PTR SS:[EBP-18]
0040E892 6A 00 PUSH 0
0040E894 6A 00 PUSH 0
0040E896 FF75 08 PUSH DWORD PTR SS:[EBP+8]
0040E899 FF15 B8F14000 CALL kernel32.CreateRemoteThread
Once this step is done, we can attach to the child process and analyse the injected thread which keeps looping over the first instruction. Then we cab go there and take control of it in order to restore the two original bytes.
Method 2 – Modify EP & Memory Dump
Another trick that we can use in this case is: to wait for the loader to copy the imports table (as seen previously during the explanation of the first method). But instead of letting the loader to copy the code starting from address 00401000 to the child process, we can set the entry point there, dump and fix the imports, as we normally do during manual unpacking practices.
In this case, the technique is safe primarily because the code of the injected thread needs to be stand-alone in the context of the process address space in which it runs. In other words, since this piece of code it is injected inside the address space of another process cannot rely on the memory alignment of the other modules, their image base etc.
So it is safe to set the entry point at address 00401000, once the imports are copied and just dump from there and save it as a new executable file.
The behaviour of the loader examined during this article is very similar to the most common loaders used by various types of malwares in nowadays, such as ransomware, fake AVs etc. Keep in mind that in most of the cases, the loader at some point will make use of at least one of the following three APIs: VirtualAlloc, VirtualAllocEx, or ZwAllocateVirtualMemory. So it is good practice to keep an eye on them and on the memory area(s) allocated through them.