1. Introduction

The term “Memory Dumping” in reverse-engineering is essentially a process of taking a snapshot of the executable.

Taking a snapshot means capturing the state of the executable at a particular point in time, most likely when it has completely decrypted its original code and had reached its OEP (original entry point). This snapshot is saved onto a disk after fixing the import table, API redirects etc. This dumped executable can be reverse engineered since the original code for the executable has been recovered.

Here we are going to see some of the anti-dumping techniques that are prevalent in packers that prevent the reverser from capturing the decrypted executable.

2. Anti-Dumping Techniques

2.1 Nanomites

Nanomites are a powerful anti-dumping technique.

This technique works by replacing certain branch instructions with int 3h. The removed jump instructions are placed in a table that is encrypted many times over. This table contains data such as address, type (whether it’s a JCC, JMP, JLE etc), the offset, whether it is a
Nanomite or a regular debug exception. The problem here gets amplified due to presence of “false Nanomites”. That means the generated table contains a mixture of legitimate int 3h debug breakpoints that will actually be executed and instructions that have 0xCC in their code and under normal conditions won’t be executed.

A3 CC884000 MOV DWORD PTR DS:[4088CC],EAX

As we can see in the above instruction, there is byte 0xCC.The packer then generates an entry in the table with an address pointing to this byte (0xCC).But as we can see, this byte is actually a part of the instruction and will never be executed as an int 3h, under normal conditions.

When the injected debug break points get fired, they can be caught by either of the following ways:

  1. By self-debugging the process. In this,
    the parent process spawns a child process and attaches to it with debug privileges. And then it waits for a debug event using WaitForDebugEvent.
  2. By hooking KiUserExceptionDispatcher.
  3. Using SEH (Structured exception handler), which is a big risk, since the top level handler can be replaced later on in the code.
  4. By registering a Vectored Exception handler (VEH) (RtlAddVectoredExceptionHandler), which is also risky, since the application can install its own Vectored Exception handler later on replacing ours.

Here’s a fairly simple implementation of Nanomites taken from Nanomites.w32 by Deroko
which is a sample polymorphic virus
. This does not have all the complexities as in armadillo Nanomites, but it is good enough to explain the concept.

Want to learn more?? The InfoSec Institute Reverse Engineering course teaches you everything from reverse engineering malware to discovering vulnerabilities in binaries. These skills are required in order to properly secure an organization from today's ever evolving threats. In this 5 day hands-on course, you will gain the necessary binary analysis skills to discover the true nature of any Windows binary. You will learn how to recognize the high level language constructs (such as branching statements, looping functions and network socket code) critical to performing a thorough and professional reverse engineering analysis of a binary. Some features of this course include:

  • CREA Certification
  • 5 days of Intensive Hands-On Labs
  • Hostile Code & Malware analysis, including: Worms, Viruses, Trojans, Rootkits and Bots
  • Binary obfuscation schemes, used by: Hackers, Trojan writers and copy protection algorithms
  • Learn the methodologies, tools, and manual reversing techniques used real world situations in our reversing lab.

The code consists of three parts

  1. The first part is the declaration of the Nanomite as a Macro.
  2. The second part shows how the Nanomite is actually used in the virus code.
  3. The third part is the handler that takes care of the nano jumps.
nanojmp macro  jmp_t, __xxx
       local nano
nano:  int 3h
       db jmp_t
       dd offset __xxx - offset nano
endm

The above code snippet is the first part.
Here, as we can see, the macro is replaced with a instruction sequence containing an interrupt, a jump type (JMP, JNZ etc) given by jmp_t, and a relative displacement.

Unlike armadillo, the jump details are stored along with the exception causing instruction (int 3h), and the details of the jump are not encrypted. Separate encrypted Nanomite jump table and false Nanomites is something that can be added to the code Deroko has written.

xor edx,edx
lea  eax, [ebp+sehhandle]
       push eax
       push dword ptr FS:[edx]
       mov dword ptr FS:[edx], esp
       call IsDebugPresent
       testeax, eax

       nanojmp jmp_jnz, getdelta

The above code shows how the nanojmp macro is used, sehhandle is the Structured Exception Handler (SEH) that will catch the debug breakpoint exception and the in the later steps we can see that it being installed as the topmost exception handler. And also it is being checked whether a debugger is attached to the process. And in the last instruction we can see the nanojmp being called where the first argument is the jump type, which is jump if not zero, and the second argument is the address to the label getdelta which is actually the starting point of the virus.

The procedure given below is the exception handler which is fired whenever
the execution hits the Nanomites

sehhandle proc    C pException:dword, pFrame:dword, pContext:dword, param:dword
       mov edx, pException
       mov eax, 1
       cmp [edx.ER_ExceptionCode], EXCEPTION_BREAKPOINT
     jne __exit_handle

Above, it is checked whether the exception is a break-point exception which is the only one we need to be concerned about.

       mov edi, pContext
       xor esi, esi
       mov [edi.CONTEXT_Dr0], esi
       mov [edi.CONTEXT_Dr1], esi
       mov [edi.CONTEXT_Dr2], esi
       mov [edi.CONTEXT_Dr3], esi

Here Context record address is copied to edi and it is used to clear any hardware debug-breakpoints that were set by the debugger. This is another anti-debug measure.

       mov ebx, [edi.CONTEXT_EFlags]
       mov esi, [edi.CONTEXT_Eip]
       inc esi

Here the instruction pointer pointing to the exception causing instruction and Eflags register is being copied to ebx and esi respectively. And esi is incremented to point to the stored jump type for this particular Nanomites which fired the exception, which as we saw in the macro is the next byte following the int 3h instruction.

       xor eax, eax
       lodsb
       cmp eax, jmp_jz
       jne __skip0
       lodsd
       test ebx, 40h
       jnz __follow
       jmp __notfollow
__skip0:
       cmp eax, jmp_jnz
       jne __skip1
       lodsd
       test ebx, 40h
       jz __follow
       jmp __notfollow

__skip1:
       cmp eax, jmp_jmp
       jne __skip2
       lodsd
       jmp __follow
__skip2:
       cmp eax, jmp_jc
       jne __skip3
       lodsd
       test ebx, 1h
       jnz __follow
       jmp __notfollow
__skip3:
       cmp eax, jmp_jnc
       jne __skip4
       lodsd
       test ebx, 1h
       jz __follow
       jmp __notfollow
__skip4:
__notfollow:
       add [edi.CONTEXT_Eip], 6h
       xor eax, eax
       jmp __exit_handle

__follow:
       add [edi.CONTEXT_Eip], eax
       xor  eax, eax
       jmp  __exit_handle
__exit_handle:
       ret
sehhandle  endp

In the rest of the code, the value is retrieved and checked using a switch statement for the kind of jump instruction — such as JMP,. JNZ, etc. After identifying the jump instruction, we need to check the Eflags register to check the jump condition as to whether we should follow the jump or not. If we are following the jump, then we add the relative displacement that we stored earlier with the jump type, with the exception causing instruction address which we can get from the context record. If we are not following this jump, then we add with exception, causing instruction address the length of the Nanomites record which is 6h.This includes the int 3h instruction, one byte to store the jump type, another four bytes to store the relative displacement.

This code essentially embodies the concept of Nanomites and much can be built on top this one. Such as:

Want to learn more?? The InfoSec Institute Reverse Engineering course teaches you everything from reverse engineering malware to discovering vulnerabilities in binaries. These skills are required in order to properly secure an organization from today's ever evolving threats. In this 5 day hands-on course, you will gain the necessary binary analysis skills to discover the true nature of any Windows binary. You will learn how to recognize the high level language constructs (such as branching statements, looping functions and network socket code) critical to performing a thorough and professional reverse engineering analysis of a binary. Some features of this course include:

  • CREA Certification
  • 5 days of Intensive Hands-On Labs
  • Hostile Code & Malware analysis, including: Worms, Viruses, Trojans, Rootkits and Bots
  • Binary obfuscation schemes, used by: Hackers, Trojan writers and copy protection algorithms
  • Learn the methodologies, tools, and manual reversing techniques used real world situations in our reversing lab.
  1. Separating the table from the int 3h instruction and storing it in a different section.
  2. Adding false Nanomites entries in the table to prevent automatic parsing and replacement of int 3h in the code by scanning for 0xcc byte.
  3. Encrypting the table for storing Nanomite entries, obfuscating the decryption routine, and adding anti-debug checks to it.

2.2 Stolen Bytes (Code Splicing)

The point at which the program starts execution is called the original entry point (OEP). The packers generally unpack the whole program in memory and then jump to its OEP for execution. Code splicing (Stolen bytes) works by copying few data bytes from the OEP and then moving them to separate memory for execution. This is used to hide the actual OEP.

More correctly, it should be called stolen instructions instead of stolen bytes. The packer generally moves the first three or four instructions from the OEP to highly obfuscated, anti-debug check ridden routines which are most probably with in the packers own code section. After executing these few instructions from the OEP, the packer then jumps to the original decrypted code that comes after the stolen instructions. The position containing these stolen instructions is filled with zeroes or junk code.

The problem with code splicing is that packers can’t steal instructions from just anywhere. It has to know the size of the instructions before it can move it another location, since it will be executing these instruction from this location and half-copied instruction will cause exceptions. Packers can steal only those instructions that are sure to be present at the OEP such as

PUSH EBP
MOV EBP, ESP
PUSH -1
PUSH Shellcod.004050A0
PUSH Shellcod.00401D5C
MOV EAX, DWORD PTR FS:[0]
PUSH EAX
MOV DWORD PTR FS:[0],ESP

The instruction PUSH Shellcod.004050A0 and the PUSH instruction following it has a fixed size of 5 bytes each, so they can also be moved.
It doesn’t matter what data they posses. We need to know the size of the instructions and their starting address (knowing starting address requires the knowledge of the size of the instruction preceding the current instruction, that is why moving instructions from the OEP is easier choice) to be able to move them.

Now we are going to see a very simple method to implement the code splicing technique. Here a section will be created where the stolen instructions will be injected. The most of the code snippets are taken from “Inject your code to a Portable Executable” by Ashkbiz Danehkar. Some of it added by the author according to the need of the example.

PIMAGE_DOS_HEADER		image_dos_header;
	PCHAR				pDosStub;
	DWORD				dwDosStubSize, dwDosStubOffset;
	PIMAGE_NT_HEADERS		image_nt_headers;
	PIMAGE_SECTION_HEADER	image_section_header[MAX_SECTION_NUM];
	PCHAR				image_section[MAX_SECTION_NUM];

Before getting to the code section below, the file is opened and read into a buffer, and then image_dos_header and image_nt_header is extracted from the file. After reading the image_nt_header, we can recover the number of sections present in the executable.

SectionNum=image_nt_headers->FileHeader.NumberOfSections;

Here, the section header is recovered for all the sections in the executable and is saved in image_section_header.

for( i=0;i<SectionNum;i++)
{
CopyMemory(image_section_header[i],pMem+dwRO_first_section+i*sizeof(IMAGE_SECTION_HEADER),
			sizeof(IMAGE_SECTION_HEADER));
}

The following loop is used to recover the section data using the information from the section header extracted earlier.

for(i=0;i<SectionNum;i++)
	{
	image_section[i]=(char*)GlobalAlloc(GMEM_FIXED | GMEM_ZEROINIT,
			PEAlign(image_section_header[i]->SizeOfRawData,							image_nt_headers->OptionalHeader.FileAlignment));
CopyMemory(image_section[i],pMem+image_section_header[i]->PointerToRawData,
					image_section_header[i]->SizeOfRawData);
	}

The code snippet below is responsible for adding a new section to the executable. dwSize is the size of the new section and szName is the name given to the new section.

DWORD PEAlign(DWORD dwTarNum,DWORD dwAlignTo)
{
	return(((dwTarNum+dwAlignTo-1)/dwAlignTo)*dwAlignTo);
}
.
.
.
DWORD roffset,rsize,voffset,vsize;
int i=image_nt_headers->FileHeader.NumberOfSections;
rsize=PEAlign(dwSize,image_nt_headers->OptionalHeader.FileAlignment);
vsize=PEAlign(rsize,image_nt_headers->OptionalHeader.SectionAlignment);
roffset=PEAlign(image_section_header[i-1]->PointerToRawData+image_section_header[i-1]->SizeOfRawData,
				image_nt_headers->OptionalHeader.FileAlignment);
voffset=PEAlign(image_section_header[i-1]->VirtualAddress+image_section_header[i-1]->Misc.VirtualSize,
image_nt_headers->OptionalHeader.SectionAlignment);
memset(image_section_header[i],0,(size_t)sizeof(IMAGE_SECTION_HEADER));
image_section_header[i]->PointerToRawData=roffset;
image_section_header[i]->VirtualAddress=voffset;
image_section_header[i]->SizeOfRawData=rsize;
image_section_header[i]->Misc.VirtualSize=vsize;
image_section_header[i]->Characteristics=0xC0000040;
memcpy(image_section_header[i]->Name,szName,(size_t)strlen(szName));
image_section[i]=(char*)GlobalAlloc(GMEM_FIXED | GMEM_ZEROINIT,rsize);
image_nt_headers->FileHeader.NumberOfSections++;

The new OEP can be set using the following code.

DWORD OEP_RVA = image_nt_headers->OptionalHeader.AddressOfEntryPoint;
OEP_RVA +=10;		// OEP_RVA points to instructions after the removed instructions

// OEP being set below
image_nt_headers->OptionalHeader.AddressOfEntryPoint= image_section_header[i]->VirtualAddress;
DWORD newOEP= image_section[i];

Now we have to find the old OEP of the executable, which is in RVA. With it, we need to find the data offset with in the copied file where the OEP resides and extract top ten bytes from it. This can be done using the following way:

We start by iterating through the Section table, each section header given by image_section_header[j]->VirtualAddressstores the starting RVA of the section, and the section size given by image_section_header[j]->VirtualSize. These are VirtualAddress and VirtualSize respectively. A section is guaranteed to be loaded contiguously in memory whether it is memory mapped or loaded by the operating system. We check our RVA against the VirtualAddress field and verify that our RVA is greater than VirtualAddress of the section, and then check that our RVA is not greater than the VirtualSize + VirtualAddress, if these conditions are true then this means that our RVA lies inside this section. Now by simply subtracting our RVA with the VirtualAddress of the section we get the offset within the section where our data/instruction is stored.

That is, the desired location will be:

oldOEP = starting_address_of_image_section_where_the_OEP_lies + (Offset recovered with the above method)

__asm
{
	mov eax, oldOEP 		// eax contians OEP
	mov ecx,newOEP		// ecx contains the location where the stolen bytes are being stored.
MOV EBX,[EAX]		 // stealing code from OEP
	Mov [EAX], 0x00000000	// the stolen data being replaced with 0’s
	mov [ecx], ebx
	ADD EAX,4
	add ecx,4
	MOV EBX,[EAX]
	Mov [EAX], 0x00000000
	mov [ecx], ebx
	add eax,4
	add ecx, 4
	mov bx, WORD PTR DS:[eax]
	Mov WORD PTR DS:[EAX], 0x0000
	mov WORD PTR DS:[ecx], bx
	add eax,2
	add ecx, 2
mov [ecx], 0xEB175883	//Other code being injected into the section
add ecx,4
mov [ecx], 0xE8098B08
add ecx,4
mov [ecx], 0x64A13000
add ecx,4
mov [ecx], 0x00008B58
add ecx,4
mov [ecx], 0x0803D9FF
add ecx,4
mov BYTE PTR [ecx], 0xE3
add ecx,1
mov [ecx ], OEP_RVA      // The OEP_RVA being stored in location marked by four 0xcc bytes
add ecx,4
mov [ecx], 0xE8E4FFFF
add ecx,1
mov [ecx], 0xFF
	}

The code injected into the newly created section has the following format:

		.
		.
		STOLEN INSTRUCTION (10 bytes)
		.
		.
		jmp tick1
tick2:
pop eax
sub eax,9
mov ecx,[eax]     //ecx contains the OEP_RVA
mov eax,fs:[0x30] //eax contains the address of PEB
mov ebx, [eax + 8] // ebx contains the image base address
add ebx,ecx	//Address of instruction after stolen instructions
jmp ebx		//Control transferred to the instruction
				__emit 0xcc
				__emit 0xcc
				__emit 0xcc
				__emit 0xcc
tick1:
				call tick2

The code contains stolen instructions followed by a sequence of instructions that fetches the previously stored RVA of the OEP, set EBX to the address of the instruction after the last the stolen instruction. After that address of PEB (Process Environment Block) is fetched and from it the ImageBaseAddress is retrieved, which gives the base address of the executable loaded in memory, it is then added to the retrieved OEP+10 (which is in EBX) to get the complete address of the instruction to which the control has to be transferred.

At this point we have created a new section, stored stolen instruction and added extra code (so that execution continues without any exception) in the section. Now all that remains is to save all the changes we have made to the executable to a file. While copying, we copy everything from DOS header and stub, PE File Header and Optional Header and all the section header information as well the section data that we have changed to a file.

The technique above can be improved further by following ways:

  1. Using a lot of junk code between stolen instructions and the extra code that we have injected. This will make reversing the application a lot more difficult.
  2. Use a lot of anti-debug checks in the stolen code to prevent analysis by debugger.

2.3 Self-Unmapping

When the executable is loaded for execution, all the data with in the sections is mapped into the address space.
That is, it is simply a mapped view of file. This mapped view of file can be unmapped using UnmapViewofFile(), like any ordinary mapped file. But before we can unmap the loaded executable, we must transfer all the data to a separate location. Because once the file is unmapped the address ranges occupied by the various sections in the image become invalid.

After we have relocated the image, we have to adjust all the absolute references according to the new base address. This is done using the relocation table. After all the relocations are fixed we can unmap the previous view of image.

Here’s an example that does all that is explained above:

PVOID baseAddress = GetModuleHandleA(0);
short int location_File_header = *(baseAddress + 0x3c);
int size_of_image = *(baseAddress + location_File_header + 0x50);
PVOID new_base = VirtualAlloc(NULL,size_of_image,0x1000,PAGE_EXECUTE_READWRITE);
long int new_execution_point;

In the above code, size of the image is extracted from the PE header, and then memory is allocated with read, write and execute permission of a size equal to the extracted size.

__asm
{
mov eax,new_base
mov esi,base_address
mov ecx, size_of_image
lea edi, [eax+offset l1]
sub edi, esi
mov new_execution_point, edi
mov edi, eax
rep movsb
}

Suppose that, Eax contains the address of the newly created page, let eax= 0x003f0000, the offset l1 will be 0×00401121 (l1 is the location from where the relocated code will resume execution) then edi = eax + offset l1 will be 0x007f1121 here esi contains 0×00400000 that is image base of the executable, now edi = edi-esi will be 0x003f1121, that address will be the address of the l1,if the whole executable image were copied to the newly allocated page.

In short here we are copying the executable image to the newly allocated memory, and then an address within the relocated image where the execution should resume is calculated and stored in new_execution_point.

long int image_directory_basereloc_rva = *(baseAddress + location_File_header + 0xa0);
long int size_of_section = *(baseAddress + location_File_header + 0xa4);
long int relocation_diff = new_base - baseAddress;
long int size_of_section_temp = size_of_section;
while (size_of_section_temp > 0)
{
	long int page_rva = *(baseAddress + image_directory_basereloc_rva);
	long int size_relocation_block = *(baseAddress + image_directory_basereloc_rva + 4);
	image_directory_basereloc_rva += 8;
	size_of_section_temp -= size_relocation_block;
	while (size_relocation_block > 0)
	{
		short int relocation_value_type = *(baseAddress  + image_directory_basereloc_rva);

		if( ((relocation_value_type >> 12) & 0x000F) == IMAGE_REL_BASED_HIGHLOW)
		{
	short int offset = relocation_value_type & 0x0FFF ;
		long int *address_to_be_patched = new_base + page_rva + offset;
		*(address_to_be_patched) = *(address_to_be_patched) + relocation_diff;
		}
		size_relocation_block -= 2;
		image_directory_basereloc_rva += 2;
	}
}

The code above is used to fix relocation table. First the RVA and the size for the relocation table is fetched from the Optional header. After that, relocation difference is calculated that will added to the locations pointed by the relocation entries.

The relocation table has the following format:

Virtual address RVA of the page that is to be fixed (4 bytes)

Size of Block The size of the relocation block, this includes the size of the header. (4 bytes)

This is followed by relocation entries.
They are 16-bit words, where the higher 4 bits indicates the type of relocation. For ex,

IMAGE_REL_BASED_ABSOLUTE The Relocation is skipped. This type can be used to pad a relocation block so that the next block starts at a 4-byte boundary.

IMAGE_REL_BASED_HIGHLOW The relocation adds the base-address difference to the 32-bit double word at the location denoted by the 12-bit offset.

The lower 12 bits are the offsets with in the 4K page. Hence the address to be patched is calculated by adding the base address of loading, the RVA of the page and the offset within the page.

__asm
{
	push base_address
	push new_execution_point
	jmp UnmapViewOfFile
	}
l1:
//execution continues from this location

Finally, we unmap the view of previously loaded executable image.
Here we have arranged values in the stack so that the return address after the execution of UnmapViewOfFile is the point labeled by l1.

This method is effective in preventing memory dumping if the user is using automated tools which dump only static memory, such as sections and headers and not the memory created dynamically by calls to VirtualAlloc() etc.

We can further improve this by

  1. Placing these routines in the TLS callback routines, which are executed before the program reaches OEP.

3. Conclusion

This paper is meant to elaborate on some of the memory dumping techniques that are being used by people to protect their applications. The techniques outlined here are in no means exhaustive, but the intention of this paper is to give a more detailed view of some of the commonly used techniques.

4. References

  1. “ANTI-UNPACKER TRICKS – PART ONE” – Peter Ferrie, Microsoft, USA
  1. Nanomites.w32 by Deroko – A virus written by Deroko