Complete Tour of PE and ELF: An Introduction
I have decided to come up with an end-to-end malware analysis course and even extend it to memory forensics and detecting APT’s. Though this might sound great, not everyone has the skills to deal generally with malware, and it requires a fair bit of understanding how malware works behind the scenes. Two of the most important things to know before start analyzing malware are to understand PE and ELF file structure, and another one is to have a good knowledge of Assembly Language. I am starting with the PE and ELF from this series, and it will extend to some more articles which will be followed by Assembly Language. After that, I will start with performing static malware analysis, dynamic malware analysis followed by memory forensics and dealing with APT’s. So without wasting any time, let’s start with PE structure.
So if you want to know what you will be dealing with here take a look at this link.
Do not worry it not that bad as it looks like and we will cover only those portions which are of importance. I will also try to demonstrate each section with the help of an example. I will be using PE view and COFF explorer to dig into PE files.
Portable Executable (PE) is an executable format for window. Common windows PE file extensions are:
- .exe, .dll, .sys, .ocx, .cpl .
Before we examine the first structure, it is important to note number appear will be stored in little-endian format. For example, hex 0x0123 will be 32 01 here.
PF very first structure is of 64 bytes and is IMAGE_DOS_HEADER
Here two fields are of most importance:
- e_magic: This is set to MZ(referring to Mark Zbikowski, who developed MS-DOS). It identifies that the file is .exe or .dll.
- e_lfanew: This field contains file offset where the PE header can be found.
Between these two fields is a DOS stub program which prints “This program cannot be run in DOS mode.”
Taking the value of file offset from e_lfanew from IMAGE_DOS_HEADER we will map the new structure known as IMAGE_NT_HEADERS
This structure contains:
- Signature: A Constant value of 0x00004550 or “PE” in ASCII in little endian order
FileHeader: This is an embedded structure and looks like this
Important fields in this are:
Machine: This field describes the on what architecture this binary is supposed to run
- For 32 bit: 014C
- For 64 bit: 8664
- NumberOfSections: Binary code is split among various sections like .text,.data etc. This field just tells us about the number of sections that are present in the binary
- TimeDateStamp: This tells when this binary is compiled. Sometimes useful for attribution of malware but note that the author can change this value.
- Characteristics: Important Characteristic to note in this section is whether the file is DLL or not with field value IMAGE_FILE_DLL set or not.
OptionalHeader: This structure looks like below:
Important Fields are:
Magic: This is the field that determines whether the binary is 32 bit or 64 bit
- 0x10B: 32 bit
- 0x20B: 64 bit
- AddressOfEntryPoint: This is mostly used for debugging and refers the relative virtual address where the loader should start executing after loading the binary in memory.
- ImageBase: This field tells us the binary preferred starting virtual memory location. However if the mentioned address not available, then any other will be picked up by the loader, and so all the sections in ImageBase needs are fixed with the new image address. Note that for 32 bit this field is of type DWORD where for 64 bit it will be uLongLong.
- SectionAlignment: As mentioned earlier PE contains many sections. This field depicts how sections are aligned in memory
- FileAlignment: This depicts that data was written to binary in chunks on disk. Common values include 0x200—512, the size of HD sector on disk
- SizeOfImage: This field tells us about the continuous chunk of memory that should be allocated for this binary.
DllCharacteristics: This contain important fields such as:
- DYNAMIC_BASE: This tells loader that binary supports ASLR and can be loaded dynamically into memory.
- FORCE_INTEGRITY: Binaries should be digitally signed
- NX_COMPACT: This tells the loader that binary is compatible with Data Execution Prevention(DEP)
DataDirectory(16): This is an embedded data structure which points to other structures for information regarding Imports, Exports, Exception, etc. For all the respective data structure it points to it has same structure
Here VirtualAddress is the Relative Virtual Address for some other structures like Import, Export, etc. We will talk about this in great detail later on.
As mentioned earlier, PE consists of sections which is a way to organized data like what sort of data goes where. For example, code gets placed in the .text section, read-only data goes to .rdata section, global data goes to .data section, etc. Below is the structure for each such section. Notice that it has a Union embedded into it
Important fields we care about in this are:
- Name: Name of Section stored in a byte array of ASCII characters.
- VirtualSize: This will be referenced to as misc.VirtualSize since it within a union. This tells us that the size of this section in memory.
- VirtualAdress: This is the RVA w.r.t to OptionalHeader.ImageBase(Remember the Image Base field in Optional Header Structure explained earlier). This is the offset we are talking about in memory.
- SizeofRawData: This depicts size of raw data on disk whose beginning is pointed by PointerToRawData
- PointerToRawData: This is the relative offset from the beginning of the file. Remember this is the offset we are talking about on disk
- Characteristics: This tells us about whether the section is readable/writable/executable. For example .rdata will have READ, WRITE flag set only by default. Also, it tells us whether the section contains any initialized data. IMAGE_SCN_MEM_NOT_CACHED field in this section tells us if this section can be cached.There is another field IMAGE_SCN_MEM_NOT_PAGED which tells whether this field can be paged or not.
Important Note: If you examine the files sometimes you will see VirtualSize > SizeOfRawData. What? How is that possible? Sometimes on the code, there are some uninitialized variables which hold no space in the disk, but they get mapped into the .bss section on memory. And this .bss section gets merged with other sections like .data thus increasing misc.virtualSize. To add more to your confusion sometimes you will also see VirtualSize < SizeOfRawData. Remember FileAlignment field we discussed earlier, so the code will be aligned in 0x200 offset and thus will sometimes include padding making the VirtualSize < SizeOfRawData.
Also, note to calculate where the section header will start to sum up SizeOFOptionalHeader + Starting offset of Optional_header.
Here is a list of Common Section Names:
- .text section: This is the main section and should never be paged out of memory.This section contains the actual machine instruction which makes up a program.
- .data: This section read/write data. For example globals.
- .rdata: This contains read-only data. For example string constructs like “Hello World”.
- .bss: This section contains the uninitialized data. It usually merges with .data section and it makes VirtualSize >SizeOfRawData.
- .idata: IT contains the information that is needed by binary to run. Very useful when analyzing malware. More on this later.
- .edata: IT contain the information about data that the binary is exporting. More on this later.
So we have covered good detail of PE as you can see below
In the next article, we will look at the remaining sections.