Reverse engineering

Presenting the PE Header

Dejan Lukan
May 8, 2013 by
Dejan Lukan

Let's present the whole PE file structure with the picture below (taken from [5]):

At the beginning there's a DOS header, which is an MS-DOS compatible executable that always consists of exactly 100 bytes that outputs an error message such as "This program cannot be run in DOS mode." The error message is displayed if we try to run program on the DOS system. Because the executable must display that text message, there's a 16-bit DOS program included in the DOS header that actually does just that.

Then there's a PE File Header, which is the structure IMAGE_FILE_HEADER and has the following members:

  • Machine [16 bits]: indicate the system the binary is intended to run on
  • NumberOfSections [16 bits]: number of sections that follow the headers
  • TimeDateStamp [32 bits]: the time the file was created
  • PoinerToSymbolTable [32 bits]: used for debugging (usually 0)
  • NumberOfSymbols [32 bits]: used for debugging (usually 0)
  • SizeOfOptionalHeader [16 bits]: is sizeof IMAGE_OPTIONAL_HEADER
  • Characteristics [16 bits]: a collection of flags:
    • IMAGE_FILE_RELOCS_STRIPPED: set if there is no relocation information in the file (in sections themselves)
    • IMAGE_FILE_EXECUTABLE_IMAGE: set if file is an executable (it is not an object of a library)
    • IMAGE_FILE_LINE_NUMS_STRIPPED: set if the line number information is stripped – not used for executable files
    • IMAGE_FILE_LOCAL_SYMS_STRIPPED: set if there is no information about local symbols in the file – not used for executable files
    • IMAGE_FILE_AGGRESIVE_WS_TRIM: set of the OS is supposed to trim the working set of the running process (the amount of memory the process uses) aggressively by paging it out
    • IMAGE_FILE_BYTES_REVERSED_LO and IMAGE_FILE_BYTES_REVERSED_HI: set if the endianess of the file is not what the machine would expect and must swap bytes before reading
    • IMAGE_FILE_32BIT_MACHINE: set if the machine is expected to be a 32 bit machine
    • IMAGE_FILE_DEBUG_STRIPPED: set if there is no debugging information
    • IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP: set if application may not run from a removable medium such as floppy of a CD Rom (in this case, the OS is advised to copy the file to the swapfile and execute it from there)
    • IMAGE_FILE_NET_RUN_FROM_SWAP: set if application may not run from the network (in this case, the OS is advised to copy the file to the swapfile and execute it from there)
    • IMAGE_FILE_SYSTEM: set if the file is a system file such as a driver
    • IMAGE_FILE_DLL: set if the file is a DLL, otherwise it's an EXE
    • IMAGE_FILE_UP_SYSTEM_ONLY: set if the file is not designed to run on multiprocessor systems

All of the above members and also all the other members of the PE header can be found by using the RVA, which is a relative virtual address. This is useful, because we don't actually have to know the exact address of that member in memory, but only the offset within the current executable/library.

Let's now take a look at the optional header, which contains the following elements:

  • Magic [16 bits]: always contains 0x010b
  • MajorLinkerVersion [16 bits]: set by linker
  • MinorLinkerVersion [16 bits]: set by linker
  • SizeOfCode [32 bits]: size of executable code
  • SizeOfInitializedData [32 bits]: size of initialized data
  • SizeOfUninitializedData [32 bits]: size of the uninitialized data
  • AddressOfEntryPoint [32 bits]: a RVA: offset of the entry point - execution starts here (the address of DLL's LibMain or a program's startup code)
  • BaseOfCode [32 bits]: offset to the executable code
  • BaseOfData [32 bits]: offset to the initialized data
  • ImageBase [32 bits]: preferred linear load address of the entire binary, including all the headers. This is the address (always multiple of 64KB) the file has been relocated to by the linker – if the binary can in fact be loaded to this address, the loader doesn't need to relocate the file again. The preferred load address cannot be used if another image has already been loaded to that address (which can happen quite often if a linker's default address is used). In this case, the image must be loaded to some other address and it needs to be relocated.
  • SectionAlignment [32 bits]: alignment of PE file's sections in RAM
  • FileAlignment [32 bits]: alignment of PE file's section in file
  • MajorOperatingSystemVersion [16 bits]: major version
  • MinorOperatingSystemVersion [16 bits]: minor version
  • MajorImageVersion [16 bits]: binary major version
  • MinorImageVersion [16 bits]: binary minor version. Many linkers don't set this information correctly and many programmers don't bother to supply it, so it's better to rely on the version resource if one exists.
  • MajorSubsystemVersion [16 bits]: major subsystem version
  • MinorSubsystemVersion [16 bits]: minor subsystem version. This should be suppled correctly, because it is checked and used.
  • Win32VersionValue [32 bits]: unknown (usually 0)
  • SizeOfImage [32 bits]: the amount of memory the image will need in bytes. It is the sum of all headers and section lengths if aligned to SectionAlignment. It is a hint to the loader how many pages it will need in order to load the image.
  • SizeOfHeaders [32 bits]: the length of all headers including the data directories and the section headers. It is at the same time the offset from the beginning of the file to the first section's raw data.
  • Checksum [32 bits]: checksum, which is only checked if the image is NT-driver, which will fail to load if the checksum isn't correct. For other binary types, the checksum isn't used and may be 0.
  • Subsystem [16 bits]: tells you in which of the NT-subsystems the image runs:
    • IMAGE_SUBSYSTEM_NATIVE: the image doesn't need a subsystem (drivers)
    • IMAGE_SUBSYSTEM_WINDOWS_GUI: the image is win32 graphical binary
    • IMAGE_SUBSYSTEM_WINDOWS_CUI: the image is win32 console binary
    • IMAGE_SUBSYSTEM_OS2_CUI: the image is the OS/2 console binary (the =OS/2 binaries will be in OS/2 format, so this is rarely used)
    • IMAGE_SUBSYSTEM_POSIX_CUI: the image is a POSIX console binary
  • DllCharacteristics [16 bits]: if the image is a DLL, it tells you when to call the DLL's entry point
  • SizeOfStackReserve [32 bits]: size of reserved stack
  • SizeOfStackCommit [32 bits]: size of initially committed stack
  • SizeOfHeapReserve [32 bits]: size of reserved heap
  • SizeOfHeapCommit [32 bits]: size of initially committed heap. The reserved amounts are address space (not real RAM) that is reserved for specific purpose. At program startup, the commited amount is actually allocated in RAM.
  • LoaderFlags [32 bits]: unknown (usually 0)
  • NumberOfRvaAndSizes [32 bits]: number of valid entries in the directories that follow immediately (unreliable - rather use the constant IMAGE_NUMBEROF_DIRECTORY_ENTRIES)
  • DataDirectory: This is an array of additional data structures that are stored inside PE header. This data structure contains a directories that describe their contents and are the following [20]:
    • Export Table: Lists the names and RVAs of all exported functions in the current module.
    • Import Table: Lists the names of modules and functions that are imported from the current module. For each function, the list contains a name string (or an ordinal) and the RVA that points to the current function's import address table entry. This is the entry that receives the actual pointer to the imported function in runtime, when the module is loaded.
    • Resource Table: Points to the executable's resource directory, which is a static definition of various user-interface elements as string, dialog box layouts and menus.
    • Base Relocation Table: Contains a list of addresses within the module that must be recalculated in case the module gets loaded in any address other than the one it was built for.
    • Debugging Information: Contains debugging information for the executable. This is usually presented in the form of a link to an external symbol file that contains the actual debugging information.
    • Thread Local Storage Table: Points to a special thread-local section in the executable that can contain thread-local variables. This is managed by loaded when the executable is loaded.
    • Load Configuration Table: Contains a variety of image configuration elements, such as a special lock prefix table, which can modify an image in load time to accommodate for uniprocessor or multiprocessor systems. This table also contains information for a special security feature that lists the legitimate exception handlers in the module (to prevent malicious code from installing an illegal exception handler).
    • Bound Import Table: Contains an additional import-related table that contains information on bound import entries. A bound import means that the importing executable contains actual addressees into the exporting module. This directory is used for confirming that such addresses are still valid.
    • Import Address Table (IAT): Contains a list of entries for each function imported from modules. These entries are initialized at load time and hold the names of the functions as well as actual addresses to them.
    • Delay Import Table: Contains special information that can be used for implementing a delayed-load importing mechanism whereby an imported function is only resolved when it is first called. This mechanism is not supported by the OS and is implemented in the C runtime library.

We didn't actually specify all the additional data directories that hold the data. We can see all of them specified inside the winnt.h header file and are presented on the picture below:

After that, there are also various sections like .data and .text that are an important part of the executable, because the hold the data of the program and the instructions that will be executed once the executable is loaded into the memory. There are also a lot of other structures, but we will not look at them in this article.

Conclusion

We've looked at the various fields of the PE file header. At the end, we determined that data directories are an important part of the executable/library, because they contain useful information like RVA addresses of imported/exported functions, resources, debugging information, etc… After the data directories there are also different sections that comprise the executable: the .idata, .data, .text and other sections. The .data section holds the executable data, while the .text section holds the executable instructions that will be executed when the executable is loaded in memory and started.

References:

[1] IMAGE_OPTIONAL_HEADER structure, https://msdn.microsoft.com/en-us/library/windows/desktop/ms680339(v=vs.85).aspx

[2] Major / MinorSubsystemVersion, http://waleedassar.blogspot.com/2012/08/major-minorsubsystemversion.html

[3] Reverse Engineering III: PE Format, Gergely Erdélyi – Senior Manager, Anti-malware Research, http://www.cse.tkk.fi/fi/opinnot/T-110.6220/2010_Spring_Malware_Analysis_and_Antivirus_Tchnologies/luennot-files/Erdelyi-Reverse_engineering_2.pdf

Dejan Lukan
Dejan Lukan

Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.