Reverse engineering

Reverse engineering tools

Dejan Lukan
April 5, 2018 by
Dejan Lukan

First, we're going to describe the process of compiling/assembling a source code to an executable file. This is very important, so we need to understand it when reverse engineering. First we must be aware of the fact that all source code must eventually be compiled into binary form, which the computers can understand: this can happen at compile time or at runtime, which is most typical for programming languages that use intermediary bytecode like Java.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Let's look at an example. Let's write a simple "Hello World" program in C.

[plain]

#include <stdio.h>

int main() {

printf("Hello World!n");

}

[/plain]

The above program displays "Hello World!" and quits, so it's very simple. Afterward we need to compile and run the program:

[plain]

# gcc main.c -o main

# ./main

Hello World!

[/plain]

We can see that it indeed printed the "Hello World!". But we're not interested in that. We need to take a look at the compilation process. To create an executable file from the C program, we have to follow four phases:

a. Preprocessing:

Preprocessor processes include files, conditional compilation instruction and macros.

b. Compilation:

Takes the output from preprocessor and the source code and generates assembler source code (C code is converted to assembly).

c. Assembly:

Takes the assembly source code and produces assembly with offsets and stores the results in an object file (assembly is converted to binary).

d. Linking:

Takes one or more object files or libraries as input and combines them to produce a single executable file - it resolves references to external symbols, assigns final addresses to procedures/functions and variables, and revises .code and .data sections to reflect new addresses (binary is converted into executable).

The linker can resolve the library calls: static linking and dynamic linking. When the executable is statically linked, the copy of a references library is appended to the resulting executable, which makes the executable quite large. On the other hand there's a dynamic linking, where the external library calls are referenced and the libraries are not copied to the resultant executable. Therefore the executables are much smaller, but require a certain library to be present on the system. When the program is being run, all the referenced libraries must be located and loaded into memory.

The whole process is shown in the picture below:

We already presented the source code above. The assembly code looks like the following:

[plain].file "main.c"

.section .rodata

.LC0:

.string "Hello World!"

.text

.globl main

.type main, @function

main:

.LFB0:

.cfi_startproc

pushq %rbp

.cfi_def_cfa_offset 16

movq %rsp, %rbp

.cfi_offset 6, -16

.cfi_def_cfa_register 6

movl $.LC0, %edi

call puts

leave

.cfi_def_cfa 7, 8

ret

.cfi_endproc

.LFE0:

.size main, .-main

.ident "GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4"

.section .note.GNU-stack,"",@progbits[/plain]

The object file is already a binary file, but not a program that we could run yet. It contains the following sections:

  • .text: Contains the executable instruction codes and is shared among every process running the same binary. This section usually has rx permissions only.
  • .bss: Holds uninitialized global and static variables. In object file, the .bss segment doesn't contain any space, but will require the appropriate space at runtime.
  • .data: Contains initialized and static variables and their values. It is usually the largest part of the executable and has rw permissions.
  • .rdata: Contains constants and string literals and usually has only r permission.
  • Symbol table: A symbol is a name and an address. A symbol table holds information needed to locate and relocate a program's symbolic definitions and references.
  • Relocation table: Relocation is a process of connecting symbolic references with symbolic definitions. When a program calls a function, the associated call instruction must transfer control to the proper destination address at execution.

The symbols are generally transferred from object file to executable file if we don't pass the option to the compiler to strip the symbols from the executable.

Tools

2.1. Debuggers

The debugger is the most important part when reverse engineering an executable. There are various debuggers we can choose from, but the best of them are the following:

  • Ida Pro
  • Ollydbg
  • gdb
  • Immunity Debugger
  • Windbg

2.2. Assemblers

The assembler is essentially as important as a debugger. A popular assembler is nasm.

2.3. PE Tools

PE Tools provide a handful of useful tools for working with Windows PE executables. The picture below is a basic PE Tools view and shows running processes and the loaded modules of each process.

There are various options we can choose from if we right-click on the process. The picture below shows all the available options:

We can see that we can dump the process's memory image to a file: we can dump the full, partial or region memory image. There's also an option "PE Sniffer", which we can use to determine the compiler and its options used to build the executable.

The PEiD tools is used to determine if any obfuscator was used to pack the executable file. The open source packer that is often used is the UPX packer.

In the picture below we can see that the program IECollection1721.exe was probably written in Delphi programming language.

2.5. Sysinternals

Sysinternals consist of Windows system utilities that contain various useful programs:

  • TCPView can show us a detailed listing of all TCP/UDP sockets. It can also report the name of the program that uses the displayed socket. An example image is presented below:

  • Process Explorer can show us detailed information about the opened and loaded programs. An example image is presented below:

 

 

 

 

 

 

 

 

 

 

 

 

File

This command identifies the file type by looking at a specific fields in a file format. In a lot of cases it searches for special strings in a file that correspond to one or the other file format.

An example can be seen below:

[plain]# file temp.exe

temp.exe: PE32 executable (GUI) Intel 80386, for MS Windows, PECompact2 compressed[/plain]

Strip

This command will strip all the debugging symbols from the binary file. The resulting binary will have the same functionality, but the file will be smaller, because the debugging symbols will be removed.

[plain]# strip temp.exe[/plain]

2.6. nm

The nm command will output the names of any functions and global variables from the object file. If we run the nm command on the above hello world object file, we get the following:

[plain]# nm main.o

0000000000000000 T main

U puts[/plain]

It looks like there are two functions in the object file, the main function of the program and puts. But why is there a puts function instead of printf, which we used in the program? It's just the way the compiler compiled the source code into the object file to make it more efficient. What are the one-letter codes besides the function names? The letter T identifies a symbol defined in the text section of the executable, which is usually a function name. And the letter U is an undefined symbol, which usually lies in an external library and must yet be linked together with the current object file to make an executable.

2.7. ldd

If the program is dynamically linked, the ldd program will tell us what libraries it depends upon. Let's take a look at the program ls (list directory contents) and see what libraries it depends upon:

[plain]

# ldd /bin/ls

linux-vdso.so.1 (0x00007fff55b38000)

librt.so.1 => /lib64/librt.so.1 (0x00007f366fef5000)

libacl.so.1 => /lib64/libacl.so.1 (0x00007f366fcec000)

libc.so.6 => /lib64/libc.so.6 (0x00007f366f942000)

libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f366f725000)

/lib64/ld-linux-x86-64.so.2 (0x00007f36700fe000)

libattr.so.1 => /lib64/libattr.so.1 (0x00007f366f520000)

[/plain]

On Windows, the same can be achieved with the otool or dumpbin:

  • otool -L temp.exe
  • dumpbin /dependents temp.exe

2.8. Objdump and Readelf

Objdump and readelf programs can be used to present information from the object files.

2.9. Strings

We can search for all printable strings in the file with the strings command. If we use the strings command on the hello world executable file, it would also parse the "Hello World!" string from the executable:

[plain]# strings ./main

/lib64/ld-linux-x86-64.so.2

__gmon_start__

libc.so.6

puts

__libc_start_main

GLIBC_2.2.5

fff.

=

l$ L

t$(L

|$0H

Hello World!

;*3$"[/plain]

 

Conclusion

There are various tools that can help us reverse engineer our program. They can be of great help so we don't need to repeat the work that has already been done by someone else, we just have to use the appropriate program. By using most of the programs mentioned above, we'll get to know them a little better and gain the knowledge of when to use the appropriate tool to get the job done.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Sources

Dejan Lukan
Dejan Lukan

Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.