Introduction

Summary: In this article we’ll take a look at the C program that prints “Hello World!” to the screen, which we’ll assemble and compile. Then we’ll compare the results and try to present what’s happening beneath the curtains. Specifically, we will look at which sections are present in the transformation chain: from C code, to assembly code, to object file, to executable.

Hello World Program: The Assembly

First we need to write the hello world C program, which can be seen below:

#include <stdio.h>

int main() {
  printf("Hello World!");
  return 0;
}

It’s a very simple program that doesn’t actually do anything; we intentionally kept it this simple, so we will be able to focus on the bigger picture and not tons of code. We then need to compile the program to obtain the assembly code – we don’t want to do anything else right now. To do that we can use the -S option passed to the gcc program, which takes the source code of the program and generates the assembly instructions. We also want the masm Intel assembly source code and not some other format. We can achieve that by passing the -masm=Intel to the gcc program. If we’re on the 64-bit operating system, we also want to compile the program as 32-bit, which we can achieve by passing the -m32 argument to the gcc program. The whole gcc command that we’re using can be seen in the output below:

# gcc -m32 -masm=intel -S hello.c -o hello.s

This command effectively takes the hello.c program and compiles it as 32-bit program into assembly instructions that are saved into the hello.s file.

The hello.s file now looks like presented below:

        .file   "hello.c"
        .intel_syntax noprefix
        .section        .rodata
.LC0:
        .string "Hello World!"
        .text
.globl main
        .type   main, @function
main:
        push    ebp
        mov     ebp, esp
        and     esp, -16
        sub     esp, 16
        mov     eax, OFFSET FLAT:.LC0
        mov     DWORD PTR [esp], eax
        call    printf
        mov     eax, 0
        leave
        ret
        .size   main, .-main
        .ident  "GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4"
        .section        .note.GNU-stack,"",@progbits

The .file directive states the original source file name that is normally used by debuggers. The .intel_syntax line specifies that we’re using intel sytax assembly and not AT&T syntax. Afterwards we’re defining the .rodata section, which is used for read-only data variables. In our case the .rodata section contains only the zero terminated string “Hello World!” that can be accessed with the LC0 variable. Then we’re defining the .text section, which is used for the code of the program.

First we must define the main function (notice the .type main,@function instruction), which is globally visible (notice the .globl main instruction). From the main: label till the ret instruction is the actual code of the program. That code first initializes the stack by pushing the value of the register EBP to the stack, moving the value of register ESP to EBP. The “and esp,-16″ is used for optimization because some operations can be performed faster if the stack pointer address is in a multiple of 16 bytes. That instruction is put in there because by default, gcc uses the optimization flag -O2. Then we’re subtracting 16 bytes from the current ESP stack pointer register for local variables. Next, the address to the LC0 (our “Hello World!” string) is read into the register eax and moved to the top of the stack, which is the first and only parameter to the printf function that is called right after. The printf function prints that string on the screen and returns to the caller, which takes care of the stack and returns.

The .size instruction sets the size of the main function. The .-main holds the exact size of the function main, which is written to the object file. The .ident instruction saves the ” GCC: (Gentoo 4.5.4 p1.0, pie-0.4.7) 4.5.4″ string to the object file in order to save the information about the compiler which was used to compile the executable.

Hello World Program: The Object File

We’ve seen the assembly code that was generated by the gcc directly from the corresponding C source code. But without the actual assembler and linker we can’t run the executable. To assemble the executable into the object file, we must use the -c option with the gcc compiler, which only assembles/compiles the source file, but does not actually link it. To obtain the object file from the assembly code we need to run the command below:

# gcc -m32 -masm=intel -c hello.s -o hello.o
# file hello.o
hello.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

We can see that the hello.o is the object file that is actually an ELF 32-bit executable, which is not linked yet. If we want to run the executable, it will fail as noted below:

# chmod +x hello.o
# ./hello.o
bash: ./hello.o: cannot execute binary file

We can read the contents of the object file with the readelf program as follows:

Want to learn more?? The InfoSec Institute Advanced Hacking course aims to train you on how to successfully attack fully patched and hardened systems by developing your own exploits. You will how to circumvent common security controls such as DEP and ASLR, and how to get to confidential data. You take this knowledge back to your organization and can then formulate a way to defend against these sophisticated attacks. Some features of this course include:
  • Create 0day attacks as part of the Advanced Persistent Threat
  • 5 days of Intensive Hands-On Labs
  • Use fuzzers and dynamic analysis to attack custom and COTS apps
  • Reverse engineer binaries to find new vulnerabilities never discovered before
  • Attack and defeat VPNs, IDS/IPS and other security technologies
# readelf -a hello.o
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          224 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           40 (bytes)
  Number of section headers:         11
  Section header string table index: 8
Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        00000000 000034 00001d 00  AX  0   0  4
  [ 2] .rel.text         REL             00000000 000350 000010 08      9   1  4
  [ 3] .data             PROGBITS        00000000 000054 000000 00  WA  0   0  4
  [ 4] .bss              NOBITS          00000000 000054 000000 00  WA  0   0  4
  [ 5] .rodata           PROGBITS        00000000 000054 00000d 00   A  0   0  1
  [ 6] .comment          PROGBITS        00000000 000061 00002b 01  MS  0   0  1
  [ 7] .note.GNU-stack   PROGBITS        00000000 00008c 000000 00      0   0  1
  [ 8] .shstrtab         STRTAB          00000000 00008c 000051 00      0   0  1
  [ 9] .symtab           SYMTAB          00000000 000298 0000a0 10     10   8  4
  [10] .strtab           STRTAB          00000000 000338 000015 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)
There are no section groups in this file.
There are no program headers in this file.
Relocation section '.rel.text' at offset 0x350 contains 2 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0000000a  00000501 R_386_32          00000000   .rodata
00000012  00000902 R_386_PC32        00000000   printf
There are no unwind sections in this file.
Symbol table '.symtab' contains 10 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
     2: 00000000     0 SECTION LOCAL  DEFAULT    1
     3: 00000000     0 SECTION LOCAL  DEFAULT    3
     4: 00000000     0 SECTION LOCAL  DEFAULT    4
     5: 00000000     0 SECTION LOCAL  DEFAULT    5
     6: 00000000     0 SECTION LOCAL  DEFAULT    7
     7: 00000000     0 SECTION LOCAL  DEFAULT    6
     8: 00000000    29 FUNC    GLOBAL DEFAULT    1 main
     9: 00000000     0 NOTYPE  GLOBAL DEFAULT  UND printf
No version information found in this file.

We can see that the file is an ELF object file that has 11 section headers. The first section header is null. The second section header is .text, which contains the executable instructions of the program. The .rel.text holds the relocation information of the .text section. The relocation entries must be present, as our program instructions call external functions, whose function pointers must be updated upon the program execution.

In the output above, we can see that the .rel.text holds two relocation entries: the .rodata and printf. The .data section holds the initialized data, while the .bss section holds uninitialized data that the program uses. The .rodata holds read-only data that can be used by the program; this is where our “Hello World!” string is stored. The .comment section holds version control information and the .note.GNU-stack holds some additional data that I won’t describe here. The .shstrtab holds section names, while the .strtab holds section strings and the .symtab holds the symbol table.

We can quickly figure out that in the assembly code there was only the .rodata and .text sections defined, but when we translated the assembly code into the object file, quite some sections were added to the file. Those sections are needed to successfully link the executable and properly execute the program.

Want to learn more?? The InfoSec Institute Advanced Hacking course aims to train you on how to successfully attack fully patched and hardened systems by developing your own exploits. You will how to circumvent common security controls such as DEP and ASLR, and how to get to confidential data. You take this knowledge back to your organization and can then formulate a way to defend against these sophisticated attacks. Some features of this course include:
  • Create 0day attacks as part of the Advanced Persistent Threat
  • 5 days of Intensive Hands-On Labs
  • Use fuzzers and dynamic analysis to attack custom and COTS apps
  • Reverse engineer binaries to find new vulnerabilities never discovered before
  • Attack and defeat VPNs, IDS/IPS and other security technologies

Hello World Program: The Executable

The last step is to actually link the object file to make an executable. To do that, we must execute the command below:

# gcc -m32 hello.o -o hello
# ./hello
Hello World!

We’ve linked the object file hello.o into the executable ./hello and executed it. Upon execution of the program, the program outputted the “Hello World!” string as it should. If we take a look at the ELF again, we can see that there is a lot of other information and file sections added to the executable, which can be seen below:

$ readelf -a hello
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Intel 80386
  Version:                           0x1
  Entry point address:               0x8048330
  Start of program headers:          52 (bytes into file)
  Start of section headers:          4392 (bytes into file)
  Flags:                             0x0
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         10
  Size of section headers:           40 (bytes)
  Number of section headers:         30
  Section header string table index: 27

Section Headers:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            00000000 000000 000000 00      0   0  0
  [ 1] .interp           PROGBITS        08048174 000174 000013 00   A  0   0  1
  [ 2] .note.ABI-tag     NOTE            08048188 000188 000020 00   A  0   0  4
  [ 3] .hash             HASH            080481a8 0001a8 000028 04   A  5   0  4
  [ 4] .gnu.hash         GNU_HASH        080481d0 0001d0 000020 04   A  5   0  4
  [ 5] .dynsym           DYNSYM          080481f0 0001f0 000050 10   A  6   1  4
  [ 6] .dynstr           STRTAB          08048240 000240 00004c 00   A  0   0  1
  [ 7] .gnu.version      VERSYM          0804828c 00028c 00000a 02   A  5   0  2
  [ 8] .gnu.version_r    VERNEED         08048298 000298 000020 00   A  6   1  4
  [ 9] .rel.dyn          REL             080482b8 0002b8 000008 08   A  5   0  4
  [10] .rel.plt          REL             080482c0 0002c0 000018 08   A  5  12  4
  [11] .init             PROGBITS        080482d8 0002d8 000017 00  AX  0   0  4
  [12] .plt              PROGBITS        080482f0 0002f0 000040 04  AX  0   0 16
  [13] .text             PROGBITS        08048330 000330 00019c 00  AX  0   0 16
  [14] .fini             PROGBITS        080484cc 0004cc 00001c 00  AX  0   0  4
  [15] .rodata           PROGBITS        080484e8 0004e8 000015 00   A  0   0  4
  [16] .eh_frame_hdr     PROGBITS        08048500 000500 000014 00   A  0   0  4
  [17] .eh_frame         PROGBITS        08048514 000514 000040 00   A  0   0  4
  [18] .ctors            PROGBITS        08049f0c 000f0c 000008 00  WA  0   0  4
  [19] .dtors            PROGBITS        08049f14 000f14 000008 00  WA  0   0  4
  [20] .jcr              PROGBITS        08049f1c 000f1c 000004 00  WA  0   0  4
  [21] .dynamic          DYNAMIC         08049f20 000f20 0000d0 08  WA  6   0  4
  [22] .got              PROGBITS        08049ff0 000ff0 000004 04  WA  0   0  4
  [23] .got.plt          PROGBITS        08049ff4 000ff4 000018 04  WA  0   0  4
  [24] .data             PROGBITS        0804a00c 00100c 000008 00  WA  0   0  4
  [25] .bss              NOBITS          0804a014 001014 000008 00  WA  0   0  4
  [26] .comment          PROGBITS        00000000 001014 00002a 01  MS  0   0  1
  [27] .shstrtab         STRTAB          00000000 00103e 0000e9 00      0   0  1
  [28] .symtab           SYMTAB          00000000 0015d8 000340 10     29  32  4
  [29] .strtab           STRTAB          00000000 001918 00014d 00      0   0  1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x08048034 0x08048034 0x00140 0x00140 R E 0x4
  INTERP         0x000174 0x08048174 0x08048174 0x00013 0x00013 R   0x1
      [Requesting program interpreter: /lib/ld-linux.so.2]
  LOAD           0x000000 0x08048000 0x08048000 0x00554 0x00554 R E 0x1000
  LOAD           0x000f0c 0x08049f0c 0x08049f0c 0x00108 0x00110 RW  0x1000
  DYNAMIC        0x000f20 0x08049f20 0x08049f20 0x000d0 0x000d0 RW  0x4
  NOTE           0x000188 0x08048188 0x08048188 0x00020 0x00020 R   0x4
  GNU_EH_FRAME   0x000500 0x08048500 0x08048500 0x00014 0x00014 R   0x4
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0x4
  GNU_RELRO      0x000f0c 0x08049f0c 0x08049f0c 0x000f4 0x000f4 R   0x1
  PAX_FLAGS      0x000000 0x00000000 0x00000000 0x00000 0x00000     0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .interp .note.ABI-tag .hash .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.dyn .rel.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr
   07
   08     .ctors .dtors .jcr .dynamic .got
   09

Dynamic section at offset 0xf20 contains 21 entries:
  Tag        Type                         Name/Value
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x0000000c (INIT)                       0x80482d8
 0x0000000d (FINI)                       0x80484cc
 0x00000004 (HASH)                       0x80481a8
 0x6ffffef5 (GNU_HASH)                   0x80481d0
 0x00000005 (STRTAB)                     0x8048240
 0x00000006 (SYMTAB)                     0x80481f0
 0x0000000a (STRSZ)                      76 (bytes)
 0x0000000b (SYMENT)                     16 (bytes)
 0x00000015 (DEBUG)                      0x0
 0x00000003 (PLTGOT)                     0x8049ff4
 0x00000002 (PLTRELSZ)                   24 (bytes)
 0x00000014 (PLTREL)                     REL
 0x00000017 (JMPREL)                     0x80482c0
 0x00000011 (REL)                        0x80482b8
 0x00000012 (RELSZ)                      8 (bytes)
 0x00000013 (RELENT)                     8 (bytes)
 0x6ffffffe (VERNEED)                    0x8048298
 0x6fffffff (VERNEEDNUM)                 1
 0x6ffffff0 (VERSYM)                     0x804828c
 0x00000000 (NULL)                       0x0

Relocation section '.rel.dyn' at offset 0x2b8 contains 1 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
08049ff0  00000206 R_386_GLOB_DAT    00000000   __gmon_start__

Relocation section '.rel.plt' at offset 0x2c0 contains 3 entries:
 Offset     Info    Type            Sym.Value  Sym. Name
0804a000  00000107 R_386_JUMP_SLOT   00000000   printf
0804a004  00000207 R_386_JUMP_SLOT   00000000   __gmon_start__
0804a008  00000307 R_386_JUMP_SLOT   00000000   __libc_start_main

There are no unwind sections in this file.

Symbol table '.dynsym' contains 5 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf@GLIBC_2.0 (2)
     2: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     3: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.0 (2)
     4: 080484ec     4 OBJECT  GLOBAL DEFAULT   15 _IO_stdin_used

Symbol table '.symtab' contains 52 entries:
   Num:    Value  Size Type    Bind   Vis      Ndx Name
     0: 00000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 08048174     0 SECTION LOCAL  DEFAULT    1
     2: 08048188     0 SECTION LOCAL  DEFAULT    2
     3: 080481a8     0 SECTION LOCAL  DEFAULT    3
     4: 080481d0     0 SECTION LOCAL  DEFAULT    4
     5: 080481f0     0 SECTION LOCAL  DEFAULT    5
     6: 08048240     0 SECTION LOCAL  DEFAULT    6
     7: 0804828c     0 SECTION LOCAL  DEFAULT    7
     8: 08048298     0 SECTION LOCAL  DEFAULT    8
     9: 080482b8     0 SECTION LOCAL  DEFAULT    9
    10: 080482c0     0 SECTION LOCAL  DEFAULT   10
    11: 080482d8     0 SECTION LOCAL  DEFAULT   11
    12: 080482f0     0 SECTION LOCAL  DEFAULT   12
    13: 08048330     0 SECTION LOCAL  DEFAULT   13
    14: 080484cc     0 SECTION LOCAL  DEFAULT   14
    15: 080484e8     0 SECTION LOCAL  DEFAULT   15
    16: 08048500     0 SECTION LOCAL  DEFAULT   16
    17: 08048514     0 SECTION LOCAL  DEFAULT   17
    18: 08049f0c     0 SECTION LOCAL  DEFAULT   18
    19: 08049f14     0 SECTION LOCAL  DEFAULT   19
    20: 08049f1c     0 SECTION LOCAL  DEFAULT   20
    21: 08049f20     0 SECTION LOCAL  DEFAULT   21
    22: 08049ff0     0 SECTION LOCAL  DEFAULT   22
    23: 08049ff4     0 SECTION LOCAL  DEFAULT   23
    24: 0804a00c     0 SECTION LOCAL  DEFAULT   24
    25: 0804a014     0 SECTION LOCAL  DEFAULT   25
    26: 00000000     0 SECTION LOCAL  DEFAULT   26
    27: 00000000     0 FILE    LOCAL  DEFAULT  ABS hello.c
    28: 08049f0c     0 NOTYPE  LOCAL  DEFAULT   18 __init_array_end
    29: 08049f20     0 OBJECT  LOCAL  DEFAULT   21 _DYNAMIC
    30: 08049f0c     0 NOTYPE  LOCAL  DEFAULT   18 __init_array_start
    31: 08049ff4     0 OBJECT  LOCAL  DEFAULT   23 _GLOBAL_OFFSET_TABLE_
    32: 08048490     5 FUNC    GLOBAL DEFAULT   13 __libc_csu_fini
    33: 08048495     0 FUNC    GLOBAL HIDDEN    13 __i686.get_pc_thunk.bx
    34: 0804a00c     0 NOTYPE  WEAK   DEFAULT   24 data_start
    35: 00000000     0 FUNC    GLOBAL DEFAULT  UND printf@@GLIBC_2.0
    36: 0804a014     0 NOTYPE  GLOBAL DEFAULT  ABS _edata
    37: 080484cc     0 FUNC    GLOBAL DEFAULT   14 _fini
    38: 08049f18     0 OBJECT  GLOBAL HIDDEN    19 __DTOR_END__
    39: 0804a00c     0 NOTYPE  GLOBAL DEFAULT   24 __data_start
    40: 00000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
    41: 0804a010     0 OBJECT  GLOBAL HIDDEN    24 __dso_handle
    42: 080484ec     4 OBJECT  GLOBAL DEFAULT   15 _IO_stdin_used
    43: 00000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@@GLIBC_
    44: 08048430    90 FUNC    GLOBAL DEFAULT   13 __libc_csu_init
    45: 0804a01c     0 NOTYPE  GLOBAL DEFAULT  ABS _end
    46: 08048330     0 FUNC    GLOBAL DEFAULT   13 _start
    47: 080484e8     4 OBJECT  GLOBAL DEFAULT   15 _fp_hw
    48: 0804a014     0 NOTYPE  GLOBAL DEFAULT  ABS __bss_start
    49: 08048404    29 FUNC    GLOBAL DEFAULT   13 main
    50: 00000000     0 NOTYPE  WEAK   DEFAULT  UND _Jv_RegisterClasses
    51: 080482d8     0 FUNC    GLOBAL DEFAULT   11 _init

Histogram for bucket list length (total of 3 buckets):
 Length  Number     % of total  Coverage
      0  0          (  0.0%)
      1  2          ( 66.7%)     50.0%
      2  1          ( 33.3%)    100.0%

Histogram for `.gnu.hash' bucket list length (total of 2 buckets):
 Length  Number     % of total  Coverage
      0  1          ( 50.0%)
      1  1          ( 50.0%)    100.0%

Version symbols section '.gnu.version' contains 5 entries:
 Addr: 000000000804828c  Offset: 0x00028c  Link: 5 (.dynsym)
  000:   0 (*local*)       2 (GLIBC_2.0)     0 (*local*)       2 (GLIBC_2.0)
  004:   1 (*global*)

Version needs section '.gnu.version_r' contains 1 entries:
 Addr: 0x0000000008048298  Offset: 0x000298  Link: 6 (.dynstr)
  000000: Version: 1  File: libc.so.6  Cnt: 1
  0x0010:   Name: GLIBC_2.0  Flags: none  Version: 2

Notes at offset 0x00000188 with length 0x00000020:
  Owner                 Data size       Description
  GNU                  0x00000010       NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 2.6.9

Conclusion

We’ve now seen how a simple program written in C is converted into the assembly code, the object file and finally the executable file. While in the C code, the program didn’t have any sections, it had two sections in assembly dialect: the .rodata and .text. When we compiled it into an object file and finally into the executable, the file had more and more sections that are needed for the program to be executed successfully.