Secure coding

How to control the flow of a program in x86 assembly

February 17, 2021 by Srinivas

x86 assembly language just like most other programming languages provides us with the ability to control the flow of the program using various instructions.

This article provides an overview of those instructions that can be used to control the flow of a program.

See the last article in this series, How to diagnose and locate segmentation faults in x86 assembly.

Using comparison instructions to control applications at the x86 level

x86 instruction set comes with two popular instructions for comparison. They are CMP and TEST.  Let us explore the following program to understand how these two instructions work.

section .text

global _start

_start:

mov eax, 101

mov ebx, 100

mov ecx, 100

cmp eax, ebx

cmp ebx, ecx

xor eax, eax

test eax, eax

First, let us assemble and link this program using the following commands.

$ nasm comparison.nasm -o comparison.o -f elf32

$ ld comparison.o -o comparison -m elf_i386

Now, let us load the program in GDB as shown below.

$ gdb ./comparison

Set up a breakpoint at the entry point of the program and run the program as shown in the following excerpt.

gef➤  b _start

Breakpoint 1 at 0x8049000

gef➤ run

The following instructions move the values into the respective registers specified in the instructions.

→  0x8049000 <_start+0>       mov    eax, 0x65

0x8049005 <_start+5>       mov    ebx, 0x64

0x804900a <_start+10>      mov    ecx, 0x64

Following is the state of registers after executing the first 3 instructions shown above.

$eax   : 0x65

$ebx   : 0x64

$ecx   : 0x64

$edx   : 0x0

$esp   : 0xffffd210  →  0x00000001

$ebp   : 0x0

$esi   : 0x0

$edi   : 0x0

$eip   : 0x0804900f  →  <_start+15> cmp eax, ebx

The eflags are as shown below.

$eflags: [zero carry parity adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]

Now, let us run the first cmp instruction by typing si and observe the changes to the eflags register.

$eflags: [zero carry parity adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]

As we can notice, there is no difference in the flags after executing the first CMP instruction. CMP instruction compares the values and sets the ZERO flag if the difference is 0. This instruction also sets Sign Flag (SF), Parity Flag (PF), Carry Flag(CF), Overflow Flag(OF) and Adjust Flag(AF) depending on various results. In this case, the values in EAX and EBX are compared and the result did not set any of these flags.

However, after executing the next cmp instruction, the ZERO flag and PARITY flags are set as shown below.

$eflags: [ZERO carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]

When a specific flag is set, GEF shows it in upper case letters as shown in the preceding output.

The next instruction xor eax, eax sets eax to 0. Following is the status of registers after executing this instruction.

$eax   : 0x0

$ebx   : 0x64

$ecx   : 0x64

$edx   : 0x0

$esp   : 0xffffd210  →  0x00000001

$ebp   : 0x0

$esi   : 0x0

$edi   : 0x0

$eip   : 0x08049015  →  <_start+21> test eax, eax

The next instruction test eax, eax checks if the register eax contains the value 0. If yes, the zero flag will be set. Following is the status of eflags after executing this instruction.

$eflags: [ZERO carry PARITY adjust sign trap INTERRUPT direction overflow resume virtualx86 identification]

Parity flag is set if the register eax has an even number of set bits.

These instructions can be used to control the flow of the program. As an example, execute a block of code if a specific register has value 0. Similarly, execute a specific block if the comparison (using the CMP instruction) results in the value zero.

Following is a sample use case of cmp instruction.

cmp eax, ebx

jz _testlabel

Following is a sample use case of test instruction.

test eax, eax

jnz _testlabel

Using jump instructions to control applications at the x86 level

The next set of instructions are jump instructions. Jump instructions are of two types. Unconditional jumps and conditional jumps. The instruction JMP is an unconditional jump as it does not rely on any conditions to be met. All other jump instructions are conditional jump instructions as their execution depends on certain conditions that are possibly set by other parts of the program. Following is an example with both unconditional and conditional jump instructions.

section .data

equal  db “eax and ebx are equal”

notequal db “eax and ebx are not equal”

section .text

global _start

_start:

mov eax, 100

mov ebx, 101

cmp eax, ebx

jz  _printequal

jmp _printnotequal

_exitprogram:

mov eax, 1

mov ebx, 0

int 0x80

_printequal:

mov eax, 4

mov ebx, 1

mov ecx, equal

mov edx, 21

int 0x80

jmp _exitprogram

_printnotequal:

mov eax, 4

mov ebx, 1

mov ecx, notequal

mov edx, 25

int 0x80

jmp _exitprogram

As we can notice in the preceding program, the entry point of the program is labeled as _start. When the program starts its execution, the registers eax and ebx are set with some values. Next, a comparison is done using CMP instruction. Since the values in eax and ebx are not equal, the ZERO flag will not be set. Once it is done, the jz _printequal instruction is executed. This instruction checks if the ZERO flag is set and takes a jump to the label _printequal if zero flag is set. Clearly, this instruction relies on the output of other instructions such as CMP. In this case, the jump will not be taken. Following is an excerpt taken from GDB at this instruction.

0x8049005 <_start+5>       mov    ebx, 0x65

0x804900a <_start+10>      cmp    eax, ebx

→  0x804900c <_start+12>      je     0x804901c <_printequal> NOT taken [Reason: !(Z)]

0x804900e <_start+14>      jmp    0x8049034 <_printnotequal>

GEF clearly shows that the JUMP is not taken because the ZERO flag is not set. Since the JUMP is not taken, the control will be passed to the next instruction, which is an unconditional jump to _printnotequal. Once the code within _printnotequal is executed, there is another unconditional jump instruction to invoke the code within the label _exitprogram, which will gracefully exit the program.

Following is a list of conditional jump instructions.

JE (Jump if Equal): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if operands of the previous cmp instruction are equal.

Example:

mov eax, 10

mov ebx, 10

cmp eax, ebx

je _loc

_loc:

JNE (Jump if Not Equal): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if operands of the previous cmp instruction are not equal.

Example:

mov eax, 10

mov ebx, 11

cmp eax, ebx

jne _loc

_loc:

 

JG (Jump if Greater): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if the first operand is greater than the second operand in the previous cmp instruction. A signed comparison is performed.

Example:

mov eax, 11

mov ebx, 10

cmp eax, ebx

jg _loc

_loc:

JGE (Jump if Greater or Equal): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if the first operand is greater than or equal to the second operand in the previous cmp instruction. A signed comparison is performed.

Example:

mov eax, 11

mov ebx, 10

cmp eax, ebx

jge _loc

_loc:

JA (Jump if Above): This instruction is the same as JG except that it performs an unsigned comparison.

JAE (Jump if Above or Equal): This instruction is the same as JGE except that it performs an unsigned comparison.

JO (Jump if Overflow): This instruction loads the EIP register with the specified address if overflow bit is set.

JNO (Jump if Not Overflow): This instruction loads the EIP register with the specified address if overflow bit is not set.

JZ (Jump if Zero): This instruction loads the EIP register with the specified address if a previous arithmetic expression resulted in a zero flag being set.

JNZ (Jump if Not Zero): This instruction loads the EIP register with the specified address if a zero flag is not set.

JS (Jump if Signed): This instruction loads the EIP register with the specified address if a previous arithmetic expression resulted in the sign flag being set.

JNS (Jump if Not Signed): This instruction loads the EIP register with the specified address if the sign flag is not set.

Using function calls to control applications at the x86 level

In x86, the call instruction is used to call another function. The function can then return using the ret instruction. When a function is called using the call instruction, a new stack frame is created at the current esp location and the return address(typically address of the instruction next to the call instruction) is stored on the stack. After the function is executed, ret instruction will be executed to return to this address saved on the stack. Let us consider the following example.

section .text

global _start

_start:

call print

mov eax, 1

mov ebx, 0

int 0x80

_print:

mov edx,len

mov ecx,msg

mov ebx,1

mov eax,4

int 0x80

ret

section .rodata

msg db  ‘Hello, world!’,0xa

len equ $ – msg

The first instruction within _start directive is a call to _print. After the _print function is executed, the ret instruction will be executed, which will return the control to the exit code written immediately after the call print instruction. Let us see how this looks like using GDB. First, let us assemble and link the program using the following commands,

$ nasm functions.nasm -o functions.o -f elf32

$ ld functions.o -o functions -m elf_i386

Load the binary in GDB using the following command.

$ gdb ./functions

Set up a breakpoint at the entry point and run the program.

gef➤  b _start

Breakpoint 1 at 0x8049000

gef➤  run

Following are the instructions to be executed.

→  0x8049000 <_start+0>       call   0x8049011 <_print>

↳   0x8049011 <_print+0>       mov    edx, 0xe

0x8049016 <_print+5>       mov    ecx, 0x804a000

0x804901b <_print+10>      mov    ebx, 0x1

0x8049020 <_print+15>      mov    eax, 0x4

0x8049025 <_print+20>      int    0x80

0x8049027 <_print+22>      ret

Following is the stack before running the first instruction.

0xffffd210│+0x0000: 0x00000001 ← $esp

0xffffd214│+0x0004: 0xffffd3c7  →  “/home/dev/x86/functions”

0xffffd218│+0x0008: 0x00000000

0xffffd21c│+0x000c: 0xffffd3df  →  “SHELL=/bin/bash”

0xffffd220│+0x0010: 0xffffd3ef  →  “SESSION_MANAGER=local/x86-64:@/tmp/.ICE-unix/1760,[…]”

0xffffd224│+0x0014: 0xffffd441  →  “QT_ACCESSIBILITY=1”

0xffffd228│+0x0018: 0xffffd454  →  “COLORTERM=truecolor”

0xffffd22c│+0x001c: 0xffffd468  →  “XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg”

Now, run the call instruction by typing si and observe the top of the stack.

0xffffd20c│+0x0000: 0x08049005  →  <_start+5> mov eax, 0x1 ← $esp

0xffffd210│+0x0004: 0x00000001

0xffffd214│+0x0008: 0xffffd3c7  →  “/home/dev/x86/functions”

0xffffd218│+0x000c: 0x00000000

0xffffd21c│+0x0010: 0xffffd3df  →  “SHELL=/bin/bash”

0xffffd220│+0x0014: 0xffffd3ef  →  “SESSION_MANAGER=local/x86-64:@/tmp/.ICE-unix/1760,[…]”

0xffffd224│+0x0018: 0xffffd441  →  “QT_ACCESSIBILITY=1”

0xffffd228│+0x001c: 0xffffd454  →  “COLORTERM=truecolor”

Notice the address placed on the top of the stack after executing the call instruction. What address is this? Let us view the disassembly of _start, which looks as shown below.

gef➤  disass _start

Dump of assembler code for function _start:

0x08049000 <+0>: call   0x8049011 <_print>

0x08049005 <+5>: mov    eax,0x1

0x0804900a <+10>: mov    ebx,0x0

0x0804900f <+15>: int    0x80

End of assembler dump.

gef➤

As we can see in the preceding excerpt, the address placed on the stack is the address of the immediate next instruction to the call instruction. Let us continue execution until the ret instruction and observe what happens when we are about to execute the ret instruction.

→  0x8049027 <_print+22>      ret

↳   0x8049005 <_start+5>       mov    eax, 0x1

0x804900a <_start+10>      mov    ebx, 0x0

0x804900f <_start+15>      int    0x80

As we can notice in the preceding excerpt, the address of the next instruction to be executed after the ret instruction is the same address that was placed on the stack earlier. So, when the ret instruction is executed, the address will be popped from the stack and placed in the EIP register.

Using loop instructions to control applications at the x86 level

x86 instruction set provides loop instruction, which decrements ECX and jumps to the address specified by arg unless decrementing ECX causes its value to become zero. So, the loop will continue to run until the value of ECX becomes zero. Let us examine the following program.

section .text

global _start

_start:

mov eax, 0

mov ecx, 5

_addtoeax:

inc eax

loop _addtoeax

The preceding program has two registers eax, ecx with the values 0 and 5 respectively. When the control first goes to _addtoeax, the value of eax will be incremented and the loop _addtoeax instruction will be executed. When this instruction is executed, the value of ECX will be decremented by 1 and eax will be incremented once again.  The loop will continue until ECX becomes 0.  When ECX value becomes 1, EAX value becomes 5. So, when the loop instruction executes, ECX becomes 0 and the loop terminates there.

Conclusion:

As discussed in this article, there are several different instructions exist in the x86 assembly instruction set to control the flow of a program. Depending on the requirement, we can choose to use these instructions appropriately.

See the last article in this series, How to implement common logic constructs such as if/else/loops in x86 assembly.

 

Sources

Posted: February 17, 2021
Articles Author
Srinivas
View Profile

Srinivas is an Information Security professional with 4 years of industry experience in Web, Mobile and Infrastructure Penetration Testing. He is currently a security researcher at Infosec Institute Inc. He holds Offensive Security Certified Professional(OSCP) Certification. He blogs atwww.androidpentesting.com. Email: srini0x00@gmail.com