Reverse engineering

x86 Assembly Language Applicable To Reverse Engineering: The Basics – Part 1

Overview

The x86 Assembly language or ASM is the lowest-level programming language understood by human kind and one of the most primitive ones; it can be described as machine language. If we can understand and handle assembly, then we can understand exactly how a computer works, which gives us the logic and especially the ability to code using any other programming language.

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Start Learning

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Start Learning

Programs coded in assembly are generally small, and can communicate much faster with the machine. Assembly language is called machine language because each Central Processing Unit (CPU) has its set of instructions (they set the architecture) which is the only thing that it understands, and is exactly the same for all 32-bit processors (which is due to the requirement of compatibility with all various devices present in the market).

That said, each assembly instruction is associated with a code which is always the same, so it uses a mnemonic device to serve each low level machine opcode (operation code). This article is not designed to teach you how to code using assembly language, the aim is introducing you the most common instructions you will meet when practicing reverse code engineering and handling dissemblers / debuggers, and providing you only a very basic introduction.

Here we go:

Registers

So that it can store information (under different values and different sizes), each processor is composed of different parts, kind of "boxes", called registers. They constitute one of the most important parts of the CPU, and according to the characteristics of the information to store (value, size, etc.) , using registers instead of memory makes the processor faster. We can consider three kinds of registers:

General Registers: Used to manipulate data, to pass parameters when calling a DOS function, and to store intermediate results
Status Registers.
Segment Register: Used to store the starting address of a segment. It may be the address of the beginning of a program's instructions, the beginning of data, or the beginning of the stack.

Almost all registers can be divided into 16 and 8 bits. General registers begin with the letters A, B, C and D, and are the most used registers.

AX – Accumulator Register: used to perform arithmetic operations or send a parameter to an interruption.

BX – Base Register: used to perform arithmetic operations or as the base address of an array.

CX – Counter Register: used generally as a counter on loops.

DX – Data Register: used to store data for functions, and as a port number in input / output operations.

AX, BX, CX and DX are 16-bit-registers. Each of them can be broken down into two little 8-bit registers L and H (Low / High), for example AX(AL, AH). To get 32-bit registers we can add an "E" to the 16-bit registers which would give: EAX, EBX, ECX and EDX. (Please note that we cannot have EAH or EAL, since the low and the high parts of 32 bit-registers are not directly accessible).

Logically these registers can contain only values equals to their capacities. Actually the amount of bits (8, 16 and 32) corresponds to these capacities, that is to say: 8 bits = 255d, 16 bits = 65535d, 32 bits = 294 967 295d ("d" to say decimal, and these are the maximum values a register can contain).

Regarding Status Registers, they do not have 8-bit parts, so they contain neither H nor L. These registers are:

DI – Destination Index: mainly used when handling string instructions, and is generally associated with Segment Registers DS or ES.

SI – Source Index: used as source data address when it comes to manipulating strings, and is generally associated with Segment Register DS.

BP – Base Pointer: when a subroutine is called by a "CALL", this register is partnering with the SS Segment Register to access data from the stack and is generally used for registering indirect addresses.

IP – Instruction Pointer: associated with the Segment Register CS to indicate the next instruction to execute, and indirectly modified by jumps instructions, subroutines and interrupts.

SP – Stack Pointer: used with Segment Register SS (SS: SP) to indicate the last element of the stack.

All of these are 16-bit registers, and can be extended to 32-bit by adding an "E" as well (EDI, ESI, EBP, EIP, and ESP). Segment Registers are in turn used to store and / or retrieve memory data.

To be more efficient and precise, the CPU needs an address; this address is divided into two 32- or 16-bit parts. The first is called "segment" the second is called "offset", which lets us say that 32-bit addresses are stored in segment:offset.

Segment Registers are read and written only in 16 bits and can contain addresses of a 64 KB segment. x86 assembly uses 32 bits offset. Various Segment Registers are:

CS –Code Segment: contains address of segment with CPU instructions referenced by Instruction Pointer register (IP) and is updated with far jump, far call, and return instructions.

SS – Stack Segment: contains all data referenced by Stack Pointer and Base Pointer.

ES – Extra Segment: referenced by Destination Index (DI) in string manipulation.

DS – Data Segment: contains all data referenced by Accumulator Register, Base Register, Counter Register, Data Register, Source Index, and Destination Index.

The Stack

The stack is a memory area that can hold temporary data (functions parameters, variables, etc.) and is designed to behave in a "Last In, First Out" context, which means the first value stored in the stack (or pile) will be the last entry out. The sample always given when it comes to explaining how the stack works is "plates stacked up to be washed"; the last to be stacked will be the first to be washed.

Figure: Simple representation of a stack (wikipedia)

To be able to "push" data onto the stack and "pop" data from it, x86 assembly uses the instructions PUSH and POP.

Push Instruction

Push is used to decrement the Stack Pointer (SP: ESP), and using PUSH we can put a value on the top of the stack.

PUSH AX
PUSH BX
PUSH 1986

First push AX onto the stack, then BX then the value 1986; but it's 1986 that will be "popped" first.

Pop Instruction

Pop increments the Stack Pointer by loading values or data stored in the location pointed to by SP.

POP AX
POP BX
PUSH CX

Assuming AX =1 and BX = 2, and following the example of Push, the top most element, which is the value of BX (2), is stored in AX. Then BX contains 1, the value of AX. Now the stack is empty.

Flags, Conditional jumps, and Comparisons
- Flags
Flags are kind of indicator alterable by many instructions; they describe the result of logical instruction, arithmetic and mathematical instruction, instruction of comparison…
Flags are regrouped into the Flags Register and its 16-bit register.

Bit 1: CF
Bit 2: 1 < Reserved
Bit 3: PF
Bit 4: 0 < Reserved
Bit 5: AF
Bit 6: 0 < Reserved
Bit 7: ZF
Bit 8: SF
Bit 9: TF
Bit 10: IF
Bit 11: DF
Bit 12: OF
Bit 13: IOPL
Bit 14: NT
Bit 15 : 0 < Reserved
Bit 16 : RF
Bit 17 : VM

Marked bits represent wildly used flags, and are used according to this:

CF – Carry Flag: affected by the result of arithmetic instructions, "used to indicate when an arithmetic carry or borrow has been generated out of the most significant ALU bit position." (Wikipedia)

PF – Parity Flag: takes value 1 if an operand's number of bits is even.

AF – Auxiliary Flag (or Adjust Flag): "indicates when an arithmetic carry or borrow has been generated out of the 4 least significant bits." (Wikipedia)

ZF – Zero Flag: used to check the result of arithmetic operations. If an operand result is equal to 0, ZF takes the value 1, used frequently to compare the result of a subtraction.

SF – Sign Flag: takes the value 1 if the result of the last mathematical operation is "signed" (+ / -)

IF – Interrupt Flag: by taking the value 1, IF lets the CPU handle hardware interrupts, if set to 0, the CPU will ignore such interrupts.

DF – Direction Flag: controls the direction of pointers movement (on strings processing for example, left to right / right to left.)

OF – Overflow Flag: indicates if an overflow occurred during an operation and may also be used to correct some mathematical operation errors in case of overflows (if overflow, OF takes the value 1).

Flags are directly related to conditional statements, which leads us to introduce conditional jumps before talking about comparisons.

Conditional jumps

We are about to discuss an interesting part insofar as it helps to understand the reaction of the program following the result of mostoperations (1 or 0).

Flags

Value

Jump

Signification

Jump If Below

JBE

Jump If Below or Equal

Jump if Carry

JNAE

Jump if Not Above or Equal

Jump if Above

JAE

Jump if Above or Equal

JNB

Jump if Not Below

JNC

Jump if Not Carry

Jump if Equal

JNA

Jump if Not Above

Jump if Zero

JNBE

Jump if Not Below or Equal

JNE

Jump in Not Equal

JNZ

Jump if Not Zero

Jump if Parity

JPE

Jump if Parity Even

JNP

Jump if Not Parity

JPO

Jump if Parity Odd

Jump if Overflow

JNO

Jump if Not Overflow

Jump if Signed

JNS

Jump if Not Signed

And it's not without interest to add:

ZF and SF

ZF = 1

SF = OF

Jump if Greater

JNLE

Jump if Not Less or Equal

JGE

Jump if Greater or Equal

JNL

Jump if Not Less

Signed SF

Jump if Less

JNGE

Jump if Not Greater or Equal

ZF and signed SF

ZF = 1

Signed SF = OF

JLE

Jump if Less or Equal

JNGE

Jump if Not Greater or Equal

To let a jump "decide" if it is taken or not, it needs to make some tests or comparisons using instructions like:

CMP instruction

CMP compares two operands but does not store a result. Using this statement, the program does a test between two values by subtracting them (it subtracts the second operand from the first), and following the result (0 or 1), it changes a given flag (Flags affected are OF, SF, ZF, AF, PF, and CF). For instance, if the two given values are equal, Zero Flag holds the value 1, otherwise it holds 0. CMP can be compared to SUB, another mathematical instruction.

CMP AX, BX

Here CPM does AX-BX. If the result of this subtraction is equal to zero, the AX is equal to BX and this will affect ZF by changing its value to 1.

To make it easier, jumps are TAKEN when:

Result is bigger than (unsigned numbers) - > JA
Result is lower than (unsigned numbers) -> JB
Result is bigger than (signed numbers) - > JG
Result is lower than (signed numbers) -> JL
Equality (signed and unsigned numbers) -> JE or JZ

Just add "N" after "J" to get the negative / opposite instruction (JA / JNA, JB / JNB…) so jumps ARE NOT taken if Result is NOT bigger then (unsigned numbers) - > JNB …

We are touching the end of this first part, we talked very basically about registers, the stack, flags, conditional jumps and the instruction of comparison CMP. In the next part we will talk essentially about mathematical and logical instructions of memory.

Intro to x86 Disassembly

Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

Start Learning

Intro to x86 Disassembly

Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

Start Learning

References

Posted: October 15, 2012

Soufiane Tahiri

View Profile

Soufiane Tahiri is is an InfoSec Institute contributor and computer security researcher, specializing in reverse code engineering and software security. He is also founder of www.itsecurity.ma and practiced reversing for more then 8 years. Dynamic and very involved, Soufiane is ready to catch any serious opportunity to be part of a workgroup.

Contact Soufiane in whatever way works for you:

Email: soufianetahiri@gmail.com

Twitter: https://twitter.com/i7s3curi7y

LinkedIn: http://ma.linkedin.com/in/soufianetahiri

Website: http://www.itsecurity.ma

x86 Assembly Language Applicable To Reverse Engineering: The Basics – Part 1

Get certified and advance your career