Overview

The x86 Assembly language or ASM is the lowest-level programming language understood by human kind and one of the most primitive ones; it can be described as machine language. If we can understand and handle assembly, then we can understand exactly how a computer works, which gives us the logic and especially the ability to code using any other programming language.

Programs coded in assembly are generally small, and can communicate much faster with the machine. Assembly language is called machine language because each Central Processing Unit (CPU) has its set of instructions (they set the architecture) which is the only thing that it understands, and is exactly the same for all 32-bit processors (which is due to the requirement of compatibility with all various devices present in the market).

That said, each assembly instruction is associated with a code which is always the same, so it uses a mnemonic device to serve each low level machine opcode (operation code). This article is not designed to teach you how to code using assembly language, the aim is introducing you the most common instructions you will meet when practicing reverse code engineering and handling dissemblers / debuggers, and providing you only a very basic introduction.

Here we go:

Registers

So that it can store information (under different values and different sizes), each processor is composed of different parts, kind of “boxes”, called registers. They constitute one of the most important parts of the CPU, and according to the characteristics of the information to store (value, size, etc.) , using registers instead of memory makes the processor faster. We can consider three kinds of registers:

  1. General Registers: Used to manipulate data, to pass parameters when calling a DOS function, and to store intermediate results
  2. Status Registers.
  3. Segment Register: Used to store the starting address of a segment. It may be the address of the beginning of a program’s instructions, the beginning of data, or the beginning of the stack.

Almost all registers can be divided into 16 and 8 bits. General registers begin with the letters A, B, C and D, and are the most used registers.

  • AX – Accumulator Register: used to perform arithmetic operations or send a parameter to an interruption.
  • BX – Base Register: used to perform arithmetic operations or as the base address of an array.
  • CX – Counter Register: used generally as a counter on loops.
  • DX – Data Register: used to store data for functions, and as a port number in input / output operations.

AX, BX, CX and DX are 16-bit-registers. Each of them can be broken down into two little 8-bit registers L and H (Low / High), for example AX(AL, AH). To get 32-bit registers we can add an “E” to the 16-bit registers which would give: EAX, EBX, ECX and EDX. (Please note that we cannot have EAH or EAL, since the low and the high parts of 32 bit-registers are not directly accessible).

Logically these registers can contain only values equals to their capacities. Actually the amount of bits (8, 16 and 32) corresponds to these capacities, that is to say: 8 bits = 255d, 16 bits = 65535d, 32 bits = 294 967 295d (“d” to say decimal, and these are the maximum values a register can contain).

Regarding Status Registers, they do not have 8-bit parts, so they contain neither H nor L. These registers are:

  • DI – Destination Index: mainly used when handling string instructions, and is generally associated with Segment Registers DS or ES.
  • SI – Source Index: used as source data address when it comes to manipulating strings, and is generally associated with Segment Register DS.
  • BP – Base Pointer: when a subroutine is called by a “CALL“, this register is partnering with the SS Segment Register to access data from the stack and is generally used for registering indirect addresses.
  • IP – Instruction Pointer: associated with the Segment Register CS to indicate the next instruction to execute, and indirectly modified by jumps instructions, subroutines and interrupts.
  • SP – Stack Pointer: used with Segment Register SS (SS: SP) to indicate the last element of the stack.

All of these are 16-bit registers, and can be extended to 32-bit by adding an “E” as well (EDI, ESI, EBP, EIP, and ESP). Segment Registers are in turn used to store and / or retrieve memory data.

To be more efficient and precise, the CPU needs an address; this address is divided into two 32- or 16-bit parts. The first is called “segment” the second is called “offset“, which lets us say that 32-bit addresses are stored in segment:offset.

Segment Registers are read and written only in 16 bits and can contain addresses of a 64 KB segment. x86 assembly uses 32 bits offset. Various Segment Registers are:

  • CS –Code Segment: contains address of segment with CPU instructions referenced by Instruction Pointer register (IP) and is updated with far jump, far call, and return instructions.
  • SS – Stack Segment: contains all data referenced by Stack Pointer and Base Pointer.
  • ES – Extra Segment: referenced by Destination Index (DI) in string manipulation.
  • DS – Data Segment: contains all data referenced by Accumulator Register, Base Register, Counter Register, Data Register, Source Index, and Destination Index.

The Stack

The stack is a memory area that can hold temporary data (functions parameters, variables, etc.) and is designed to behave in a “Last In, First Out” context, which means the first value stored in the stack (or pile) will be the last entry out. The sample always given when it comes to explaining how the stack works is “plates stacked up to be washed”; the last to be stacked will be the first to be washed.


Figure: Simple representation of a stack (wikipedia)

To be able to “push” data onto the stack and “pop” data from it, x86 assembly uses the instructions PUSH and POP.

Push Instruction

Push is used to decrement the Stack Pointer (SP: ESP), and using PUSH we can put a value on the top of the stack.

  • PUSH AX
  • PUSH BX
  • PUSH 1986

First push AX onto the stack, then BX then the value 1986; but it’s 1986 that will be “popped” first.

Pop Instruction

Pop increments the Stack Pointer by loading values or data stored in the location pointed to by SP.

  • POP AX
  • POP BX
  • PUSH CX

Assuming AX =1 and BX = 2, and following the example of Push, the top most element, which is the value of BX (2), is stored in AX. Then BX contains 1, the value of AX. Now the stack is empty.

  • Flags, Conditional jumps, and Comparisons
    • Flags

    Flags are kind of indicator alterable by many instructions; they describe the result of logical instruction, arithmetic and mathematical instruction, instruction of comparison…

    Flags are regrouped into the Flags Register and its 16-bit register.

  1. Bit 1: CF
  2. Bit 2: 1 < Reserved
  3. Bit 3: PF
  4. Bit 4: 0 < Reserved
  5. Bit 5: AF
  6. Bit 6: 0 < Reserved
  7. Bit 7: ZF
  8. Bit 8: SF
  9. Bit 9: TF
  10. Bit 10: IF
  11. Bit 11: DF
  12. Bit 12: OF
  13. Bit 13: IOPL
  14. Bit 14: NT
  15. Bit 15 : 0 < Reserved
  16. Bit 16 : RF
  17. Bit 17 : VM

Marked bits represent wildly used flags, and are used according to this:

  • CF – Carry Flag: affected by the result of arithmetic instructions, “used to indicate when an arithmetic carry or borrow has been generated out of the most significant ALU bit position.” (Wikipedia)
  • PF – Parity Flag: takes value 1 if an operand’s number of bits is even.
  • AF – Auxiliary Flag (or Adjust Flag): “indicates when an arithmetic carry or borrow has been generated out of the 4 least significant bits.” (Wikipedia)
  • ZF – Zero Flag: used to check the result of arithmetic operations. If an operand result is equal to 0, ZF takes the value 1, used frequently to compare the result of a subtraction.
  • SF – Sign Flag: takes the value 1 if the result of the last mathematical operation is “signed” (+ / -)
  • IF – Interrupt Flag: by taking the value 1, IF lets the CPU handle hardware interrupts, if set to 0, the CPU will ignore such interrupts.
  • DF – Direction Flag: controls the direction of pointers movement (on strings processing for example, left to right / right to left.)
  • OF – Overflow Flag: indicates if an overflow occurred during an operation and may also be used to correct some mathematical operation errors in case of overflows (if overflow, OF takes the value 1).

Flags are directly related to conditional statements, which leads us to introduce conditional jumps before talking about comparisons.

Conditional jumps

We are about to discuss an interesting part insofar as it helps to understand the reaction of the program following the result of mostoperations (1 or 0).

Want to learn more?? The InfoSec Institute Reverse Engineering course teaches you everything from reverse engineering malware to discovering vulnerabilities in binaries. These skills are required in order to properly secure an organization from today's ever evolving threats. In this 5 day hands-on course, you will gain the necessary binary analysis skills to discover the true nature of any Windows binary. You will learn how to recognize the high level language constructs (such as branching statements, looping functions and network socket code) critical to performing a thorough and professional reverse engineering analysis of a binary. Some features of this course include:

  • CREA Certification
  • 5 days of Intensive Hands-On Labs
  • Hostile Code & Malware analysis, including: Worms, Viruses, Trojans, Rootkits and Bots
  • Binary obfuscation schemes, used by: Hackers, Trojan writers and copy protection algorithms
  • Learn the methodologies, tools, and manual reversing techniques used real world situations in our reversing lab.
Flags

Value

Jump

Signification

CF

1

JB

Jump If Below

JBE

Jump If Below or Equal

JC

Jump if Carry

JNAE

Jump if Not Above or Equal

0

JA

Jump if Above

JAE

Jump if Above or Equal

JNB

Jump if Not Below

JNC

Jump if Not Carry

ZF

1

JE

Jump if Equal

JNA

Jump if Not Above

JZ

Jump if Zero

0

JNBE

Jump if Not Below or Equal

JNE

Jump in Not Equal

JNZ

Jump if Not Zero

PF

1

JP

Jump if Parity

JPE

Jump if Parity Even

0

JNP

Jump if Not Parity

JPO

Jump if Parity Odd

OF

1

JO

Jump if Overflow

0

JNO

Jump if Not Overflow

SF

1

JS

Jump if Signed

0

JNS

Jump if Not Signed

And it’s not without interest to add:

ZF and SF

ZF = 1
SF = OF

JG

Jump if Greater

JNLE

Jump if Not Less or Equal

SF

OF

JGE

Jump if Greater or Equal

JNL

Jump if Not Less

Signed SF

OF

JL

Jump if Less

JNGE

Jump if Not Greater or Equal

ZF and signed SF

ZF = 1

Signed SF = OF

JLE

Jump if Less or Equal

JNGE

Jump if Not Greater or Equal

To let a jump “decide” if it is taken or not, it needs to make some tests or comparisons using instructions like:

CMP instruction

CMP compares two operands but does not store a result. Using this statement, the program does a test between two values by subtracting them (it subtracts the second operand from the first), and following the result (0 or 1), it changes a given flag (Flags affected are OF, SF, ZF, AF, PF, and CF). For instance, if the two given values are equal, Zero Flag holds the value 1, otherwise it holds 0. CMP can be compared to SUB, another mathematical instruction.

  • CMP AX, BX

Here CPM does AX-BX. If the result of this subtraction is equal to zero, the AX is equal to BX and this will affect ZF by changing its value to 1.

To make it easier, jumps are TAKEN when:

  • Result is bigger than (unsigned numbers) – > JA
  • Result is lower than (unsigned numbers) -> JB
  • Result is bigger than (signed numbers) – > JG
  • Result is lower than (signed numbers) -> JL
  • Equality (signed and unsigned numbers) -> JE or JZ

Just add “N” after “J” to get the negative / opposite instruction (JA / JNA, JB / JNB…) so jumps ARE NOT taken if Result is NOT bigger then (unsigned numbers) – > JNB …

We are touching the end of this first part, we talked very basically about registers, the stack, flags, conditional jumps and the instruction of comparison CMP. In the next part we will talk essentially about mathematical and logical instructions of memory.

References