Secure coding

Introduction to x86 assembly and syntax

Srinivas
February 10, 2021 by
Srinivas

In this second installment of a series of articles on x86 assembly, we will discuss how programs written in x86 assembly look like and what are the types of syntaxes programmers can use and some of the key differences in these syntaxes. 

Understanding these details is important as we may be presented with assembly language that is written in one of the available syntaxes depending on the type of operating system and tools we use. 

See the first article in the series: What is x86 assembly?

Learn Secure Coding

Learn Secure Coding

Build your secure coding skills in C/C++, iOS, Java, .NET, Node.js, PHP and other languages.

What does x86 assembly look like?

Programmers with experience in high-level languages like Java may find it completely different to write programs in assembly language. Assembly programs typically contain instructions or mnemonics, which look as follows.

global _start

section .data

    message db "Hello, world!", 0x0a

    len equ $ - message

section .text

_start:

    mov eax, 4    

    mov ebx, 1    

    mov ecx, message  

    mov edx, len   

    int 0x80       

    mov eax, 1     

    mov ebx, 0    

    int 0x80

The preceding excerpt shows a simple hello world program in x86 assembly language. This program prints the string "Hello, world!" and gracefully exits. As we can notice, there are several mov instructions followed by int 0x80 written here to complete the program. We will discuss more technical details about this program later, but the idea is to give a picture of how assembly programs look like.

Examples of x86 assembly programming language

Low-level programs such as drivers and boot loaders may be written in assembly. Bootloader is a small piece of software that gets executed when a system boots. Once the bootloader is loaded, it gets an operating system loaded and ready for execution. Depending on the computer design, this process may slightly vary as there can be one or more additional stages in the process of boot loading.

The following example shows a bootloader written in assembly language. 

org 7C00h

         jmp short Start ;Jump over the data (the 'short' keyword makes the jmp instruction smaller)

 Msg:    db "Hello World! "

 EndMsg:

 Start:  mov bx, 000Fh   ;Page 0, colour attribute 15 (white) for the int 10 calls below

         mov cx, 1       ;We will want to write 1 character

         xor dx, dx      ;Start at top left corner

         mov ds, dx      ;Ensure ds = 0 (to let us load the message)

         cld             ;Ensure direction flag is cleared (for LODSB)

 Print:  mov si, Msg     ;Loads the address of the first byte of the message, 7C02h in this case

                         ;PC BIOS Interrupt 10 Subfunction 2 - Set cursor position

                         ;AH = 2

 Char:   mov ah, 2       ;BH = page, DH = row, DL = column

         int 10h

         lodsb           ;Load a byte of the message into AL.

                         ;Remember that DS is 0 and SI holds the

                         ;offset of one of the bytes of the message.

                         ;PC BIOS Interrupt 10 Subfunction 9 - Write character and colour

                         ;AH = 9

         mov ah, 9       ;BH = page, AL = character, BL = attribute, CX = character count

         int 10h

         inc dl          ;Advance cursor

         cmp dl, 80      ;Wrap around edge of screen if necessary

         jne Skip

         xor dl, dl

         inc dh

         cmp dh, 25      ;Wrap around bottom of screen if necessary

         jne Skip

         xor dh, dh

 Skip:   cmp si, EndMsg  ;If we're not at end of message,

         jne Char        ;continue loading characters

         jmp Print       ;otherwise restart from the beginning of the message

 times 0200h - 2 - ($ - $$)  db 0    ;Zerofill up to 510 bytes

         dw 0AA55h       ;Boot Sector signature

 ;OPTIONAL:

 ;To zerofill up to the size of a standard 1.44MB, 3.5" floppy disk

 ;times 1474560 - ($ - $$) db 0

The preceding excerpt is taken from https://en.wikibooks.org/wiki/X86_Assembly/Bootloaders and it provides a good example of how real-world software written in assembly may look like. The same link has additional examples and a detailed explanation about the program shown here. 

Types of syntax used to write x86 assembly

x86 assembly language comes in two syntax flavors. Intel and AT&T. Intel syntax is predominantly used in the Windows family, while AT&T is commonly seen in the UNIX family. We will stick to intel syntax throughout our assembly language journey in this series of articles. However, let us dive into the details of these two syntaxes. Let us begin by going through the following two examples.

Sample Code 1:

.globl _start

.section .text

_start:

  mov    $0x2,%eax

  add    $0x8,%eax

  add    %eax,%eax

  sub    $0x5,%eax

  inc    %eax

  inc    %eax

  dec    %eax

  dec    %eax

Sample Code 2:

global _start

section .text

_start:

  mov    eax,0x2

  add    eax,0x8

  add    eax,eax

  sub    eax,0x5

  inc    eax

  inc    eax

  dec    eax

  dec    eax

If you closely observe the two examples shown above, they achieve the same outcome but they look different. The first program is written using AT&T syntax and the latter is written using intel syntax.

Let us go through some of the notable differences in these two syntaxes.

  1. When writing programs in AT&T syntax, the first operand in the instruction is the source operand and the second operand is the destination operand. However, in intel syntax, the first operand is the destination operand and the second operand is the source operand. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2.
  2. When programs are written using AT&T syntax, the registers use the prefix % while intel syntax does not use any prefix with the registers. Similarly, Intel syntax does not use any prefixes for its immediate operand while AT&T syntax uses $ along with the hexadecimal representation using 0x. Once again the same example we used earlier can explain these differences. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2.
  3. In AT&T syntax, all opcodes have a suffix to specify the size. For example, moving an 8-bit value from the register bl to al will need the following instruction in intel syntax: mov al, bl. The same operation in AT&T syntax will be written by specifying the size as a suffix to the opcode, which looks as follows: movb %bl, %al. Notice the opcode movb.

It should be noted that we have only scratched the surface keeping beginner-level readers in mind and there are more differences between these two syntaxes. If it is confusing to read through the assembly program written in one of these syntaxes, it is easy to convert it into the other type. For example, let us assume that a program is written in at&t syntax and using objdump on this program will show the assembly instructions as follows.

$ objdump -S syntax-att

syntax-att:     file format elf32-i386

Disassembly of section .text:

08049000 <_start>:

 8049000: b8 02 00 00 00       mov    $0x2,%eax

 8049005: 83 c0 08             add    $0x8,%eax

 8049008: 01 c0                add    %eax,%eax

 804900a: 83 e8 05             sub    $0x5,%eax

 804900d: 40                   inc    %eax

 804900e: 40                   inc    %eax

 804900f: 48                   dec    %eax

 8049010: 48                   dec    %eax

Clearly, the program is written in AT&T syntax and objdump is shown the same. We can display instructions in intel syntax using objdump as shown below.

$ objdump -S syntax-att -M intel

syntax-att:     file format elf32-i386

Disassembly of section .text:

08049000 <_start>:

 8049000: b8 02 00 00 00       mov    eax,0x2

 8049005: 83 c0 08             add    eax,0x8

 8049008: 01 c0                add    eax,eax

 804900a: 83 e8 05             sub    eax,0x5

 804900d: 40                   inc    eax

 804900e: 40                   inc    eax

 804900f: 48                   dec    eax

 8049010: 48                   dec    eax

As we can notice, the objdump output shows the instructions in intel syntax even though the program is written in AT&T syntax.

Assembling and linking

When a program is written in AT&T syntax, it can be compiled and linked as follows using GAS assembler and ld linker. The following excerpt shows the commands on a 64 bit CPU.

as <file>.s -o <file>.o --32 ld <file>.o -o <file> -m elf_i386

Similarly, programs written using intel syntax can be compiled and linked using NASM and ld respectively as shown below. The following excerpt shows the commands on a 64 bit CPU.

nasm <file>.nasm -o <file>.o -f elf32 ld <file>.o -o <file> -m elf_i386

See the next article in the series, x86 basics: Data representation, memory and information storage.

Learn Secure Coding

Learn Secure Coding

Build your secure coding skills in C/C++, iOS, Java, .NET, Node.js, PHP and other languages.

Sources:

Srinivas
Srinivas

Srinivas is an Information Security professional with 4 years of industry experience in Web, Mobile and Infrastructure Penetration Testing. He is currently a security researcher at Infosec Institute Inc. He holds Offensive Security Certified Professional(OSCP) Certification. He blogs atwww.androidpentesting.com. Email: srini0x00@gmail.com