Secure coding

Introduction to x86 assembly and syntax

In this second installment of a series of articles on x86 assembly, we will discuss how programs written in x86 assembly look like and what are the types of syntaxes programmers can use and some of the key differences in these syntaxes.

Understanding these details is important as we may be presented with assembly language that is written in one of the available syntaxes depending on the type of operating system and tools we use.

See the first article in the series: What is x86 assembly?

Learn Secure Coding

Build your secure coding skills in C/C++, iOS, Java, .NET, Node.js, PHP and other languages.

Learn More

What does x86 assembly look like?

Programmers with experience in high-level languages like Java may find it completely different to write programs in assembly language. Assembly programs typically contain instructions or mnemonics, which look as follows.

global _start

section .data

message db "Hello, world!", 0x0a

len equ $ - message

section .text

_start:

mov eax, 4

mov ebx, 1

mov ecx, message

mov edx, len

int 0x80

mov eax, 1

mov ebx, 0

int 0x80

The preceding excerpt shows a simple hello world program in x86 assembly language. This program prints the string "Hello, world!" and gracefully exits. As we can notice, there are several mov instructions followed by int 0x80 written here to complete the program. We will discuss more technical details about this program later, but the idea is to give a picture of how assembly programs look like.

Examples of x86 assembly programming language

Low-level programs such as drivers and boot loaders may be written in assembly. Bootloader is a small piece of software that gets executed when a system boots. Once the bootloader is loaded, it gets an operating system loaded and ready for execution. Depending on the computer design, this process may slightly vary as there can be one or more additional stages in the process of boot loading.

The following example shows a bootloader written in assembly language.

org 7C00h

jmp short Start ;Jump over the data (the 'short' keyword makes the jmp instruction smaller)

Msg: db "Hello World! "

EndMsg:

Start: mov bx, 000Fh ;Page 0, colour attribute 15 (white) for the int 10 calls below

mov cx, 1 ;We will want to write 1 character

xor dx, dx ;Start at top left corner

mov ds, dx ;Ensure ds = 0 (to let us load the message)

cld ;Ensure direction flag is cleared (for LODSB)

Print: mov si, Msg ;Loads the address of the first byte of the message, 7C02h in this case

;PC BIOS Interrupt 10 Subfunction 2 - Set cursor position

;AH = 2

Char: mov ah, 2 ;BH = page, DH = row, DL = column

int 10h

lodsb ;Load a byte of the message into AL.

;Remember that DS is 0 and SI holds the

;offset of one of the bytes of the message.

;PC BIOS Interrupt 10 Subfunction 9 - Write character and colour

;AH = 9

mov ah, 9 ;BH = page, AL = character, BL = attribute, CX = character count

int 10h

inc dl ;Advance cursor

cmp dl, 80 ;Wrap around edge of screen if necessary

jne Skip

xor dl, dl

inc dh

cmp dh, 25 ;Wrap around bottom of screen if necessary

jne Skip

xor dh, dh

Skip: cmp si, EndMsg ;If we're not at end of message,

jne Char ;continue loading characters

jmp Print ;otherwise restart from the beginning of the message

times 0200h - 2 - ($ - $$) db 0 ;Zerofill up to 510 bytes

dw 0AA55h ;Boot Sector signature

;OPTIONAL:

;To zerofill up to the size of a standard 1.44MB, 3.5" floppy disk

;times 1474560 - ($ - $$) db 0

The preceding excerpt is taken from https://en.wikibooks.org/wiki/X86_Assembly/Bootloaders and it provides a good example of how real-world software written in assembly may look like. The same link has additional examples and a detailed explanation about the program shown here.

Types of syntax used to write x86 assembly

x86 assembly language comes in two syntax flavors. Intel and AT&T. Intel syntax is predominantly used in the Windows family, while AT&T is commonly seen in the UNIX family. We will stick to intel syntax throughout our assembly language journey in this series of articles. However, let us dive into the details of these two syntaxes. Let us begin by going through the following two examples.

Sample Code 1:

.globl _start

.section .text

_start:

mov $0x2,%eax

add $0x8,%eax

add %eax,%eax

sub $0x5,%eax

inc %eax

dec %eax

Sample Code 2:

global _start

section .text

_start:

mov eax,0x2

add eax,0x8

add eax,eax

sub eax,0x5

inc eax

dec eax

If you closely observe the two examples shown above, they achieve the same outcome but they look different. The first program is written using AT&T syntax and the latter is written using intel syntax.

Let us go through some of the notable differences in these two syntaxes.

When writing programs in AT&T syntax, the first operand in the instruction is the source operand and the second operand is the destination operand. However, in intel syntax, the first operand is the destination operand and the second operand is the source operand. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2.
When programs are written using AT&T syntax, the registers use the prefix % while intel syntax does not use any prefix with the registers. Similarly, Intel syntax does not use any prefixes for its immediate operand while AT&T syntax uses $ along with the hexadecimal representation using 0x. Once again the same example we used earlier can explain these differences. To move the value 2 into the register EAX, the instruction looks as follows in AT&T syntax: mov $0x2,%eax. The same instruction written using Intel syntax looks as follows: mov eax,0x2.
In AT&T syntax, all opcodes have a suffix to specify the size. For example, moving an 8-bit value from the register bl to al will need the following instruction in intel syntax: mov al, bl. The same operation in AT&T syntax will be written by specifying the size as a suffix to the opcode, which looks as follows: movb %bl, %al. Notice the opcode movb.

It should be noted that we have only scratched the surface keeping beginner-level readers in mind and there are more differences between these two syntaxes. If it is confusing to read through the assembly program written in one of these syntaxes, it is easy to convert it into the other type. For example, let us assume that a program is written in at&t syntax and using objdump on this program will show the assembly instructions as follows.

$ objdump -S syntax-att

syntax-att: file format elf32-i386

Disassembly of section .text:

08049000 <_start>:

8049000: b8 02 00 00 00 mov $0x2,%eax

8049005: 83 c0 08 add $0x8,%eax

8049008: 01 c0 add %eax,%eax

804900a: 83 e8 05 sub $0x5,%eax

804900d: 40 inc %eax

804900e: 40 inc %eax

804900f: 48 dec %eax

8049010: 48 dec %eax

Clearly, the program is written in AT&T syntax and objdump is shown the same. We can display instructions in intel syntax using objdump as shown below.

$ objdump -S syntax-att -M intel

syntax-att: file format elf32-i386

Disassembly of section .text:

08049000 <_start>:

8049000: b8 02 00 00 00 mov eax,0x2

8049005: 83 c0 08 add eax,0x8

8049008: 01 c0 add eax,eax

804900a: 83 e8 05 sub eax,0x5

804900d: 40 inc eax

804900e: 40 inc eax

804900f: 48 dec eax

8049010: 48 dec eax

As we can notice, the objdump output shows the instructions in intel syntax even though the program is written in AT&T syntax.

Assembling and linking

When a program is written in AT&T syntax, it can be compiled and linked as follows using GAS assembler and ld linker. The following excerpt shows the commands on a 64 bit CPU.

as <file>.s -o <file>.o --32 ld <file>.o -o <file> -m elf_i386

Similarly, programs written using intel syntax can be compiled and linked using NASM and ld respectively as shown below. The following excerpt shows the commands on a 64 bit CPU.

nasm <file>.nasm -o <file>.o -f elf32 ld <file>.o -o <file> -m elf_i386

See the next article in the series, x86 basics: Data representation, memory and information storage.

Learn Secure Coding

Build your secure coding skills in C/C++, iOS, Java, .NET, Node.js, PHP and other languages.

Learn More

Sources:

X86 Assembly Bootloaders, Wikibooks
Assembly Language for x86 Processors, Kip Irvine
Modern X86 Assembly Language Programming, Daniel Kusswurm
Linux Assembly Language Programming, Bob Neveln

Posted: February 10, 2021

Srinivas

View Profile

Srinivas is an Information Security professional with 4 years of industry experience in Web, Mobile and Infrastructure Penetration Testing. He is currently a security researcher at Infosec Institute Inc. He holds Offensive Security Certified Professional(OSCP) Certification. He blogs atwww.androidpentesting.com. Email: srini0x00@gmail.com

Introduction to x86 assembly and syntax

What does x86 assembly look like?

Examples of x86 assembly programming language

Types of syntax used to write x86 assembly

Assembling and linking

Sources:

Learn Secure Coding

Get certified and advance your career