Secure coding

How to build a program and execute an application entirely built in x86 assembly

Srinivas
February 15, 2021 by
Srinivas

In the previous article, we discussed an overview of common x86 instructions which can be used in writing assembly programs. With the knowledge we have gained so far, we are in a good position to begin writing our first assembly program.

This article provides an overview of how to build a program and execute an application entirely built in x86 assembly.

Intro to x86 Disassembly

Intro to x86 Disassembly

Build your x86 assembly skills with six courses covering the basics of computer architecture, how to build and debug x86, x86 assembly instructions and more.

Input and output: x86 system calls

Operating systems contain routines to perform various low-level operations. If we want to invoke these operating system routines from our program, we need to invoke system calls. A system call is a bridge between the user program and the operating system routine. If we want to write a string to the output console, instead of writing the routine from scratch every time, we can make use of a routine that already exists in the operating system. This can be achieved using a system call.

According to Wikipedia, "A system call is how a program requests a service from an operating system's kernel. This may include hardware-related services (e.g., accessing the hard disk), creating and executing new processes, and communicating with integral kernel services (like scheduling). System calls provide an essential interface between a process and the operating system."

On a Ubuntu 20.04 Desktop x64 build, we can view the following file to view the full list of x86 system calls and their associated system call numbers available.

/usr/include/x86_64-linux-gnu/asm/unistd_32.h

Now, let us extract some system calls and their system call numbers associated with them.

The following excerpt shows the READ system call, which can be used to get user input.

$ cat /usr/include/x86_64-linux-gnu/asm/unistd_32.h | grep 'read'

#define __NR_read 3

#define __NR_readlink 85

#define __NR_readdir 89

#define __NR_readv 145

#define __NR_pread64 180

#define __NR_readahead 225

#define __NR_set_thread_area 243

#define __NR_get_thread_area 244

#define __NR_readlinkat 305

#define __NR_preadv 333

#define __NR_process_vm_readv 347

#define __NR_preadv2 378

As highlighted, the READ system call has the system call number 3. Similarly, the following excerpt shows the WRITE system call, which can be used to write output to the console.

$ cat /usr/include/x86_64-linux-gnu/asm/unistd_32.h | grep 'write'

#define __NR_write 4

#define __NR_writev 146

#define __NR_pwrite64 181

#define __NR_pwritev 334

#define __NR_process_vm_writev 348

#define __NR_pwritev2 379

As highlighted, the write system call has the system call number 4. When we want to use these system calls in our x86 assembly programs, we should use their respective numbers.

Similarly, the following excerpt shows the exit system call.

$ cat /usr/include/x86_64-linux-gnu/asm/unistd_32.h | grep 'exit'

#define __NR_exit 1

#define __NR_exit_group 252

As we can see, the exit system call has the syscall number 1.

Hello World! Creating the usual Hello World in x86

Now that we understand what system calls are and how the system call numbers can be found let us write a simple program in x86 assembly to print the string Hello World!

It should be noted that we will use the write syscall to print the string Hello World! To better understand the arguments and other data this syscall requires, we can read the man page as shown in the following command.

man 2 write

Following is an excerpt taken out from the output of the preceding command.

ssize_t write(int fd, const void *buf, size_t count);

The write function requires 3 arguments. The first argument is the file descriptor, which is stdout in this case and takes the value 1. The second argument is the constant buffer, which is a pointer to the message we want to print. The third argument is the length of the string. When invoking system calls, we will need to provide these values in appropriate registers. Following is the standard pattern we need to follow when writing assembly programs in x86.

The first argument goes into EBX; the second argument goes into ECX; the third argument goes into the EDX register. The syscall number goes into the EAX register.

With all these details considered, the following is the program that prints the string Hello, world! to the output console.

section .text

global _start

_start:

mov ebx,1

mov ecx,msg

mov edx,len

mov eax,4

int 0x80

section .rodata

msg db  'Hello, world!',0xa

len equ $ - msg

The program has two sections: .text and .rodata.  The .rodata section has a string defined using the label msg. The write routine also requires the length of the string, and thus we are computing the length of the string without hardcoding it and saving it in the label len.

In the .text section, we used a global directive _start to specify the entry point of the program. Within the entry point, we are placing the value 1 for stdout into EBX, a pointer to the string is being placed into ECX, and the length of the string is placed in EDX. Lastly, we placed the syscall number 4 into the EAX register. To invoke the syscall, we executed the instruction int 0x80.

We can then assemble and link the program using the following commands:

nasm helloworld.nasm -o helloworld.o -f elf32 ld helloworld.o -o helloworld -m elf_i386

Once done, we can run the program as shown below.

$ ./helloworld

Hello, world!

Segmentation fault (core dumped)

$

As we can notice in the preceding output, the string Hello, world! is printed. However, we should also notice the segmentation fault. We will discuss more about segmentation faults and how to identify the reasons for them in a later article.

Strings/ASCII: How to work with strings and ASCII in x86

In this section, let us extend our previous Hello, world! program to ask for input from the user and then print the entered text back to the screen.

Following is the program written in x86 assembly to achieve this.

section .data

question db "What is your name? "

greeting db "Hello, "

section .bss

input resb 24

section .text

global _start

_start:

call _printQuestion

call _getInput

call _printGreeting

call _printInput

_getInput:

mov eax, 3

mov ebx, 0

mov ecx, input

mov edx, 24

int 0x80

ret

_printQuestion:

mov eax, 4

mov ebx, 1

mov ecx, question

mov edx, 19

int 0x80

ret

_printGreeting:

mov eax, 4

mov ebx, 1

mov ecx, greeting

mov edx, 7

int 0x80

ret

_printInput:

mov eax, 4

mov ebx, 1

mov ecx, input

mov edx, 24

int 0x80

ret

First, we defined the labels question and greeting with the strings that we want to print on the screen within the .data section, as shown below.

section .data

question db "What is your name? "

greeting db "Hello, "

Next, the .bss section is used to reserve 24 bytes as shown below.

section .bss

input resb 24

Note that the .bss (block starting symbol) is the portion of an object file, executable, or assembly language code that contains statically allocated variables that are declared but have not been assigned a value yet.

Next is the .text section with code to read input and write the output. As we can notice, this section has 4 subroutines as shown below.

section .text

global _start

_start:

call _printQuestion

call _getInput

call _printGreeting

call _printInput

All the subroutines except for _getInput are similar to the Hello, world! program we wrote earlier as they are just used to write output to the screen. So, let us focus on _getInput subroutine in this section. Following is the assembly code written.

_getInput:

mov eax, 3

mov ebx, 0

mov ecx, input

mov edx, 24

int 0x80

ret

As we can notice, EAX register holds the value 3, which is the syscall number for read. The registers EBX, ECX and EDX hold the arguments required for the read system call. EBX holds 0, which is for stdin. ECX is a pointer to the label that contains the user-supplied input. EDX contains the length, which is 24 in this case. Finally, we used int 0x80 to invoke the read syscall. After the syscall is executed, ret instruction is executed so the control is passed to the next call instruction in the .text section.

The following commands can be used to assemble and link this program.

$ nasm io.nasm -o io.o -f elf32 $ ld io.o -o io -m elf_i386

Once the executable file is produced, we can run it and the output looks as follows.

$ ./io-no-exit

What is your name? infosec

Hello, infosec

Segmentation fault (core dumped)

$

Once again, there is a segmentation fault after the program completed its execution. We will discuss the reasons and the solutions in a later article.

Learn Secure Coding

Learn Secure Coding

Build your secure coding skills in C/C++, iOS, Java, .NET, Node.js, PHP and other languages.

Conclusion

This article has provided foundational knowledge of how to build a program and execute an application entirely built in x86 assembly. This process has covered some other concepts such as system calls, handling strings, declaring data, reading data from the user input and writing data to the terminal. These concepts are the fundamental building blocks of writing  x86 assembly programs, and they will come in handy when writing complex x86 assembly programs.

See the next article in the series, Debugging your first x86 program.

Sources:

Srinivas
Srinivas

Srinivas is an Information Security professional with 4 years of industry experience in Web, Mobile and Infrastructure Penetration Testing. He is currently a security researcher at Infosec Institute Inc. He holds Offensive Security Certified Professional(OSCP) Certification. He blogs atwww.androidpentesting.com. Email: srini0x00@gmail.com