Malware analysis

Disassembly 101

Richard Azu
August 13, 2019 by
Richard Azu

Introduction

This article briefly explores topics connected to assembly basics, registers, operands, instructions, arithmetic instructions, logical instructions, stack instructions, conditionals and jump instructions. We’ll conclude with a reason why assembly language is still relevant despite the evolution of high-level languages.

This article has been designed for professionals, students or self-learners who want to learn the key aspects of assembly programming. This article will help to give you enough understanding on assembly programming.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Computer programming language

Computer programming language is any one of various languages used for expressing a set of detailed instructions for a digital computer. Programming languages are broadly classified into three categories: machine, assembly and high-level languages.

Machine language

Machine language is also known as machine code. It is a sequence of bit patterns that’s used for providing instructions to the processor of a computer. These sequences of binary digits are not human-readable.

Assembly language 

Assembly language sends codes or instructions to the computer using simple mnemonic abbreviations. Since the codes in assembly language are not directly understood by the computer, a translator is required to convert the instructions into machine language. 

The utility program that converts source code programs from assembly language into machine language, so the Central Processing Unit (CPU) can understand it, is known as an assembler. The reverse conversion of machine language into assembly language is executed by a translator called a disassembler. 

High-level language

High-level languages send codes or instructions to the computer using simple English language words and mathematical symbols. These types of instructions are sometimes referred to as human languages because they are further from machine language. The translators which convert high-level language into machine language are called compilers and interpreters.

Assembly Basics

Structure of a computer system

The basic structure of a computer system is made up of the CPU, main memory and the input/output peripherals. The CPU is also made up of registers, control units and arithmetic and logic unit (ALU).

Registers

The CPU in a computer runs all of its tasks and operations. To effectively do this, it needs storage to process operations and temporarily hold the received instructions. This storage is called a register. A register may store codes or sets of instructions, a storage address of another location or any kind of data such as the binary of a character.

Instructions 

Assembly language instructions come in two parts: the operational code (opcode) and the data to be operated on, the operand. A typical code in assembly language has two operands, the target operand and the source operand. The target operand is normally the address of a register, while the source operand represents a value. 

Assembly code example

MOV AL, 4Dh ; load register AL with 77 decimal (4D hex)

 

Equivalent binary code

10110000 01001101

1011 a binary code (opcode) of instruction 'MOV'

0 specifies if data is byte (‘0’) or full size 16/32 bits

000 a binary identifier for a register 'AL'

01001101 is the binary representation of the decimal 77

Arithmetic instructions 

Below are sample arithmetic instructions which assembly language performs.

  • INC AL       ; Increments the value in the low byte register of the primary accumulator AL    register by 1
  • DEC AL       ; Decrements the value in the low byte register of the primary accumulator AL    register by 1
  • ADD AX, BX   ; Add the values stored in primary accumulator AX and base register BX and then store the sum in accumulator AX

Logical instructions 

Assembly language logical instructions operate on a bit-by-bit basis; therefore, no overflow or carry bit is generated. Typical logical operations include logical and (AND), logical or (OR),

logical complement (NOT) and logical exclusive or (XOR). The AND operation can be used for clearing one or more bits in a register. 

Stack instructions 

Assembly language stack are top-down structures in memory that store data in such a way that the last data stored is the first to be retrieved. The only access to add or remove data from the stack is through the top of the stack. The most common stack instructions are PUSH and POP. PUSH puts new data at the top of the stack while POP removes the next data from the top of the stack.

Conditionals

Assembly language conditional statements control the flow of the execution of the program. Conditional statements are in two parts: unconditional jump and conditional jump.

Unconditional jump is performed by the JMP instruction. The CMP instruction compares two operands and sets the appropriate flag, depending on the outcome. The conditional jump instructions takes input from the set flags based on the output of the CMP instruction.

Unconditional jump

  • MOV  AX, 10    ; Initializing AX to 2
  • MOV  BX, 11    ; Initializing BX to 3
  • MOV  CX, 00    ; Initializing CX to 0
  • L17:
  • ADD  AX, 01    ; Increment AX
  • ADD  BX, AX    ; Add AX to BX
  • JMP  L17     ; repeats the statements
  • ADD  BX, AX    ; Add AX to BX (this line code will never run because of the unconditional jump instruction JMP L17)

Conditional jump

  • CMP DX, 01  ; Compare the DX value with one
  • JE  L17     ; If yes, then jump to label L7 (this is a conditional jump which skips the next two instructions only if the value in register DX is one )
  • ADD  AX, 01   ; Increment AX
  • ADD  BX, AX    ; Add AX to BX
  • L17:
  • ADD  AX, 11    ; Increment AX
  • ADD  BX, AX    ; Add AX to BX

Conclusion

The ability to read and write codes or sets of instructions in low-level assembly language is a great skill to have despite evolution of high-level languages. Assembly language codes are used in coding device drivers, real-time systems and low-level embedded systems. These codes also help in the reverse engineering processes used to establish the vulnerabilities or logical flows of computer programs in a real-world running environment.

Sources

  1. Computer programming language, Encyclopedia Britannica
  2. Difference between Machine Language and Assembly Language Comparison Chart, STechies
Richard Azu
Richard Azu

Experienced in the deployment of voice and data over the 3 media; radio, copper and fibre, Richard – a system support technician with First National Bank Ghana Limited is still looking for ways to derive benefit from the WDM technology in Optics. Using Kali as a springboard, he has developed an interest in digital forensics and penetration testing.