Malware analysis

Important Code Constructs in Assembly Language: Advanced

We have covered some basic operations and conditional statements in Part 1. In this article, we will cover some more complex data structures to recognize in Assembly like arrays, linked lists, structs. So without further ado, let's start.

Arrays

Arrays are used to store similar data items. Arrays can also be globally or locally defined in which the concept of global and local prevails as is covered in Part 1. Consider below code part

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Start Learning

Int i;

Int x[5];

for (i=0;i<10;i++)

{

x[i]=i;

}

The code is self-explanatory; it is just assigning the current value of i to corresponding value in the array x at index I while traversing through the loop ten times ( i<10). The main thing is to recognize how the assembly code is traversing through the array. We do not know the size of each element yet and also there has to be some base address by which all the memory locations of the array has been accessed and assigned some value. Let's look at the assembly code.

Mov [ebp+var_4], 0

Jmp short loc_123456:

Loc 987654:

Mov eax, [ebp+var_4]

Add eax, 1

Mov [ebp+var_4], eax

Loc 123456:

Cmp [ebp+var_4], 10

Jge 234567:

Mov ecx, [ebp+var_4]

Mov[ebp+ecx*4+var_8],ecx

Jmp short loc_987654

As we can see here firs the normal 'for' loop initialization and comparison happen. Then the current value is transferred to ecx. Why? It will be used as an index to array x. In statement [ebp+ecx*4+var_8], var_8 is used as the base address of the array x. Try and put different values of ecx from 0,1,2,3… You will see how the each memory location of the array has been assigned the value. So when recognizing arrays look out for these constructs which include base address and indexing variable.

Structs

If you see arrays not capable with heterogeneous data, there is another data structure called struct there to help. Structs can hold items of different data types. If you remember in my PE series, there ae lot of structs in data directory. Consider following struct code where struct variable declaration, memory allocation, definition are listed:

struct Test_struct *Test; // variable declaration

Void main()

{

Test = (struct Test_struct *) malloc(sizeof(Test_struct)); // memory allocation for struct

testing(Test);

}

Test_struct // struct definition

{

Int x[3];

Char b;

}

Void testing(struct Test_struct *a)

{

Int i,

a->b='l';

for (i=0;i<3;i++)

{

a->x[i]=i;

}

Note here that I have declared a pointer structure variable and thus using -> to access the structure member. In case of simple struct variable, dot variable(.) is used to access struct members. Let's look at the assemble code. Since the code has various constructs we will look their assembly code in pieces as well.

Main:

Push ebp

Mov ebp,esp

Push 16h

Call malloc

Add esp, 4

mov dword_126785,eax

Mov eax, dword_126785

Push eax

Call loc 234567

Add esp,4

………

Loc 234567:

Push ebp

Mov ebp, esp

Push ecx,

Mov eax,[ebp+var_4]

Mov byte [eax+13h],6ch

Mov [ebp+var_8], 0

Jmp short loc_123456

Loc 987654:

Mov eax,[ebp+var_8]

Add eax,1

Mov [ebp+var_8],eax

Loc 123456:

Cmp [ebp+var_8], 3

Jge 12345:

Mov eax,[ebp+var_8]

Mov edx,[ebp+var_8]

Mov ecx,[ebp+var_4]

Mov[ecx+eax*4],edx

Jmp short loc_987654

Ok so this looks a bit complex at first, but if we look at instruction by instruction, it will be simple. First sizeof function passes 16h to the stack to allocate space for the struct. Then malloc is called and then the variable is cleared. After that dword_126785 holds the base address of the structure and it is passed onto the testing function via push eax. Then we move onto the testing function where the base address is accessed and put the value into eax. Variable at 13h inside eax address is the character 'l' with value 6ch. Then at var_8 is the base address of the integer array and integer array is referenced from ecx+eax*4 where ecx holds the base location of the struct and eax will traverse through the integer array an index. Main analysis of this structure is that different variables are inside the sam,e data structure i.e. char and integer array within the same data structure. So this structure cannot not be an array.

Linked List

Data items in Linked List are linked with each other i.e. every record points to the next data item. But it is not guaranteed that these will be stored in the same way in memory as well. To identify this data structure, one needs to find an object that contains the pointer to another object. Keep in mind in this article we are only talking about singly linked list. Consider the following linked list implementation with Node structure.

Struct node

{

Int a;

Struct node *next;

}

Void main()

{

Node *start, *temp;

Int i;

start=NULL;

for(i=0;i<5;i++)

{

temp=( Node *)malloc(sizeof(Node)); // pointer to base address of structure

temp ->a=i; // storing i to a

temp ->next=start; // point to next node

start=temp; // saving the value of temp in start

}

Push ebp

Mov ebp,esp

Mov[ebp+var_4],0

Mov[ebp+var_8],1

Loc:987654

Cmp [ebp+var_8],5h

Jge 12345:

Push 8h

Call malloc

Mov[ebp+var_c],eax

Mov edx, [ebp+var_c]

Mov eax,[ebp+var_8]

Mov [edx],eax

Mov eax,[ebp+var_4]

Mov edx,[ebp+var_c]

Mov[edx+4],eax

Mov eax,[ebp+var_c]

Mov [ebp+var_4],eax

Jmp short loc _123456

Loc:123456

Mov eax,[ebp+var_8]

Add eax,1

Mov[ebp+var_8],eax

Jmp 986754

So in this assembly what is happening is that initially the offset of the first node is calculated and stored. Then the value of i is stored at the offset of the first node, and NULL is stored in the *next at [edx+4].In the end, the base offset value is stored in the *start. In the second traversal, second node size if calculated, offset is stored and again var i value is stored there. At edx+4 is stored the value of the first offset and at the end of traversal offset of the second node is stored in start again so that 3^rd node points to 2^nd node thus forming a list. There can be multiple variations to the assembly code. The code to recognize is that next variable pointing to another struct again and again thus forming a linked list.

So this is all about structures that we will discuss. As the goal of this and PE article series is to develop malware analysis skills, I think now we can start looking at malware analysis which I will start from next article.

Posted: April 13, 2016

Security Ninja

View Profile

Important Code Constructs in Assembly Language: Advanced

Arrays

Structs

Linked List

Get certified and advance your career