Buffer overflow is a very common and widely known software security vulnerability. It is a condition which occurs when more data is written into a block of memory (buffer) than it is allocated to hold. As buffers are created to hold a definite amount of data, the excess information stored gets overflowed to the adjacent buffers, causing overwriting or damaging the valid data which is already stored. In order to exploit buffer overflow, one should have basic knowledge about topics like stacks, CPU registers, memory allocation, etc.

As it is a very vast topic in itself, in this article we will try to understand the basics of the Stack Based Buffer Overflow.

First of all, we will create a simple C program and cover the basics, like how the program runs in the memory, how the function call takes place, what is the return address, etc. So let’s start with the basics.

A stack is a continuous block of memory which is used to store the temporary data within your program. The stack works on a Last in First out (LIFO) basis. PUSH and POP are the two working functions in which PUSH is used to put data into the stack and POP is used to remove the data from the stack, and it grows downwards towards lower memory addresses to higher memory addresses on Intel based systems. In Intel 32 bit architecture the maximum data size would be 4 bytes, which is equal to 32 bits for each PUSH and POP operation. Basically, the stack holds following types of data of the program.

  • Argument to the Function
  • Calling Function Address
  • Return Address
  • Local Variable
  • and couple of other things.

We will see each of them in detail further in this article. Before that, let us install some tools needed for the practical session. Here, we are using the following setup.

  • We have configured Windows XP Service Pack 2 on Virtual Machine.
  • Immunity Debugger (We can download it by searching Google or we can by clicking on the URL which is given in the references at the end of the article.)
  • Dev C ++ (We can download the tool by clicking the form below.)

Note: We are assuming that you have configured the required tools and have basic knowledge about assembly language.

First of all, we will start looking at the things like function call in the memory and return address etc. from the very starting stage. We have already written a simple C program in which we have defined a function which is called from the main program. The EXE file and source code of the program are given at the end of the article. We can download the EXE and the source code of the files by clicking the URL given at the end of the article. Let’s have a look at the source code of the program so that we can understand the basic concepts.

As seen in the above screen shot, we have written a simple program in which we have defined some local variables; after that we have called a function with one argument value. Then, we defined the function which will print the message – “Function is called”, then we returned value 1 to the main program.

After compiling the program, we will open the program with Immunity Debugger. We can open the program by clicking the file menu or dragging the .exe file into the debugger. After that, we will see the following type of code on the screen.

Now, we can see four sections on the screen. Each section represents a different type of CPU processing stat. Each section is defined below.

  1. In this section, we can see the de-assemble output of the .exe file.
  2. This section gives the information about various registers and their values. We can see the various type of registers and their values in the above screen.
  3. In this section, we can see the memory dump of the program. We can see what type of data has been written by the program into the memory. We can also edit these values dynamically according to the requirement.
  4. This is the most important part: it shows the stack status of the program.

If we closely look into the screen we can see ‘paused’ written in the right corner of the window. This means that after opening the program with the debugger, the debugger has paused the program, so we will have to start the program manually.

Now let’s start our analysis.

The first step is to identify the main function and the other functions we have defined in the program, which are then loaded into the computer memory. If we scroll down in the first area, which is the de-assemble output of the program, we can see the ASCII main function with the EXE name and another function with the assembly language code. In our case the EXE file name was First.exe so we can see the same on the screen with some extra value.

In the above screenshot we have pointed to some numbers. These numbers are defined in the below section for better understanding.

  • This is the main function in the assembly language. We can see it in the screenshot name “First.xxxxx ASCII “Main Function””. In this case, First.xxx is the name of the .exe file we have loaded into the debugger. In the left hand side, we can see the memory address according to each assembly instruction. This is basically the physical memory address of the instruction. 00401290 is the address in our case in which the program has been started. And 004012DA is the address (in our case) in the memory where the program has finished.
  • It is the function call from the main program. This line is basically calling the function which we have defined in the C program.

When we closely look at the main function call, we can see that it is calling the “First.004012D3” in which the last 8 digits are the address of the function which will be called.

  • In the end, we can see the function loaded into the memory, which is printing the value “Function is called” through the printf function, which has been called from the program. The function has started by pushing the EBP register into the stack. In our case, the starting address of the function is “004012DE”, and the function completion address is “004012FE”. Both the addresses can be seen in the above screenshot.

It is the most important part, as we will see how we can define the break point in the program. A break point helps us to freeze the program at that location and allows us to analyse things like register status, stack and frame pointer. It also allows us to change the values dynamically. So, let us create a break point just before the function call. We can set the break point by selecting the row by just clicking on it and pressing F2 for creating the break point.

As shown in the above screenshot after creating the break point, the row has been highlighted. Now, we need to start the program. We can do it by hitting the F9 key. We will see some changes in the numbers on the screen.

In the above screenshot we can see that the stack value has been changed and the register value has also been changed. Another interesting thing is the EIP register which holds the value at which we had set the break point.

Now, we will have to use two options.

  • Step Into
  • Step Over

Step Into: when we want to execute the next instruction after creating the break point, then we will have to use Step Into. The keyboard shortcut key for Step Into is F7.

Step Over: Sometimes we do not want to get into the details of the function call, so in such situations we can use Step Over. The keyboard shortcut key for Step Over is F8.

  • Now, we have to go to the next instruction, so press the F7 key as mentioned in the above step. Let us analyse the stack now.

As shown in the above screenshot, nothing interesting is showing in window no 4, but we can see in window no 1 that the next instruction has been executed, and the next instruction address has been assigned into the EIP register in window no 2.

  • We will execute the next instruction by pressing the F7 key. After this step, the screen will look like this:

We can see that the next instruction has been executed and we can see lots of changes in the screen. First of all, we have noticed that the instruction executed controller has reached the address given in the previous call function. This is visible in window no 1.

When we closely look into window no 4, at the top of the stack we can see the address 004012D4, which is the return address to the main program. It means after the function execution has completed, by using this address the counter will go to the main program and the program execution gets completed.

So in this article we have learned…

  • Analysing the Stack
  • Creating the Break Point
  • Analysing the Function Call and Register Status.