Reverse engineering

IDA: Cross references / Xrefs

Dejan Lukan
January 11, 2013 by
Dejan Lukan

Cross references can help us determine where certain functions were called from, which can be useful for a number of reasons. Let's say that we found the function we're interested in for whatever reason: maybe it contains a vulnerable code, we could use to execute malicious shellcode or maybe it does the actual encryption of the data we're interested in. Once we've located the function, we must determine how the program execution leads to the execution of that function. Usually, we would have to set a breakpoint at that function and run the program and let the breakpoint be caught; then we would have to check the previous stack frames to determine which functions were previously called that led to the current function. We would have to backtrace the program execution function by function, manually, which can take a lot of time. But Ida can help us by doing that automatically with the use of cross references.

Cross references show how a particular piece of information was accessed; the information can either be code or data, so that we know code and data cross references. When we talk about code cross references, we're talking about the location from where certain functions are being called. In addition, we're talking about the location from where the variable is being accessed.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

The cross references are presented in the code disassembly view as can be seen on the picture below:

On the picture above we can see the function at address 0x00436629 that contains the cross reference comment: the CODE XREF. The code references are presented as "CODE XREF", whereas the data cross references is presented as "DATA XREF"; in our case we can see it's a code cross reference. The cross reference in our case originates from the address 0x00436608, so the current function is called on that address. We also see the arrow pointing up, which indicates that the function that calls the current function is located upward the disassembly code (has a lower virtual address). After the arrow, there's also a single letter letting us know what kind of execution transfer was it: a function call, or a conditional/unconditional jump. The letter 'j' denotes the jump, while the letter 'p' denotes the function call.

Let's also present the data cross references on the picture below:

We can see that the data cross references are marked with the "DATA XREF" string and use the name and offset if the function is known or virtual address if function is not known to denote the location the data was accessed from. All the data cross references on the picture above have the arrow pointing up, which lets us know that the presented data variables are accessed from the code that has a lower virtual address. Also, the data cross references have different letters assigned after the arrow: the letter 'w' denotes that the location is being written to; the letter 'r' denotes that the location is being read from, while the letter 'o' denotes that the address of the location is being taken from it. On the picture above we can also see three dots ('…'), which are letting us know that there are other locations in the program that are referencing this location (there are other data cross references as well, but are not displayed). If we would like to display more cross references than just two, we can go to the Options - General - Cross-references and change the "Number of displayed xrefs" from 2 to some other number.

Usually, certain location is referenced quite often and we can't really display all of the cross references in the comments of the disassembly view because the disassembled code may become hard to read. Let's take a look at the following data cross reference (shown on the picture below):

We can see that the locations 0x0044A7CE and 0x0044B1ED are referencing the data at the address 0x00474B94, but other cross references also exists (note the three dots '…'). Let's change the "number of displayed xrefs" in the Ida options to 5 and take a look at the cross references again (shown below):

Now there are five different cross references displayed, but there are still more of them hidden. We could change the number of xrefs we would like to be shown at any given time to 10, 20, 30, etc, but that would really make the view look messy. A better way of dealing with this is to point a cursor to the 0x00474B94 line and press on View - Open subviews - Cross references, which will open a list of all cross references to our data variable. That list can be seen on the picture below:

We can see that there are 15 cross references and we've just listed them all. We can double-click on any entry above, which will take us to the address of the chosen cross-reference. The first column presents the direction, either up or down, where the cross reference is located. The second column presents whether it's read, write or pointer cross reference. The third column presents the virtual address where the cross reference is located. And the last column shows the disassembled instruction at the cross reference that references our data variable.

We've seen how data cross references can be presented, but what about function. If we would like to know which function calls the current function in which we're located we can click on View - Open subviews - function calls. An example can be seen below:

The picture displays the virtual addresses that call the current function in which we're located. There's only one function that calls the current function and it's located at 0x0042733B. On the lower part of the window are the functions that are being called by this function. We've just seen how the functions are related to one another, which can be a great help when trying to reverse engineer certain programs.

There are a number of graphs we can use displaying the cross references and other information about the currently executed program. If we click on the View - Graphs - Flow charts we will display the current functions building blocks, which are the blocks that will be executed as a whole upon entering the first instruction within each block. An example of a flow chart of the sub_4025DB function can be seen below:

We can see that the current function has quite a lot of building blocks. Let's present the top three building blocks of the graph above:

We can see that each block only contains the instructions that don't jump around the program, so whenever a jump (either conditional or unconditional) is made, a new building block needs to be introduced. The graph also presents the jumps between the building blocks, which can be of valuable help. We've just seen how a certain function can be divided onto several building blocks, which can be used to present the jump instructions inside a specific function.

Besides the above graph that presented the building blocks of the function, we can also display the graph of all the functions inside the executable. We can do that by pressing the View - Graphs - Function calls option. The function calls of the entire Meterpreter executable can be seen on the picture below:

We can't really read what's being written in the nodes, but let's focus on the colors for now. The purple color is used for the library functions, which can quickly become evident if we zoom the first part of the image, which is shown below:

Did you notice that all of the above function names are actually library functions? The green color is used to denote the entry point of the executable as seen below:

All the black nodes are the sub functions present in the executable itself. Usually there are too many nodes and connections between the nodes to make this graph useful, so it's not used most of the time. The other reason that it's not used is that we can only zoom in or out in the program, which can be very tedious if the graph is quite large, not to mention that we can't search for specific functions.

We've already talked about how Ida can help us determine which function calls the current function and which functions are called by the current function. We haven't yet mentioned that we can also generate a graph by using the cross references to generate the graphs. First of all, we should currently be located on a function that contains at least one cross reference as follows:

We can see that the address 0x004068C8 (which is inside the current function 0x00406811) contains a cross references 0x00406811 which contains one of the jump instructions. Basically, we're looking at some virtual address of the current location and the cross reference is actually the current function we're located in. To display a graph of all functions that called the current function recursively up to the entry point of the program we can click on the View - Graphs - Xrefs, which will open the following window:

We can see that the blue circled node presents the function we're currently in and all the other nodes present the graph of how the program execution reached the function.

We can also create a graph of the functions that are called by the current function by selecting View - Graphs - Xrefs from, which will open the window on the picture below:

We can see that we're starting from the current function at address 0x00406811 and displaying all the other functions that get called from this function.

The Xrefs to/from have the same problem as the function graphs: they can only be zoomed in and out, which isn't enough if we're dealing with a very complex program that has thousands of nodes and even more connections between them. This is why Ida has another graph that can be accessed by selecting View - Graphs - User xrefs chart, which we can interact with in various ways. This graph can show Xrefs 'to' and Xrefs 'from' graphs joined into one, and additionally it can be instructed to only show only part of the total graph, which can be quite handy when dealing with large graphs. The preferences for this graph are presented on the picture below:

On the picture above we can see that the start and end address are equal, which means that the graph will present all of the Xrefs to and Xrefs from in the currently chosen function. If this isn't the case, the presented graph will only contain references in the currently selected range. There are also a number of parameters we can choose when generating the graph we want. The "cross references to" and "cross references from" options instruct Ida to search for symbols leading to the current function and leading from the current function, which we've already seen. The "recursive" option enables the Ida to recursively check for the symbols to/from the currently selected function. The "follow only current direction" option will only add nodes that can be accessible by the currently selected function and not by the other functions that are found along the way. The "recursion depth" specifies the size of the graph that we would like to generate; the number -1 means the Ida will not be limited and will display all nodes if appropriate.

The "ignore" option specifies what nodes we would like to exclude from the resulting graph. The options are: externals, data, from library functions, to library functions. We can select any of the four types that we wouldn't like to include on the resulting graph, which can lead to a much simpler graph that is easier to look at. The printing options are used to instruct whether the comments are included into each node if the function has comments in the disassembled view and if a special dots '…' nodes are printed whenever a node has more cross referenced that won't be followed because of the recursion restriction.

If we display the new graph with the default options, we would receive a graph similar to the one below:

Notice that the graph is the same as if we would join the previously generated Xrefs to and Xrefs from graphs? The graph is quite simple, since the executable is very small, so we don't really need the "user xrefs chart" graph, because the default Xrefs to and Xrefs from graphs are enough. But let's present various options from the "user xrefs chart" to better understand them.

If we disable the recursive option we will get the following graph, which only prints the directly accessible nodes from the current node sub_406811. We can see that the xrefs to/from was not recursively checked, which means that only direct ancestors and descendants are visible.

If we disable the "follow only current direction" to include all the parents/children of the found nodes, we'll receive the graph presented below. We can see that the graph is totally unreadable since all the symbols were included in the graph. This gets us thinking that we really keep the "follow only current direction" option enabled at all times.

If we enter the number 1 into the recursion field, we will receive the graph presented below, since we limited Ida to only recourse once:

If we ignore the Data and Externals, the generated graph will look something like this (the purple nodes will be hidden):

Conclusion

In the article we've looked at cross references that are a valuable resource when we want to figure out exactly where the function was called from and what functions the current function calls. This can be quite useful, so we don't have to traverse the stack for frame pointers to look for the function that called the current function; this is already available in Ida for free, so there is no need to do it manually.

Sources

Chris Eagle, The IDA Pro Book: The unofficial guide to the world's most popular disassembler.

Dejan Lukan
Dejan Lukan

Dejan Lukan is a security researcher for InfoSec Institute and penetration tester from Slovenia. He is very interested in finding new bugs in real world software products with source code analysis, fuzzing and reverse engineering. He also has a great passion for developing his own simple scripts for security related problems and learning about new hacking techniques. He knows a great deal about programming languages, as he can write in couple of dozen of them. His passion is also Antivirus bypassing techniques, malware research and operating systems, mainly Linux, Windows and BSD. He also has his own blog available here: http://www.proteansec.com/.