Demystifying dot NET reverse engineering: Introducing Round-trip engineering
After covering the basics of dot NET reverse engineering in first articles (refer to the references), it’s time to go more in depth of the dot NET MSIL assembly language. The purpose of this article is not to teach you programming using this language. I’ll try to clarify more the IL code you saw until now, how to deal with it using some “new” tools, but also presenting the technique of “round trip engineering” which is not a dot NET specified technique, and see how we can be creative and use this in our reverser’s point of view advantages.
In this article I’ll present you IL assembly much more in depth and how we can deal with two synchronized tools to remove the first protection of a Crack Me I made for this purpose. Once you can handle these tools and know the “basics” of round trip engineering, I will present you with a second article on advanced round trip engineering to perform some more advanced manipulation on the same Crack Me software to remove the second protection – so take a long breath because here we go!
Definition of “Round trip engineering”
The IL assembler and disassembler were first made as strict internal tools used to facilitate the development of the Common Language Runtime. When both tools became synchronized enough, third party dot NET oriented compilers based on ILAsm started to appear along with dot NET Frameworks class library, compilers and tools was released to the developers’ community.
Technically, round trip engineering is the ability of two or more software development tools to be synchronized, when talking about round trip engineering in dot NET context, we are talking about taking a managed assembly, generated from any high level programming language such as Microsoft Visual C# or Microsoft Visual Basic .NET or any low level programming like ILAsm, but using a dot NET oriented compilers; disassemble it, modify the code
by adding, removing or changing the IL code, then reassembling it back
into new modified, working assembly / module.
Note: Managed dot NET applications are called “assembly” and a Managed dot NET executable is called a “module“.
The subject’s core
In this article, we will work exclusively with two tools officially released by Microsoft within Windows SDK tools: the IL assembler (ILASM) and the IL disassemble tool (ILDASM that was presented in previous parts of “Demystifying dot NET reverse engineering”, please refer to references section). Theoretically, we can reverse engineer every dot NET based assembly / module using only these two tools.
In ILDASM, the disassembler can be found under the folder C:Program FilesMicrosoft Visual Studio 8SDKv2.0Bin. In ILASM the assembler can be found under C:WINDOWSMicrosoft.NETFrameworkvx.x (depending on the version of frameworks used to produce the assembly which we want to modify)
Before starting the analysis of our target (not yet presented), I will clarify in depth some dot NET aspects starting by the Common Language Runtime.
Common Language Runtime is a layer between dot NET assemblies and the operating system in which it’s supposed to run; as you know now (hopefully) every dot NET assembly is “translated” into a low level intermediate language (Common Intermediate Language – CIL which was earlier called Microsoft Intermediate Language – MSIL ). Despite of the high level language in which it was developed with, it was made independently of the target platform. This kind of “abstraction” leads to the possibility of interoperation between different development languages.
The Common Intermediate Language is based on a set of specifications guaranteeing the interoperation; these set of specifications are known as the Common Language Specifications – CLS as defined in the Common Language Infrastructure standard of Ecma International and the International Organization for Standardization – ISO (link to download Partition I is listed in references section)
Dot NET assemblies and modules which are designed to run under the Common Language Runtime – CLR are composed essentially by Metadata and Managed Code.
Managed code is the set of instructions that makes the “core” of the assembly / module functionality, and represents the application’s functions and methods encoded into the abstract and standardized form known as MSIL or CIL, and this is Microsoft’s nomination to identify the managed source code running exclusively under CLR.
On the other side, Metadata is an ambiguous term, and can be called “data that describes data.” In our context, metadata is a system of descriptors concerning the “content” of the assembly, and refers to a data structure embedded within the low level CIL and describing the high level structure of the code. It describes the relationship between classes, their members, the return types, global items, and methods parameters. To generalize (and always consider the context of the common language runtime), metadata describes all items that are declared or referenced in a module.
Based on this, we can say that the two principal components of a module are metadata and IL code; the CLR system is subdivided to two major subsystems which are the “loader” and the just-in-time compiler.
The loader parses the metadata and makes in memory a kind of layout / pattern representation of the inner structure of the module, then depending on the result, the just-in-time compiler (also called jitter) compiles the Intermediate Language code into the native code of the concerned platform.
The figure below describes how a managed module is created and executed:
Our target, which is a managed module, is “CrackMe3-InfoSecInstitute-dotNET-Reversing.exe” (link to download in references section), is presented like this:
A nag screen that we have to remove and a windows form asking for a serial number, and we will discover together how we can use some creative round trip engineering to remove all protections present on this Crack Me.
Step 1: Disassembling
First of all, let’s disassemble our Crack Me using ILDASM and see the IL code embedded in our managed module. To avoid getting lost, let’s just focus on two nodes:
Our managed module has only one form called Form1 as seen in the picture above and an appealing class name “GenSerial“, let’s develop the first node and see:
Double click on the Form1_Load method to see actual IL code:
.method private instance void Form1_Load(object sender, class [mscorlib]System.EventArgs e) cil managed
// Code size 19 (0x13)
IL_0000: ldstr “I’m a nag screen, remove me.”
IL_0005: ldc.i4.s 16
IL_0007: ldstr “Nagging you!”
IL_000c: call valuetype [Microsoft.VisualBasic]Microsoft.VisualBasic.MsgBoxResult [Microsoft.VisualBasic]Microsoft.VisualBasic.Interaction::MsgBox(object, valuetype [Microsoft.VisualBasic]Microsoft.VisualBasic.MsgBoxStyle, object)
} // end of method Form1::Form1_Load
All ILAsm keywords are marked in bold to increase code readability. I’ll take you through this piece of code line by line to clarify what it does and what we will have to do.
Understanding Form_Load() Method:
.method private instance void Form1_Load(…) cil managed defines the metadata item Method Definition.
The keywords private and instance define the flags of Method Definition item. The keyword public means that the method Frm1_Load() can be accessed by all members for whom the mother class of Form1_Load() is visible. The keyword instance
here to tell that the method is associated with an object rather than a class.
The keyword void
defines explicitly the return type of the current method which is the default return type. Void means that this method does not return any value.
The keywords cil and managed
indicate that the method body is presented in Intermediate Language (IL) and define implementation flags of Method Definition
.maxstack 8 is the directive that presents the maximum number of items /elements that can be present at any time during method execution on the evaluation stack .
IL_0000: is a label and won’t occupy anything in the memory, ILDASM marks every line (instruction) with a label, labels are not compiled and are exclusively used to identify some offsets within IL code at compile time.
As you know IL is strictly a stack based language, everything MUST pass through the evaluation stack, and every instruction puts or takes something (or nothing) from the top of the evaluation stack. When we talk about pushing /pulling elements onto / from the stack we are talking in term of items regardless their sizes.
Ldstr “I’m a nag screen, remove me.” Creates an object of type string from the given string constant and loads a reference to this object onto the evaluation stack, these kind of constants are stored in the metadata, and this common
language runtime string constant or metadata string constant
always stored in Unicode (UTF-16)
ldc.i4.s 16 is the short form (notice de “.s”) of the instruction that loads the value 16 of type int32 onto the evaluation stack and in this sample, it loads the message box style that displays critical message icon.
call valuetype [Microsoft.VisualBasic]Microsoft.VisualBasic.MsgBoxResult [Microsoft.VisualBasic]Microsoft.VisualBasic.Interaction::MsgBox(object, valuetype [Microsoft.VisualBasic]Microsoft.VisualBasic.MsgBoxStyle, object), valuetype is used before the object we want to create and we provide the full signature of the class including library name, so we call a (non virtual?) method , valuetype is obligatory in generic type instantiations since these are represented in metadata as TypeSpecs.
The instruction pop
removes the string “Nagging you!” from the stack.
The return instruction ret, returns immediately to the call site, depending on the called method, ret returns one value of a certain type and this type have to be on the evaluation stack, in the case the method concerned, it returns void type, meaning the evaluation stack must be empty at the moment of return (as in this case).
Now we know that this method (which is Form1_Load) does one thing only: preparing and showing the nag screen and we have to get rid of it.
Technically we have to generate an .il file from our assembly (by dumping it from ILDasm), manipulate it and reassemble it, but before doing this we have to know the version of our assembly so we can recompile it (using ILASM) without getting problems.
We can use ILDasm to get this kind of information by viewing the assembly manifest content:
.ver directive specifies the version number of the assembly:
Image 1 Assembly version
Now let’s dump our assembly by clicking File->Dump->Ok (or Ctrl+D), choose a directory and save it. You should get something like this:
The file with .il extension contain every IL instruction present on the managed module, open it using your favorite text editor and try to find the Form1_Load method:
At this point many possibilities are available to us. We can remove all instruction inside our method or remove to whole method, forcing a ret at the beginning. I prefer to remove the content of this method, which means, transforming the method Form1_Load() to a method that does nothing!
Remove from line number 1192 to line number 1201 and save modifications:
Now we need to reassemble our modified .il file, and we will use ILAsm assembler, a tool provided with Visual Studio and Windows SDK. With it, we will be able to generate portable executables from Microsoft Intermediate Language.
Step 2: Reassembling
You still remember the version of our assembly which is 4.0.30319 (image1), the version of ILASM we are going to use is under C:WINDOWSMicrosoft.NETFrameworkv4.0.30319.
We have to pass trough Microsoft Windows Command (CMD) or Visual Studio Command Prompt. Either way, the result remains the same.
ilasm filename.il –res=filename.res
Filename = full path to the .il file, the -res
parameter is optional and is used to keep resources within the original assembly (like icons)
If the compilation process was successful you will get:
Resolving local member refs: 0 -> 0 defs, 0 refs, 0 unresolved
Writing PE file
Operation completed successfully
And you get your new modified working and without nag screen file:
Note: You can change texts that appear on the message box by changing strings loaded onto the stack, changing the style of the message box:
Displays OK button only.
Displays OK and Cancel buttons.
Displays Abort, Retry, and Ignore buttons.
Displays Yes, No, and Cancel buttons.
Displays Yes and No buttons.
Displays Retry and Cancel buttons.
Displays Critical Message icon.
Displays Warning Query icon.
Displays Warning Message icon.
Displays Information Message icon.
First button is default.
Second button is default.
Third button is default.
Application is modal. The user must respond to the message box before continuing work in the current application.
System is modal. All applications are suspended until the user responds to the message box.
Specifies the message box window as the foreground window.
Text is right-aligned.
Specifies text should appear as right-to-left reading on Hebrew and Arabic systems.
Tableau 1 Visual Basic MsgBoxStyle enumeration values (source: Microsoft MSDN)
The second problem we are facing is much more complicated.
If you want to bypass the serial validation or want to calculate the correct serial, this may be relatively easy.
I want to show you how we can take advantage of some “creative” round trip engineering and let the Crack Me do something that it was not supposed to do, like tell us the correct serial number!
- ECMA Partition I : http://www.ecma-international.org/publications/standards/Ecma-335.htm
- Crack Me#3 – http://www.mediafire.com/?r73aumddbt06b7d
- MSIL Disassembler: http://msdn.microsoft.com/en-us/library/ceats605.aspx
- MSIL Assembler: http://msdn.microsoft.com/en-us/library/496e4ekx.aspx