Introduction

The prime objective of this article is to confront with the .NET mother tongue language termed as Common Instruction Language (CIL) which laid down the foundation of .NET Reverse Engineering. Here, you will comprehend the distinction between CIL directive, attributes, opcodes and will come across numerous CIL tools that associate a significant role in code execution. The triggering point to writing this article is to provide a deep analysis and examination of CIL grammar.

The source code of any software or executable application, is intellectual property of a vendor company and not to be disclosed due to proprietary issues. Without the actual code, we have to rely on what is called the native code, so it is required to delve into CIL before moving ahead to code dissembling. Apart from that, we shall discuss some of the advance conceptions related to reverse engineering such as: round-tripping engineering, obfuscation and code disassembling which uses some advance tools such as IDAPro, Ollydbg, Hex Editor, Ilasm and Reflector in the forthcoming articles of this series.

Abstract

MSIL (Microsoft Intermediate Language) is an essential fragment of CLR and the code that is written and executed under CLR is referred to as “managed code”. The managed compiler translates that code (*.cs file) into CIL code, manifest and metadata. This process typically undergoes two compilation phases. The first compilation phase is performed by compiler in which source code is transformed to MSIL. The second compilation phase occurs at run time, when the MSIL code is compiled to native code. The .NET platform is considered language-independent because the process execution of a managed application is identical regardless of the source language. Finally CIL is full- fledged .NET programming language, with its own syntax and compiler.

The beauty of MSIL code is that it compiled once and executes anywhere by using JIT compiler which, compiles assemblies into native binary code that targets a specific platform. You can write an application and deploy that application to Windows, Linux, Mac and other platforms that support .NET run time.

Prerequisite

In order to execute and examine MSIL/CIL code, you have to configure your machine with the following tools;

  • .NET framework 3.5 or higher
  • Either SharpDeveloper Studio or Xamarin Studio
  • Visual Studio Command Prompt
  • IL Disassembler (ildasm.exe)
  • Reflector

Understanding CIL

When you build a .NET assembly using your managed language of choice (C#, VB, F#, Perl, COBOL), the associate compiler translates your source code into Common Instruction Language. CIL is just an another structural .NET programming language, it possible to build .NET assemblies directly using CIL and CIL compiler (ILASM.EXE) that ships with .NET framework.

The more you understand the grammar of CIL, the better able you are to move into the arena of advanced .NET programming. The programmer having comprehensive knowledge of CIL, can perform the following tasks:

  1. Disassembling an existing assembly, edit the CIL code, and recompile the updated code.
  2. CIL is the only .NET language that allows you to access each aspect of CTS and CLS.
  3. Building in-house dynamic assemblies using the System.Reflection.Emit namespace API.

CIL does not simply define a general set of keywords such as public, private, new, get, set, this. Rather, the token set understood by the CIL compiler is sub-divided into three categories. Each category of CIL token is expressed using a particular syntax. The three categories are:

CIL Directive

Directives are represented syntactically using a single dot prefix (.class, .assembly). They are a set of CIL tokens that are used to describe the structure of a .NET assembly called CIL directives. They are used to inform the CIL compiler how to define the namespace, class and methods that will populate an assembly.

CIL Attributes

Sometimes CIL directives are not descriptive enough to fully express the definition of a given type. However, they can be further specified with various CIL attributes to qualify how a directive should be processed.

CIL Opcodes

The Opcodes (or operation code) provides the type of implementation logic once a .NET assembly, namespace and type has been defined in terms of CIL code.

Despite catering numerous advantages, CIL programming has some drawbacks as such maintaining of safe code. CIL source code is inherently unsafe and could lead to disaster.

First CIL Program

We need a code editor in order to author our First CIL program, for instance Notepad or Wordpad but it is good to write code by using other full-fledged open source .NET IDE such as SharpDevelop or Xamarin Studio. They are integrated with existing .NET FCL an automatic directive recognition feature. No matter which IDE or editor we are using, the important point is to save that CIL code file with *.il extension.

The following code illustrates the first hello world program using CIL programming language. Open notepad, and place the following code, then save this file as Test.il

.assembly extern mscorlib {}<b></b>
<pre>.assembly FirstApp
 {

 }

 .namespace FirstApp
 {
     .class private auto ansi beforefieldinit Test
     {
         .method public hidebysig static void Main(string[] argd) cil managed
         {
              .entrypoint
              .maxstack    1
              ldstr        "Welcome to CIL programming world"
              call         void [mscorlib] System.Console::WriteLine(string)
              ret
         }
     }
 }

File:- Test.il

CIL code Compilation

After finish coding, save this file as Test.il and compile it using the .NET framework shipped tool ILASM.exe as the following command:

ILASM /exe /debug Test.il

Here the exe option indicates that the target is a console base application. The debug option asks the compiler to generate a debug file (test.pdb) for the application which is a useful viewing source code in a debugger or disassembler.


After successfully compiling the Test.il file, Test.exe is created in the project directory which is finally executable and yields our desired output as the following:


When building or modifying assemblies using CIL code, it is always advisable to verify that the compiled binary image is a well-formed .NET image using the peverify.exe utility as shown below:


Here in the aforementioned figure, it is proved that the all opcodes within the test.exe binary are valid CIL codes. While the CIL compiler has numerous command-line options as the following:

Flags Description
/dll It produce a *.dll file as an output.
/exe This is the default option and produce an *.exe file as an output.
/debug Includes debug information.
/output It specifies the output file name and extension.
/snk It compile the *.il file with a strong name.

In the aforementioned CIL code source file Test.il, the first declaration is an external reference to the mscorlib library. The mscorlib.dll contains the core of the .NET Framework FCL which includes the System.Console class. The second assembly directive is simple name of assembly, which is FirstApp and third directive defines the namespace.

.assembly extern mscorlib {}
.assembly FirstApp
{

}
// class namespace
.namespace FirstApp
{ ……}

The following lines define a class and a method within the class. The class directive introduces a public class named Test which implicitly inherits the System.Object class. The method directive defines the public Main as a member method. The cil keyword indicates that the method contains Intermediate code.

.class private auto ansi beforefieldinit Test
    {
        .method public hidebysig static void Main(string[] argd) cil managed
        { …}
    }

The Main method commences with two directives. The .entrypoint directive, designates Main as the entry point of the application. The .maxstack set the size of the memory stack to 1 slot. The ldstr directive loads the string into memory. The call directive consumes one item from the memory and displays them using the WriteLine method. Finally ret directive indicates return or exit from the method.

.entrypoint
.maxstack    1
ldstr     "Welcome to CIL programming world"
call      void [mscorlib] System.Console::WriteLine(string)
ret

CIL Code Post-mortem Analysis

CIL is much easier to understand and interpret when compared to assembly language. The contents of source code in CIL programming are case sensitive like C# in statements and, not terminated with a semicolon. Apart from that, the most significant part of CIL application is dotted prefixed directives and actual executable source code. There are several categories of directives proposed by .NET CLR such as assembly, class and method.

In order understand the CIL code directive, we shall write a console application using the Xamarin Studio that produces the addition of two integer types. Although we can develop such an application using other code editors but Xamarin studio provides more functionality and facilities in terms of writing crucial IL coding rather than other editors.

So first open the Xamarin studio and select New
Solution from File menu. Then choose IL type Console Project from project template as shown below:


Thereafter, rename the main.il to MathFun.il and place the following code in the MathFun.il file. We shall discuss each segments of the *.il file in the next section.

.assembly extern mscorlib
{
  .publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
  .ver 2:0:0:0
}
.assembly MathFun
{
  .ver 1:0:0:0
  .locale "en.US"
}
.module MathFun.exe

.imagebase 0x00400000
.file alignment 0x00000200
.stackreserve 0x00100000
.subsystem 0x0003
.corflags 0x00000003

// =============== CLASS MEMBERS DECLARATION ===================

.class public auto ansi beforefieldinit MathFun
       extends [mscorlib]System.Object
{
  .field private string '<Name>k__BackingField'
  .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
  .method public hidebysig specialname rtspecialname
          instance void  .ctor(string name) cil managed
  {
    // Code size       18 (0x12)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  nop
    IL_0008:  ldarg.0
    IL_0009:  ldarg.1
    IL_000a:  call       instance void MathFun::set_Name(string)
    IL_000f:  nop
    IL_0010:  nop
    IL_0011:  ret
  } // end of method Test::.ctor

  .method public hidebysig specialname instance string get_Name() cil managed
  {
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       11 (0xb)
    .maxstack  1
    .locals init (string V_0)
    IL_0000:  ldarg.0
    IL_0001:  ldfld      string MathFun::'<Name>k__BackingField'
    IL_0006:  stloc.0
    IL_0007:  br.s       IL_0009

    IL_0009:  ldloc.0
    IL_000a:  ret
  } // end of method Test::get_Name

  .method public hidebysig specialname instance void set_Name(string 'value') cil managed
  {
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  ldarg.1
    IL_0002:  stfld      string MathFun::'<Name>k__BackingField'
    IL_0007:  ret
  } // end of method Test::set_Name

  .method public hidebysig instance string Display() cil managed
  {
    // Code size       22 (0x16)
    .maxstack  2
    .locals init ([0] string CS$1$0000)
    IL_0000:  nop
    IL_0001:  ldstr      "Hello "
    IL_0006:  ldarg.0
    IL_0007:  call       instance string MathFun::get_Name()
    IL_000c:  call       string [mscorlib]System.String::Concat(string,string)
    IL_0011:  stloc.0
    IL_0012:  br.s       IL_0014

    IL_0014:  ldloc.0
    IL_0015:  ret
  } // end of method Test::Display

  .method public hidebysig instance int32 Addition(int32 x, int32 y) cil managed
  {
    // Code size       9 (0x9)
    .maxstack  2
    .locals init ([0] int32 CS$1$0000)
    IL_0000:  nop
    IL_0001:  ldarg.1
    IL_0002:  ldarg.2
    IL_0003:  add
    IL_0004:  stloc.0
    IL_0005:  br.s       IL_0007

    IL_0007:  ldloc.0
    IL_0008:  ret
  } // end of method Test::Addition

  .property instance string Name()
  {
    .get instance string MathFun::get_Name()
    .set instance void MathFun::set_Name(string)
  } // end of property Test::Name
} // end of class MathOperation.Test

.class private auto ansi beforefieldinit MathFun extends [mscorlib]System.Object
{
  .method private hidebysig static void  Main(string[] args) cil managed
  {
    .entrypoint
    // Code size       57 (0x39)
    .maxstack  4
    .locals init ([0] class MathFun obj)
    IL_0000:  nop
    IL_0001:  ldstr      "Ajay"
    IL_0006:  newobj     instance void MathFun::.ctor(string)
    IL_000b:  stloc.0
    IL_000c:  ldloc.0
    IL_000d:  callvirt   instance string MathFun::Display()
    IL_0012:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_0017:  nop
    IL_0018:  ldstr      "Addition is: {0}"
    IL_001d:  ldloc.0
    IL_001e:  ldc.i4.s   15
    IL_0020:  ldc.i4.s   35
    IL_0022:  callvirt   instance int32 MathFun::Addition(int32,int32)
    IL_0027:  box        [mscorlib]System.Int32
    IL_002c:  call        void [mscorlib]System.Console::WriteLine(string,object)
    IL_0031:  nop
    IL_0032:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
    IL_0037:  pop
    IL_0038:  ret
  }
}

MathFun.il

Now build this program using F8. After successful compilation, the final executable MathFun.exe file is created in the project Bin/Debug folder of the solution directory.

Assembly Directives

The assembly directive contains information that the compiler produces to the manifest, which is metadata pertaining to the overall assembly. This section lists common assembly directives as following;

.assembly extern

This directive represents an external assembly. The public types and methods of the referenced assembly are available to the current assembly. Here, is the syntax as:

.assembly extern name as alaisname { }

Want to learn more?? The InfoSec Institute Reverse Engineering course teaches you everything from reverse engineering malware to discovering vulnerabilities in binaries. These skills are required in order to properly secure an organization from today's ever evolving threats. In this 5 day hands-on course, you will gain the necessary binary analysis skills to discover the true nature of any Windows binary. You will learn how to recognize the high level language constructs (such as branching statements, looping functions and network socket code) critical to performing a thorough and professional reverse engineering analysis of a binary. Some features of this course include:

  • CREA Certification
  • 5 days of Intensive Hands-On Labs
  • Hostile Code & Malware analysis, including: Worms, Viruses, Trojans, Rootkits and Bots
  • Binary obfuscation schemes, used by: Hackers, Trojan writers and copy protection algorithms
  • Learn the methodologies, tools, and manual reversing techniques used real world situations in our reversing lab.

We implement such construct in the MathFun.il file by referencing the mscorlib.dll as following:

.assembly extern mscorlib
{
.publickeytoken = (B7 7A 5C 56 19 34 E0 89 )
.ver 2:0:0:0
}

Because of the importance of mscorlib.dll, the ILASM compiler automatically includes an external assembly reference to that library.

.assembly

It defines the simple name of the assembly. Assembly can be defined by specifying the friendly name of the binary;

.assembly CILType { }

There are some of the sub-directives available in the assembly block as:

  • .ver
  • .locale
  • .publickey

By taking the reference of MathFun.il file, we are updating the assembly definition to include a version number of 1.0.0.0 using .ver directive and culture information using .locale; such construct would be as the following:

.assembly MathFun
{
.ver 1:0:0:0
.locale “en.US”
}

.module

The .module directive ensures the final executable extension of the files such as *.exe;

Want to learn more?? The InfoSec Institute Reverse Engineering course teaches you everything from reverse engineering malware to discovering vulnerabilities in binaries. These skills are required in order to properly secure an organization from today's ever evolving threats. In this 5 day hands-on course, you will gain the necessary binary analysis skills to discover the true nature of any Windows binary. You will learn how to recognize the high level language constructs (such as branching statements, looping functions and network socket code) critical to performing a thorough and professional reverse engineering analysis of a binary. Some features of this course include:

  • CREA Certification
  • 5 days of Intensive Hands-On Labs
  • Hostile Code & Malware analysis, including: Worms, Viruses, Trojans, Rootkits and Bots
  • Binary obfuscation schemes, used by: Hackers, Trojan writers and copy protection algorithms
  • Learn the methodologies, tools, and manual reversing techniques used real world situations in our reversing lab.

.module MathFun.exe

.imagebase

The .imagebase directive sets the base address where the application is loaded. The default is 0×00400000.

.imagebase 0×00400000

.file

The .file directive adds a file to the manifest of the assembly. This is useful for associating helper documents with an assembly.

.file alignment 0×00000200

The nometadata is the primary option and stipulates that the file is unmanaged.

.stackreserve

The .stackreserve directive configures the stack size to 0×00100000 which is default.

.stackreserve 0×00100000

.subsystem

The .subsystem directive indicates the subsystem used by the application, such as console or GUI subsystem. Here the syntax as shown below:

.subsystem number

In the aforementioned example, we are constructing a console application. So mention 3 which are for console application and 2 for GUI applications.

.subsystem
0×0003

.corflags

The .corflags directive sets the runtime flag in the CLI header which stipulates an IL only assembly. The default value is 1 of the corflags.

.corflags 0×00000003 (As reference to MathFun.il)

.maxstack

The .maxstack directive establishes the maximum number of variables that may be pushed onto the stack during execution.

.maxstack 8 (default value)

Class Directives

This part describes the important class directives. It has contains the following significant directive:

.class

The .class directive defines a new reference, value or interface type. Here, the syntax is shown below:

attributes classname extends basetype implements interface

As per the aforementioned MathTest.il file, we implement the class MathOperation using .class directive in this way as the following:

.class public auto ansi beforefieldinit MathFun
extends [mscorlib]System.Object

The class directive is also adorned with variety of attributes. Here is the short list of the most common:

  • abstract: indicates class that can’t be instantiated.
  • ansi and Unicode : determine the format of string.
  • auto : CLR controlled the Memory layout of fields by this.
  • beforefieldinit: the type should be initialized before a static class is accessed.
  • private and public : set the visibility outside the class

The Test class also implements a constructor specification as Test() in order to initialize the field’s data as in C#.net version.

public Test(string name)
        {
            this.Name = name;
        }

So its IL code would be as the following:

.field private string '<Name>k__BackingField'
  .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
  .method public hidebysig specialname rtspecialname
          instance void  .ctor(string name) cil managed
  {
    // Code size       18 (0x12)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  nop
    IL_0007:  nop
    IL_0008:  ldarg.0
    IL_0009:  ldarg.1
    IL_000a:  call       instance void MathFun::set_Name(string)
    IL_000f:  nop
    IL_0010:  nop
    IL_0011:  ret
  }

.property

The property directive adds a property member to a class. Here, the syntax as shown below;

.property attributes return propertyname parametrs default { body }

If we define a property in C# code as the following:

public String Name
        {
            get;
            set;
        }

Then its corresponding MSIL code counterpart for Get and Set property would be as the following:

  .method public hidebysig specialname instance string
          get_Name() cil managed
  {
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       11 (0xb)
    .maxstack  1
    .locals init (string V_0)
    IL_0000:  ldarg.0
    IL_0001:  ldfld      string MathFun::'<Name>k__BackingField'
    IL_0006:  stloc.0
    IL_0007:  br.s       IL_0009

    IL_0009:  ldloc.0
    IL_000a:  ret
  } // end of method Test::get_Name

  .method public hidebysig specialname instance void
          set_Name(string 'value') cil managed
  {
    .custom instance void [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() = ( 01 00 00 00 )
    // Code size       8 (0x8)
    .maxstack  8
    IL_0000:  ldarg.0
    IL_0001:  ldarg.1
    IL_0002:  stfld      string MathFun::'<Name>k__BackingField'
    IL_0007:  ret
  }
.property instance string Name()
  {
    .get instance string MathFun::get_Name()
    .set instance void MathFun::set_Name(string)
  }

.method

This directive defines the method in a class. Here is the syntax as:

.method attributes callingconv return methodname arguments { body }

We are defining two methods Display() and Addition(). First one would show “Hello” text on the screen and second Addition() method would compute the sum of two integer type supplied variables in the method as following:

public String Display()
        {
            return "Hello " + Name;
        }
public int Addition(int x, int y)
        {
            return (x+y);
        }

The resemble IL code for methods as:

.method public hidebysig instance string
          Display() cil managed
  {
    // Code size       22 (0x16)
    .maxstack  2
    .locals init ([0] string CS$1$0000)
    IL_0000:  nop
    IL_0001:  ldstr      "Hello "
    IL_0006:  ldarg.0
    IL_0007:  call       instance string MathFun::get_Name()
    IL_000c:  call       string [mscorlib]System.String::Concat(string,
                                                                string)
    IL_0011:  stloc.0
    IL_0012:  br.s       IL_0014

    IL_0014:  ldloc.0
    IL_0015:  ret
  }

The method attribute has some additional attributes as:

  • hidebysig: hides the base class interface of this method.
  • Specialname: this is used for special methods such get_Property and set_Property.
  • Rtspecialname: this indicates the special method referred as constructor.
  • Cil or il: the method contains the MSIL code.
  • Native: the method contains platform-specific code.
  • Managed: indicates the implementation is managed.

.field

The field directive indicates a new defined field which is state information for a class. Here, the syntax as shown below:

.field attributes type fieldname

In the C# code, we can define an integer type field as the following:

.field private initonly int32 x
.field private initonly int32 y

Main() Method Directives

The method block can contain both directives and the implementation code (CIL).

.entrtpoint

This directive designates a method as an entry point of the application. This directive can appear anywhere in the program.

.locals

The .locals directive declares the local variables that are available by name. Here, we are defining two integer types local variable in the MathFun.il as:

.locals init ([0] int32 x,[1] int32 y)

And we are assigning a string slot by also passing a string data into the class constructor as:

.locals init ([0] class MathFun obj)

MSIL Instructions

Each MSIL instruction assigned an opcode, which is commonly 1 or 2 bytes. Opcodes which caters an alternative means of identifying MSIL instructions, are used primarily when producing code dynamically at run time.

    IL_0000:  nop
    IL_0001:  ldstr      "Ajay"
    IL_0006:  newobj     instance void MathFun::.ctor(string)
    IL_000b:  stloc.0
    IL_000c:  ldloc.0
    IL_000d:  callvirt   instance string MathFun::Display()
    IL_0012:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_0017:  nop
    IL_0018:  ldstr      "Addition is: {0}"
    IL_001d:  ldloc.0
    IL_001e:  ldc.i4.s   15
    IL_0020:  ldc.i4.s   35
    IL_0022:  callvirt   instance int32 MathFun::Addition(int32,
                                                                     int32)
    IL_0027:  box        [mscorlib]System.Int32
    IL_002c:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                  object)
    IL_0031:  nop
    IL_0032:  call       valuetype [mscorlib]System.ConsoleKeyInfo [mscorlib]System.Console::ReadKey()
    IL_0037:  pop
    IL_0038:  ret

Synopsis

This article touched briefly on the most important features of the common language runtime and ILAsm. You now know how the runtime functions, how a program in ILAsm is written,compile using either ilasm or Xamarin studio, and how to define the basic components (classes, fields, property and methods).We will pick opcode specification in depth along with the remaining crucial segments of the MSIL grammar in the next articles of this series.