1. Introduction

Web Applications using applets to transfer data between the client and server are hard to manipulate using security holes, because of the simple reason that code within the applet is difficult to modify. Although the code can be recovered by decompiling the applets, it is very difficult to get them recompiled back into a jar file after making the required changes. This is because the decompilation process always leaves codes portions that are not properly recovered and hence it needs to be reconstructed manually by the analyst, which is a very error-prone method. There will be a lot of dependencies that cannot be resolved and based on the compilation process, some information may be completely lost.

In this paper we’ll discuss a technique that can be used to modify code and without having to recompile the applet and it can be madeto run in a standalone manner so that it can be debugged live and the values can be manipulated to exploit security holes.

We’ll use a generally occurring deployment scenario:

  1. The applets are signed
  2. The applets run in the context of Internet Explorer, using proxy settings, etc… imported from the browser settings.

In the next section we’ll discuss the basics about java virtual machine, and class file format.

2. Java Virtual Machine

Programs written in the Java language are compiled into a portable binary format called byte code. Every class is represented by a single class file containing class related data and byte code instructions. These files are loaded dynamically into an interpreter (Java Virtual Machine, aka. JVM) and executed.

Here is an example of this:

Above, a java class file is compiled using the javac compiler, which is then converted to a .class file; this class file is executed by Java Virtual Machine (byte code interpreter).

  1. 2.1 The various data types of VM

    The Java VM has two kinds of values:

    1. Primitive Type
    2. Reference Type

    The primitive type values consist of:

    1. Numeric (byte, short, int, long, char, float, double)
    2. Boolean
    3. ReturnAddress: The returnAddress type is used by the Java virtual machine’s jsr, ret, and jsr_w instructions. The values of the returnAddress type are pointers to the opcodes of Java virtual machine instructions. Unlike the numeric primitive types, the returnAddress type does not correspond to any Java programming language type and cannot be modified by the running program.

    The Reference type consists of:

    1. class types
    2. array types
    3. Interface types.

    Their values are references to dynamically created class instances, arraysthat implement interfaces, respectively. A reference value may also be the special null reference, a reference to no object, which will be denoted here by null.

  2. 2.2

    The Java VM uses many data areas during runtime to save context of the execution, some are data areas are created per thread, and some are created during the VM startup and destroyed when the VM finishes its execution.

  3. 2.2.1 The PC Register

    The PC register is like EIP (Instruction pointer) in x86 assembly, this is on a per thread basis and it stores the address of the instruction that is to be executed. The PC will contain an undefined value if the instruction to be executed falls in the native code region, as java allows for native code integration.

  4. 2.2.2 Java VM Stack

    This is on aper thread basis and is used to store frames (this will be explained later on in the article).

  5. 2.2.3 Heap

    This is shared by all the threads that are running in the java VM. The memory from this heap is used to allocate memory for classes and arrays that are declared in the class.

  6. 2.2.4 Method Area

    This is similar to the text section in PE executable; this contains the code to be executed, and this is shared by all the threads.It stores per-class structures such as the runtime constant pool, field and method data, and the code for methods and constructors.

    The method data contains the maximum size of the method’s stack frame, number of local variables and an array of byte code instructions. It may also contain local variable names which can be used by a debugger, but it must be compiled with debug information to get that info.

  7. 2.2.5 Runtime Constant Pool

    This is allocated from within the method area and this is a per-class/interface data area, as it stores constant data such as integers, float etc… and also class/interface/method references having dynamic resolutions.

  8. 2.2.6 Native Stack Area

    These are used by native code methods, they are conventional stacks (i.e.similar to the ones used by regular PE executables) andthey are allocated per thread.

  9. 2.2.7 Frames

    Frames are like activation records created during method invocation in C/C++ programs. The main purpose of this data area is to:

    1. Store local variables
    2. Store Dynamic linking information
    3. Operand Stack
    4. Dispatch Exceptions
  10. 2.3 Instruction Set in Java VM

    This will provide a brief overview on how the java VM executes the opcodes, and some basic instructions in java VM.

    The opcodes are one byte in length and is followed by one or more operands as arguments.

    Here’s a how the system works:

    While(instruction is present)

    {

    Fetch opcodes;

    If (operands are there)

    Fetch Operands

    Push operands into operand stack

    Perform the operation

    Push result into the operand stack

    End if

    Move to next instruction

    }

    The number and size of the operands are decided by their opcodes. For example,iadd will add two integers and then push the result to the operand stack. Looking at the opcodes, we know that it will take two integers as arguments.

    The operand is stored in one byte boundaries; if the operand is more than one byte long it will be stored in big-endian format.

    2.3.1 Byte code instruction set

    The JAVA VM instruction set consists of 212 instructions. The instruction set can be grouped into these categories:

    1. Stack Operations

    The constant can be pushed into stack or loaded from them, using instructions like ldc, ldc_w, iconst_0 etc.

    1. Arithmetic operations

    The Java VM uses different opcodes to operate on different values.For example,opcodes starting with “I” has integers as their operands (iadd adds two integers and pushes the result back into stack).

    1. Control Flow

    These are similar to branch instructions in 0×86 (jle, jge, jmp) as they will redirect the execution of the current thread to a different address. There are instructions like goto and if_cmpeq, which compares two integers and if they are equal redirects execution.

    1. Load/Store

    These are responsible for pushing the value into the operand stack from a variable ,for ex, aload_0 and storing the value into a variable by poping the operand stack.

    1. Field Access

    There are two kinds of field access: instance field and static field access.Both of the fields are accessed differently;getfield is used to get instance fields where one of the arguments to the opcodes is the object reference and the other argument (spanning two bytes) is the index into the constant pool which contains the symbolic reference to the field.

    The other opcode “getstatic” is used to get static fields such “System.out” where “out” is a static variable.

    1. Method Invocation

    These methods are used to invoke static, virtual, private, interface methods. Each of these methods is invoked using different opcodes. For example, for invoking static methods, invokestaticopcodes is used.

    1. Object Allocation

    Various objects are allocated using different opcodes. For example, class object is created using the “new” opcodes and arrays are created using newarray, this is for allocating basic types array, such as char[], etc…

    1. Conversion and Type checking

    The opcodes in this category are used to convert one basic data type to another, for i2f is used to convert integer to float.

    2.4 Specially named Initialization methods

    This is a very important point to remember while performing byte code patching of class files. Certain methods such as the constructors are given a special name “” by the compiler.

    For example, if you are trying to create a new instance of Client class, these will be the byte code instructions that will be generated.

    Java Code:

    Client n = new Client();

    Byte Code:

    new client

    dup

    aload_1

    Invkoespecial emmo.client:: void

    astore_1

    2.5 Descriptors

    The descriptors are string based symbolic representations of methods, fields, classes, etc…thatare called signatures. They are stored in constant pool and have a special string representation. We will see a brief overview of how various objects are represented using strings

    We’ll specify the descriptor format using a grammar approach. There are two kind of descriptors field and method descriptor.

    The grammar we’ll be using has the following properties:

    1. Terminal symbols will be italic
    2. Non-Terminal symbols will be regular and bold.
    3. Definition of a Non-Terminal symbol will contain Non-Terminal at the LHS followed by a symbol “->” and then the Terminal/Non-Terminal symbol at the RHS.
    4. If more than one symbols for the RHS exists then they will be separated by a “/” symbol.
    5. Comments will be given by enclosing them between “”.
    6. The symbol “*” will be used to indicate that one or more instance of that particular Terminal/Non-Terminal might be present in the string.

    Here is a sample production:

    A->B/i

    Here, A is the non terminal, whose definition is being given thenB and i are the alternatives RHS symbols for the non-terminal A.

    Want to learn more?? The InfoSec Institute Ethical Hacking course goes in-depth into the techniques used by malicious, black hat hackers with attention getting lectures and hands-on lab exercises. While these hacking skills can be used for malicious purposes, this class teaches you how to use the same hacking techniques to perform a white-hat, ethical hack, on your organization. You leave with the ability to quantitatively assess and measure threats to information assets; and discover where your organization is most vulnerable to black hat hackers. Some features of this course include:

    • Dual Certification - CEH and CPT
    • 5 days of Intensive Hands-On Labs
    • Expert Instruction
    • CTF exercises in the evening
    • Most up-to-date proprietary courseware available
    1. Field Descriptor

    FieldDescriptor ->Fieldtype

    Fieldtype ->basetype/objecttype/arraytype

    Basetype->B/C/D/F/I/J/S/Z

    Objecttype->L

    Arraytype->[Fieldtype

    B Byte
    C Char
    D Double
    F Float
    I Integer
    J Long Integer
    S Signed short
    Z Boolean
    [ Reference,
    L Reference< class instance>

    For example…

    Integer array[][][]

    Will be represented in symbolic reference as

    [[[I

    Object obj;

    Will be represented as

    Ljava/lang/Object;

    1. Method Descriptor

    Method descriptors are represented by:

    MethodDescriptor->( ParameterDescriptor* ) ReturnDescriptor

    This represents the parameter that the method takes and the possible return values after completion of the method.

    ParameterDescriptor->FieldType

    ReturnDescriptor->FieldType/V

    Here, “V” indicates a NULL value, hence, the method can return a NULL value as well.

    Here are a few examples on how a method signature would look:

    String copy (int length, String arr)

    Will have a signature of the following format:

    (ILjava.lang.String ;) Ljava.lang.String

    2.6 Java Class File Format

    The image below shows the basic layout of a class file; the left hand side of the figure is an abstract form of a basic class layoutwhichcontains a grouped view of the members so that it is easier to understand and the boxes onthe right side are a more detailed version of the class layout which shows all the members in the class format and the group that it belongs to (shown by arrows).

    Beloware a few members that will be critical to rest of the paper; these are constant pool:
    Interface array, Field array, and Method array.

    Here, u1 represents unsigned one byte and u2 represents unsigned 2 bytes.

  11. 2.6.1 Constant Pool

    The java virtual machine relies on symbolic references of classes to get the runtime layout; the byte code refers to these symbolic references, and thesereferences are placed in the constant pool.

    Each constant pool entry has the following format:

    cp_info {

    u1 tag;

    u1 info[];

    }

    The tag field specifies which constants are present at the given index in constant pool table, the possible entries are:

    Constant Type Value
    CONSTANT_Class 7
    CONSTANT_Fieldref 9
    CONSTANT_Methodref 10
    CONSTANT_InterfaceMethodref 11
    CONSTANT_String 8
    CONSTANT_Integer 3
    CONSTANT_Float 4
    CONSTANT_Long 5
    CONSTANT_Double 6
    CONSTANT_NameAndType 12
    CONSTANT_Utf8 1

    After every tag byte there are two or more bytes that gives more specific information about the constant.

    Here is an example of the mentioned specific information; this is for a constant of type class; the format for this information is:

    CONSTANT_Class_info {

    u1 tag;

    u2 name_index;

    }

    This contains the tag value of 7 (refer the table above). The name index points to an entry within the constant pool table. This will contain a symbolic representation in utf-8 format representing the fully qualified class name.

    For more info please read [1].

  12. 2.6.2 Fields Array

    This gives the complete description of all the fields in the class/interface. Here is the format of the field’s structure.

    field_info {

    u2 access_flags;

    u2 name_index;

    u2 descriptor_index;

    u2 attributes_count;

    attribute_info attributes[attributes_count];

    }

    Here,name_index and descriptor_index both point to an utf-8 string which gives the name and descriptor of the field variable. Attributes that are associated with the field can be vary, and one of them is ConstantValue. (A ConstantValue attribute represents the value of a constant field that must be explicitly or implicitly static.)

  13. 2.6.3 Method Array

    The method array is another important structure that we need to be aware of before reverse engineering applications; this array holds the class byte code. The structure is similar to the field info structure.

    The important difference is the presence of attributes called the code and exception attributes.

    I. Code Attribute

    This is a variable length structure and is used to hold the code of the methods defined in the class/interface, max number of local variables, max stack size, and an exception table when indicates the extent and nesting of try blocks and the corresponding catch blocks.This also holds other attributes called LineNumberTable and LocalVariableTable which hold information that are used by the debuggers to locate local variables and to match the byte code with the line number in the original source code.

  14. II.Exception Attribute

    This is another variable length structure that gives the list of exceptions that the application might throw.
    Above, we discussed some of the critical data structures that will be useful in reverse engineering. Next, we will move into the actual reverse engineering of a java applet.Here we will use an applet because applets require special attention while reversing because of various security setting, and these settings must be changed before the reversing begins.

    3. Reverse Engineering Java Applets

    In this section, we’ll see how to patch byte code and perform other kind of manipulation in the java class files of the applet.We’ll also see how to get a signed applet to run in a standalone manner (as an application).

    1. Removing Signatures and Providing permissions

    The signature is to verify that an applet or application is from a reliable source and can be trusted and can be run with the permissions given in the policy file.

    If we try to modify the byte code or any data structure, we will get the following error:

    This indicates that the integrity of the file has been compromised; this was concluded because the digest in the signature files was not same as the digest calculated when the jar file was being read.

    The easiest way to remove this is by simply deleting the two files called the “SIGNFILE.DSA” and “SIGNFILE.DSF” in the “META-INF” directory.

    The second modification that needs to be done is to give the applet permissions so that it can access resources in the machine.

    To remedy this, we will create a policy file (sjava.policy) having the following entries:

    grant {

    permissionjava.security.AllPermission;

    };

    Want to learn more?? The InfoSec Institute Ethical Hacking course goes in-depth into the techniques used by malicious, black hat hackers with attention getting lectures and hands-on lab exercises. While these hacking skills can be used for malicious purposes, this class teaches you how to use the same hacking techniques to perform a white-hat, ethical hack, on your organization. You leave with the ability to quantitatively assess and measure threats to information assets; and discover where your organization is most vulnerable to black hat hackers. Some features of this course include:

    • Dual Certification - CEH and CPT
    • 5 days of Intensive Hands-On Labs
    • Expert Instruction
    • CTF exercises in the evening
    • Most up-to-date proprietary courseware available

    We can start the “appletviewer” using the following command:

    appletviewer -J-Djava.security.policy=sjava.policy

    The policy file states that the complete permissions are given to the applet; it can use any resource in the machine.Hence, if it is an application it will be fine, but if it is a malware or a malicious code, you need to give a more fine grained access policy file.

    The next step is to understand the various tools that will be used for reverse engineering and byte code patching.

    1. Byte Code Manipulation tools

    These are the tools that will be used:

    1. Class Constructor Kit (CCK) by M. Dham
    2. Java Decompiler
    3. Jclasslib bytecode viewer by ej-technologies
    4. JDB

    3.2.1 Class constructor Kit (CCK)

    This is a great tool for visual creation and modification of class files. This will allow you to append your own byte code instructions, change the existing ones, and update many data structures such as Linenumber table, LocalVariable table, fields, attributes, etc…

    3.2.2 Java Decompiler

    This allow you generate the java files from the class fileandcan be used to understand and locate the code you are going to modify as well as verify whether post modifications the decompilation generate a proper java source code file.

    3.2.3 JclassLib bytecode viewer

    This tool parses the entire class and gives you the proper picture of how a class file looks like by showing all the information such as constant pool table, interfaces, etc…Using this tool, a high level understanding of how a every byte code instruction is formed can be understood; also the format of the constant pool table and its various information structures (which were previously explained) are well laid out.

    3.2.4 JDB

    This is java debugger and using this we can set breakpoints on certain methods in the classes and perform a dynamic analysis.

    1. Byte Code Patching

    Let’s take a look at how Jclasslib displays the data

    The left pane indicates a lot of the data structures such as constant pool and interfaces implemented in the class.The expanded tree node is for methods exported by the class and the highlighted method name is “start”. There is just one attribute to the method called code which was explained previously.The right hand side of the window shows the byte code for the method, the green links are the arguments to the opcodes.

    Take a look at the following short code as an illustration:
    getstatic #11,

    Here, #11 points to the 11th array index in the constant pool which is structure of Field_ref_info.

    The instructions have the following layout:

    mnemonic
    operand1
    operand2

    The mnemonic is the name of the opcodes and the operand1/2 is the compile time generated operands; these are embedded with in class file with the instructions. The other operands are runtime generated and are placed in the operand stack.

    For example: getstatic<java/lang/system.out>

    The memory layout will be as follows:

    getstatic
    indexbyte1
    indexbyte2

    Where indexbyte1/2 will be in the compiled code, the byte code in binary format will have the following representation:

    1. 00 61

    Here, 178 is the opcodes and 0061 is the index into the constant pool table.

    The operand stack will have the following format:

    … => value

    Where the left hand side of the symbol ‘=>’ indicates the data consumed by the opcodes and the right hand side indicates the result of running the opcode.

    Let’s see another example…

    “anewarray” this is used to create an array of references.

    The memory layout will be:

    1. 00 61

    Where 189 is the opcodes and 0061 is the index into the constant pool in which a symbolic reference to a class, interface, and array is present.

    The operand stack contains the following values:

    …,count =>arrayref

    Hence, at runtime, the opcodes take the length of the array as argument and returns a reference to the generated array.

    More information can be found at [1].

    As we can see, the opcodes separate their operands into runtime and compile time and both are required to generate the results.

    Now let’s move into the portion where the class will be patched.To do that, the best approach is to decompile the class files using the Java Decompiler; this will help us in locating the file that contains the method which needs to be altered.

    We will use a sample applet to demonstrate this process. The jar file is extracted and the target file is recovered; in this case this is called DataApplet. The class file is decompiled using Java Decompiler and then source is partially recovered.We locate our function called start(); this function has the following implementation:

    From the implementation we can see that the method uses the function GetCodeBase() to get the base URL from which the applet is loaded, and if we try to run the applet in a standalone manner, that is, by saving the jar file and then running the file using the appletviewer, it will give a null pointer exception because the getcodebase() will return a NULL value.Hence, the variable url will be NULL and when it is used later on in other parts of the code, it will generate the exception.

    Now, let us look into the byte code representation of the above method:

    The code section that needs to be removed is:

    aload 0;

    invokevirtual echo/DataApplet._getCodeBasecURL;

    This, in java source code would be:

    Java.net.URL url = getCodeBase();

    And we need to replace this with:

    URL url = new URL(“http://xxx.xxx.xxx.xxx/secured/MainPage.html”);

    The next step is to figure out how this will be represented in Java byte code.The easiest method is to write the desired code and generate the class file then recover the byte code from the class file. Once a complete understanding of how java VM uses various opcodes is gained, the code can be written without any such methods.

    The generated class file was viewed using the byte code viewer “jclasslib”.

    Now this code will be injected into the class using the Class Constructor kit (CCK):

    While patching an applet code, the most important thing that we need to careful about is maintaining the stack state.

    It is important to understand how each instruction behaves;let’s take the above code as an example.

    The instruction “new” will have the following operand stack format:

    … =>objectref,..

    That is object reference will be left in the stack after it is called, the compile time operand will specify which object needs to be created. Hence at this point the stack will contain an extra value and has to be consumed. The instructions dup, ldc_w has the function of pushing values into the operand stack, these values have to be consumed to maintain the expected state of the stack, this is done by the opcodes “invokespecial.”

    The stack state expected by this opcode is:

    …, objectref,[arg1],[arg2],…=>…

    This opcodes takes the objectref and the arguments present in the stack and invoke the method given in its compile time operand and does not return any value, but we have to return a reference to the newly created object.For that reason, we use dup that creates two instances of the object reference of the class.The first one is consumed by the invokespecial instruction and the second one is still left in the stack and this is assigned to the local variable at index 0, which is a URL object.

    After this modification, the applet will connect to a static address every time and it can be run and debugged as a java application.

    Debugging an applet compiled without debug info becomes difficult since, as mentioned, there are two attributes of the code attribute called the LocalVariableTable and LineNumberTable. These tables are empty and the local variable names can’t be fetched. There, their values can’t be manipulated by java debugger.

    Now the simplest method for studying the runtime behavior of such applications is to inject the code, as shown below:

    getstatic<java/lang/system.out>
    .

    .
    invokevirtual

    This is a very primitive way of getting local variable values, but also very effective (in the absence of debug info) in understanding the overall behavior of the method.

    4. Conclusion

    This paper shows reverse engineering of an applet which does not have any kind of code obfuscation, string encryption or other code protection techniques employed. This paper is intended as starting point to begin reverse engineering java applets/applications.Using the basic concepts from this paper and building on these will allow you to do more complex reverse engineering applications and problems will be more easily solved.

    The reason for choosing applets is that they have additional hurdles while reversing than regular java applications and explaining them will also explain reverse engineering normal java applications.

    5. Reference

    1. The JavaTM Virtual Machine Specification : Second Edition by Tim Lindholm
      Frank Yellin
    2. Class Construction Kit – http://bcel.sourceforge.net/cck.html
    3. BCEL Manual – http://jakarta.apache.org/bcel/manual.html