Malware analysis

Java bytecode reverse engineering

Ajay Yadav
January 31, 2014 by
Ajay Yadav

This article is designed to show how to crack a Java executable by disassembling the corresponding bytes code. Disassembling Java bytecode is the act of transforming Java bytecode to Java source code. Disassembling is an inherent issue in the software industry, causing revenue loss due to software piracy. Security engineers have made an effort to resist disassembling techniques, including software watermarking, code obfuscation, in the context of Java bytecode disassembling. A huge allotment of this paper is dedicated to tactics that are commonly considered to be reverse engineering.

The methods presented here, however, are intended for professional software developers and each technique is based on a custom-created application. We are not encouraging any kind of malicious hacking approach by presenting this article; in fact the contents of this paper help to pinpoint the vulnerability in the source code and learn the various methods developers can use in order to shield their intellectual property from reverse engineering. We shall explain the process of disassembling in terms of obtaining sensitive information from source code and cracking a Java executable without having the original source code.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Prerequisite

I presume that the aspirant would have thorough understanding of programming, debugging and compiling in JAVA on various platforms such as Linux and Windows and, of course, knowledge of JVM's inner workings. Apart from that, the following tools are required to manipulate bytecode reverse engineering;

  • JDK Toolkit (Javac, javap)
  • Eclipse
  • JVM
  • JAD

Java bytecode

Engineers usually construct software in a high-level language such as Java, which is comprehensible to them but which in fact, cannot be executed by the machine directly. Such a textual form of a computer program, known as source code, is converted into a form that the computer can directly execute. Java source code is compiled into an intermediate language known as Java bytecode, which is not directly executed by the CPU but rather, is executed by a Java virtual machine (JVM). Compilation is typically the act of transforming a high-level language into a low-level language such as machine code or bytecode. We do not need to understand Java bytecode, but doing so can assist debugging and can improve performance and memory convention.

The JVM is essentially a simple stack-based machine that can be separated into a couple of segments; for instance, stack, heap, registers, method area, and native method stacks. An advantage of the virtual machine architecture is portability: Any machine that implements the Java virtual machine specification is able to execute Java bytecode in a manner of "Write once, run anywhere." Java bytecode is not strictly linked to the Java language and there are many compilers, and other tools, available that produce Java bytecode, such as the Eclipse IDE, Netbeans, and the Jasmin bytecode assembler. Another advantage of the Java virtual machine is the runtime type safety of programs. The Java virtual machine defines the required behavior of a Java virtual machine but does not specify any implementation details. Therefore the implementation of the Java virtual machine specification can be designed in different ways for diverse platforms as long as it adheres to the specification.

Sample cracked application

The subsequent Java console application "LoginTest" is developed in order to explain Java bytecode disassembling. This application typically tests valid users by passing them through a simple login user name and password mechanism. We have got this application from other resources as an unregistered user and obviously we don't possess the source code of this application. As a result, we do not know a valid user name and password, which are only provided to the registered user.

Without having the source code of the application or login credential sets, we still can manage to login into this mechanism, by disassembling its bytecode where we can expose sensitive information related to user login.

Disassemble bytecode

Disassembling is the reverse approach, due to the standard and well-documented structure of bytecode, which is an act of transforming a low-level language into a high-level language. It basically generates the source code from Java bytecode. We typically run a disassembler to obtain the source code for the given bytecode, just as running a compiler yields bytecode from the source code. Disassembling is utilized to ascertain the implementation logic despite the absence of the relevant documentation and the source code, which is why vendors explicitly prohibit disassembling and reverse engineering in the license agreement. Here are some of the reasons to decompile:

  • Fixing critical bugs in the software for which no source code exists.
  • Troubleshooting a software or jar that does not have proper documentation.
  • Recovering the source code that was accidentally lost.
  • Learning the implementation of a mechanism.
  • Learning to protect your code from reverse engineering.

The process of disassembling Java bytecode is quite simple, not as complex as native c/c++ binary. The first step is to compile the Java source code file, which has the *.java extension through javac utility that produce a *.class file from the original source code in which bytecode typically resides. Finally, by using javap, which is a built-n utility of the JDK toolkit, we can disassemble the bytecode from the corresponding *.class file. The javap utility stores its output in *.bc file.

Opening a *.class file does not mean that we access the entire implementation logic of a mechanism. If we try to open the generated bytecode file through notepad or any editor after compiling the Java source code file using javac utility, we surprisingly find some bizarre or strange data in the class file which are totally incomprehensible. The following figure displays the .class files data:

So the idea of opening the class file directly isn't at all successful, hence we shall use WinHex editor to disassemble the bytecode, which will produce the implementation logic in hexadecimal bytes, along with the strings that are manipulated in the application. Although we can reverse engineer or reveal sensitive information of a Java application using WinHex editor, this operation is sophisticated because unless we have the knowledge to match the hex byte reference to the corresponding instructions in the source code we can't obtain much information.

Reversing bytecode

It is relatively easy to disassemble the bytecode of a Java application, compared to other binaries. The javap in-built utility that ships with the JDK toolkit plays a significant role in disassembling Java bytecode, as well as helping to reveal sensitive information. It typically accepts a *.class file as an argument, as following:

[java]

Drive:> Javap LoginTest

[/java]

Once this command is executed, it shows the real source code behind the class file; but remember one thing: It does display only the methods signature used in the source code, as follows:

[java]

Compiled from “LoginTest.java”

public class LoginTest

{

public LoginTest();

public static void main(java.lang.String[]);

static boolean verify(java.lang.String, char[]);

}

[/java]

The entire source code of the Java executable, even if it contains methods related to opcodes, would be showcased by the javap –c switch, as following:

[java]

Drive:> Javap –c LoginTest

[/java]

This command dumps the entire bytecode of the program in the form of a special opcode instruction. The meaning of each instruction in the context of this program will be explained in a later section of this paper. I have highlighted the important section, from which we can obtain critical information.

[java]

Compiled from "LoginTest.java"

public class LoginTest {

public LoginTest();

Code:

0: aload_0

1: invokespecial #1 // Method java/lang/Object."<init>":()V

4: return

public static void main(java.lang.String[]);

Code:

0: invokestatic #2 // Method java/lang/System.console:()Ljava/io/Console;

3: astore_1

4: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

7: ldc #4 // String Login Verification

9: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

12: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

15: ldc #6 // String ************************

17: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

20: aload_1

21: ldc #7 // String Enter username:

23: iconst_0

24: anewarray #8 // class java/lang/Object

27: invokevirtual #9 // Method java/io/Console.printf:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/Console;

30: pop

31: aload_1

32: invokevirtual #10 // Method java/io/Console.readLine:()Ljava/lang/String;

35: astore_2

36: aload_1

37: ldc #11 // String Enter password:

39: iconst_0

40: anewarray #8 // class java/lang/Object

43: invokevirtual #9 // Method java/io/Console.printf:(Ljava/lang/String;[Ljava/lang/Object;)Ljava/io/Console;

46: pop

47: aload_1

48: invokevirtual #12 // Method java/io/Console.readPassword:()[C

51: astore_3

52: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

55: ldc #13 // String -------------------------

57: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

60: aload_2

61: aload_3

62: invokestatic #14 // Method verify:(Ljava/lang/String;[C)Z

65: ifeq 79

68: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

71: ldc #15 // String Status::Login Succesfull

73: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

76: goto 87

79: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

82: ldc #16 // String Status::Login Failed

84: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

87: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

90: ldc #13 // String -------------------------

92: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

95: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

98: ldc #17 // String !!!Thank you!!!

100: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

103: return

}

[/java]

From line 62, we can easily conclude that the login mechanism is implemented using a method called verify that typically checks either the user-entered username and password. If the user enters the correct password, then the "Login success" message flashes, otherwise:

[java]

62: invokestatic #14 // Method verify:(Ljava/lang/String;[C)Z

65: ifeq 79

68: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

71: ldc #15 // String Status::Login Succesfull

73: invokevirtual #5 // Method java/io/PrintStream.println:(Ljava/lang/String;)V

76: goto 87

79: getstatic #3 // Field java/lang/System.out:Ljava/io/PrintStream;

82: ldc #16 // String Status::Login Failed

[/java]

But still we are unable to grab the username and password information. But, if we analyze the verify methods instruction, we can easily find that the username and password are hard-coded in the code itself, highlighted in the colored box as following:

[java]

static boolean verify(java.lang.String, char[]);

Code:

0: new #18 // class java/lang/String

3: dup

4: aload_1

5: invokespecial #19 // Method java/lang/String."<init>":([C)V

8: astore_2

9: aload_0

10: ldc #20 // String ajay

12: invokevirtual #21 // Method java/lang/String.equals:(Ljava/lang/Object;)Z

15: ifeq 29

18: aload_2

19: ldc #22 // String test

21: invokevirtual #21 // Method java/lang/String.equals:(Ljava/lang/Object;)Z

24: ifeq 29

27: iconst_1

28: ireturn

29: iconst_0

30: ireturn

}

[/java]

We finally come to the conclusion that this program accepts ajay as the username and test as the password, which is mentioned in the ldc instruction.

Now launch the application once again and enter the aforesaid credentials. Bingo!!!! We have successfully subverted the login authentication mechanism without even having the source code:

Bytecode instruction specification

Like Assembly programming, Java machine code representation is done via bytecode opcodes, which are the forms of instruction that the JVM executes on any platform. Java bytecodes typically offer 256 diverse mnemonic and each is one byte in length. Java bytecodes instructions fall into these major categories:

  • Load and store
  • Method invocation and return
  • Control transfer
  • Arithmetical operation
  • Type conversion
  • Object manipulation
  • Operand stack management

We shall only discuss the opcode instructions that are used in the previous Java binary. The following table illustrates the usage meanings as well as the corresponding hex value:

Java Opcodes Meaning Hex value

Aload Load a reference onto the stack from a local variable

19

Aload_0 Load a reference onto the stack from local variable 0

2a

Aload_1 Load a reference onto the stack from local variable 1

2b

Aload_2 Load a reference onto the stack from local variable 2

2c

anewarray Create a new array of references of length count and component type identified by the class reference index in the constant pool.

bd

Astore Store a reference into a local variable

3a

astore_0 Store a reference into local variable 0

4b

astore_1 Store a reference into local variable 1

4c

astore_2 Store a reference into local variable 2

4d

dup Duplicate the value on top of the stack

59

getstatic Get a static field value of a class, where the field is identified by field reference in the constant pool index

B2

goto Goes to another instruction at branch offset

A7

invokespecial Invoke instance method on object objectref, where the method is identified by method reference index in constant pool

B7

invokestatic Invoke a static method, where the method is identified by method reference index in constant pool

B8

invokevirtual Invoke virtual method on object objectref, where the method is identified by method reference index in constant pool

B6

ifeq If value is 0, branch to instruction atbranchoffset

99

Iconst_0 Load the int value 0 onto the stack

03

Iconst_1 Load the int value 1 onto the stack

04

ireturn Return an integer from a method

ac

ldc Push a constant index from a constant pool

12

pop Discard the top value on the stack

57

return Return void from method

B1

In brief

This paper illustrates the mechanism of disassembling Java bytecode in order to reveal sensitive information when you do not have the source of the Java binary. We have come to an understanding of how to implement such reverse engineering tactics by using JDK utilities. This article also unfolds the importance of bytecode disassembling and JVM internal workings in the context of reverse bytecode and it also explains the meaning of essential bytecode opcode in detail. Finally, we have seen how to subvert login authentication on a live Java console application by applying disassembly tactics. In the forthcoming paper, we shall explain how to patch Java bytecode in the context of revere engineering.

Sources

Demystifying java internals introduction

Ajay Yadav
Ajay Yadav

Ajay Yadav is an author, Cyber Security Specialist, SME, Software Engineer, and System Programmer with more than eight years of work experience. He earned a Master and Bachelor Degree in Computer Science, along with abundant premier professional certifications. For several years, he has been researching Reverse Engineering, Secure Source Coding, Advance Software Debugging, Vulnerability Assessment, System Programming and Exploit Development.

He is a regular contributor to programming journal and assistance developer community with blogs, research articles, tutorials, training material and books on sophisticated technology. His spare time activity includes tourism, movies and meditation. He can be reached at om.ajay007[at]gmail[dot]com