Using Base64 for malware obfuscation
What is Malware?
Malware stands for malicious software and software, in simple language, means some program written in any programming language. So if a malicious program is intentionally written to cause damage to any computer or server or gain unauthorized access to any system, it is called malware.
What is Obfuscation
Obfuscation is the most commonly used technique to conceal the original code written by the programmer, rendering the executable code difficult to read and hard to understand while maintaining the functionality of the written code.
Malware obfuscation techniques
There are many obfuscation techniques being used by malware writers like Base64, Exclusive OR (XOR), ROT13, Dead code insertion, Instruction changes, Packers etc.
In this post, we will be focusing on Base64 obfuscation technique.
Base64 is a simple malware obfuscation technique. The very reason why Base64 encoding is used is because using Base64 it is possible to encode binary data to ASCII string format. Thus, attackers encode data in base64 format and send it over HTTP Protocol. Base64 allows only 64 characters for encoding, hence the name. The characters are –
“=” is used for padding.
Base64 encoding method
You can refer below the Base64 table for converting normal strings to base64 encoding. As per the table, 0 corresponds to letter ‘A’, 45 corresponds to letter ‘t’, / corresponds to ‘63’ and so on.
Encoding and decoding Base64
There are many tools and online websites available to encode and decode base64 strings.
One can use following URL to encode and decode base64 string –
In our case, we will be using Python to encode and decode the base64 string. For example let’s try encoding and decoding “InfosecInstitute”
Encoding – Here is how encoding is done using python. Open the python terminal and run following commands –
>>> import base64
>>> plain_text = “InfosecInstitute”
>>> encoded = base64.b64encode(plain_text)
>>> print encoded
This is how simple it is to encode a base64 string.
Decoding – Here is how decoding is done using python. Open the python terminal and run following commands –
>>> import base64
>>> encoded = “SW5mb3NlY0luc3RpdHV0ZQ==”
>>> decoded = base64.b64decode(encoded)
>>> print decoded
This is how simple it is to decode the base64 string.
It is not difficult to identify base64 strings in the binary or network traffic. Base64 encoded letters are usually a long string which comprises base64 characters set (Alphanumeric characters, + and /). If you come across a long string chances are high it may be base64 encoded strings. Another simple technique is to check == present in the long string.
Example – SW5mb3NlYw==
The above string ends with ==. Usually base 64 strings end with == where = is used for padding.
Another method to identify base64 is to use the YARA rule. Here is the sample YARA rule to identify Base64 encoded strings –
$a or $b