Decrypting Malware Command and Control
Encryption and Encoding in Malware
Network traffic analysis is a common approach to cyber defense. By performing deep packet inspection of all inbound and outbound traffic, an organization can potentially identify the signature of malware or other cyberattacks within the traffic.
Malware authors don’t want to have their traffic detected and analyzed by an organization’s security team; that analysis would speed up identification and eradication of a malware infection. To help hide their presence on a system, malware authors will use encryption and encoding to protect their traffic against casual inspection.
Decoding Malware C2 Traffic
If a malware author used strong encryption for their network traffic, like the algorithms used in ransomware for file encryption, then traffic analysis would provide little or no benefit. However, many malware authors use the simplest tools available, leaving room for analysis.
To demonstrate this, let’s take a look at a sample from the command and control traffic of a malware sample.
The above packet does not show any intelligible or readable information. However, it does have certain features that make it possible to identify the encoding algorithm used to conceal it:
- Limited Alphabet: The sample only consists of alphanumeric characters
- Trailing Equal Sign: The last character of the sample is an equal sign.
These two features are indicative of Base64 encoding. Base64 encoding algorithms use an alphabet of alphanumeric characters and two special characters (not present in the above sample), and its padding algorithm results in one or two trailing equal signs if the length of the plaintext is not a multiple of three. Thus, this traffic sample is likely Base64 encoded.
Reversing the Encoding
To test this theory, we can use the base64 library in Python.
The code for doing so is shown above. Note that the b64decode function is imported from the library as d. This function d is then used to decode the sample collected from the network traffic.
Analyzing the Results
Running the Python code above creates the following results.
Looking at this output, we know that the decoding was likely successful. The algorithm had no issues with the length of the input, padding, etc.
Looking at the results, one potential interpretation of the results is that this is a file. The sample starts with the characters MQ, which may be a file’s magic number.
The string MQ is recorded as 4D 51 in hexadecimal (ASCII encoding). Looking this up in a file signature search engine (as shown above) reveals that it is not a known magic number.
However, further analysis of the file reveals repeated structural elements. The series \n@UU appears multiple different times in the result, revealing that this is structured data and not random noise generated by a failed decoding or encrypted data. This likely means that the MQ is some type of command code and the rest of the result is the collected data.
Analysis of another traffic connection (shown above) to the same host from this PCAP seems to support this conclusion. The Base64 encoded value decodes to A1\x89\x01\x14@\x07. This is very similar structurally to the first line of the original message (MQ\x06\x01\x16@\x12\x0e\x0e).
The Limits of Cryptanalysis and NTA
Analysis of the network traffic sample revealed quite a bit of data about how the malware communicates with its server. Looking at these results suggests that the requests and responses follow a certain format, which is protected by Base64 encoding.
However, without further information, going further than this is not possible. Analysis of the malware itself would be required to understand the C2 structure and the meaning of the observed traffic.