Incident response

Network traffic analysis for IR: Data exfiltration

Introduction

Understanding network behavior is a prerequisite for developing effective incident detection and response capabilities. ESG research has found that 87 percent of companies use Network Traffic Analysis (NTA) tools for threat detection and response capabilities, and 43 percent say that NTA is their first line of defense for that purpose.

Network communication is one of the channels that cybercriminals use for data exfiltration. They can use HTTP or FTP to send files in order to trick incident response (IR) teams analyzing network traffic into thinking that the communication taking place is legitimate. The hackers, alternatively, can use the TOR browser to mask location and traffic.

Learn Incident Response

Get hands-on experience with incident response tools and techniques as you progress through nine courses.

Get Started

The IR teams working in a Security Operation Center (SOC) are always ready to counter data exfiltration using NTA tools and other prevention techniques. In this article, we will learn about data exfiltration, how hackers steal your data, how dangerous data exfiltration is, exfiltration distribution techniques, malicious tactics used to increase sophistication and potential remedies to thwart data exfiltration.

What is data exfiltration?

Data exfiltration is the act of illegally transferring critical data and/or information from a targeted network to the hideouts of the cyber pests. Detecting data exfiltration is a daunting task, as data routinely moves in and out on networks and this nefarious technique closely resembles normal network traffic.

How do attackers steal your data using network traffic?

To infiltrate a network, threat actors mostly use Advanced Persistent Threats (APTs) and botnets, both high-risk threats, to perpetrate data exfiltration. Before actual data exfiltration, attackers find their targeted information using various data collecting and monitoring tools. Usually, threat actors utilize a mix of malicious and legitimate tools and methods to extract vital data from the victim’s machine(s), such as using various internet protocols to send a vast amount of traffic to targeted machines.

Let’s take a quick look at these tools and techniques.

File Transfer Protocol (FTP)

FTP is a network protocol used to transfer data between a server and a client. FTP does not ensure any sort of protection to data integrity. Nevertheless, it is used as a reliable protocol for transferring files.

To perform data exfiltration using FTP, adversaries authenticate to an external FTP server from a compromised host within a corporate network. Inefficient firewall rules in many organizations’ networks are unable to prevent outbound connections and allow hackers to easily establish a connection back to their own malignant infrastructure. Additionally, most operating systems include a default FTP client; therefore, malicious actors don’t have to install any additional tools on an exploited host.

HyperText Transfer Protocol (HTTP)

HTTP is an application layer protocol used to transmit information between a client and a server. Web browsers use HTTP to access websites and communicate with web servers.

Cybercriminals use HTTP as a source for data exfiltration. To this end, they employ several adaptive attacks to actively hide their HTTP communication in legitimate network traffic. Subsequently, threat actors sniff the traffic of an infected host and construct the model for an observed communication, which is known as a template. Once it is done, malicious actors transform their own communication in such a way that it fits the template, thereby adapting to benign traffic.

Windows Management Instrumentation (WMI)

The WMI can be employed to check files opened by targeted users or employees. Once it is done, attackers can collect these files to transfer data.

The following list of APTs can use WMI:

Cobalt Strike can utilize WMI to deliver a payload to the remote host
Deep Panda group uses WMI for lateral movement
BlackEnergy uses WMI to collect details of the victim host
APT32 group employs WMI to implement their tools on remote computers and to glean information about the Outlook processes
APT29 group uses WMI to steal credentials and execute backdoors

What are some potential exfiltration distribution techniques?

Three techniques for exfiltration distribution are described below.

Random distribution

This technique randomizes the data into data streams sent to the hackers. Random connections will be opened, and a random amount of data is transferred to increased malicious servers. In this case, recognizing patterns is very difficult for detection mechanisms. However, hackers have to reconstruct the data on the receiving end.

Round-robin distribution

Using this type of distribution, attackers using a piece of malware send each stream of packets having exfiltrated data to different suspected servers. Upon reaching the last server, the next stream of packets will be sent to the first server and then continues its cycle in a round-robin distribution fashion.

Single-server distribution

This distribution technique allows sending traffic onto a single server, rather than over different servers. Therefore, it doesn’t add any covertness to the exfiltration attempt.

Detecting data exfiltration using network traffic analysis

Behavior-based approach

In their research paper “Behaviour Model for Detecting Data Exfiltration in Network Environment,” Rajamenakshi et al presented a behavior-based data exfiltration approach. Researchers made an experiment to detect abnormal behavior in the network traffic.

Data exfiltration occurs to and fro from hosts in a network. The data transfer from hosts can be either outside the network or within the network. In both cases, the massive amount of data is transferred from infected hosts. A huge amount of data transfer has two impacts on the parameters of the host systems:

Memory utilization
CPU utilization

Usually, on most hosts, the incoming data is greater than the outgoing data. Nevertheless, data transfer depends on host behavior. For example, desktop hosts don’t transmit more data outside the network, whereas this is not the case when it comes to servers like web or FTP.

To detect data exfiltration, researchers analyze the network traffic and bring out many vital parameters about each host using the SNMP and computing the KDE values. The KDE or Kernel Density Estimation helps in detecting data exfiltration.

Researchers worked in two phases: the learning phase and the detection phase. During a learning phase, they analyze each host in the network and compute KDE values individually for network and system parameters. In the detection phase, they compute KDE values for the identified parameters and then correlate current KDEs with the learned KDEs. They do this using Carl Pearson's correlation coefficient technique, which is used to detect data exfiltration over the network. In the event of weak correlation, researchers term it an anomaly.

In addition, researchers also noted the likewise variations of other parameters to confirm data exfiltration anomalies. For instance, if CPU utilization indicates anomalies, then researchers observe if the correlation of the outgoing network traffic’s KDEs is also anomalous during the same time. If yes, then these anomalies indicate a data exfiltration attack.

Signature-based approach

To exfiltrate data, attackers mostly use a Command & Control (C&C) channel, which is like a client-server architecture and provides a remote-connectivity between threat actors and compromised host(s). Because C&C can have a legitimate use, it cannot be disregarded due only to the possibility of its malicious use.

The signature-based data exfiltration detection method is used to detect malicious C&C channels by looking for known patterns of signatures. Developers generate signatures based on known malware samples, and new network traffic is compared to these signatures to identify attacks. If signatures match with new network traffic, then such traffic is classified as the C&C traffic.

Before creating signatures, developers analyze confirmed C&C traffic gleaned from numerous sources such as sandboxes and honeynets. A malware log is collected by running it in a controlled condition. This malware log and its behavior are incorporated into the signature.

In the event of data exfiltration using a C&C channel, the signature-based approach will be effective and detect the attack before the actual damage.

How dangerous can data exfiltration be?

Data is said to be the digital currency of organizations. Data is of paramount importance when it comes to national security, military secrets and compliance standards. Unauthorized data disclosure or compromise of Personally Identifiable Information (PII) can pose huge financial and reputational damage to a victim company. For example, the General Data Protection Regulation (GDPR) imposes a fine of 4% of annual global turnover or €20 million to those enterprises who infringe GDPR terms.

Attacks can target national secrets. Attackers in a shadow network attack lifted out secret, restricted and confidential documents related to the computers in government offices on several continents, including the confidential embassy documents about India’s relationships in Russia, West Africa and the Middle East.

Military secrets are the crown jewels of valuable data, and their disclosure can be highly detrimental to the concerned country or beneficial for an enemy country. For example, hostile actors in one country can use data exfiltration to know where the major troops and ballistic missiles of the enemy country are deployed.

Conclusion: The way forward

As we’ve seen, data exfiltration involves both technical methods and some degree of sophistication on the part of attackers. They can use FTP, HTTP, WMI and other communication channels to infiltrate your network to steal your critical data. Hackers use data obfuscation, packet adaptation and stream adaptation to stay undetected.

Nevertheless, several techniques have been developed to detect data exfiltration. To this end, techniques in the spotlight include the behavior-based approach and the signature-based approach. Whether detection techniques like these will slow this dangerous data-stealing method remains to be seen.