Introduction

Data collection and analysis for use by network engineers, security professionals and incident response has only exploded over the years with the growth of cloud-based services, mobile devices and tablets, remote workforces, interconnected applications and global enterprises. In fact, research has found that 41 percent of organizations claim that they were collecting significantly more network data for security analysis than they knew how to process. The same research found 49 percent of organizations had trouble correlating security issues with network performance. At the same time, cyberattacks are becoming more and more complex, sophisticated and tailored.

However, despite all of these complexities, the fundamental role of data collection, processing and analysis in incident response and security monitoring is unchanged, playing a crucial role in identifying and dealing with network intrusion. Instead, organizations have begun to utilize additional categories or types of network data that could be collected. This allows security professionals to gain deeper insight into their network’s activity, measure its security and make sense of otherwise overwhelming levels of data in order to detect cyberattacks.

Role of data collection

While most of the data that passes through network devices is of no value to network engineers, let alone security professionals, there are key pieces that contain vital information that should be collected, processed, protected and analyzed. Proactively, security professionals can employ real-time monitoring, testing and analysis of network data to help to identify network vulnerabilities, measure performance, evaluate service levels and even initially detect anomalous activity. 

Despite the structured nature of individual network data, different security threats, attacks and intrusions can cause differences in the type, amount, source and destination of network traffic. Combined with endpoint or host performance data, threat intelligence, application data and information from security products, security analysts have ample data from which to predict network threats, enhance network security and take action in the event of a threat.

Types of collection

Whether for training, testing, detection or incident response, the type of data produced by network components and internal and external hosts and selected for network analysis can vary by need. Further complicating network traffic data collection and analysis is the quality of the data collected, the tools available to record and structure it and the ability to effectively and efficiently analyze it. 

For all of these reasons, organizations and security professionals need to have a strong grasp of the types of data available for collection and, ultimately, what the data can tell them about what is occurring or has occurred within the network.

In the following section, we will explore the different categories of data, their potential applications and usages and which types of devices produce it. 

Packet-based

Packet-based data is based on the Transmission Control Protocol (TCP), User Datagram Protocol (UDP) and Internet Control Message Protocol (ICMP) protocols, among others. Packets carry a lot of data useful for network traffic analysis, and this is often the most common method of data collection because of the wide range of devices that use it, ranging from mobile phones to large-scale enterprise networks. 

Information is divided at the packet level, encapsulated into packets and paired with addresses and headers before it is sent to a destination node where it is decoded and executed. The header guides the packet in transit in a network and can be used to identify and filter packets as they flow through to a network, using information such as IP addresses, ports and protocols. Inside the packet, information can be sent “in the clear” or encrypted.

On the other hand, the sheer volume of packets flowing through a network can easily overwhelm analysts if they do not know that they are looking for, especially if they do not have a tool (e.g., Wireshark) to help structure and visualize the data. Otherwise, a simple collection of network packets at a physical interface, such as with WinPcap or TCPdump can hinder efficient and effective analysis

Given the sheer volume and complexity of packet-level data, the following components can be the most helpful to capture and review when network traffic analysis is necessary:

Source and destination IP addresses

The source and destination IP address is the fundamental aspect of packet transmission protocols and, when collected, can be used to identify a baseline of “normal” traffic. Packet capture that includes a review of IP addresses, for example, can point toward distributed denial-of-service attacks or botnets where source IP addresses for traffic occurs outside of normal IP ranges or are more concentrated when compared to legitimate users. 

Source and destination port

Packets travel to different ports based on their purpose or protocol, and analysis of their frequency overtime can form a baseline to monitor for anomalies. When irregular port scans occur or when you see unusual traffic to or from a port, security analysts can use this strange network behavior to trigger additional research. 

Packet content

As mentioned before, there are two pieces to a packet: a packet header and the packet content, both of which can be collected and analyzed for network security purposes. Packet headers can be scanned by antivirus software to, for example, monitor whether the abnormal levels of network activity are targeting applications known to be vulnerable or including unusual source IP information. 

On the other hand, packet content can be reviewed with the goal of identifying whether malicious code is present or if unusual application commands are included, which can point toward external cyberattacks. However, packet content inspection can be limited by the transmission protocol of the packet and the types and location of security and monitoring devices used. While protocols like HTTP and DNS include packet information sent “in the clear,” traffic sent via Secure Sockets Layer (SSL) and Transport Layer Security (TLS) encrypts data, making packet content analysis slower and more difficult.

Packet size 

The size of a packet can also be useful for network monitoring and security analysis. As with other packet-based data, network behavior usually follows logical and consistent patterns. It is the same for the size of the information within the packet, measured in bytes. 

Because packet size can vary by the source of the packet, the collection of packet size data sent to or from certain locations within a network can be used to detect attacks that fall out of usual bounds. The same can occur for large cumulative packet sizes over small windows of time or outside of normal periods of network activity.

Packet quantity

A final packet-based data type is the quantity of packets sent to or from a network as well as the types of packets sent. For the former, a large number of incoming packets can point toward denial-of-service flooding attacks, while large numbers of outgoing packets to unusual destination IPs can flag for the potential of data exfiltration or network probing. 

The latter, packet types, focuses on the different protocols networks use to function, including TCP packets and the flags sent (e.g., SYN, ACK, FIN and so on), ICMP request/reply packets, unusual UDP packets (e.g., TFTP and DNS) and others. The amount of network activity organized by these different packet and protocol types can, when compared to IP addresses, be used to investigate for potential malicious network behavior if they fall outside of normal activity.

Flow-level data

Packet-level data collection narrows the focus of network analysis to a very micro level. Add in the fact that today’s enterprises have network speeds measured in the hundreds of gigabytes per second, complex application environments and data encryption, performing packet analysis alone can cause analysis to miss a lot of environmental and contextual data that can aid incident detection and remediation. 

Therefore, the practice of flow-level data collection has matured, providing a macro-level view of network activity, where groups of packets that share similar destinations, sources, protocol types or other attributes listed in the packet’s header are analyzed together. When related packets are analyzed together, flow analysis can aid in monitoring network performance, application health or host activity as well as flagging for unusual network traffic that may point toward a potential intrusion. 

When using flow-level data collection, organizations need to determine both where the data collection will occur and what the scope of that data collection will be. First, while flow data collection can occur at any point within a network and can even be conducted in multiple network points simultaneously, it is usually most effective when paired with network edge nodes where data passing in and out of a local area network can be monitored. 

At the same time, flow data can be collected either with a “depth-first” method or “breadth-first.” The former narrows the type of flow data collected to match certain criteria present in packet headers, while the latter seeks to collect as much information as possible in order to obtain a broad view of overall network activity. 

Once collected, flow-level data can be organized and reviewed according to the following classification types: 

Flow count

Flow count tracks the frequency of different types of flows travelling through a network, classified by the flow key or attribute selected. For example, data can be organized by packet source address, destination address or protocol type (e.g., DNS, TCP, HTTP) so network behavior over time can be evaluated. If there are abrupt changes or unusual activity, this can be a flag to initiate a more detailed investigation.

Flow size

Similar to flow count, flow size aggregates the quantity and content size of the packets collected as part of the flow. During a flooding attack, for example, a dramatic increase in the flow size will be identifiable. 

Flow direction

Flow direction can help network analysts to monitor how quickly and how much data is moving into or out of a network. 

Flow duration

The length of the flow, in combination with its frequency, can help to identify if a particular instance of network activity is a brief, innocuous network scan or web crawler or a more malicious network probe.

Flow rate

Flow rate measures how fast group packets associated with a certain flow move. As with flow size, flow rate can point toward an impending denial-of-service attack or other malicious activity if flows are higher than normal. 

Connection-based data

Flow-based and packet-based data collection provide more comprehensive network information for review, but connection-based can provide a deeper level of understanding about the nature of network traffic between specific network devices. 

In particular, connection-based data is the aggregate of network traffic between two specific IP address, one internal and one external — or inflow and outflow. It provides a level of granularity of information often flagged by other collection methods or analysis.

Analysis of traffic between two specific points can be tracked and evaluated by several different methods:

Connection duration

Connection-based data can be evaluated by duration or a certain time, such as the amount of data transmitted between a specific time period or comprehensively during certain hour periods over several days or weeks. This information can be useful to identify the intent of the communication or whether it falls outside of the usual patterns.

Connection size

Connection data can also be sorted by the size of the individual packets transmitted or the overall size of the flow of data between two hosts. Tracking this type of data can help to distinguish between normal data flows and instances where external payloads may be incoming or unusual amounts of data are moving out of a network. 

Connection count

The connection count of a particular IP address assigned to a host can help to monitor the number of connections between that host and others within a certain time period. A change in the number of connections or new, unexplained connections can be the signal the need for deeper investigation.

Connection type

Finally, connection data can be evaluated by the type or protocol of the traffic transmitted, including TCP, UDP and ICMP. By comparing this information with other data such as flow-based or packet-level data, network analysts can begin to distinguish between potentially malicious activity and normal traffic. 

Host-based data

The last type of network traffic that can be collected and used for analysis is host-based data. Unlike the previous types of data, host-level data includes information about device and network activity stores on individual clients. Functions such as Loadrunner for Windows and Collectl for Linux can provide useful information about system events like configuration changes, computing resource usage and system behavior. 

As network attacks often seek to change or exploit hosts, Host-based Intrusion Detection Systems installed on individual clients can be used to flag for unusual activity like privilege escalation, login attempts, directory changes and data sent/received. 

Computer resource usage

Collecting and reviewing CPU and memory usage can provide useful information about the behavior of individual clients present in a network. CPU and memory usage at both an enterprise level and the individual host level that is outside of normal bounds can point toward the potential for data exploitation, distributed denial-of-service attacks or targeted attacks on individual users.

System logs

Events such as user logins, attempted logins, directory updates or changes, file modification and even instances of media being inserted into clients can all be recorded at the host level. For each user or application action, a log entry in an event or message log is created, including the date, time, user and associated action. Therefore, when a device begins to fail or service degrades, a review of event logs can help to diagnose the cause. This can point toward situations such as missing patches, unauthorized connections, abnormal port activity, unauthorized system usage and data calls that may fall out of normal bounds.

Bringing it all together

Every phase of network data collection can be a challenge for analysts, ranging from identifying when and where to capture network traffic and what type of data should be collected to when that information actually points to potentially malicious behavior. However, the field of network traffic analysis has matured to offer analysts several different methods to collect, classify and structure data and the tools needed to process that information to efficiently and effectively make sense of what is an otherwise overwhelming amount of data. 

 

Sources

  1. Xuyang Jing, Zheng Yan, Witold Pedrycz. “Security Data Collection and Data Analytics in the Internet: A Survey,” IEEE
  2. Network visibility and monitoring tools now amp up security, SearchSecurity

Be Safe

Section Guide

Patrick
Mallory

View more articles from Patrick

As you grow in your cybersecurity career, Infosec Skills is the platform to ensure your skills are scaled to outsmart the latest cyber threats.

Section Guide

Patrick
Mallory

View more articles from Patrick