Network traffic analysis for IR: Data analysis for incident response
While no incident is the same, security professionals have come to rely on pre-established procedures and best practices to help contain a security breach and recover from it. Having an incident response plan in place is also a requirement to remain compliant with regulations ranging from HIPAA to PCI-DSS, while its practical application is the subject of all major cybersecurity professional certifications.
One of the most common incident response plans is presented by the National Institute of Standards and Technology (NIST). While there are many others, they all often share many of the same components as that recommended by NIST:
- Detection and analysis
- Post-incident activity
The development, implementation and maintenance of an incident response plan, as well as the various roles played throughout an organization, can each be their own university course. But one skill in particular — data analysis — is often overlooked in the role that it can play throughout the incident life cycle.
Therefore, the goal of this article is to highlight the different ways that security professionals utilize and analyze data throughout the incident response process and provide recommendations for organizations to follow so they can be better prepared to use the information available to them.
The role of data analysis in incident response
A common theme across incident response plans is the role of preparation: not just in creating and maintaining an incident response procedure, but also having the tools and systems in place to help an organization prevent, detect and mitigate the negative effects of an incident.
Although many of these components can be considered more of a “science,” the way that data can help to inform, validate and support decision-making and highlight various parts and phases of incident response can be more of an “art.” Because of this, this section seeks to move through each phase of incident response and highlight how data can be used to accomplish each objective.
Preparing to handle incidents
Long before data analysts can play a role in incident response, they first need to have a grasp of the tools and resources available to them that may be useful in the event of a breach. While every organization and analyst is difference, the list below can be used as a starting point to establish a data analysis component to an organization’s incident response plan, which is separate from resources that would be used for the continuity of operations plan.
To start, each member of the incident response team, including an individual in a data analysis role, should have a dedicated laptop. This laptop should have the software needed to perform package analysis, malware analysis and database manipulation separate than their daily workstation. Having a separate device can help to keep data involved in an incident separate from other working data to help prevent confusion as well as the spread of any malicious code.
Other key information that should be readily available for a data analyst include:
- Network diagrams updated to display assets and network devices
- Lists of ports in use at the organization and their uses
- Application documentation including key configurations, user access lists, APIs, databases, operating systems, security tools and so on
- Network performance data that displays expected network and application uptime and behavior
- Access to an enterprise issue tracking system (e.g., help desk)
- Change management logs
Data analysis can also play a role in preventing an incident from occurring in the first place. There are several key tools and exercises that organizations can undertake in order to better understand their threat environment, their vulnerabilities, their assets and if their security posture is sufficient to mitigate incidents from occurring. One of the most important is a risk assessment, in which data analysts can assist in collecting, organizing, analyzing and presenting organizational data to make it actionable.
Risk assessments should be periodically conducted to make sure they accurately reflect the current state of an organization’s assets, network and security. This information is then combined with a current understanding of threats and vulnerabilities to organizational assets to produce risks. These risks should then be prioritized and paired with potential quantifiable impacts so organizations can make informed decisions on whether to mitigate, transfer or accept each risk.
Detecting an incident
While many organizations have intrusion detection systems, firewalls and other security tools in place, identifying when an incident is actually occurring can still be a challenging process. The same goes for determining the type of incident, the extent of its reach and the scale of its possible impact.
When combined with other security professionals and system alerts, data analysts can help to sift through network device and security system logs or problems identified by users to assist in identifying anomalies that may point to malicious behaviors, also known as indicators. Because organizations may generate dozens or even hundreds of alerts each day, having the ability to organize and triage this data can improve the time they are able to respond in the event of a cyberattack.
While not every incident will have indicators that are immediately identifiable or detectable, organizations should have tools and resources in place to help to identify possible precursors that may give them a chance to prevent the incident from occurring or to catch it early. This information can also be supported by publicly-available threat information, user-reported data and other software logs. Example of key metrics that data analysts and security professionals should be collecting and monitoring include:
- Network device performance data and traffic flow to identify anomaly activity, such as hosts that usually do not communicate together or large amounts of data being exfiltrated
- Web server and database logs that monitor user account access and activity
- Intrusion detection and prevention systems with sensor- or signature-based alerts tuned to key servers and hosts
- Event flagging with the collection of related data, including the time, date, type of probe, the source and destination IP addresses and network devices, as able
- Antivirus software with alerts set to detect attempts to infect systems with malware
- Anti-spam software that captures spam, phishing attacks and malware from reaching user email boxes
- Configuration management tools or file integrity software that records and alerts to system, operating system and application changes
- Data loss prevention tools that flag for keywords, PII or other data in transit leaving the network
On its face, incident analysis seems like a straightforward process, but every event and the types and amount of data available for each can vary widely. Adding to the frustration and confusion, security analysts often have to spend time and resources investigating everything from false positives that misconfigured software alert to through to user error or user complaints that turn out to be nothing. Further, even if an indicator or alert is accurate, an incident can also turn out to be a piece of equipment or configuration operating improperly.
Needless to say, unless every precursor, indicator and alert is working properly, it falls on security analysts to investigate each incident with whatever data and time they have available to them.
Of those that turn out to be real security incidents, even those have a wide spectrum when it comes to ease of detection and impact. A defaced web page or ransomware is obvious when implemented, but other intrusions can begin with something as benign has a port scan, invalid user login attempt or a minor system configuration change. While technology has evolved to help make the process of detection easier, it often comes down to a team performing data analysis efficiently so proper actions can be taken as quickly as possible.
This process or series of phases should be performed consistently each time an incident is detected and documented for future reference. First, initial analysis should be conducted to capture the scope of the incident, including:
- The applications, networks and systems affected
- The origin of the incident (e.g., a user or system alert)
- The nature of the incident (e.g., network outage, data availability, login attempts)
- Which systems or vulnerabilities are being exploited
This initial analysis should be comprehensive enough to allow for future prioritization of the incident. It should also be filed for staff to be able to reference it again in the future if a similar incident occurs.
To support this initial phase, data analysts will need to have a baseline from which to compare the expected incident activity against what is “normal.” The following baselines are recommended to help make incident validation easier:
- Network devices and performance: Data that captures the stability of system/file integrity (e.g., checksums), bandwidth usage over time and common host communications and port activity
- Security software alert activity: Incident detection systems, firewalls and other security tools have a logging capability that should be regularly reviewed so trends and false positives and false negatives can be understood during “normal” performance. Therefore, when unexplained activities or alerts occur, analysts can easily identify them
- Packet sniffers: If an incident is suspected, a packet sniffer can be used to monitor traffic flow over a certain range of devices to check for malicious activity that falls out of the norm. These tools can be tuned to collect data that matches certain criteria in order to aid in the analysis
In addition to these recommendations, security analysts should also establish and maintain a knowledge library that should be easy to search and reference during incident analysis. The knowledge base should have tag for device or application types, incident types, indicator types and other relevant filters, and can include documents and procedures to aid in triage or remediation.
Categorizing an incident’s severity and priority is a key part of the incident handling process. Although it could be tempting to handle incidents in a chronological order, both the variability of incidents and resource limitations prevent this from being a secure approach. Instead, organizations should utilize the available data to prioritize incidents by their severity and potential impact. This can be determined by several factors, which include:
- Operational impact: The number of business systems and/or business operations that are negatively affected by an incident. Impact can be calculated by downtime, frequency of occurrence, opportunity cost, lost revenue, lost customer reputation or costs to repair, among others
- Informational impact: Incidents can have the ability to affect the confidentiality, integrity and availability of the organization’s data or that of a partner, customers or payment provider
- Recoverability: The amount of time and resources that must be spent to recover from the incident
These factors can be calculated independently and considered collectively to determine a response the organization feels is appropriate for their operations and mission. Ultimately, incidents with a high operational impact and a low recoverability level are those to be prioritized while others with a high recoverability time, such as a massive data loss, need to be strategically considered by an organization based on their level of impact.
In the case of a data breach, incident response should focus on a holistic evaluation, impact mitigation, customer notification and the potential to prevent future breaches instead of quick, tactical actions.
Containment and recovery
Although incident remediation has reached the containment and recovery phase, data analysis can still play a key role. As soon as an organization is able, containment should occur to try to limit the scale and scope of damage to the network as well as giving incident handlers more time to devise the appropriate remediation steps.
This is where data analysis can help. A key part of containment and recovery is efficient and informed decision-making by organizational leadership and security professionals. Knowing if and when to shut down systems, which network components should be disconnected and what the impact of each move will be can assist in limiting operational damage.
These decisions can be made much more easily if there are not only predetermined incident containment and recovery strategies, but also data available to help justify them and inform potential downstream impacts. Data can also help to inform if organizational risk analysis has changed, which may mean adjustments in risk management.
Finally, after an incident is contained and eradicated and systems recovered, data analysis can be used to confirm that any malware or malicious activity is no longer on the network. Data can also be collected and compared to previously known baselines and to confirm proper system functions.
Once the buzz of incident response recedes, there is still a need for data analysts to remain active and prepare for the next security challenge ahead. One of the key activities is conducting a “lessons learned” exercise that captures the objective and subjective information and data for the incident, such as the time to response and the scale of the incident in cost. Over time, organizations can then consolidate these lessons learned documents to identify trends in security gaps, justify additional security resources or change organizational policies or procedures.
Ultimately, the information collected and evaluated should be actionable and meaningful. Possible data and information that could be presented include:
- Number of incidents handled, by type
- Time per incident, including:
- Total man hours spent working on the incident
- Elapsed time from the beginning of the incident through to the end of each stage of the incident response cycle
- The cause of the incident and its vector of attack
- What corrective actions could be taken to prevent a similar incident in the future?
- What resources could help to prevent, detect or mitigate a future incident?
- The systems affected
- The estimated monetary damage from the incident
Furthermore, organizations should establish policies and procedures surrounding the retention of data relating to an incident. Policy should align to local laws and best practices to ensure proper availability and chain of custody, in case it is needed as evidence or to conduct further forensic analysis of the malware or threat actor.
Conclusion: Bringing it all together
Of course, no organization wants to experience a data breach or cyberattack, but it is essential to be ready for one when — not if — it does occur.
While plenty of time and resources are spent toward having the right systems, tools, training and alerts in place so security professionals are better able to act when it does occur, the role that data plays in each aspect of incident remediation should not be overlooked. If properly leveraged, data can help organizations to prepare, act, contain and, most importantly, learn how to improve to be ready for the next attempt.
- Rodrigo Werlinger and David Botta, “Detecting, Analyzing and Responding to Security Incidents: A Qualitative Analysis“
- SP 800-61, Rev. 2: Computer Security Incident Handling Guide, NIST