CISSP: Incident management
This article is part of our CISSP certification prep series. For more CISSP-related resources, see our CISSP certification hub.
In the IT industry, incident management is the management of activities to detect, analyze, respond to, and correct an organization’s security situation. All the operational security measures that the CISSP certification exam establishes decrease the possibility of a security incident from occurring. Sadly, these events are still inevitable, no matter what precautions are taken. Because of the inevitability of security incidents, the CISSP caters to the need to use regimented and fully organized methodologies to identify and respond to such security events.
Incident response and handling are mostly associated with how an organization reacts to any security incident. Reporting on such incidents can be stressful. In these types of high-alert situations, the documentation tends to be overlooked while focusing on the resolution of the issue. It will become difficult to know whether this investigation will land in court of law or not.
Different organizations use different terms and phases associated with incident response processes. The NIST Computer Security Incident Handling Guide divides the incident response lifecycle into the following four steps:
- Detection and Analysis
- Containment, Eradication and Recovery
- Post-incident Activity
In the CISSP, the steps are further divided. The following eight steps are further subdivisions of NIST’s four points:
- Lessons Learned
Preparation can also be called the pre-incident phase. It involves the steps that are taken before an incident occurs. In other words, this is the time in which the team prepares for any incident. This can include training, defining policies and procedures, gathering tools and necessary software, procuring necessary hardware equipment, etc. This phase should include everything that can aid in faster resolution of an incident. An incident handling checklist is also prepared at this stage.
This is the primary and the most important step in the incident response process. Detection, sometimes also called the identification phase, is the phase in which events are analyzed to determine whether a compromise a security incident. The organization will not be in a great position to give a timely response to a security incident without a proper and effective detection and analyzing mechanism. This mechanism should include an automated and regimented operation to pull systems events/logs and bring them into an organizational context. Context while analyzing is important because it may prevent overlooking an event that might be important but not immediately apparent. The most important aspect of this phase is to deal with the situation quickly and decisively, whether the incident is currently occurring actually or has occurred in the past.
The response phase also called as containment phase. As the name suggests, this phase deals with actual interaction of the response team with the affected system. The intent is to try to contain further damage from occurring and affecting more systems. Responses depend upon the scenario, but general responses might include taking a system off the network, powering off the system, isolating traffic, and other related tasks. This phase typically starts with forensically backing up the system involved in the incident. Volatile memory capturing and dumping is also performed in this step before the system is powered off.
Bringing the systems down can have a negative impact on the organization, so receiving permission from management to take action as well as updating them on the severity of the incident is most important in this case. Stopping an incident from spreading is more important than curing it at the first place it is discovered.
Eradication is another name for the mitigation phase. It involves analyzing the incident, including understanding the root cause. This understanding can then help in cleaning the systems reliably and can also help in the implementation of security measures against future incidents. The system can then be returned to a stable state after the cause is known, preferably without risk of reoccurrence. It should be noted at this stage that the obvious malware removal is not sufficient until the cause is known and understood.
After the cause is known, the system is returned to a functioning state. This restoration could involve rebuilding the system from scratch or restoring to a known backup. Root-cause analysis plays an important and significant role in locating a trustworthy known backup image. The timeline of events generated with the help of root-cause analysis can help determine which iteration of the backup is safest.
The final important part of this phase is to prevent the future impact of similar incidents. Patching and stronger firewall configurations can be effective steps taken for future preventions.
Reporting is a phase that starts from the beginning of the incident and remains to the conclusion. Reporting must begin immediately upon the detection of the incident. The reporting phase can be divided into two categories:
The incident response team has the responsibility to report the technical details of the incident. It is also crucial that they update the management about serious incidents. Non-technical stakeholders should be updated as the incident-handling process progresses. This is an important step in reporting and shouldn’t be ignored. Formal reporting will be started as the process reaches its recovery phase. The incident handling team will be preparing and sending the formal reports to technical and non-technical management staff as they recover the systems and put them back to production.
As the name indicates, this phase involves the processes involved in restoring the system to its operational state. As a normal procedure, the business unit responsible for the system will make the decision about when the system will go back online. One must also use caution, as there is a possibility that the infection or the attack may have persisted through the mitigation phase. Close monitoring of the system is necessary after it has been restored to production. Non-peak production hours are the best time for restoration of the operations. This will make the system security monitoring a lot easier.
Remediation steps are taken during the mitigation phase, where the vulnerabilities that are found during the root-cause analysis are mitigated. Remediation starts directly after mitigation and later its scope becomes broader. To manage a security incident, root-cause analysis should be performed. Root-cause analysis helps in determining the vulnerabilities that could cause such an incident to occur. Without root-cause analysis, the recovered system could still have a particular weakness that can affect other systems or could even cause that incident to occur again in the future. For example, if a previous backup is chosen that is not far back enough, the restored version could also have that vulnerability, starting the whole cycle over again.
“Lessons Learned” is the post-incident phase and unfortunately is also the most ignored phase. The lessons learned phase can be the most effective; if done right, it can bring positive changes to the overall security of the organization. The goal of this phase to prepare a final report on the incident and deliver it to management, along with the suggested improvements to avoid such incidents in future.
The important considerations could include specifying ways in which the identification could be made sooner, how the response could be made quicker, outlining organizational shortcomings that might have contributed to the incident, and any potential improvement in the system. Feedback from this phase feeds directly into constant preparation and the lessons learned can help in improving the preparation for future incidents.
Moving beyond the actual moment of incident response, the importance of incident management in creating an actionable framework of actions and appropriate steps cannot be overstated. It might be tempting to run at full speed to simply “put out the fire” of the compromised incident, but choosing speed over process can leave you open to further vulnerabilities. Worse, you may find yourself putting out the same fires over and over again without learning anything about how to prevent them in future.