Developments in Machine Learning vs. Traditional SIEM Solutions
For decades, Information security analysts have been scanning through security logs trying to find anomalies that could lead to security incidents. In the beginning, the log data was limited, and the complexity of attacks did not require many different data feeds to be combined to come to a conclusion. Some assistance was provided by log analysis tools, but in the basis, the correlation was done by actual people.
This manual process became increasingly difficult when organizations needed to deal with ever increasing data volumes and a growing number of data sources. They needed to separate effectively the useful from the useless information that was hidden inside. Some more advanced log solutions were created that slowly evolved into the SIEM (Security information and event management) solutions such as ArcSight and AlienVault we now see.
A SIEM solution will sift through a potentially enormous amount of security related data to present only relevant, prioritized and actionable events to a Security Team. The determinations are based on pre-programmed logic, designed and developed by Security Engineers.
Some SIEM solutions can combine different event sources to correlate that information with a deeper level of analysis, which adds more value.
Imagine 50 failed login attempts from a single source, directed at a single target, over the course of 5 minutes. This is a very common scenario. Sometimes this caused by a misconfigured internal system. For instance, it could be triggered by having the wrong credentials stored in an executed maintenance script. In this case, a SIEM content modification can suppress these alerts while the underlying problem is addressed.
In other cases, these events are part of a brute-force attack to “guess” the login credentials to the target. As long as the events indicate a login failure, the security implications are limited. Most likely there has not been a breach yet. Otherwise, the attacker would not keep trying (or the attacker is very clever). Now, imagine the same 50 failed logins followed by a single, successful login from a single source directed to a single target. A well-developed SIEM system should correlate these two different event types, the failed and the successful logins and raise a higher priority Security Event. The same well-developed SIEM solution should also correlate these two different login events with for instance internal IDS events, generated following from the attackers’ successful login. This will provide further certainty the attack is real, and it will give the Security Team a clearer picture of the attack as it happens.
The problem with a traditional SIEM solution is that all this content, covering all possible attack chains, will need to be thought through before it can be seen in action. In the perfect situation this is very well possible, but in practice time, limits and human errors can limit the effectiveness of the SIEM solution. Combine this with the growing costs of good SIEM engineers, the skills shortage and the time it takes to create good SIEM content, and it is easy to start looking around for other options. Some SIEM providers have come up with add-ons such as behavioural content modules, but in the end, this is mostly just complex, pre-configured SIEM content, often with a serious additional price-tag.
This is why a more dynamic solution is needed. The next logical step to a traditional SIEM is a monitoring system that can learn normal behaviour and detect anomalies after which it automatically changes its own content.
The science covering a computer system performing certain actions without the need to be explicitly programmed is called Machine Learning. Machine Learning is often confused with Artificial Intelligence, but it is merely one of the many subjects under that much larger (and more popular in modern media) umbrella.
Machine Learning systems have been around for quite some time. Some examples are Search Engines, Web Stores, Social Media Sites, but also traditional environments such as a many Postal Services and banks. IBM, Amazon, Google, Facebook and many other organisations are working continuously on more and more advanced Machine Learning systems, and some of these are trickling down into the valuable IT Security market.
Imagine the traditional SIEM solution, but now smart enough to create baselines of user and network behaviour, generated from a very large set of data sources. It will then look at current and historical events and compare these against the created baselines with (learned) complex algorithms, without much input from any Security Engineer. The capabilities would be nearly unlimited, and the cost savings to organisations would be enormous. This solution could automate much of the security analysis traditionally performed manually by human analysts, without the need for a traditional SIEM solution.
IBM Watson (the AI system known for beating the Jeopardy quiz) recently passed the CISSP exam and an implementation suggested by Marc van Zadelhoff, IBM General Manager Security, is to use it inside a Security Operation Center for assistance and raw data analysis. Looking at the increasing costs of security solutions and the scarcity of qualified IT Security staff, this will most likely happen in one way or another.
Most Large Security vendors are working on Machine Learning systems in some way or another. Splunk offers User Behaviour Analytics, which is based on data science combined with Machine Learning in order, identify threats such as cyber attacks or insider threats. HP ArcSight ThreatDetector uses Machine Learning to automate pattern discovery and to facilitate intelligent rule creation. IBM QRadar has similar capabilities, and the list goes on.
It is not hard to see what the future will hold for the traditional SIEM solutions.
More and more automation will be implemented by new and existing security vendors through continuously improving Machine Learning systems. This will eventually render much of the traditional SIEM Engineering skills obsolete, and professionals in this field need to be aware and prepared. It will also lead to opportunities. The more high-quality intelligence is applied to the vast amount of raw security data, the clearer the picture of security incidents will become for human analysis. In the end, there is enough going on inside and outside large networks. Smarter detection will lead to more incidents that have previously been flying under the radar. This will only make the security profession more interesting.
There will also be an increasing market for highly skilled, non-security specialized data scientists. They will be working on these new Machine Learning environments, tweaking and developing new and existing algorithms. The rapid developments in this area over the previous decade can only support that view.