The Value of Online Malware Collections
The Problem with Malware
One of the biggest security threats to a modern business is a malware outbreak. The risk of its occurrence is fairly high, thanks to the prevalence of malware-spam campaigns and easy propagation via USB devices and network vulnerabilities, and the impact on a business can be devastating. Think of a company-wide ransomware attack! Malware has already taken hospitals, government departments, power grids and airlines offline for days or weeks.
Now, malware in its many varieties is not new. For decades there has been a battle between antivirus companies and malware creators. Why has there never been a solution for this problem?
The answer is fairly simple: malware authors will always be one step ahead of the anti-malware vendors. Malware detection and prevention are inherently reactive to newly-developed malware; after all, it’s hard to fix a problem that hasn’t been created yet. Some progress has been made with machine-learning program classification and sandboxing, but these are expensive and far from reliable. What has been successful, however, is the collection of threat intelligence around malware.
The Collection, Analysis and Categorization of Malware
Threat intelligence attempts to gather as many unique identifiers related to a particular malware sample as possible. This means that new files that are suspicious or malicious (or that sometimes even appear clean at first glance) can easily be compared to the existing malware dataset.
This comparison can be done manually, but these days many end-point products such as Crowdstrike Falcon and Carbon Black, or SIEM products such as Splunk Enterprise Security and ArcSight, can do this automatically. They interface with online cloud-based malware collection platforms such as VirusTotal and Hybrid Analysis. All that is usually needed is an API request carrying a hash; after this, intelligence on the file is fed back to the customer.
This is where the business model of these collection-hosting companies starts. A free API is usually available, which allows for a few queries in a given time window and sometimes limits the amount of available information. To apply these intelligence lookups at a larger, automated scale, however, a paid subscription is needed.
Value of the Data
These paid malware intelligence subscriptions show the value of the data that is collected; companies are willing to pay a lot for this information. Of course, the fees cover the upkeep of the often-enormous systems that gather and provide this intelligence data. Some estimates indicate that at least 360,000 new malware samples are found every day. The amount of file submissions to just one service, including actual clean files, could easily top a million per day. These samples can often be several megabytes in size and will have many so-called Indicators of Compromise (IoCs) attached to them if they are fully analyzed as well. On top of this malware-specific threat intelligence data, the platforms also hold information on where the files came from and which sites are advised to be blocked for that reason.
The value of the collected data is so evident that even antivirus companies now heavily rely on these malware collections as a layer of defense. Think, for instance, about blocking a file if more than five of the 50+ VirusTotal-linked antivirus engines see it as malicious. This method could be even more reliable than the vendor’s own single engine, and these partnerships with antivirus vendors are worth a lot of money for the collection platforms.
Developments in the Cloud
When Google took over the Spanish firm VirusTotal in 2012, there was a lot of speculation on why it decided to do so. Was it to integrate their Chrome browser with VirusTotal data in order to create a more secure browser product? Was the collection of threat and malware intelligence important to Google’s own security capability? Or were they just absorbing another, profitable business? Maybe it was a combination of all these factors that led to the decision.
However, there certainly seems to be a trend. In 2017, security and next-gen antivirus company Crowdstrike acquired Payload Security, which owned the online sandboxing platform Hybrid Analysis. Hybrid Analysis also stores a vast amount of malware and its surrounding intelligence data. This would be very valuable for any antivirus company. Being able to provide full integration with a (cloud-enabled) endpoint antivirus product and having access to the latest uploaded malware samples, possibly containing zero-day vulnerability exploits, can give a security company a leg up in a very competitive market.
The need for more malware samples and data in order to make the platform the best in the market was shown by Google’s decision in 2016, when it started limiting VirusTotal access for companies that took intelligence data from VirusTotal but didn’t share data and samples in return. This platform, and many others like it, would not exist without having access to shared community data. The business model is all about scale.
The service of massive malware collection and analysis makes a lot of sense, both from a security perspective and from a business perspective. Of course, setting up and maintaining the platform itself will require a significant investment, but the actual data is mostly free. The malware is written by third parties and has no copyrights attached, and the users and their automated security products upload it for free to the cloud platform. This data is then transformed into valuable information and resold, often to the same customers that can benefit from the cross-customer-correlated intelligence.
360K New Malware Samples Hit the Scene Every Day, Infosecurity Magazine
Hybrid Analysis Grows Up – Acquired by CrowdStrike, Zeltser.com
VirusTotal Access to be Limited: Google, Comodo Antivirus