Malware analysis

Top 7 malware sample databases and datasets for research and training

Greg Belding
May 3, 2021 by
Greg Belding

Research and training are integral parts of cybersecurity, but how do you research and train for something that is changing every day, and frankly, by the minute? Have no fear about the ever-changing face of the malware threat landscape malware sample databases and datasets keep track of the world of malware so that aspiring cybersecurity professionals, and those actively working as cybersecurity professionals, can stay on top of malware research and training

Let’s explore the top 7 malware databases and datasets for research and training so you will be well equipped with the online resources needed to make a difference in the fight against malware. 

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

7. SoReL-20M

In response to the lack of large-scale, standardized and realistic data for those needing to research malware, researchers at Sophos and ReversingLabs have released SoReL-20M, which is a database containing 20 million malware samples, including 10 million disabled malware samples. 

Samples in SoReL-20M have adopted features from the Ember 2.0 dataset, such as detection metadata, labels and complete binaries. It is still young, and mostly contains executable and Windows files, so it still has some room to grow. That being said, researchers can already make use of the vastness of the malware samples it contains, which can be found here.

6. VirusShare

A longtime staple of malware sample datasets, VirusShare deserves to be in the top seven. After registering for an account (by emailing admin@virusshare.com and asking for one), you can search for samples, grab some hashes, research specific malware families and other key details about any of the over 37 million malware samples that VirusShare contains. 

Don’t let the barebones aesthetics of VirusShare fool you it is one of the most useful sources for malware research and training out there.

5. InQuestLabs

Earning its spot on the list due to usability (not to mention offering features that the others don’t) is InQuestLabs. This malware database offers a solid list of features:

  • Deep file inspection (DFI)
  • Aggregate reputation database
  • Indicators of compromise (IOC)
  • Base64 regular expression generator
  • Mixed hex case generator
  • UInt() trigger generator

4. MalwareBazaar

While it may not have the sheer number of malware samples that others have, it offers great insights for researching and malware training. One of the most useful things about the MalwareBazaar is the information available. The dashboard is referred to as “browse” and at first glance, it tells you how many samples were uploaded to the database in the last 24 hours, the most seen malware family in the last 24 hours, the number of malware samples currently in the database, a syntax search field and a running list of the most recent uploaded samples, in descending order based on upload date. 

MalwareBazaar organizes samples based upon date, SHA256 hash, file type, signature, tags and reporter of the malware. Once you have found your sample, downloading it in a zip file is as simple as using the file password that MalwareBazaar provides for the malware sample.

3. Hybrid Analysis

Hybrid Analysis offers a database of malware samples but what sets it apart is two things. The first is a free malware analysis service open to all. And all you have to do to get the file analyzed is drag and drop the file you think is suspicious and you are off to the races. The second thing that distinguishes this malware sample database is the aptly named Hybrid Analysis technology that the search uses to compare the sample. It checks multiple databases and file collections to detect some of the rarer malware samples. 

It should be noted that for full use of Hybrid Analysis, you will want to use one of the paid versions for full access to all malware samples.

2. URLhaus

Let’s face it sometimes all you have to go on is a URL. If you thought it may be suspicious, the last thing you would want to do is paste the URL into your browser and go to it. For those situations, URLhaus is just what you are looking for. This malware database stores URLs for known malware, lets users propose new malware URLs, and offers the dataset as a parsable list of the URLs via the URLhause API. 

Offering statistics for a malware sample database is fairly common, but what is not common is what URLhaus provides:

  • Most delivered payload
  • Average takedown time
  • Top malware-hosting network
  • Blocklist comparison
  • Average reaction time

1. VirusBay

VirusBay offers what virtually no one else can a collaborative support system that connects SOC professionals, learners and novices with high-end malware researchers. This heightened collaboration within cybersecurity intends to help organizations’ response and recovery to information security incidents when it would not be possible for external experts to come out to the site or facility.

VirusBay offers the following features:

  • Secure (and free) malware sample exchange
  • Security incident report generator
  • Indicators of compromise (IOC) Q&A
  • One-click call for papers (CFP)
  • A credits-based community where your skills are noticeable immediately. Each user’s skill is measured separately, and actions taken by users can earn credits.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Utilize a wide array of malware databases for your work and education

Malware sample databases and datasets are one of the best ways to research and train for any of the many roles within an organization that works with malware. There is a growing list of these sorts of resources and those listed above are the top seven focused on research and training. 

 

Sources

Sophos, ReversingLabs Release 20 Million Sample Dataset for Malware Research, Security Week

Greg Belding
Greg Belding

Greg is a Veteran IT Professional working in the Healthcare field. He enjoys Information Security, creating Information Defensive Strategy, and writing – both as a Cybersecurity Blogger as well as for fun.