Using data science to combat deepfakes, malware and social engineering
Cybersecurity data science is a fast-growing field, but it’s difficult to find up-to-date training due to all the new technologies, said Infosec Skills author Emmanuel Tsukerman.
“Deep learning is very recent, but look at what kind of impact it’s doing politically,” said Emmanuel, who recently published a series of courses on Cybersecurity Data Science.
“People are constantly talking about if you can trust videos that you’re seeing online. It’s making huge waves,” Emmanuel said. “But if you want to learn cybersecurity data science, you don’t have that many venues where you can learn it.”
It’s a challenge cybersecurity professionals must be prepared to address. The misuse of deepfake technology alone is expected to cost businesses over a quarter of a billion dollars in 2020, according to Forrester. However, deepfakes are just part of a larger trend affecting digital content and cybersecurity.
“It might become hard to tell who content is written by — if it’s really written by a human,” Emmanuel said. “On the other side, we’ll also see a lot of machine learning being used to detect when something is real or not.”
Data scientist the best job in U.S.
Data science is a natural fit for cybersecurity professionals, Emmanuel said, since it’s all about automatically learning from the datasets you have.
“In cybersecurity we generate massive datasets all the time, and there’s so much promise around how we can use that data.”
That promise is what drew Emmanuel to the field of cybersecurity data science.
“It’s a job that will not be going away anytime soon,” Emmanuel said. “On the other hand, there is the content itself, which is an ever-evolving landscape. There are always interesting, emerging technologies like the deepfake stuff and the implementation of deep learning around malware — from both sides, creating and detecting it.”
Leveraging machine learning for cybersecurity
Emmanuel believes data science is becoming more relevant for all types of cybersecurity roles.
“Let’s say you’re working on anti-malware or anti-virus. With millions of new strands or millions of new samples coming out every day, it’s almost essential nowadays to use machine learning in your threat detection.”
One case study highlighted in his courses demonstrates how pentesters can improve phishing rates by adding machine learning to a Twitter phishing bot.
“It’s based on an experiment showing a much higher phishing success rate — something like 30 or 40 percent — versus if you use an ordinary spam bot,” Emmanuel said.
With machine learning, the bot could grab all of a user’s tweets, learn the topics and then start posting about those topics.
“It might mention you, and you’re like, ‘What’s this about me? What’s this link?’ Then you click it. It’s much more convincing,” Emmanuel said. “You can take it further and further. You can use machine learning to know when to send a tweet, when the user is checking their Twitter, who their friends are. You can make it more and more intelligent, and more and more dangerous.”
Learning through hands-on projects
Understanding theory is important, Emmanuel said, but many cybersecurity data science questions can only be answered by getting your hands dirty.
“Let’s say you have this idea for combating ransomware,” Emmanuel said. “It may sound promising, but how would you know if it works? The only way is to get hands-on. Set up the lab, find the sample, try to dump the memory and see if it actually worked. If it works, that’s great. You have a legitimate solution that will help a lot of people get all their data back and not have to pay. If it doesn’t work, now you know because you’ve tried it.”
Emmanuel said he tried to keep his courses as hands-on as possible, including a capture the flag project designed to test your skills.
“Pretty much all the lessons are hands-on in the sense you have code that you can run on your own machine, and you have data sets provided for you.”
The topics included in his Cybersecurity Data Science learning path include:
- Preparation for Cybersecurity Data Science
- Malware Detection via Machine Learning
- Machine Learning for Intrusion Detection
- Machine Learning for Social Engineering
- Machine Learning for Pentesting
“It’s as real as it gets, as hands-on as it gets,” Emmanuel said.
The future of cybersecurity data science
Emmanuel recommends maximizing your cybersecurity training with relevant, hands-on courses that apply directly to the work you’re doing, or the work you want to do. That will help shorten the learning curve so you can stay ahead of the latest technology shifts.
When asked to predict where the the evolving cybersecurity data science landscape will head next, Emmanuel expects fake content to become an even larger issue for organizations.
He added, “It’s an interesting job, and it’s always evolving.”
Learn more about Emmanuel Tsukerman’s Infosec Skills courses:
About Emmanuel Tsukerman
Emmanuel Tsukerman graduated from Stanford University and obtained his PhD from UC Berkeley. In 2018, Dr. Tsukerman’s anti-ransomware product was listed as Top 10 Ransomware Products of 2018 by PC Magazine. In 2019, Dr. Tsukerman worked at Palo Alto Networks on research and development of large-scale machine learning solutions on cybersecurity data. When he isn’t developing cybersecurity products, Dr. Tsukerman enjoys fishing in the sunny outdoors and spending time with family.