Malware analysis

Spamdexing (SEO spam malware)

Daniel Brecht
July 15, 2020 by
Daniel Brecht

About SEO spam — is my website a target?

You’ve spent time and energy in positioning your website high in search engine rankings through good SEO practices. You realize, however, that someone has hijacked your site by inserting their own spam. You are a victim of SEO spam, otherwise known as spamdexing, web spam, search engine spam and more. 

This malware comes in many forms. It is normally used by malicious hackers to bank on the good ranking of reputable sites to spread their links to as many users as possible.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

This threat is carried out through a number of tactics of SEO manipulation, including ways to create websites that trick search engine algorithms to high-rank spam content. It was first introduced in 1996, when the schemes used by spammers mostly revolved around excessively repeating unrelated phrases or increasing the number of links with the goal of attracting traffic. 

Spamdexing evolved to include other techniques, including comment spam (used to build backlinks) that automatically posts irrelevant comments that provide no value to the discussion but are used to improve the webpage ranking. As search engines fine-tuned their algorithms, however, spamdexing has become more difficult to achieve for hackers. Yet is far from being a problem of the past.

What is spamdexing?

Spamdexing is the term used for a “website optimized, or attractive, to the major search engines for optimal indexing.” This SEO spamming involves getting a site more exposure than it deserves for its keywords and, as a result, more visible placement on search engine results pages (SERPs). Search engine poisoning (SEP) abuses the ranking algorithms and takes visitors to pages they did not intend to visit. In some cases, websites are artfully created exactly for that scope; in many others, however, they exploit legitimate sites.

As mentioned, spamdexing comes in different forms and has different features. Many of these tactics mimic legitimate techniques used in SEO optimization. However, while "white-hat SEO" aims to improve the overall high quality of a website and boost site ranking with authorized methods, "black-hat SEO" promotes useless content that lures users into visiting pages that just help fulfill malicious hackers’ agendas.

What does SEO spam look like? It can use content or links.

Content spam can come in the form of keyword stuffing, one of the oldest tactics involving the repeated use of keywords; article spinning, basically the re-purposing of content by merely changing words; gateway pages, sites that rank high artificially but that have no meaningful information for users that are asked to visit other pages for content; hidden text, not always a spam attempt but often used to disguise extra keywords in the page (it can use text color equal to the background, hide words behind images or even use a font with size zero); duplication of copyrighted content from high-ranking web sites.

Link spam, instead, is based on using links to rank pages higher. This technique is accomplished via hidden hyperlinks; cookie stuffing, transferring cookies on users’ computers (without their knowledge) to earn from affiliate sales; link farms, pages that link to each other and pages that are merely collection of hyperlinks; comment spam, several, meaningless comments to a post.

Is spamdexing still a problem?

“Spamdexing was a big problem in the 1990s, and search engines were fairly useless because they were compromised by spamdexing,” reports webspam.org. Then Google broke onto the scene and began tailoring algorithms and page ranking systems to promote good sites and good content. Spammy sites or pages that violated the guidelines set by the search engine were penalized in different ways from downranking to blacklisting. Nevertheless, spamdexing still continues, albeit at a lower level.

Cloaking, another form of spamdexing, was identified as existing on the SERP of Google in 2011. Cloaking is considered a violation of Google’s Webmaster Guidelines because it provides users with different results than they expected. In fact, this technique is based on presenting different content to users and search engines: for example, while users see images, the search engine sees HTML text. 

The 2012 algorithm Google Penguin was deployed to address this and other search engine spam, including Black Hat SEO. It was only the first of many other more refined efforts by Google algorithm updates that have changed the SEO world several times through the years to come: Google Hummingbird in 2013, Fred in 2017. 

In its Webspam Report 2017, Google already reported how it had doubled its efforts in “removing unnatural links via ranking improvements and scalable manual actions,” to the point that they were able to ensure less than 1% of spammy search results. The newest May 2020 Core Update, which was launched with the aim to reward good, relevant, original content for users, promises to improve these stats even more.

How to remove spam and provide protection for your site

Is spamdexing (SEO spam malware) a threat for website owners? Absolutely. It can have a negative impact on their site's search rankings, cause its removal from the search engine index and damage the webmaster’s and client’s reputations.

The easiest way to identify a Google Penalty is with the help of Google Search Console “Search Traffic” and “Manual Actions” features from the side bar.

If anything fishy is spotted, it’s advised to file a spam report (Google Account required). Google also encourages anyone who witnesses any unfair techniques to artificially inflate PageRank are asked to take immediate action and notify Google's web spam team, either through webmaster tools or their public spam report, to help combat spamdexing.

In addition, it is important to implement essential security measures and take steps for fixing any vulnerabilities on your site, such as:

  • Apply anti-spam tools
  • Enable spam filtering 
  • Get the most out of Google Alerts to monitor a site for spammy content or to help detect hacked pages
  • Capitalize on Chrome extensions to report spam to help with Google’s scalable spam-fighting efforts
  • Add a noindex robots meta tag on posts to demotivate spammers and to block non-search crawlers
  • Take advantage of “disavow backlinks” to remove spammy, artificial or low-quality links pointing to your site
  • Use a web application firewall (WAF) if you’re serious about preventing a search engine spam infection
  • Monitor your backlinks and look for any abnormal link “spam issues” or patterns with Google Search Console. You can also receive alerts when Google encounters indexing on your site
  • Exploit the “nofollow" attribute to deter spammers from targeting a site and give you more control

It is also worthwhile to browse how-to guides:

  • Google has a guide to show how to check if your site is hacked. It shows an example of hacked content and cloaking
  • Google also has a guide to identify the hack and then fix vulnerabilities on a site

Conclusion

Spamdexing has been an issue for decades and has evolved through times with the use of different techniques. Many search engines, including Google, routinely update their algorithms, check for instances of spamdexing and will remove these pages from their indexes. They may block an entire website if unethical methods are used to make the site rank highly.

In fact, Google has stated: “We continued to protect the value of authoritative and relevant links as an important ranking signal for Search. We continued to deal swiftly with egregious link spam, and made a number of bad linking practices less effective for manipulating ranking.”

Nevertheless, users and webmasters should always be on guard against these tactics and learn to recognize signs of spamdexing early.

Become a certified reverse engineer!

Become a certified reverse engineer!

Get live, hands-on malware analysis training from anywhere, and become a Certified Reverse Engineering Analyst.

Sources 

Daniel Brecht
Daniel Brecht

Daniel Brecht has been writing for the Web since 2007. His interests include computers, mobile devices and cyber security standards. He has enjoyed writing on a variety of topics ranging from cloud computing to application development, web development and e-commerce. Brecht has several years of experience as an Information Technician in the military and as an education counselor. He holds a graduate Certificate in Information Assurance and a Master of Science in Information Technology.