Introduction:

We are all aware of Google/Yahoo/Bing Search engines; they need no introduction. We use them every now and then to solve our day-to-day queries.

Google and other search engines use automated programs called spiders or crawlers. Also, these search engines have a large index of keywords, and where those words can be found. Powerful crawling and indexing features make these search engines not only powerful but also opens doors for hackers to use for identifying vulnerable targets over the internet. This is called Search Engine Hacking.

Search Engine Hacking involves using advanced operator-based searching to identify exploitable targets and sensitive data using the search engines.

In this article, we learn to use various Google search operators to identify vulnerable targets over the Internet and also check out a new tool that can be used to automate this process.

Special Search Characters:

Google search engine provides its users with various special search characters for advanced searching. See a partial list below:

  1. Quotes [“search query”]: Quotes are used to search for specific phrase or set of words.

    E.g. The query [“The monk who sold his Ferrari”] will search for the specific phrase —The monk who sold his Ferrari.

  2. Minus Sign [-]: The minus sign tells Google search engine to exclude the word that follows the minus operator.

    E.g. [-red apple] will display the search results which will exclude the word red.

  3. Tilde operator [~]: Adding a tilde operator in front of a word will search for results containing that word as well as even more synonyms.

    E.g. [~jokes] will display search results which will include the word jokes as well as its synonyms like funny, humor, etc.


  4. OR operator or vertical bar [|]: Using OR (in uppercase) or the vertical bar with two or more keywords, tells Google to search for pages that contain either of the words.

    E.g. [Android OR Apple] will display search results containing either of the words.

  5. Asterisk operator [*]: The asterisk is a computer symbol for a wildcard, which allows the search engine, such as Google, to fill in that space with any text string. You can also use it within double quotes for more precise searches.

    E.g. The query [“today is * day”] will display search results like “today is a good day” or “today is mother’s day”, etc.

Basic Searching Techniques:

Google search engine provides various operators to customize our search results.

The basic syntax of a Google advanced operator is

operator:search_term

The list below provides some of the key operators useful in creating search queries to retrieve valuable information from the web.

  1. Intitle operator:

    The query [intitle:keyword] in the search engine will return pages containing the keyword in the title.

    E.g. 1: The query [intitle:Google] will return all the web pages containing Google in the title.

    E.g. 2: Google Hacking using intitle operator

    Using the query [intitle:”Index of”] will return all the web pages containing “Index of” in the title. This can be used to identify if Directory Listing (Directory Listing displays a list of the directory contents) is enabled on the web server.


  2. Site operator:

    The query [site:www.site.com] narrows a search to a particular site, domain or sub-domain.

    E.g. 1: The query [news site:yahoo.com] will search for the keyword “news” on the site and the sub-domains of Yahoo.com.

    E.g. 2: Google Hacking – Information gathering on sub domains

    The query [site:yahoo.com] will display search results containing all the sub-domains of yahoo.com. This operator is useful for gathering information on the sub-domains of a specific target site.

  3. Inurl operator:

    The query inurl:keyword in the search engine will return pages containing the keyword in the URL.

    E.g. 1 – The query [inurl:contactus site:www.MySite.com] will search for pages on MySite in the URL containing the word “contactus”.

    E.g. 2 – Google Hacking – Looking for Admin Portals

    The query [inurl:admin.php] will search for all the websites that might have admin login pages. These pages attract the hackers and they might brute force the login page to gain access to the admin interface.

  4. Cache operator:

    Google keeps the snapshot of the pages it has crawled. The query [cache:keyword] in the search engine displays Google’s cached version of the page.

    E.g. – The query [cache:www.yahoo.com] will display cached pages of the website Yahoo.com. The above directive can be useful in gathering information from the previously cached pages.

    Another very useful website that can be used to obtain the cached pages is http://archive.org/

    This websites stores a snapshot of the websites in a calendar format, and can be used to view the pages of any previous date. The screenshot below displays a cached page of Yahoo.com dated 9 Feb 2010.

    Click to Enlarge

    Click to Enlarge


  5. Filetype operator:

    The query [filetype:file extension] searches for pages that end in a particular file extension. Google can search for many different types of files like pdf, doc, image, rtf, ppt, xls, etc.

    E.g. The query [filetype:pdf site:yahoo.com] will return all the links to pdf files found on Yahoo.com.

Google Hacking through keyword search

Let’s look at some of the keyword searches and the operators that can be used to build search queries to carry out Google Hacking.

  1. Digging Google for Configuration Files:

    Configuration files are used to configure the initial settings for some computer programs. An attacker having access to the configuration file can get a complete understanding of the program deployed.

    For e.g. a Google query like [filetype:ini inurl:ws_ftp.ini] would retrieve the configuration file used by the WS_FTP client program as shown in the screenshot below:


  2. Digging Google for Log Files:

    The web servers log information like IP address, timestamps, HTTP request, usernames and password in to the log files. These log files are usually stored with the extension .log on the server side and may be accessible over the internet due to inadequate protection.

    For e.g. a Google query like [filetype:log cron.log] would retrieve the UNIX cron log as shown in the screenshot below:

    Click to Enlarge

    Click to Enlarge

  3. Digging Google for database leakage information from web applications:

    Google Hackers search Google for pieces of database information leaked from vulnerable servers. This information can be used to identify a vulnerable target and launch a more sophisticated attack against the target.

    For example, a Google query like [filetype:inc intext:mysql_connect

    ] will retrieve the .inc file that contains the mysql user credentials and other functions details that are used to connect to the database.


  4. Digging Google for leakage of information though error messages:

    Information leakage through error messages are very much useful for information gathering and launching further attacks on the websites. If the application does not have exception/error handling mechanisms, it might leak sensitive details in the error messages like database details, error stack trace details, etc.

    E.g. a Google query like [intitle:”Apache Tomcat” “Error Report”] will display search results containing the Apache Tomcat error messages.


We discussed a brief on the directives that can be used to carry out search engine hacking. Manually trying out each of these directives can be a cumbersome task. To automate the process of search engine hacking and retrieving juicy information, we make use of automated tools.

Automated tools available for Google Hacking:

The above tools provide are useful for Google Hacking. However, let’s look at a new tool called Search Diggity, which provides a graphical user interface and is useful in retrieving lot information from both Bing as well as Google search engine.

Search Diggity:

It is Stach & Liu’s MS Windows GUI application that serves as a front-end to the most recent versions of the Diggity tools:

  • GoogleDiggity
  • BingDiggity
  • Bing LinkFromDomainDiggity
  • CodeSearchDiggity, DLPDiggity
  • FlashDiggity
  • MalwareDiggity
  • PortScanDiggity
  • SHODANDiggity
  • BingBinaryMalwareSearch
  • NotInMyBackYard Diggity

More information on these modules can be found here: Ref:
http://www.stachliu.com/resources/tools/google-hacking-diggity-project/attack-tools/

Let’s explore a few of the above key modules of interest to learn about the art of search engine hacking.

GoogleDiggity:

The Google Diggity tool automates the Google Hacking process. It queries the search engine using the Google JSON/ATOM Custom Search API to identify vulnerabilities and information disclosures.

The Google Search engine uses a bot detection technique. As a result querying Google using automated tools for Google hacking. This is overcome with the use of Google JSON/ATOM Custom Search API, which uses an API key. A user can register for an API key against a valid Gmail account and get a free 100 requests/day. Additional queries are available at a cost (Google charges $5 per 1000 queries).

The tool provides a well-structured interface that allows the user to:

  • Select the search queries from the list
  • Feed the API key
  • Specify the target site/domain/IP address
  • Scan button to kick of the scan, etc.

Bing Diggity:

Similar to GoogleDiggity, Bing Diggity is a Bing search engine hacking tool. It utilizes the Bing 2.0 API (The Bing 2.0 API allows 1000 results per query) and the Stach & Liu’s newly developed Bing Hacking Database (BHDB) to find vulnerabilities and sensitive information disclosures related to your organization that are exposed via Microsoft’s Bing search engine.

The tool provides a well-structured interface that allows the user to:

  • Select the search queries specific to Bing search engine from the list
  • Feed the API key
  • Specify the target site/domain/IP address
  • Scan button to kick of the scan, etc.

DLPDiggity:

DLPDiggity is a data loss prevention tool that leverages Google/Bing to identify exposures of sensitive info (e.g. SSNs, credit card numbers, etc.) via common document formats such as .doc, .xls, and .pdf. First, GoogleDiggity and BingDiggity are used to locate and download files belonging to target domains/sites on the Internet. Then, DLPDiggity is used to analyze those downloaded files for sensitive information disclosures.

DLPDiggity utilizes IFilters
(An IFilter is a plugin that allows the Windows Indexing Service and the newer Windows Desktop Search to index different file formats so that they become searchable) to search through the actual contents of files, as opposed to just the meta-data. Using .NET regular expressions, DLPDiggity can find almost any type of sensitive data within common document file formats.

Over the last few years, there has been a tremendous increase in the volume of office documents that have been indexed and made searchable by Google and Bing. DLPDiggity taps into that in order to find documents containing sensitive information.

The tool provides a well-structured interface that allows the user to:

  • Select the DLPDiggity search queries from the list that can be used to dig Google/Bing search engine for querying for documents.
  • Select the regular expressions that will be used to search through the documents in the target directory for data leaks of sensitive information such as SSN, credit card numbers
  • Search button to analyze through the documents

FlashDiggity:

FlashDiggity automates Google searching/downloading/decompiling/analysis of SWF files to identify Flash vulnerabilities and information disclosures.

FlashDiggity first leverages the GoogleDiggity tool in order to identify Adobe Flash SWF applications for target domains via Google searches, such as ext:swf. Next, the tool is used to download all of the SWF files in bulk for analysis. The SWF files are disassembled back to their original ActionScript source code, and then analyzed for code-based vulnerabilities.

The tool provides a well-structured interface that allows the user to:

  • Select the FlashDiggity search queries from the list that can be used to dig Google search engine for querying for documents
  • Select the regular expressions that will be used to search through the ActionScript of decompiled SWF Flash files for code-based vulnerabilities and information disclosures.
  • Search button to decompile and analyze the SWF files