Google Hacking Overview
Google Hacking is a term that encapsulates a wide range of techniques for querying Google to reveal vulnerable Web applications and sometimes to pinpoint vulnerabilities within specific web applications. Besides revealing flaws in web applications, Google Hacking allows you to find sensitive data, useful for the Reconnaissance stage of an attack, such as emails associated with a site, database dumps or other files with usernames and passwords, unprotected directories with sensitive files, URLs to login portals, different types of system logs such as firewall and access logs, unprotected pages that contain sensitive information such as web-connected printers or cameras with data about their usage, status, location and so on.
Advanced operators for querying Google
Advanced operators allow you to get more specific search results from your queries. Most of the time, they allow you to view a list of the most relevant and useful results. For example, you can use advanced operators to get only files of a particular type or filter so that the results of your search are limited to a specific website. If you simply use a Google search term, you will see all the results that match the given terms. Advanced operators, however, make it possible to get a subset of the original results that match certain characteristics. This can be easily illustrated by querying Google for a domain and compare that to querying with the site operator for the given domain. The former query would give results to all kinds of external websites that mention that domain while the latter would narrow the results down to those originating from the chosen domain.
Advanced operators usually take the form of operator:search-term and are directly written in your query string. There should be no space between the operator and the search term and the search term itself cannot contain spaces, or the query will fail. To use spaces, we would have to surround the phrase with quotation marks. Quotation marks serve the purpose of telling Google to search for an exact match. To test this, you can try searching Google with a term like there is a lot of fish in the sea and retrying the search with the same term but encapsulated in quotation marks – “there is a lot of fish in the sea.”
Figure 1: Results from enclosing search words with quotation marks vs. no quotation marks
For example, by querying Google for site:infosecinstitute.com filetype:pdf, we use two advanced operators – the site operator which will limit the results to only those originating from the given website and the filetype operator which will return results limited to a certain file type (in this case, pdf).
Below is a table containing some of the commonly used Google operators and symbols for Google hacking:
|intitle:||Searches in the title of the pages (the <title> HTML element that is located in the <head> element of the page’s markup)||intitle:admin
|inurl:||Searches with the URL of the crawled web pages.||inurl:wp-content/uploads filetype:sql
inurl:.ssh intitle:index.of authorized_keys
|intext:||Searches within the text of the web pages (the text possibly seen by regular users browsing the web pages)||intext:”powered by webcamXP 5″
intext:”Powered by net2ftp” inurl:ftp
inurl:”server-status” intext:”Apache Server Status”
|allintext:/allinurl:/allintitle:||All three operators work similarly to the ones mentioned above except they do not work with other operators and look for all words after them in the text/url/title of the web page.||allintext: “Please login to continue…” “ZTE Corporation. All rights reserved.”
allintitle:Welcome to Windows XP Server Internet Services
|filetype:||Limits the results to web resources matching the desired file type (not always correct)||filetype:xls intext:email intext:password
|site:||Limits the results to web resources within a given website||filetype:xls site:apple.com
|Info:||Shows additional links/actions which can be followed such as showing Google’s cache of the website, visiting similar pages, viewing pages which link to the given page and so on.||info:apple.com|
|–||Excludes the term/operator from the results||inurl:citrix inurl:login.asp -site:citrix.com|
|“search-term”||Adding the phrase in quotation marks returns only results that are an exact match to what is sought for||inurl:”server-status” intext:”Apache Server Status”|
|*||A wildcard for any unknown/arbitrary words. It is not used for completing a word like foot* but pinpoints that anys word could be at that search position.||a * saved is a * earned|
|+||The phrase that follows the + modifier must exist within the results. It can be used to include an overly common word which Google typically neglects in queries.||“Machine gun” +uzi|
|.||A single-character wildcard, any single character can be in that place||inurl:.ssh intitle:index.of authorized_keys|
Numerous cheat sheets exist which show details about most of the advanced operators available for use in queries such as the one posted by Google Guide.
Google also provides a web page with an interface for making some advanced queries located at https://www.google.com/advanced_search
Google Hacking Database
The Google Hacking Database contains user-submitted queries divided into different categories – such as vulnerable files, files containing passwords, information about the server and the software on it, finding online devices and so on. A dork is just an already found Google query which is known to return useful results such as exploits or sensitive data. When browsing the dorks available in the Google Hacking Database, you ought to be looking at their submission date as some dorks are old and may not prove useful. Old submissions relating to exploits, vulnerabilities and other flaws of specific software versions may easily become irrelevant after a period of time. However, there are some dorks that deal with ways to harvest information which still work no matter the submission date – such as ways to find database dumps, to find pages with downloads, to get unprotected directory listings (to some extent) and so on.
Basic penetration testing through Google Hacking
As shown above, Google can be used for (passive) information gathering. It is a great tool for footprinting and allows for mobility and anonymity during the footprinting process. The information that Google Hacking results can show is generally publicly available and can be found manually, should one have the time and resources to search for it. With Google Hacking, you are not actively engaging with the system, but you can easily collect information typically sought in the Reconnaissance phase of an attack such as error messages, passwords, usernames, sensitive directories, devices and hardware online, detect web servers and vulnerabilities within them, pages with access forms, and sensitive e-banking and e-commerce information. Thus, you can directly find usernames and passwords which could easily be exploited to get access, you can find possible devices and software which can be targeted, etc., which makes Google an invaluable tool. In fact, Google Hacking is a concept with which you have to be acquainted if you plan on taking an exam such as the Certified Ethical Hacker (CEH) exam.
There are many ways to look for usernames and passwords through Google queries. For example, you can search for .sql files which contain dumps of the databases of different websites. Those databases usually contain most of the data related to a website – such as its users, passwords, user details and so on. One query is: filetype:sql inurl:backup inurl:wp-content. This will search for database dumps in websites whose URL contains the words backup and wp-content. Wp-content is the folder where the user and some plugins upload their files in the popular CMS WordPress on which many websites are built, and backup can potentially filter the results to people who decided to place a copy of their database online in case something happens.
Figure 2: Querying Google for database dumps
The query returned many results, most of which were actual database dumps of WordPress installations. Those database dumps contained information about the WordPress administrative users such as their username, email, hashed password, amongst other potentially useful information. The WordPress administrative users themselves are usually located in the wp_users table (which can have a different prefix than wp – the prefix is set in the initial installation of WordPress).
Figure 3: Locating the administrative user and its associated data in one of the database dumps
Figure 4: The administrative user, his/her email, names, and hashed password in another database dump
There are many files used by different kinds of software which contain lists of usernames and passwords. For example, .htpasswd can be used in websites to perform Basic Authentication. With Basic Authentication, browsers show login fields which can be checked for matches within an .htpasswd file on the server/website.
Figure 5: Basic Authentication in a website. Your browsers show login fields which can be checked for matches within a .htpasswd file
There are many ways to search for this particular file. The Google Hacking Database proposes simply typing htpasswd, but you can search for htpasswd.bak, filetype:htpasswd and so on. As seen here, searches for one type of information can often expose other data that can be used in the pen testing process
Figure 6: An arbitrary file with a username and password found online
Figure 7: An arbitrary file with a username and password found online
Identifying system version information
As we have seen in the operators table, we can get directory listings by incorporating “index.of” in our searches. Queries such as intitle:index.of server.at can pinpoint directory listings with some server information which is shown in such listings by default in web servers such as Apache.
You can add the site: operator to that query to search for directory listings leaking server information in specific websites. For example, a search for intitle:index.of server.at site:somewebsite.edu revealed the particular server software (Apache), its version and the operating system of the machine it is on as seen in the picture below.
Figure 8: Querying to retrieve server information
Finding websites using vulnerable software
Another use for Google hacking is to identify systems that are running a known vulnerable version of software.
Many web applications add a “Powered By” field somewhere on the page and sometimes mention the version of the software. That means if you find a vulnerability in, let’s say vBulletin, you can search for other websites who are also susceptible to this vulnerability.
Figure 9: An example of “Powered By” field that indicates the software versioning.
The picture above shows vBulletin installed on a website which is noted by the informational footer. Should a vulnerability exist in that version of vBulletin, other vulnerable sites would be easily reached.
Queries to start your tests with
- Site:targetsite.com Intitle:index.of – when you start examining a website it is a good idea to look at any potential directory listings first. Those can sometimes reveal information about the server and will certainly show files which may reveal additional information. This operator will only display results from Apache based servers and not others such as sites served with Node.js, though Apache is the web server dominating the market.
- site:targetsite.com intext:error|warning: – languages like PHP give an option for errors and warning to be displayed directly on the page where they occurred which is useful for development purposes. However, there are many websites that are in production mode without hiding possible errors. The actual error or warning is usually prepended with error: or warning: so you can search for those on a particular website. Depending on the website and its subject-matter, false positives may emerge.
Figure 14: Searching for errors and warnings in a specific website
Figure 15: The MySQL database user is revealed from a PHP warning found through Google
As you can see above, a simple search for errors and warnings in a website revealed a database error which showed that the database user is artshis2, that a MySQL database is used on the machine and that the website is using a legacy PHP MySQL extension which may be vulnerable to SQL injections.
- inurl:temp | inurl:tmp | inurl:backup | inurl:bak – Searching for temporary or backup files can be quite fruitful. This search will capture files, directories and file extensions on the server containing one of the most common backup/temporary names. You can add additional parameters to the query to get more specific results. For example, adding inurl:wp-content to the query would show back up files and directories that are inside the public assets folder of a WordPress installation. You can also combine this with other searches such as filetype:sql that we mentioned earlier.
Figure 16: Searching for backup or temporary files and folders within WordPress installations
WThe image above shows that searching for temporary files and backups within WordPress installations can reveal quite a lot, in this case, public backup copies of databases and entire WordPress installations.
Taking advantage of web software with Google queries
When you have acquired information about a given target and the software that is running on it, you can use further Google queries to find leakages resulting from the software. For example, if you know that the website is built using PHP you can use the search for errors and warnings mentioned above. If you know that PHP creates .log files that in certain cases might become public you can try further queries directed at locating those logs, for example, filetype:log “PHP Parse error”| “PHP Warning.”
The Google Hacking Database contains quite a few dorks to software which may be exploited in different ways. For example, the WordPress plugin BackupBuddy used to upload copies of the entire website in the public uploads directory so that any attacker could access the archive with the website’s data and possibly take control of it. A dork for finding potential backup archives can be found at https://www.exploit-db.com/ghdb/4306/.
Google Hacking Tools
In the past, there were many programs which could help you automate Google Hacking. Unfortunately, most are outdated and do not work anymore such as Metagoofil Metagoofil allows you to choose a domain, fetch a certain number of files in it extracted from Google and view juicy data from them instantaneously – such as emails, machine usernames, servers and so on.
The current source of Metagoofil may no longer work. However, we are including a link to a modified version that should function correctly. Be aware that some servers may disallow Metagoofil access to the files extracted by Metagoofil which will lead to an error for that particular file.
To install Metagoofil, all you have to do is download or clone the repository https://github.com/DimoffX/Metagoofil2016, open your Command line/Terminal, navigate to the trunk directory of the repository with cd and type python metagoofil.py to get help. You would need to have Python 2 installed on your machine to run it, which you can get from https://www.python.org/downloads/.
A sample command that you can try out is:
python metagoofil.py -d example.com -t pdf -l 5 -n 5 -o bgsites -f “example.html”
This will scan the given website for pdf files, download the first five files found, save them in the bgsites folder and create an HTML report named “novinite.html”
Figure 17: A Metagoofil HTML report for a website
Figure 18: An inline Metagoofil report for a Word document found on a website
Another useful tool currently available is the website https://www.shodan.io/ which is itself a different search engine, one that allows us to find specific types of machines (web cameras, routers, servers) that are connected to the Internet along with metadata about them such as the software behind them.
Google’s cache is a great way to view websites that were changed or no longer exist. You can also use it for anonymously visiting web pages without establishing a connection with the server of the web page as you only make an HTTP request to Google.
To do this, all you have to do is visit the cache of the desired web page with a &strip=1 added to the Google’s web cache URL to view the Text-only version of the web page. If the strip parameter is not added, you are still going to request external resources from the website itself such as images from the cached website itself.
Figure 10: Anonymously visiting a Wikipedia page
Anonymous Googling is especially useful when combined with proxies.
A simple way to open the text of the desired web page from the Google search results without accessing the normal cache is to click on the arrow next to the web page’s URL address, right click cache from the navigation that pops up, copy the link’s address, paste it into your browser address bar and add &strip=1 or %26strip%3D1 (the URL encoded form of &strip=1) to the URL encoded webcache.googleusercontent.com URL.
Below are some explanatory images.
Needs a conclusion
Figure 11: Copying the URL of the cached web page
Figure 12: Pasting the URL into the address bar and adding &strip=1 to the cached web page
Figure 13: You need to end up in the text-only version of that web page
Be wary that if you actually follow any link in the cache – you would end up on the actual website outside of any cache and anonymity.
Finally, another way to ensure anonymity is to view the web page through Google Translate where the request to the web page would be made by Google servers instead of your browser.
Google Hacking is not only a great way to discover and view web pages without being exposed to the targeted systems but an actual way of uncovering information in a typical Information Gathering phase of an attack. It is a must-know for most Information Security examinations and can bear great fruits if implemented correctly. Many queries are publicly shared in the GHDB for one to discover and examine while specific, personalized tests against sites can be made using advanced operators.