Introduction to Information Gathering

Penetration testing begins with a pre-engagement phase in which the pen tester gets acquainted with the client, the goals, limitations, and scope of the penetration test. After that, the pen tester begins the actual penetration test and usually proceeds with an information gathering endeavor in which the pen tester locates publicly available information related to the client and seeks ways that could be exploited to get into the systems. In this phase, the pen tester also uses tools such as port scanners which can help him/her get an understanding of the systems in the network and the software that is on them. Using that information, the pen tester can pinpoint what impact the different findings may have on the client and the vulnerability analysis part can proceed where the information found is used to locate possible vulnerabilities in the systems and the subsequent exploitation phase where the vulnerabilities are attempted to be exploited to get into the system. Thus, without good information gathering – there would be no vulnerabilities to find and exploit.

Now is the time to make an important distinction – that between passive and active information gathering. Passive information gathering refers to gathering as much information as possible without establishing contact between the pen tester (yourself) and the target about which you are collecting information. Active information gathering involves contact between the pen tester and the actual target. When you actively query systems to gain the information you are moving to a dark legal situation as most countries prohibit attempts to break into systems without the necessary permission. Thus, if you do not have permission to test a system (a get-out-of-jail-free card) it is not a good idea to perform active querying against it. For example, if you use Nmap to find open ports and applications on a desired remote system, you are actively interacting with that system in an attempt to find weaknesses and if you are doing a whois lookup, browsing the company’s website or querying the search engines for information about the company you are passively collecting information. An example of active information gathering is calling company staff and attempting to trick them into divulging privileged information.

The pre-attack phase can be described in the following way:

  1. Passive information gathering to discover preliminary information about the systems, their software and the people involved with the target.
  2. Passively determining the network range to find out the machines in the network that you can focus on.
  3. The pen tester actively checks which of the located machines are alive to know what to target.
  4. The pen tester actively looks for open ports and the applications on them for each of the machines in the network to find out the optimal way to break in.
  5. The pen tester uses tools such as traceroute and Cheops to fingerprint the operating system behind each of the machines.
  6. The pen tester maps the network by writing down and visualizing all the data collected and starts with the attack phase…

Passive Information Gathering Tools


We can use The Harvester to collect emails about targeted domains. We can then use these emails to initiate social engineering or launch other attacks. The Harvester is a tool built in Python, so to run it you would need Python on your machine and preferably added to your PATH environment variable. You can download Python from If you have git on your machine, you can open your Terminal, navigate to a desired folder and type git clone
to download the tool. Otherwise, you would just have to open the GitHub repository, click on “Clone or download” and download the source code as a ZIP. Once you have it installed, you can open your command line/Terminal and navigate to the folder in which you have installed The Harvester and type python to get the help screen.

Figure 1: The Harvester’s help screen should look something like this

Now, if we want to look for emails in the domain we can perform a query like this:

python -d -l 100 -b google

This will search for emails in Google located in and will limit the search to the first 100 query results in Google.

Figure 2: No emails found in my domain as I have obfuscated them

Let us try searching for emails within the InfoSec Institute’s website. We type the same query but change the domain: python -d -l 100 -b google

Figure 3: The query for emails in InfoSec Institute came up with four emails that we can potentially take advantage of

To improve our research, we can try out the search with different search engines. If we search for emails within the first 30 results of the Yahoo’s search engine we end up with only one email, but it is different than the four ones that we have collected with Google. We type python -d -l 20 -b yahoo and end up with the following output:

Figure 4: Harvester search with Yahoo yielded a different result

As you might have noticed, besides spitting out the emails within that domain, the Harvester lists all of the subdomains that it managed to find associated with the given domain and maps them to their respective IP address. This information is useful as well as the different subdomains may be using different server software, software development frameworks and could even be on different machines which would mean those subdomains may be prone to different vulnerabilities and can have differing levels of exploitability.


Netcraft is a web application which allows us to see detailed information about the software, the web server and the web host involved with an arbitrary website. Netcraft makes gathering this information painless as it takes milliseconds to get to know the software behind the website. For example, knowing who the web hosting provider of the website is could prove useful if you decide to launch a social engineering attack. Then, you can write an email to an administrator appearing to originate from their web hosting company, asking him/her to open a link and change a setting. Furthermore, the different software development frameworks that the website uses that you can find with Netcraft can have their own vulnerabilities which you can then attempt to exploit. The image below shows a lookup on InfoSec Institute in

Figure 5: Searching for information about InfoSec Institute in Netcraft

We can see that InfoSec Institute’s hosting provider is Digital Ocean so we may note that somewhere. We can also see that the website is using a Content Management System (CMS) called Expression Engine, a software development framework called CodeIgniter, that the back-end language is PHP and so on. We may look for flaws in those technologies that would subject the website to vulnerabilities. We may also see that the web server software was recently changed from Apache/2.2.31. Knowing the exact version of the server software is useful as vulnerabilities for various server software are published online regularly and patched with newer versions of the software which are often not implemented in time by most websites. Furthermore, knowing the server software narrows the scope of research. If we know that the server is running Apache, we would not be searching for vulnerabilities for the Microsoft IIS Server. Though, sometimes the server information provided by the website can be a bait/honeypot.


Maltego is a data mining tool that can help us get and visualize intelligence gathering. It has several versions, one of which is free to play around with. The free version is the so-called “Community Edition” and you can download it from All you have to do to run it is register an account with Paterva. The free version though limits the number of mined results that can be shown to you.

Figure 6: Maltego’s pre-built machines restrict results to 12 entities in the free version

Maltego Primer

Once you have installed Maltego, set up your account and logged in, you should see a final step as shown below:

Figure 7: Successfully installed Maltego

Click on “Open a blank graph and let me play around.”

In the palette box to the left, you can see the different entities that you can incorporate into your searches. Drag and drop the Domain entity in the Palette to the graph. You should see a single Earth icon with the site as a label.

Figure 8: Adding a domain entity to the graph

Now, change the domain name to the domain for which you want to acquire new information. To do this, you have to double-click the domain name and type your desired domain name.

After you have prepared the domain to explore, you can run transforms or queries for different information for the domain by right clicking the domain entity and choosing the desired query.

Figure 9: A slice of the different transforms that are available in Maltego

Ethical Hacking Training – Resources (InfoSec)

To illustrate how those transforms work, if you choose to transform the entity to phone numbers using search engines, you will get a few phone numbers for InfoSec Institute. You can use these phone numbers to make your social engineering attack more convincing, employ vishing, pretexting and so on.

Figure 10: The transform to phone numbers using search engines is applied on InfoSec Institute and five telephone numbers are visualized.

Now, if we apply another transform – our results will just increase. Even better, we can apply new transforms to the results that occur from our previous transforms.

To illustrate this, if we right click on the first returned phone number – we would get different transforms that we can apply to it. Let us say that we choose to transform the phone number to URLs (meaning, get the URLs where the telephone number is mentioned). That would get us a level deeper into the hierarchy/relationship of the results and will show us a few URLs where can check manually who answers the telephone number and so on.

Figure 11: Going a level deeper in Maltego. Getting the URLs where one of the phone numbers associated with InfoSec Institute is mentioned

As you can see, there are quite a lot of entities and transforms to explore.

Active Information Gathering Tools

Nmap allows to run scans on targeted machines to see what ports are open on them and thus what applications are running on them. The difference from the tools mentioned above is that you are actively interacting with the given machine by sending specially crafted packets to it. Besides, discovering open ports Nmap allows us potentially to detect the operating system of the machine and discover the services that are running on the opened ports.

You can download Nmap from or You can use Nmap from your CLI or from a GUI application.

After installation, to see the different ways in which you can use nmap, you would just have to type nmap and hit Enter.

Figure 12: Nmap’s help screen

A simple check for open ports and the corresponding services on them is nmap -sS -Pn <IP ADDRESS>. This will attempt a stealth discovery and treat the machine as alive.

To detect the operating system that the machine is running on, you can use the -O flag and so on.

If Nmap seems like a tool that you want to explore more, please visit Irfan Shakeel’s article on Nmap.


We have demonstrated the basics of the penetration testing process, the role of information gathering in this process and we have demonstrated some of the most popular tools for gathering information out there. Using these tools can not only help you gather sufficient information for a fruitful information gathering, but they can also help you increase the efficiency and effectiveness of the whole penetration testing process.


Gregg, M. (2016). The Seven-Step Information Gathering Process | Certified Ethical Hacker Exam Prep: Understanding Footprinting and Scanning | Pearson IT Certification. [online] Available at: [Accessed 23 Jun. 2016].