The art of searching for open source intelligence
The Internet is a big ocean, and it carries loads of information you might be interested in or looking for, but where and how to find that information? Thanks to search engines like Google that make the searches using a query possible, but is it enough? If you think so, then you need to rethink about it, we have world wide web, deep web, and dark web, are you getting information from every corner of the Internet? Well, it depends on how you are searching particular information; this is why it calls “art of searching.”
The art of searching applies to many aspects, but here it is being used particularly for open source intelligence. The objective of this series is to discuss the open source intelligence (OSINT) concept, tools, methodology and processes, the art of searching is the part of this series, in this series, we will look at the Internet using different prospect, and we will utilize a different approach.
Internet research experts is a myth:
People claim that they are Internet research experts, and they can find anything using their techniques, but nobody knows about the actual information available out there. Internet changes constantly, it changes within a fraction of seconds, while you are reading this there have been significant changes occurred on the Internet. The second reason can be drawn from the size of the Internet; it is huge, and you can’t claim that you have scanned every corner and provided the correct information with the accuracy.
You always find the information you are looking for on the Internet, but it changes so the accuracy can’t be measured. It makes open source intelligence a continuous process; World Wide Web is not the Internet. However, it is part of the Internet. The Internet is the network of different networks; it is an umbrella of the connected devices (computers, printers, routers, switches, servers, etc.) Think of the shodan; it does not search web pages, whatever the devices. Think about the quick port scan against any technology infrastructure; you scan for the devices, not the web pages.
So the important points to conclude here:
- Open source intelligence is a continuous process and organizations should adopt it like this.
- The World Wide Web is not the entire Internet; you should search the Internet for the information
- War dialing is not dead, it evolved and changed its shape (think of the random port scanning)
New school open source intelligence
Intelligence gathering is not a new topic, people of every era used to gather intelligence using their own techniques, now the techniques have been evolved, and we have ‘new’ way of searching the information.
The word ‘open’ refers the publicly available sources; it has nothing to do with the open-source software. “Open source intelligence (OSINT) is the process of collecting the intelligence from publicly available sources, paid or free, print or electronic.” The scope of OSINT is not limited to cyber security only, but business and corporate intelligence, military intelligence and another field where information matters.
Businesses hire information brokers and private investigators to gather information of their competitor, well it connected with the competitive intelligence; a corporate term refers the process to gather information about the competitors. The process utilizes the same techniques of gathering information:
- Web-based communities: social media websites, forums/blogs, Wikis, video and images sharing websites, news portals and other user generated mediums
- Dark web
- Newspaper, magazines, radio, television and computer-based information
- Govt reports, press conferences, marketing surveys, speeches, press releases, official statements (Tweets, Facebook posts, etc.)
- Academic research papers, theses, and dissertations, and interviews
Military and security agencies use open source intelligence to counter the terrorism and to gather the information of their opponents, content analysis of Middle East’s regional newspapers are mostly effective to predict the stability/instability of the region. It is an effective technique of gathering the cultural and demographic intelligence from the area not covered by the military intelligence spy. Commercial imagery sources and digital maps provide the up to date information to military commander regarding the airfield, roads, bridges, buildings and Govt. offices.
Offensive VS defensive OSINT
As discussed OSINT has a broader scope, but this series primarily focuses on cyber attacks. Offensive OSINT is when you study the attack before it happens while defensive is learning about the attacks against a company. The OSINT gives opportunities to both the defender and attacker; you can learn the weakness of a company and fix it while at the same time the weakness could be exploited.
The OSINT process
In the first step, you need to identify the sources from where you can get the required data, there are many techniques of acquiring the data, but the identification process is the most important because this is where you decide the result of the overall activity. Every single step of the process will be discussed throughout this series.
Harvesting divided into two types:
- Active harvesting – Target can learn about the harvesting
- Passive harvesting – It makes no connection with the target, hence targets never know about it
What information to look
- Technology infrastructure
- Software / hardware versions and OS information
- Network diagram
- Documents, papers, presentations, spreadsheets and configuration files
- Email and employee search (name and other personal information)
The information above can lead to the following cyber attacks:
- Brute force (password)
- Denial of service
- Social engineering
The search engine seems a rich source to search particular information; however, it is not in many cases. You can’t find the classified information by just Googling, yes, using Robots.txt you can make stop the crawler to crawl and index a particular page. The search engine can only provide the pages that it indexes. Apart from search engines, utilize online libraries and private forums/blogs.
Students study Boolean logic while studying digital electronics or other related courses; this logic also applies to a search engine, operators, and the relationship are: AND, OR and NOT.
Quotation mark ” “
It shows the exact order and phrase
For example: “African Americans.”
|-word to exclude it||“African Americans” diet -kid -girl -“marriages.”|
|AND, a default operator||If you write ‘infosec training’ or ‘infosec and training’ it makes no difference. Use AND with other operators. Otherwise, it makes no difference; it shows the result where every type keyword occur|
|OR allow more than one term||It does not follow a specific order. However, it requires at least one of the term to appear in the result
“African Americans” OR blacks
Meta search engines
Yippy clustering search engine
Yippy is formerly known as a cluster, and it is the best amongst the meta search tool available so far. It is unique because it employs its own clustering engine, software that organizes unstructured information into hierarchical folders. Clusty offers clustered results of Web, news, and certain specialty searches. The Clusty default is to search the web using Live Search, Gigablast, Ask, Wikipedia, and the Open Directory. Let see the logical categories that it creates:
It creates the cluster without the correct spelling or not
It allows the user to look at the sources of the search results and types of sites (e.g., .com, .qov). It also supports all the advanced search queries, and operators discussed before. It automatically groups huge amount of information logically at the same time it also shares the new areas of subject development. It also allows the user to create a custom tab based on the need of the search result; you can select the news sources, directories, and particular domain extensions.
Keep in mind that no search engine is the best and use more than one search engine during your research, utilize the specialized search engine for specific cases.
Other Metasearch engines:
Geographically limited search engines:
|Yandex||Russia, Turkey, Ukraine, Belarus, Kazakhstan|
Search documents and files
- Megasearch http://megasearch.co/
- Cheg for education
- Base engine for academic material
- Library of Congress
There are numerous people search engines are also there, but we will be discussing them in detail while doing the analysis and we will see how an attacker get the valuable information to launch their phishing and social engineering attacks.
Accessing the darknet for the information is crucial now a day, people share valuable information behind the TOR project, but during the open source intelligence process, you need to dig around everything and anything. You can access the darknet using the TOR, but you must require a smart darknet search engine like ahmia.fi to look for the relevant information in the darknet.
Advance search operator – Google
|Intitle||Search page title||yes||yes||yes||Yes|
|Allintitle||Search page title||Yes||Yes||Yes||Yes|
|Inurl||Search URL||yes||yes||no||Not really|
|Allinurl||Search URL||Yes||yes||Yes||Not really|
|Site||Search specific site||Yes||Yes||No||Not really|
|Allintext||Search text of page only||Yes||Yes||Yes||Yes|
|Filetype||Search file||Yes||Yes||Not||Not really|
|insubject||Group subject search||Like intitle||Like intitle||Yes||Like intitle|
Search engine is so powerful, and sometimes it shows the information that should not be available for public, a basic search operator:
Inurl: admin URL: orders
It is too dangerous for a company, and they should think about it.
Intranet or private network of companies is open, although they should be protected, this is what we call vulnerability? A quick Google search “Welcome to Intranet” reveals many addresses of the intranet that could be exploited. There are hundreds of examples where hackers used a search engine to find the important information about a company. Another example shows how someone can get the username and password of NOD32 antivirus by just Googling “intext:”eav” filetype:txt”
This is certainly not the end of the art of searching, not even the end of operators. Google hacking database provides lists of operators (dorks) to use for different purposes. In this article, we have discussed the OSINT from the search point of view, but there are so many things to discuss including but not limited to metadata searching, people searching, technology infrastructure and how it linked with an attack. Well, the next article of this series will focus on these objectives.