Open Source Intelligence: Can I just Google it?

January 23, 2019

One of the most common questions people ask when considering any type of investigation or research is “if the information is online, can I just Google it?”

While a professional investigator or intelligence analyst may chuckle at this assumption, many consumers do indeed consider Google to be the Holy Grail of finding information on the web. People turn to Google for answering nearly every question imaginable and many believe that Google is the “be-all, end-all” for finding information. It’s no wonder: over 70,000 Google searches are performed every second. That’s over 6.2 billion Google searches per day worldwide. Google also owns over 90% of the world market share of search engines.

Having Google – or any search engine for that matter – at our fingertips has transformed the way we seek and interact with information. But the efficiency and convenience of simply “googling” something has also given us a false sense of assurance when it comes to the reliability, accuracy, and depth of the information and answers we seek.

Just the Tip of the Iceberg

In order to understand what Google can and cannot find on the web, we first need to understand how the internet is structured.

The internet can be conceptually understood as similar to an iceberg. The part of the iceberg above the water level is what is visible to all. This is known as the “surface web” or “clear web,” containing everything you can theoretically find publicly. The surface web is the only layer that is accessible to Google. It consists of news websites, online stores, and some social media. Although there are trillions of web pages in the surface web, they amount to only about 4% of the entire internet.

Analogous to an iceberg, the vast majority of the internet’s substance lies beneath the surface. This underwater section, known as the “deep web,” amounts to over 93% of all content on the internet. The deep web consists of everything you’d typically find behind a website’s login screen or paywall. These sites are not indexed by search engines like Google because they aren’t meant to be public-facing. This does not necessarily mean that they are dangerous. Emails, social media interactions, online databases, internal company websites, financial records, and file storage are all part of the deep web. This content is still on the internet, but not discoverable by Google searching.

The final layer of the internet is the dark web. The dark web is an intentionally hidden network of websites typically used for anonymous and often dangerous and illegal activities. The dark web can only be accessed through special browsers and comprises about 3% of all internet content. Drug trafficking, weapons sales, and other illicit exchanges make up the majority of dark web activity.

What’s Under the Hood

Of course, there are the obvious reasons why one should be reluctant to rely too heavily on Google. Not everything on the internet is truth. Sources can be unreliable or inaccurate. Web sites disappear, rank differently, and change over time as new information is added to the web. But there are also several not-so-apparent drawbacks to using Google that much of the public is unaware.

Google search results are based on an enormously complex mix of algorithms, configurations, advertising, and web sites that have been specifically optimized by their owners to rank higher in search engine results. Your Google searches are also affected by where you’re Googling from. If you Google “florists” and you’re physically located in Los Angeles, you’re not likely to find similar businesses located in New York, even though they exist. This is how Google has been configured to give you what it thinks are the most relevant results.

We also need to keep in mind Google’s incentive structure. Google is a business trying to make money, like any other. Google rakes in billions of dollars annually from advertising revenue and through marketing of its own brand of products and services. Google’s foremost priority is not to provide you with free, easy, and complete “point-and-click” online investigations.

It’s Gotta Be Here Somewhere!

Here’s an example of how Google search results can be unreliable and incomplete: try Googling a cell phone number. You’ll see dozens upon dozens of bogus websites claiming to be able to trace the number to an owner or address. You know the number must belong to someone, but Google only shows websites trying to sell you dubious services.

Here’s another example: try Googling a name along with the phrase “criminal records.” Again, you’ll see similar gimmicky websites offering instant, cheap background checks. Yet you find none of these services accurate or reliable. What Google and these background check websites rarely mention is that many counties in the U.S. do not provide online access to criminal and court records. Some jurisdictions only store these records in hard copy format in clerks’ offices and at courthouses. Thus, a Google search isn’t the most reliable method for such an important question.

The point is that even if we know certain information exists, we are not always able to find it through Google. Google has inherent limitations, including only indexing a tiny fraction of the entire internet. Relying solely on what Google shows in its results for truth and completeness is a dangerous endeavor.

Are You “Feeling Lucky?”

So, can you “just Google it?” It depends on if you’re seeking a quick search result, or a verified, reliable, and thorough investigation that will uncover information that cannot be found by Google or any search engine. In any case, we can unequivocally guarantee that there is more out there than what you’ll find in a Google search.

Google, like any search database, is only able to show you what it has indexed in its catalog of information and web sites. It is not a query of the entire existence of the internet.

The bottom line: by relying solely on Google, you don’t know what you could be missing. It’s impossible to know exactly how much data exists on the internet. Google has indexed hundreds of billions of sites totaling trillions of individual web pages. Google is, no doubt, a great starting point for research. But Google is only capable of indexing the 4% of websites that are part of the surface web, leaving out 96% of all online content. This means that when you’re doing a Google search, you’re really only seeing a fraction of a fraction. What meets the eye on Google may be just the tip of the iceberg.

Nighthawk Strategies’ “Ask a PI” blog series seeks to explore some of the common questions – and sometimes myths and misconceptions – about private investigations, online research, social media intelligence, privacy, and security. Got a question? Ask a PI! Email us your questions/suggestions at [email protected].

Open Source Intelligence: Can I just Google it?

Just the Tip of the Iceberg

What’s Under the Hood

It’s Gotta Be Here Somewhere!

Are You “Feeling Lucky?”

Nighthawk Strategies CEO Edward Ajaeb Attains Distinguished Certified Legal Investigator

5 FAQs When Hiring a Private Investigator

Staying Safe Online: Free Links & Resources