The Deep Web and Dark Web: What They Actually Are (And What Developers Should Know)

The terms “deep web” and “dark web” are often confused or conflated. They refer to different things, and the distinction matters — both for understanding how the internet actually works and for making good decisions when building web applications.

What Is the Deep Web?

The deep web is simply every part of the internet that isn’t indexed by search engines. That’s it.

Google and other search engines crawl and index publicly accessible web pages. Anything that isn’t publicly accessible, or that has been explicitly excluded from indexing, is “deep web” content.

The deep web is enormous. Estimates suggest it makes up the majority of the internet by volume. Most of it is completely mundane:

Your email inbox (not indexed by Google)
Your bank account portal (behind authentication)
Corporate intranets (internal to organisations)
Subscription content (paywalled)
Database-driven content that doesn’t have stable, linkable URLs
Pages with noindex directives

There is nothing inherently illicit about the deep web. The reason it’s “hidden” from search engines is usually access control — content that requires authentication or payment to access.

What Is the Dark Web?

The dark web is a subset of the deep web that requires special software to access — typically Tor (The Onion Router). Dark web sites have .onion addresses and are deliberately designed to make traffic and user identity difficult to trace.

Legitimate uses of the dark web include:

Journalists and whistleblowers communicating with sources in repressive regimes
Privacy-conscious individuals accessing the internet without their ISP tracking behaviour
Security researchers studying threat actor activity
Activists in countries with internet censorship
News organisations with anonymous tip lines (The Guardian, The New York Times, and others have .onion addresses)

Illegitimate uses do exist. The dark web hosts illegal marketplaces and other criminal activity. But framing the dark web as purely criminal is like framing email as purely for spam — the technology is neutral; the use cases vary.

What This Means for Web Developers

Your own application content is deep web by default. Authentication walls, private user data, and dynamic content that doesn’t have indexable URLs are all deep web. This is correct behaviour. You don’t want Google indexing your users’ private data.

Search engine indexing is an active choice. Content is indexed because you’ve made it accessible and haven’t blocked it. The default is not indexed. robots.txt and noindex meta tags are how you manage what search engines can see.

Security implications. “Security through obscurity” — hiding sensitive pages and hoping search engines don’t find them — is not security. If a URL is guessable and unauthenticated, it can be found whether or not it’s in Google’s index. Authentication, authorisation, and proper access controls are the actual security layer.

Crawlability vs. accessibility. A page can be accessible to users but not crawlable by search engines (due to JavaScript rendering requirements, authentication, or noindex directives). These are independent properties. Modern SEO requires understanding both.

Tor and anonymisation networks. If you’re building applications that could face political censorship, providing a .onion address gives users in restricted regions access without exposing them to surveillance. This is standard practice for news organisations and human rights groups.

Common Developer Misconceptions

“The deep web is dangerous.” No — it’s your email, your bank, your company’s internal tools. It’s not dangerous; it’s private.

“You need special tools to access the deep web.” No — you access the deep web every time you log in to any website. The deep web is just behind authentication.

“The dark web is mostly illegal.” The dark web includes significant legitimate use. The illegal markets get press coverage; the mundane privacy-focused uses don’t.

“My app’s private pages are secure because they’re not indexed.” No — unindexed doesn’t mean secured. Security requires authentication and authorisation, not obscurity.

Understanding how indexing, access control, and anonymisation networks work helps developers make better architectural decisions. If you’re building an application with specific privacy, security, or accessibility requirements, talk to us.