Technical3 min read

How Google discovers new pages

Before a page can rank — before it can even be indexed — it has to be discovered. Discovery is the step most SEOs take for granted and the one that silently fails most often: a page nobody links to is a page Google may never meaningfully crawl. Discovery isn't magic or luck; it's a graph walk with a few assists. Understanding exactly how it works tells you why some pages get found in hours and others sit undiscovered for months.

person_off

Run the Orphan Page Checker on your site — free, no account.

Find undiscovered pages free

Discovery is a graph walk

The primary way Google finds pages is by following links from pages it already knows. Googlebot starts from a frontier of known URLs, fetches them, extracts their links, and queues the new ones — repeating endlessly. Your site is a graph, and discovery is a crawler walking outward along its edges. A page with inbound internal links sits on that walk; a page with none is off the map.

On the walk vs off it
KNOWN ──► [Home] ──► [Hub] ──► [New Page A]   discovered
                       └──► [New Page B]   discovered

          [New Page C]   <- no inbound link
          (in sitemap only)   discovered slowly / maybe not,
                              crawled rarely if at all
Pages A and B are reachable by following links, so they're discovered on the next crawl of the hub. Page C is only in the sitemap — a hint, not a path — so it's discovered late if at all, and barely recrawled.

Sitemaps help — but they're not enough

An XML sitemap is a list of URLs you'd like crawled. It genuinely helps discovery, especially for new or large sites, but it has two limits people forget: it's a hint, not a guarantee of crawling or indexing, and it carries no authority. A URL that's in the sitemap but has zero inbound internal links is still an orphan — Google may know it exists but treats it as unimportant, crawling it rarely and ranking it poorly.

warning

The sitemap trap: “It's in the sitemap, so Google has it” is the most common discovery misconception. Submission is not the same as discovery, crawling, indexing, or ranking. Internal links do the heavy lifting; the sitemap is a supplement.

Why pages go undiscovered

  • chevron_rightNo inbound internal links — the page isn't on any crawl path. The number-one cause.
  • chevron_rightBuried too deep — pages many hops from a well-crawled entry point are reached late and recrawled rarely (crawl depth).
  • chevron_rightJavaScript-only links — links that exist only after client-side rendering are discovered less reliably and later than links in the server HTML.
  • chevron_rightCrawl budget exhaustion — on large sites, low-priority and duplicate URLs soak up budget, so genuinely new pages wait in line.
  • chevron_rightBlocked or noindexed paths — robots.txt disallows or noindex on intermediary pages can cut off the route to pages beyond them.

How to get new pages found fast

  1. Link to the new page from already-crawled, high-authority pages at publish time — the homepage, a relevant hub, the cluster pillar. This puts it directly on the next crawl path.
  2. Make the inbound links contextual and descriptively anchored, so discovery and relevance arrive together.
  3. Keep it shallow — within a few clicks of a strong entry point — so it's crawled sooner and more often.
  4. Include it in the sitemap as a supplement, and ensure the links pointing to it are in the server-rendered HTML, not JS-injected.
  5. For time-sensitive pages, request indexing in Search Console — but treat that as a nudge, not a substitute for internal links.
lightbulb

Build discovery into the template: Related-content modules, breadcrumbs, and hub pages mean every new page is born with inbound links instead of being published into the void. That single habit prevents most discovery failures — and most orphan pages.

FAQ

Will Google find my page if it's in the sitemap?expand_more

Maybe, but slowly and unreliably. A sitemap is a discovery hint with no authority — a page in the sitemap but with no inbound internal links is treated as unimportant and crawled rarely. Internal links from already-crawled pages are what reliably get a page discovered and recrawled.

How long does it take Google to discover a new page?expand_more

It varies from hours to never. A page linked from a frequently-crawled, high-authority page can be found on the next crawl; an orphaned page deep in the site or in the sitemap only can wait months or be missed. Strong internal links are the biggest lever on discovery speed.

Do JavaScript links get discovered?expand_more

Less reliably. Links that only appear after client-side JavaScript runs depend on Google rendering the page, which happens later and less consistently than reading server HTML. For dependable discovery, keep important internal links in the server-rendered markup.

What the fix list looks like

82

Health

B+

Grade

Strong structure with a few high-impact internal links to add. Acting on the list below could unlock a meaningful lift in organic visibility.

Internal links to add

/blog/how-to-improve-seoarrow_forward/features/internal-linking
High

Anchor: internal linking strategy

Placement: Paragraph 3, sentence 2

/blog/content-marketing-guidearrow_forward/pricing
Moderate

Anchor: structural SEO platform

Placement: Paragraph 6, sentence 1

/guides/keyword-researcharrow_forward/blog/topic-clusters
Moderate

Anchor: build topic clusters

Placement: Paragraph 2, sentence 4

14

Quick wins

12

Orphan pages

9

Anchor gaps