Measuring crawl efficiency
Crawl efficiency is the share of a search engine's crawling effort that lands on pages you actually want indexed — versus effort wasted on duplicates, dead ends, and junk. On a small site it barely registers; on a large one it's decisive, because crawl budget is finite and every request spent on a useless URL is a request not spent on a valuable one. This guide covers what eats crawl efficiency, how to measure it, and how to claw it back.
Run the Crawl Depth Checker on your site — free, no account.
What crawl efficiency actually is
Search engines allocate each site a rough crawl budget — how many URLs they'll fetch in a given period, driven by the site's perceived importance and how fast/stable it responds. Crawl efficiency is how well that budget is spent: a high-efficiency site spends most of its crawl on canonical, valuable, indexable pages; a low-efficiency one burns it on parameter variants, soft 404s, redirect chains, and infinite filter combinations.
When it matters: Below a few thousand URLs, Google generally crawls everything that matters and budget isn't a constraint — don't over-optimise it. Crawl efficiency becomes important at scale (large ecommerce, publishers, faceted catalogs) where wasted budget directly delays valuable pages getting crawled and indexed.
What wastes the budget
BUDGET SPENT ON… SHOULD BE… ───────────────────────── ───────────────────── ?sort= ?color= ?page= canonical product/category (faceted/parameter URLs) pages only session IDs / tracking one clean URL per page redirect chains A->B->C direct links to final URL soft 404s / thin pages real, indexable content infinite calendars/filters bounded, crawlable paths deep pagination tails curated links to key items
Note how many of these are structural: redirect chains, deep pagination, and duplicate URLs are the same problems that distort authority flow. Crawl efficiency and authority distribution are two symptoms of the same underlying link-graph health.
How to measure it
- chevron_rightServer log analysis — the ground truth. See exactly which URLs Googlebot fetches and how often. A high proportion of crawls hitting parameter URLs, redirects, or non-indexable pages is a direct efficiency signal.
- chevron_rightSearch Console Crawl Stats — total requests over time, response codes, and what file types/purposes the crawl is spending on. Spikes in 'other' or redirects are red flags.
- chevron_rightCrawl-vs-index gap — compare pages crawled to pages indexed, and crawlable URLs to your canonical page count. A large gap between known URLs and valuable pages signals bloat.
- chevron_rightCrawl-vs-sitemap — when a structural crawl reaches far fewer pages than the sitemap lists, part of that is budget, not just orphans (the honest caveat on big sites).
Don't confuse a budget limit with a site problem: When an audit crawl hits its page cap, 'not crawled' pages are a limit of the crawl, not necessarily neglected pages. Measure efficiency against what's actually fetchable, and caveat partial crawls rather than reporting them as site-wide failures.
How to improve it
- Consolidate duplicates — canonical tags, parameter handling, and consistent URL formatting so each page is one URL, not thirty.
- Block the junk — robots.txt or noindex on faceted combinations, internal search results, and infinite calendars that have no business being crawled.
- Fix redirect chains and 404s — link directly to final URLs; repair or remove dead pages so budget isn't spent fetching nothing.
- Flatten and prune — shorten deep pagination with curated links, and stop linking to low-value pages you don't want crawled.
- Keep important pages well-linked and shallow — efficiency isn't only about removing waste; it's about making sure the valuable pages are easy to reach so they're crawled often.
Limitations
- chevron_rightCrawl budget is opaque and not a published number — you infer efficiency from logs and Search Console, not a dashboard figure.
- chevron_rightFor most small-to-mid sites it's not the bottleneck; prioritise it only when scale makes it one.
- chevron_rightImproving efficiency helps valuable pages get crawled sooner; it doesn't by itself make them rank — content, relevance, and authority still decide that.
FAQ
What is crawl efficiency?expand_more
It's how much of a search engine's crawling effort lands on pages you actually want indexed, versus effort wasted on duplicates, redirects, soft 404s, and low-value URLs. High efficiency means most of your finite crawl budget is spent on canonical, valuable pages.
Does crawl budget matter for small sites?expand_more
Rarely. Below a few thousand URLs, search engines generally crawl everything that matters, so budget isn't the constraint. Crawl efficiency becomes important at scale — large ecommerce, publishers, and faceted catalogs — where wasted crawls delay valuable pages.
How do I measure crawl efficiency?expand_more
Server logs are the ground truth — they show exactly which URLs Googlebot fetches. Supplement with Search Console Crawl Stats and by comparing crawled versus indexed versus canonical page counts. A high share of crawls hitting parameter URLs, redirects, or non-indexable pages signals low efficiency.