RankForgeBot — Our Web Crawler

Identifies as

RankForgeBot

Look for the token RankForge/1.0 (+https://rankforge.cc/bot) in your access logs.

robots.txt

Respected

We obey User-agent: * directives on every crawl. If robots.txt is unreachable, we treat the site as crawlable (standard crawler behaviour).

JavaScript

Not executed

We read the raw HTML your server returns — the same first-wave view a search engine sees. We don't run client-side scripts.

Rate

Polite + adaptive

A short delay between requests, a bounded number of parallel connections, and automatic back-off if your server returns 429 / 503.

How to recognise RankForgeBot

Every request our crawler makes carries an identifying token in its User-Agent header so you can spot it in your server logs:

RankForge/1.0 (+https://rankforge.cc/bot)

The +https://rankforge.cc/bot URL points back to this page, so anyone inspecting their traffic can find out who we are. If you see requests carrying this token, that is RankForge running an audit that a user — possibly you — requested.

We respect robots.txt

Before crawling, RankForgeBot fetches your /robots.txt and honors it. Concretely:

check_circleDisallow rules are obeyed. Any path disallowed for User-agent: * is skipped and never fetched. Robots handling is on for every audit by default.
check_circleCrawl-delay is honored. If your robots.txt sets a Crawl-delay, we slow down to at least that interval — even when it's longer than our own default delay.
check_circleSitemaps are used. We read Sitemap: directives from robots.txt to discover your canonical URL set, rather than guessing.
check_circleFail-open on errors. If robots.txt can't be fetched (network error or a 4xx/5xx response), we treat the site as crawlable — the same convention search engines use.

We crawl politely

An audit should be invisible to your infrastructure. RankForgeBot is built to take a snapshot, not to stress-test your server:

check_circleA delay between requests. We pause briefly between requests to the same host (default 0.1s, or your robots.txt Crawl-delay if it's longer) and cap how many connections we open in parallel.
check_circleAutomatic back-off. If your server starts returning 429 (Too Many Requests) or 503, we increase the delay progressively — up to several seconds between requests — and honor any Retry-After header you send.
check_circleA hard budget. Each audit stops at the requesting plan's page limit (10 pages for anonymous audits, 100 on the free tier) and a wall-clock cap. When either is hit, we stop and analyze what we already have.
check_circleRead-only, GET only. We only issue GET requests to fetch pages. We never submit forms, never log in, and never attempt to modify anything on your site.

What we access — and what we don't

RankForgeBot only reads publicly accessible pages on the domain being audited. From each page we extract the structural signals the analysis needs:

check_circlePage URLs and HTTP status codes. So we can map your site and flag broken or redirected links.
check_circleHTML structure. Titles, headings, meta tags, and canonical tags — parsed from the raw HTML.
check_circleInternal links and anchor text. The link graph is the core of the audit: which page links to which, and with what anchor.

We deliberately do not:

cancelCrawl behind a login. Password-protected areas, member dashboards, and authenticated content are never touched — we only fetch what an anonymous visitor could load.
cancelSubmit forms or run scripts. No form submissions, no POST requests, no JavaScript execution — so nothing on your side is triggered or changed.
cancelCrawl private or internal hosts. Requests to localhost, private IP ranges, and other non-public addresses are blocked outright, so RankForge can't be pointed at internal infrastructure.

How long we keep crawl data

We don't hoard your pages. Retention depends on who ran the audit:

scheduleAnonymous audits — 24 hours. An audit run without an account is stored temporarily and automatically deleted after 24 hours. Nothing anonymous is kept beyond that window.
scheduleRegistered audits — until deleted. When a signed-in user audits a site, the result is kept while their account is active so they can compare crawls over time. They can delete any audit at any point.
scheduleServer logs — up to 90 days. Request logs (used for rate limiting, security, and debugging) are retained for at most 90 days.

Full detail is in our Privacy Policy and Security overview.

How to opt out

If you'd rather RankForge never crawl your site, you have two reliable options.

Option 1 — block in robots.txt

RankForgeBot honors User-agent: * rules, so a standard disallow will stop it:

User-agent: *
Disallow: /

Scope it to specific paths if you only want to exclude part of your site. Note that a User-agent: *rule also applies to search engines — if you want to keep Google in but shut RankForge out, use Option 2.

Option 2 — ask us directly

Email [email protected] with your domain and we'll exclude it, so RankForge won't crawl it regardless of who requests an audit — without you having to change robots.txt or affect any other crawler.

Questions about a crawl?

If RankForgeBot did something unexpected on your site, or you want it excluded, we want to hear about it. Reach us at [email protected].

shieldRead our security overview

RankForgeBot — our web crawler