AdvancedRankForge Research5 min read

How topical clusters are detected

A topic cluster is easy to declare on a content plan and surprisingly hard to detect in a live site, because the map you intended and the graph you actually built rarely match. RankForge detects clusters from the site itself — combining the shape of the internal link graph with the semantics of the pages — rather than trusting a folder structure or a spreadsheet. This article explains the signals it uses, why it insists two independent lenses agree before calling something a cluster, and where the method honestly reaches its limits.

hub

Run the Topical Authority Checker on your site — free, no account.

Map your topic clusters free

Detecting clusters vs. declaring them

Most teams 'have' clusters the way they have a New Year's resolution: on paper. The content brief grouped twelve articles under a pillar, but six months and three writers later the actual link graph tells a different story — half the supporting pages never linked back, two drifted off-topic, and the pillar quietly became an orphan. Detection means reading the structure that exists, not the one that was planned.

This is why URL folders are a weak signal on their own. /blog/seo/internal-links and /blog/seo/crawl-budget sharing a path says the CMS filed them together; it says nothing about whether they link to each other or cover the same topic. A real topic cluster is a property of the graph and the content at once — detect it from those, and the folder becomes a tiebreaker, not the evidence.

The two lenses: structure and semantics

RankForge looks at a candidate group through two independent lenses and only treats it as a cluster when both agree. Either lens alone produces confident nonsense; together they cancel each other's failure modes.

Why one lens isn't enough

                 STRUCTURE says        SEMANTICS says
                 "these link tightly"  "these are about
                                        the same thing"
                 ───────────────────  ───────────────────
Real cluster          YES                   YES        <- detect
Mega-menu / nav       YES                   NO         <- reject
  (everything links                                       (link noise)
   to everything)
Unlinked topic        NO                    YES        <- flag as
  silos                                                    "missing links"
Random pair           NO                    NO         <- ignore

Structure alone is fooled by site-wide navigation (everything links to everything). Semantics alone can't tell a wired cluster from a pile of related-but-isolated pages. The interesting cases live where the two lenses disagree — those become recommendations, not clusters.

The structural lens

Structurally, a cluster looks like a hub-and-spoke subgraph: one page (the pillar) receives links from several others (the supporting pages), which also link among themselves more than they link outside the group. RankForge measures internal link density within a candidate group versus its links to the rest of the site, and weights contextual links above navigation links — a group held together only by the global menu is not a cluster, it's just the nav.

The semantic lens

Semantically, the pages in a cluster should be about the same thing. RankForge looks at lexical and topical overlap across titles, headings, and body content — the shared vocabulary that signals a common subject — plus the descriptiveness of the anchor text links use, since a supporting page linking 'internal linking guide' to its pillar is a far stronger topical signal than one linking 'read more'.

The signals, concretely

Under those two lenses, the concrete signals RankForge weighs include:

chevron_rightInternal link density — how interconnected a candidate group is relative to its links outward. High internal / low external density is the signature of a real cluster.
chevron_rightHub asymmetry — whether one page receives disproportionately many inbound links from the group (a pillar) versus a flat mesh of equals (a cluster without a pillar, which is a weaker structure).
chevron_rightAnchor-text topicality — whether links within the group use descriptive, on-topic anchors, and whether those anchors converge on a consistent theme.
chevron_rightContent overlap — shared terminology and subject matter across the group's titles, headings, and body, distinguishing a real topic from coincidental co-linking.
chevron_rightPath and co-citation hints — URL proximity and being co-linked from the same external/source pages, used as tiebreakers, never as primary evidence.

info

Computed once, read everywhere: Cluster membership is computed in the backend and feeds the Structural Health Score, the cluster maps, and the link recommendations from one place — the web view and PDF read it rather than re-deriving it.

Why isolated articles fail — and clusters win

An isolated article is a single sample. Even if it's excellent, it asks a search engine to rate your expertise on a topic from one data point, and it accumulates only whatever authority its own backlinks earn. A cluster changes the unit of evidence from one page to a connected body of work: coverage signals you answer the full range of questions, and the link structure concentrates authority on the pillar while lending it back to the supporting pages.

This is also why clusters compound. Each new supporting page that links into the cluster strengthens the pillar, which lends authority back down — the structural mechanism behind topical authority. A pile of isolated articles on the same subject has the raw material for that and captures none of it, purely because the wiring is missing.

Limitations — there is no one perfect algorithm

Cluster detection is inference, not measurement, and honesty about that is part of the methodology. The hard cases are real:

chevron_rightBoundary pages legitimately belong to two clusters (a page on 'internal linking for ecommerce' bridges two topics). Any hard partition will misfile some of these; RankForge surfaces overlap rather than forcing a single home.
chevron_rightThresholds are judgement calls. How dense is dense enough to be a cluster? There's no universal cutoff, so the detector errs toward flagging 'almost a cluster, missing links' rather than inventing structure that isn't there.
chevron_rightDegenerate sites defeat the structural lens. When a mega-menu links every page to every page, internal-link density is uninformative — RankForge detects this saturation and caveats cluster findings instead of reporting confident groups.
chevron_rightThin or client-rendered content weakens the semantic lens. If body text is title-only after rendering, topical overlap is harder to judge, and the report says so.

lightbulb

The useful output isn't a perfect map: It's the disagreements — semantically-related pages that don't link (add links) and tightly-linked pages that aren't topically related (likely nav noise or a cannibalization risk). That's where the Topical Authority Checker produces action.

FAQ

Does RankForge use my URL folders to find clusters?expand_more

Only as a weak tiebreaker. Folders show how your CMS filed pages, not whether they actually link to each other or cover the same topic. Clusters are detected from the internal link graph and page content; the URL path is a hint, not evidence.

Why does a group of related pages not show as a cluster?expand_more

Usually because the semantic lens sees the topic but the structural lens doesn't see the links — the pages are related but isolated. RankForge flags that as a missing-links opportunity rather than calling it a cluster, because a cluster without internal links doesn't behave like one.

Can a page belong to more than one cluster?expand_more

Yes. Bridge pages legitimately span topics. RankForge surfaces overlap instead of forcing each page into a single cluster, because a hard partition would misrepresent how real sites are structured.