Entity & Semantic SEO

For most of search history, Google was a string-matching machine. You typed cheapest flights to tokyo, and the engine looked for documents containing those exact strings. SEO, accordingly, was a game of putting the right strings in the right places: title tags, H1s, anchor text, and — in the bad old days — keyword-stuffed footers.

That world is gone. Since the Knowledge Graph launched in 2012 and the Hummingbird update reorganized the ranking pipeline in 2013, Google has been migrating from strings to things. The official phrasing in their announcement was literal: “things, not strings.” Instead of asking “which pages contain these words,” modern Google asks “which real-world entities is this query about, and which pages are authoritative sources for those entities?”

This shift is the single most important mental model for advanced SEO in 2026. If you still think in keywords, you are optimizing for an engine that no longer exists. This guide shows you how to think — and mark up — in entities instead.

Entities vs keywords

A keyword is a string of text. An entity is a thing — a person, organization, place, product, concept, or event — that exists independently of any particular wording.

Take the entity Tim Berners-Lee. The keywords tim berners-lee, inventor of the world wide web, creator of HTTP, and TBL all point at the same entity. Google resolves them to one node in its Knowledge Graph, assigns it a machine-readable ID (a Knowledge Graph MID like /m/07d5b, or a Wikidata QID like Q80), and connects it to other entities: World Wide Web, CERN, MIT, W3C.

Here is the difference in one table:

Aspect	Keyword (string)	Entity (thing)
Nature	A sequence of characters	A real-world concept with a stable ID
Identity	`nyc` ≠ `new york city`	`nyc` = `new york city` = `Q60`
Ambiguity	”Jaguar” is one string	Animal vs car vs OS — three entities
Relationships	None inherent	Connected in a graph (`isA`, `partOf`, `worksFor`)
How Google uses it	Lexical match	Disambiguation + reasoning + retrieval

🧑‍💻 Developer’s view: Think of keywords as raw string input and entities as resolved foreign keys. Search used to be WHERE body LIKE '%term%'. Now there is an entity-resolution layer in front — the query string is parsed into entity IDs, and ranking happens partly over a graph of those IDs. Your job is to make your pages cleanly joinable to the right nodes.

Why does this matter in practice?

Disambiguation. When Google understands that your page about “Python” is about the programming language (Q28865) and not the snake (Q472), it stops competing in irrelevant SERPs and starts ranking where you belong.
Synonym and intent coverage. You no longer need to repeat fifteen keyword variants. Cover the entity well and you rank for the cluster of strings that resolve to it.
Reasoning. Google can infer that a page covering React, JSX, hooks, and the virtual DOM is a genuine authority on front-end frameworks, because those entities are graph-neighbors. Keyword density can’t fake that.

Becoming an entity

The first strategic move is to make yourself — your brand and your authors — into entities Google recognizes. An unrecognized brand is invisible to the reasoning layer; it’s just another string. A recognized entity gets a Knowledge Panel, earns trust signals (the “T” in E-E-A-T), and becomes eligible to be cited as a source.

Google builds an entity from corroborating evidence across the web. Your job is to feed it consistent, machine-readable, cross-referenced signals. Three levers matter most.

1. sameAs — link your entity to known authorities

The sameAs property in schema.org markup is the most direct way to say “the entity on this page is the same as this node you already trust.” Point it at Wikidata, Wikipedia, LinkedIn, GitHub, Crunchbase, and official social profiles. Wikidata and Wikipedia are the highest-value targets because Google’s Knowledge Graph is partly seeded from them.

Here is Organization markup for a fictional dev-tools company. Put it in JSON-LD in your <head> or before </body>, ideally on your homepage and About page:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Organization",
  "@id": "https://acme.dev/#organization",
  "name": "Acme Dev Tools",
  "url": "https://acme.dev/",
  "logo": "https://acme.dev/logo.png",
  "foundingDate": "2019-04-01",
  "description": "Acme builds open-source observability tooling for distributed systems.",
  "sameAs": [
    "https://www.wikidata.org/wiki/Q123456789",
    "https://en.wikipedia.org/wiki/Acme_Dev_Tools",
    "https://www.linkedin.com/company/acme-dev-tools",
    "https://github.com/acme-dev",
    "https://www.crunchbase.com/organization/acme-dev-tools",
    "https://x.com/acmedev"
  ]
}
</script>

And the matching Person markup for an author — note how the author is worksFor the organization, wiring two entities together:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Person",
  "@id": "https://acme.dev/team/jane-doe/#person",
  "name": "Jane Doe",
  "jobTitle": "Principal Engineer",
  "worksFor": { "@id": "https://acme.dev/#organization" },
  "url": "https://acme.dev/team/jane-doe/",
  "knowsAbout": ["Distributed tracing", "OpenTelemetry", "Go"],
  "sameAs": [
    "https://www.linkedin.com/in/janedoe",
    "https://github.com/janedoe",
    "https://orcid.org/0000-0002-1825-0097"
  ]
}
</script>

💡 Tip: The @id values are internal anchors that let your schema graph reference itself. Using a stable URL fragment like #organization means your Article, Person, and BreadcrumbList blocks can all point back to one canonical organization node instead of redefining it. This is exactly how a graph should be modeled — define once, reference everywhere.

2. about / mentions — declare what your content covers

about says “this page is primarily about entity X.” mentions says “this page refers to entity Y.” Use real entity IDs so there’s no ambiguity:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "OpenTelemetry vs Jaeger: choosing a tracing backend",
  "author": { "@id": "https://acme.dev/team/jane-doe/#person" },
  "publisher": { "@id": "https://acme.dev/#organization" },
  "about": {
    "@type": "Thing",
    "name": "OpenTelemetry",
    "sameAs": "https://www.wikidata.org/wiki/Q98961422"
  },
  "mentions": [
    { "@type": "SoftwareApplication", "name": "Jaeger",
      "sameAs": "https://www.wikidata.org/wiki/Q60803298" },
    { "@type": "Thing", "name": "Distributed tracing",
      "sameAs": "https://www.wikidata.org/wiki/Q105749746" }
  ]
}
</script>

3. Consistent NAP and naming

NAP — Name, Address, Phone — must be byte-for-byte identical everywhere: your site, Google Business Profile, LinkedIn, Crunchbase, directories. Inconsistent NAP fractures your entity into several weakly-supported candidates, and Google may fail to merge them. Treat your canonical entity facts like a single source of truth in config:

# entity.yml — the one place your NAP lives; render everything from this
name:    "Acme Dev Tools, Inc."
url:     "https://acme.dev/"
phone:   "+1-415-555-0142"
address:
  street:  "500 Howard St, Suite 400"
  city:    "San Francisco"
  region:  "CA"
  postal:  "94105"
  country: "US"

⚠️ Caution: “Acme Dev Tools” vs “Acme Dev Tools, Inc.” vs “ACME” are three different strings to a naive matcher. Pick one legal name and one common name, and use them consistently. The same discipline applies to your domain: don’t split brand authority across acme.dev, acme.io, and getacme.com unless you have a deliberate redirect strategy.

Topical authority

Entities don’t live alone — they live in clusters. OpenTelemetry is connected to distributed tracing, spans, the OTLP protocol, Prometheus, sampling strategies, instrumentation libraries. Topical authority means demonstrably covering that whole neighborhood of entities and the questions users ask about them, not ranking for one fat keyword.

The old keyword playbook produced ten thin pages each targeting one phrase (opentelemetry tutorial, opentelemetry vs jaeger, opentelemetry python…), often cannibalizing each other. The entity playbook produces a content cluster: one comprehensive pillar plus interlinked supporting pages, together covering every meaningful sub-entity and sub-question.

	Keyword approach	Entity / topical approach
Unit of planning	A search phrase	A topic and its sub-entities
Page count logic	One page per keyword	Pages mapped to a knowledge map
Internal links	Random / by relevance hunch	Structured around entity relationships
Success metric	Rank for the keyword	Own the topic across the SERP
Risk	Cannibalization, thin content	Coverage gaps

To build the map, mine the entities Google already associates with your topic:

The People Also Ask box and “Related searches” — each is effectively a sub-question or sub-entity.
Wikipedia’s table of contents and “See also” for your core entity — a free, human-curated entity graph.
Wikidata’s property panel — the actual graph edges (subclass of, uses, has part).
A quick programmatic pull of related Knowledge Graph entities:

curl -s "https://kgsearch.googleapis.com/v1/entities:search" \
  --data-urlencode "query=OpenTelemetry" \
  --data-urlencode "types=Thing" \
  --data-urlencode "limit=10" \
  --data-urlencode "key=$GOOGLE_KG_API_KEY" -G \
  | jq '.itemListElement[].result | {name, id: ."@id", desc: .description}'

💡 Tip: A useful rule of thumb — for a topic to earn authority, your cluster should answer every question a knowledgeable colleague could ask about it in a 30-minute conversation. If a sub-question has no page and no section, that’s a coverage gap a competitor will fill.

The payoff compounds. Once Google trusts you as the authority on distributed tracing, new pages in that cluster rank faster, because the trust attaches to your entity within the topic, not just to individual URLs.

Practical steps

Here’s the workflow, end to end. Each step is concrete enough to put on a ticket.

1. Run an entity inventory. List the entities your site is (brand, products, authors, locations) and the entities you want to be authoritative for (your core topics and their sub-entities). For each, find or create the canonical reference — a Wikidata QID if one exists, an internal @id if not.

We ARE:        Acme Dev Tools (Q123456789), Jane Doe (no QID → /team/jane-doe/#person)
We OWN:        distributed tracing, OpenTelemetry, observability, OTLP, span sampling
Gaps to fill:  span sampling (no page), OTLP (only a passing mention)

2. Mark up entities with schema. Add Organization + Person to your identity pages, and Article with about/mentions to content pages, all wired together via @id. Generate and validate the JSON-LD with the Schema generator, then confirm it with Google’s Rich Results Test and the Schema.org validator. This is the structured-data discipline covered in depth in Layer 4: content & structure.

3. Organize internal links by entity, not by whim. Every supporting page should link up to its pillar and across to sibling entities, using descriptive anchor text that names the entity (distributed tracing, not click here). This is how you make your cluster legible as a graph:

        [ Pillar: Distributed Tracing ]
        /        |          |         \
 [OpenTelemetry][Jaeger][Span sampling][OTLP]
        \________cross-links between siblings________/

4. Earn authoritative mentions. sameAs is your claim; third-party corroboration is the proof. Get cited, profiled, and linked by sources Google already trusts — conference talks, GitHub READMEs of popular projects, industry publications, and, where genuinely warranted, a Wikidata entry. Unlinked brand mentions count too; Google reads them as entity-association signals.

5. Measure and iterate. Track whether a Knowledge Panel appears for your brand, whether you show up in the “From sources across the web” and AI Overviews, and which cluster pages rank. Fill gaps; deepen weak nodes; refresh the updated date when you do.

Entity inventory complete (we-are + we-own + gaps)
Organization and Person schema with full sameAs shipped
about/mentions on every important content page
NAP consistent across all properties
Internal links structured around the entity graph
At least one new authoritative third-party mention per quarter

Key takeaways

✅ Google ranks entities (things), not keywords (strings) — optimize for the concept, not the exact phrase.
✅ Make your brand and authors into recognized entities with Organization/Person schema and sameAs links to Wikidata, LinkedIn, GitHub, and Crunchbase.
✅ Use about and mentions with real entity IDs to tell Google precisely what each page covers.
✅ Keep your NAP and naming byte-for-byte consistent so Google never splits your entity in two.
✅ Build topical authority by covering a topic’s full entity neighborhood in an interlinked content cluster, not one keyword per page.
✅ Generate and validate your markup with the Schema generator and apply it through Layer 4: content & structure.