International SEO & hreflang in Practice
Ship multi-language, multi-region sites that rank in each market — with hreflang done right.
You reach for international SEO the moment your content stops serving one audience. Two triggers force the issue. The first is language: you translate the site into German, Japanese, or Spanish, and you need each translation to surface for the people who read that language. The second is region: the language is the same but the market differs — a US store and a UK store both in English, with different prices, shipping, spelling, and legal copy. Often you hit both at once: English for the US, English for the UK, German for Germany, German for Austria.
Search engines do not automatically know that your /de/produkt page is the German equivalent of /en/product. Left to guess, they get it wrong, and the cost is concrete:
- Wrong-region ranking. Your US dollar pricing page ranks for British searchers, who bounce when they hit checkout in the wrong currency.
- Self-cannibalization. Two near-identical English pages compete for the same query. Google picks one almost at random, splits your signals, and neither ranks as well as a single page would.
- Duplicate-content dilution. Untranslated boilerplate and shared templates across locales look like thin, repetitive content, dragging down the whole cluster.
hreflang is the mechanism that fixes this. It is a set of annotations that tell search engines, “this URL is the X-language, Y-region version of this page, and here are all its siblings.” Done right, it routes each searcher to the page built for them. Done wrong — and it is very easy to do wrong — it silently does nothing, or worse, points crawlers at the wrong pages. This guide is the practitioner’s version: structure first, then implementation, then the failure modes that eat hours.
URL structure — the foundation hreflang sits on
Before any annotation, you choose how locales map onto URLs. This decision is hard to reverse (it changes every URL on the site), so weigh it carefully. There are three viable patterns.
| Approach | Example | SEO authority | Geo signal to Google | Cost & setup | Maintenance |
|---|---|---|---|---|---|
| ccTLD (country-code top-level domain) | example.de, example.co.uk | Split — each domain builds authority from scratch | Strongest — .de is an unmistakable Germany signal | High — buy/defend many domains, separate hosting & certs | High — every domain is its own SEO property |
| Subdirectory | example.com/de/, example.com/en-gb/ | Consolidated — all locales share one domain’s authority | Moderate — set via hreflang + content | Low — one domain, one cert, one deploy | Low — single codebase, single property |
| Subdomain | de.example.com, uk.example.com | Mostly split — Google treats subdomains as semi-separate | Moderate | Medium — DNS + cert per subdomain | Medium |
Recommendation for most sites: subdirectories. They let every locale inherit the authority your main domain has already earned, instead of forcing each market to climb from zero. They are the cheapest to run — one repository, one deploy pipeline, one TLS certificate, one property to monitor. And they are exactly what static site generators produce naturally.
Reserve ccTLDs for cases where the geo signal is worth the cost: you are a large brand with the resources to build authority in each country independently, local trust matters enormously (finance, healthcare, government-adjacent), or a regulator effectively requires a local domain. Subdomains are a middle option, usually chosen for infrastructure reasons (a separate stack per region) rather than SEO ones — Google has stated it handles them fine, but in practice authority consolidation is less clean than with subdirectories.
💡 Tip: the locale segment goes first in the path —
/de/blog/post, not/blog/de/post. A top-level prefix is unambiguous, easy to route, and lets you apply locale logic (currency, language headers) at one branch of the routing tree.
🧑💻 Developer view: whatever you pick, make the locale a single source of truth in code — one config object mapping
locale -> { hreflang, currency, path prefix }. Every other system (sitemap, hreflang tags, language switcher, analytics) should read from that one place. The day you addfr-CA, you want to edit one file, not seven.
hreflang implementation — three delivery methods
hreflang is the same data delivered through one of three channels. You pick one primary method per page; mixing them on the same URL invites contradictions. The non-negotiable rule that overrides everything below: annotations must be bidirectional (return tags). If page A lists B as its German alternate, then B must list A as its English alternate. If even one link is missing the return, Google ignores the entire cluster for that page. Think of it as a handshake — both hands must extend, or no deal.
Method 1 — HTML <link> elements
The most common method. Each page’s <head> lists every locale alternate, including itself (self-referential), plus x-default.
<!-- In the <head> of https://example.com/en/product -->
<link rel="alternate" hreflang="en" href="https://example.com/en/product" />
<link rel="alternate" hreflang="en-GB" href="https://example.com/en-gb/product" />
<link rel="alternate" hreflang="de" href="https://example.com/de/produkt" />
<link rel="alternate" hreflang="zh-CN" href="https://example.com/zh-cn/product" />
<link rel="alternate" hreflang="x-default" href="https://example.com/en/product" />
Every alternate page in this set must carry the same block (self-referencing its own URL as one of the entries). That sameness is the bidirectional handshake.
Method 2 — XML sitemap
Best for large sites: you maintain the annotations in one file instead of in thousands of page heads, and a single sitemap entry declares the whole cluster at once.
<url>
<loc>https://example.com/en/product</loc>
<xhtml:link rel="alternate" hreflang="en" href="https://example.com/en/product"/>
<xhtml:link rel="alternate" hreflang="en-GB" href="https://example.com/en-gb/product"/>
<xhtml:link rel="alternate" hreflang="de" href="https://example.com/de/produkt"/>
<xhtml:link rel="alternate" hreflang="x-default" href="https://example.com/en/product"/>
</url>
Remember the namespace on the root element: xmlns:xhtml="http://www.w3.org/1999/xhtml". And the bidirectional rule still applies — every <url> entry in the cluster needs the full set of xhtml:link children, not just a link back to one peer.
Method 3 — HTTP Link header
The only option for non-HTML files like PDFs, where there is no <head> to edit. Delivered as a response header.
Link: <https://example.com/en/manual.pdf>; rel="alternate"; hreflang="en",
<https://example.com/de/handbuch.pdf>; rel="alternate"; hreflang="de"
Picking a method
| Method | Best for | Watch out for |
|---|---|---|
HTML <link> | Small/medium sites, full template control | Page weight; every page edited on locale change |
| XML sitemap | Large sites, programmatic generation | Sitemap must stay perfectly in sync with live URLs |
| HTTP header | PDFs and other non-HTML assets | Needs server/edge config; easy to forget |
Getting the codes right
The value is language or language-REGION:
- Language only —
en,de,zh,fr. Targets speakers of that language anywhere. - Language + region —
en-GB,en-US,zh-CN,pt-BR. Targets that language in that country.
The language code is ISO 639-1 (two letters). The region code is ISO 3166-1 alpha-2 (two letters, a country, not a continent or language). The classic trap: the UK’s country code is GB, not UK. And x-default is the fallback bucket — the page shown to anyone whose language/region you do not explicitly cover (more on it next). Case is not enforced, but the conventional form is lowercase language, uppercase region: zh-CN, en-GB.
Common mistakes — the failure modes that cost hours
hreflang fails silently. There is no error page; the tags simply get ignored and your locales drift in the rankings. These are the recurring causes.
-
Missing return tags. Page A points to B, but B does not point back to A. The single most common failure — and it invalidates the whole cluster, not just the broken link. Always render the complete, symmetric set on every page.
-
Wrong or invalid codes.
en-UKinstead ofen-GB. Inventing region codes (en-EU— there is noEUcountry). Using a language where a region belongs. A single invalid entry can void the set. Validate against the ISO lists. -
Relative URLs. hreflang requires absolute URLs with protocol and host —
https://example.com/de/produkt, never/de/produkt. Relative URLs are ignored. -
canonical fighting hreflang. This is the subtle killer. Each locale page’s canonical must be self-referential —
/de/produktcanonicalizes to/de/produkt. If every locale instead canonicalizes to the English version, you are telling Google “these are duplicates, only index the English one,” which directly contradicts hreflang’s “these are distinct locale equivalents.” The canonical wins, and your translations vanish from the index. Rule: self-referencing canonical + hreflang set, on every page. -
Missing
x-default. Not strictly required, but without it Google has no declared fallback for unmatched users. Always include anx-default— usually pointing at a language selector page or your most general/default-language version. -
Annotating redirecting or non-200 URLs. Every URL in the set must return
200 OKand be indexable. Pointing hreflang at a URL that 301-redirects or 404s breaks that node of the cluster. -
Mixing delivery methods inconsistently. Sitemap says one thing, HTML head says another. Pick one source of truth.
⚠️ Note: hreflang is a clustering and routing hint, not a ranking boost. It will not lift a weak page. It will, when correct, make sure the right already-ranking page is the one shown to each searcher. Expect “users now land on the correct locale,” not “traffic doubled.”
Language vs region targeting — don’t out-compete yourself
The hardest strategic call is how granular to go. More locale pages is not better; it multiplies maintenance and risks self-competition.
Target by language alone (en, de, fr) when the content is genuinely identical for all speakers of that language regardless of country — same product, same pricing, same copy. One German page serving Germany, Austria, and Switzerland is simpler and concentrates signals on a single URL.
Target by language + region (en-US, en-GB) when the same language needs different content per market: different prices or currency, different shipping or availability, region-specific legal text, or spelling/idiom differences worth the split (color/colour, fall/autumn).
The danger zone is same language, multiple regions with near-identical content — en-US and en-GB pages that differ only by a price string. To Google these look like duplicates competing for the same queries, the classic self-cannibalization trap. Two ways out:
- Differentiate enough to justify the split — distinct pricing, local examples, market-specific sections — and wire up correct hreflang so Google routes by region instead of choosing for you.
- Or collapse to language-only (
enwithx-default) if the markets truly do not differ. One strong page beats two weak twins.
💡 Tip: a useful test — if a US and UK visitor would be equally well served by the same page, you don’t need separate region pages. Split only when the experience genuinely diverges.
Tooling & validation
hreflang is too error-prone to verify by eye at scale. Build it into your workflow.
- Google Search Console. The International Targeting / Page indexing reports surface hreflang errors (“no return tags,” “unknown language code”) straight from Google’s own parse. This is the ground truth — Google telling you what it sees. Check it after every locale rollout.
- hreflang validators. Dedicated checkers (Merkle’s hreflang tags testing tool, TechnicalSEO.com, Aleyda Solis’s generator) take a URL and verify reciprocity, code validity, and absolute URLs. Use the generator to produce a correct first set, then a validator to confirm the live pages.
- Crawlers. Screaming Frog, Sitebliss, or Ahrefs’ site audit crawl the whole site and flag missing return tags, non-200 alternates, and canonical/hreflang conflicts across thousands of URLs at once — the only practical way to audit a large site.
curlfor headers and a quick head check. For one-off verification of HTTP-header hreflang or a page’s head:
# Inspect HTTP Link-header hreflang (e.g. on a PDF)
curl -sI https://example.com/en/manual.pdf | grep -i '^link:'
# Pull every hreflang link from a rendered page
curl -s https://example.com/en/product | grep -o 'hreflang="[^"]*"'
🧑💻 Developer view: add a CI check. Crawl your built output, parse every page’s hreflang set, and assert reciprocity, absolute URLs, and self-referential canonicals before deploy. A 30-line script in your pipeline catches the missing-return-tag bug at PR time instead of three weeks later in Search Console.
🧑💻 This site as a worked example
This site is itself bilingual (English and Chinese), so it lives by these rules. It is built with Astro’s i18n support: each guide exists under both /en/ and /zh/ path prefixes (subdirectories — the recommendation above), driven by a single locale config that maps each locale to its hreflang value and path prefix. That config is the single source of truth from the Developer-view tip — the language switcher, the routing, and the hreflang tags all read from it.
The hreflang annotations are generated automatically by Astro’s sitemap integration rather than hand-written into every page head. When the site builds, the integration walks the locale config, emits the full bidirectional xhtml:link cluster for each translated page (including x-default), and writes one sitemap. Adding a new locale is a config edit, not a sweep through hundreds of files — and reciprocity is guaranteed by construction, because the generator emits the symmetric set every time.
This is the framework’s Layer 7 (Advanced & Extensions) in action — international SEO is an advanced concern you layer on once the fundamentals are solid. But it is implemented down in Layer 2 (the Build layer): the i18n routing and sitemap generation are wired into the site’s foundation, not bolted on later. If you want the mechanics of how the static output and sitemap are produced, the Edge SEO and build-layer guides cover the deploy and generation pipeline that makes this automatic.
Key takeaways
- ✅ Default to subdirectories (
/de/,/en-gb/) — they consolidate authority, cost the least, and are what static generators produce naturally. Reserve ccTLDs for strong geo-trust needs. - ✅ Make every hreflang set bidirectional and self-referential, with absolute URLs and an
x-default— one broken return tag voids the whole cluster. - ✅ Keep canonicals self-referential on each locale; never canonicalize translations to the default language, or they drop from the index.
- ✅ Use ISO codes correctly — language is ISO 639-1, region is ISO 3166-1 (
en-GB, noten-UK). - ✅ Split by region only when content genuinely differs; otherwise target by language to avoid self-cannibalization.
- ✅ Validate continuously — Search Console + a crawler + a CI reciprocity check, because hreflang fails silently.