The Internet's Lingua Franca: We Classified 1.74 Million Websites Across 54 Country Domains — 64% Are in English

English is the native language of roughly 380 million people — about 4.7% of the global population. It is a second language for perhaps another 750 million, depending on how generously you define fluency. By any demographic measure, English is a minority language on a planet where Mandarin, Hindi, and Spanish each claim more speakers.

On the internet, demographics do not apply.

W3Techs reports that English accounts for 49.5% of content across the top 10 million websites — a figure that has held roughly stable since 2019 while every other language's share has either stagnated or declined. The conventional reading is that English dominance is gradually diluting as non-English-speaking populations come online. The data tells a different story. When you move beyond content volume on global TLDs and examine what language actual websites use on country-code domains — the namespaces explicitly created for national communities — the English footprint is not 49.5%. It is substantially higher, and it reaches places no one expected.

We used LLMSE, DomainsProject's AI-powered website classification engine, to detect the language of 1,744,730 websites drawn from the DomainsProject dataset and indexed in the LLMSE classification pipeline. The sample spans every major country-code TLD and dozens of generic TLDs, with language detection performed via HTML lang attribute extraction with fallback to the lingua computational linguistics library. We then cross-referenced these language classifications against the full DomainsProject dataset — over 600 million domain records across 244 country-code namespaces — to contextualize the findings against the scale of each national namespace. External triangulation draws on the EF English Proficiency Index (2025 edition, 2.2 million test-takers across 123 countries), W3Techs content language surveys (April 2026), and Ethnologue language data.

The headline: 1,121,764 of the 1,744,730 classified websites — 64.3% — are in English. Among the 54 country-code TLDs with sufficient sample sizes for analysis, English exceeds 90% on 8 ccTLDs, exceeds 50% on 16, and falls below 15% on only 8. Japan's .jp, at 7.0% English, is the global floor. Australia's .au, at 98.3%, is the ceiling. Between those two extremes, the data reveals not a single story of English dominance but five distinct mechanisms — colonial inheritance, commercial necessity, cultural resistance, tech-brand capture, and neutral-language default — each producing a measurably different English footprint on the domains of countries that, in most cases, do not speak English at home.

The Data

LLMSE continuously classifies websites using a combination of HTML metadata extraction, computational language detection, and LLM-powered content analysis. Here is the dataset scope for this analysis:

Metric Value
Total websites classified 1,744,730
Websites with detected language 1,688,092
Languages detected 146
English-language websites 1,121,764 (64.3%)
Country-code TLDs analyzed 54 (min. 100 classified sites each)
DomainsProject domain records (context) 600M+ across 244 ccTLDs
Classification period Q1 2026

The 1.74 million classified websites are drawn from the DomainsProject crawl pipeline and represent active, resolving websites — not parked pages, DNS-only records, or redirect chains. Language detection succeeded for 96.8% of classified sites. The remaining 3.2% (56,638 sites) returned no detectable language — typically single-page JavaScript applications, image-only pages, or sites behind authentication walls.

Global Language Distribution (Top 15)

Percentages in this table are calculated against the 1,688,092 language-detected sites (excluding 56,638 with no detectable language). The headline 64.3% figure uses the full 1,744,730 denominator.

Rank Language Websites Share
1 English 1,121,764 66.5%
2 German 100,548 6.0%
3 French 56,705 3.4%
4 Spanish 46,355 2.7%
5 Dutch 42,008 2.5%
6 Chinese 41,489 2.5%
7 Japanese 35,849 2.1%
8 Portuguese 26,785 1.6%
9 Russian 20,142 1.2%
10 Polish 18,939 1.1%
11 Vietnamese 15,680 0.9%
12 Italian 13,968 0.8%
13 Czech 12,368 0.7%
14 Indonesian 12,083 0.7%
15 Turkish 11,313 0.7%

English at 66.5% of language-detected websites is 11x the share of the next language (German at 6.0%). The gap between first and second place is larger than the combined share of languages ranked 2 through 15. This is not a distribution with a long tail competing for the top — it is a distribution with one language and then everything else.

Methodology

Definitions

  • Classified website: A URL that was fetched, rendered, and processed through the LLMSE classification pipeline, producing at least one metadata classification (language, category, or quality metric). Each URL represents one website — typically the apex domain or www subdomain.
  • Detected language: The natural language of the website's primary text content, determined by the LLMSE language detection pipeline. A website is assigned exactly one language.
  • English prevalence (%): The percentage of classified websites on a given ccTLD that were detected as English-language. Calculated as: (English-language sites on TLD) / (total language-detected sites on TLD) × 100.
  • Country-code TLD (ccTLD): A two-letter top-level domain assigned to a country or territory per ISO 3166-1 alpha-2, plus .eu (European Union). We include .io and .ai as ccTLDs per their IANA delegation, while noting their anomalous usage patterns.

Classification Heuristics

Language detection follows a two-stage pipeline:

  1. HTML lang attribute extraction (primary): If the HTML document declares a lang attribute on the <html> element, the ISO 639 code is resolved to a language name. This method is deterministic and precise but depends on correct markup — approximately 65% of websites in our sample provide a usable lang attribute.
  2. Computational detection via lingua (fallback): For the remaining ~35%, the extracted text content (meta description, keywords, and body text) is passed to the lingua language detection library, which uses statistical models trained on character n-grams and word frequency distributions. Lingua supports 75+ languages with high accuracy for texts over 50 characters.

A website is classified as "English" if either method identifies English as the primary language. No secondary or minority language detection is performed — a bilingual English/French website with English as the lang attribute is classified as English.

Dataset Scope

  • The 1.74 million classified websites are a convenience sample drawn from the LLMSE classification queue, not a statistically random sample of all websites on each ccTLD. The classification queue is populated from the DomainsProject crawl pipeline, which enumerates domains via active DNS resolution across all delegated TLDs.
  • Sample sizes per ccTLD range from 134 (.zw) to 342,471 (.io). The median sample size across the 50 analyzed ccTLDs is 4,602. Statistical significance varies by TLD — findings for ccTLDs with fewer than 1,000 classified sites should be treated as directional indicators rather than precise measurements.
  • Classification reflects the state of websites at the time of LLMSE processing during Q1 2026. Websites may change language over time.

Known Limitations

  • Convenience sample, not random sample: The LLMSE classification queue oversamples websites that appear in DNS crawls and undersamples sites behind CDN-only configurations or those with non-standard DNS patterns. This may bias toward established, well-configured websites and away from informal or community-run sites.
  • Single-language classification: Bilingual and multilingual websites are assigned one language based on the primary signal. In multilingual countries (Belgium, Switzerland, Canada), this approach undercounts minority languages on individual domains.
  • HTML lang attribute reliability: Some websites declare lang="en" by default regardless of content language — a known issue with CMS templates and boilerplate HTML. This may slightly inflate English counts for non-English websites using English-default templates.
  • Sample size variation: .io (342,471) and .de (83,961) have 2,500x and 627x the sample size of .zw (134). Per-TLD percentages should be compared with awareness of this variance.
  • No longitudinal data: This is a single-snapshot analysis. Claims about trends, shifts, or trajectories are not supported by the data and are not made in this post.

The full DomainsProject dataset is available for independent analysis at /dataset. LLMSE classification data is accessible via the LLMSE platform.

The Scorecard

English Prevalence Across 54 Country-Code TLDs

Rank ccTLD Country/Territory Classified English English % Top Local Language Local %
1 .au Australia 19,982 19,650 98.3%
2 .nz New Zealand 3,959 3,883 98.1%
3 .ie Ireland 2,661 2,607 98.0% Irish 0.5%
4 .uk United Kingdom 41,919 40,950 97.7%
5 .za South Africa 6,885 6,689 97.2% Afrikaans 0.2%
6 .sg Singapore 1,747 1,695 97.0% Chinese 0.6%
7 .in India 11,158 10,702 95.9% Hindi 0.8%
8 .ph Philippines 1,233 1,159 94.0% Tagalog 2.1%
9 .ca Canada 14,712 13,165 89.5% French 8.8%
10 .io Br. Indian Ocean Terr. 342,471 304,985 89.1% Chinese 2.8%
11 .my Malaysia 2,604 2,313 88.8% Chinese 3.1%
12 .us United States 5,297 4,633 87.5% Vietnamese 1.9%
13 .ai Anguilla 39,303 34,140 86.9% French 2.1%
14 .hk Hong Kong 1,410 941 66.7% Chinese 27.1%
15 .id Indonesia 4,881 3,180 65.2% Indonesian 32.8%
16 .th Thailand 1,086 621 57.2% Thai 34.3%
17 .eu European Union 23,035 11,427 49.6% German 11.4%
18 .rs Serbia 1,951 811 41.6% Serbian 35.1%
19 .ee Estonia 2,040 826 40.5% Estonian 49.0%
20 .ro Romania 7,097 2,848 40.1% Romanian 56.9%
21 .gr Greece 4,572 1,775 38.8% Greek 53.4%
22 .hr Croatia 2,252 809 35.9% Croatian 62.4%
23 .lt Lithuania 1,323 444 33.6% Lithuanian 62.1%
24 .be Belgium 9,298 2,842 30.6% Dutch 46.3%
25 .bg Bulgaria 1,444 430 29.8% Bulgarian 64.0%
26 .lv Latvia 1,195 353 29.5% Latvian 63.8%
27 .ar Argentina 4,928 1,452 29.5% Spanish 68.2%
28 .vn Vietnam 3,704 1,091 29.5% Vietnamese 68.3%
29 .nl Netherlands 42,240 12,368 29.3% Dutch 69.2%
30 .cn China 6,636 1,911 28.8% Chinese 50.0%
31 .ch Switzerland 13,674 3,940 28.8% German 51.5%
32 .se Sweden 8,525 2,384 28.0% Swedish 66.3%
33 .pt Portugal 3,916 1,098 28.0% Portuguese 68.6%
34 .mx Mexico 5,133 1,436 28.0% Spanish 68.6%
35 .de Germany 83,961 23,429 27.9% German 70.2%
36 .no Norway 4,347 1,213 27.9% Norwegian Bokmål 37.3%
37 .dk Denmark 8,971 2,181 24.3% Danish 71.8%
38 .cl Chile 4,109 984 23.9% Spanish 73.8%
39 .hu Hungary 7,400 1,757 23.8% Hungarian 73.0%
40 .il Israel 1,992 471 23.6% Hebrew 72.0%
41 .tr Turkey 3,507 813 23.2% Turkish 75.0%
42 .fi Finland 4,602 1,017 22.1% Finnish 74.6%
43 .sk Slovakia 3,696 786 21.3% Slovak 68.3%
44 .tw Taiwan 3,607 756 21.0% Chinese 68.0%
45 .at Austria 9,558 1,765 18.5% German 79.5%
46 .es Spain 10,841 1,913 17.6% Spanish 78.4%
47 .fr France 25,867 3,804 14.7% French 82.9%
48 .pl Poland 17,594 2,533 14.4% Polish 83.7%
49 .br Brazil 18,937 2,580 13.6% Portuguese 84.5%
50 .cz Czechia 12,536 1,597 12.7% Czech 83.5%
51 .kr South Korea 3,838 483 12.6% Korean 78.8%
52 .ua Ukraine 9,449 904 9.6% Russian 45.5%
53 .it Italy 6,726 623 9.3% Italian 89.3%
54 .jp Japan 20,911 1,466 7.0% Japanese 83.4%

The spread from 98.3% (.au) to 7.0% (.jp) — a 91-point gap — is the single most important number in this table. It means the question "how much of the web is in English?" has no single answer. It depends entirely on which slice of the web you examine. On Australian domains, English is near-universal. On Japanese domains, English is a rounding error. The global aggregate of 64.3% obscures a variance so wide that it contains entirely different internets.

English Prevalence by Tier

Tier English % ccTLDs Count Defining Pattern
English-native >85% .au, .nz, .ie, .uk, .za, .sg, .in, .ph, .ca, .io, .my, .us, .ai 13 Native speakers or colonial inheritance
English-majority 50–85% .hk, .id, .th 3 Commerce and post-colonial bilingualism
Bilingual/mixed 30–50% .eu, .rs, .ee, .ro, .gr, .hr, .lt, .be 8 EU integration, transitional economies
Local-dominant 15–30% .nl, .de, .se, .dk, .no, .ch, .vn, .ar, .mx, .pt, .bg, .lv, .hu, .il, .tr, .fi, .sk, .tw, .cn, .at, .es, .cl 22 Strong domestic language, English for commerce
Strong resistance <15% .fr, .pl, .br, .cz, .kr, .ua, .it, .jp 8 Large domestic internet ecosystems

The largest tier is "local-dominant" at 22 ccTLDs — countries where the local language controls 65-80% of the namespace and English holds a stable 15-30% commercial minority. This is the equilibrium state for most of the world's internet: local language for local audiences, English for international commerce and technical content. The 8 "strong resistance" ccTLDs are the exception that proves the rule — these are countries with domestic internet economies large enough to sustain entirely local-language ecosystems.

The Colonial Footprint: English Where the Empire Built the Internet

The most striking pattern in the scorecard is not that English-speaking countries have English-language websites. It is that former British colonies — countries where English is spoken natively by a fraction of the population — show English prevalence rates indistinguishable from the Anglosphere itself.

ccTLD Country English % Population (M) Native English Speakers (%) Official Language Status
.in India 95.9% 1,428 ~0.02% Co-official (with Hindi)
.sg Singapore 97.0% 5.9 ~37% Official (one of four)
.ph Philippines 94.0% 117 ~0.05% Official (with Filipino)
.za South Africa 97.2% 62 ~8.1% Official (one of eleven)
.my Malaysia 88.8% 34 ~1.5% Second language
.hk Hong Kong 66.7% 7.5 ~3.5% Official (with Chinese)

India is the defining case. With 1.4 billion people and roughly 260,000 native English speakers (0.02% of the population), India's .in domain space is 95.9% English. Hindi — spoken by 528 million people as a first language — accounts for 0.8% of classified .in websites. The disparity is not demographic. It is institutional. British colonial administration established English as the language of law, higher education, and central government. Post-independence India retained English as a co-official language precisely because it was no one's mother tongue — a neutral administrative register in a country with 22 scheduled languages. When India's formal-sector economy built its internet presence, it built it in the language its institutions already operated in.

The Philippines tells the same story with a different colonial path. English-medium instruction has been mandatory in Philippine schools since the Philippine Commission Act of 1901 — a US colonial policy that persisted through independence. Filipino and Tagalog dominate daily life, but the formal economy — banking, law, corporate communications — operates in English. Our data shows .ph at 94.0% English with Tagalog at just 2.1%. The gap between spoken language and web language is a colonial artifact preserved by institutional inertia.

Singapore at 97.0% English is the most complete example of post-colonial English assimilation. Singapore's language policy deliberately elevated English as the inter-ethnic lingua franca after independence in 1965, making it the medium of instruction across all government schools. Unlike India and the Philippines, where English coexists with dominant regional languages, Singapore engineered English into the operational foundation of the state. The 97.0% figure is not inherited passively — it reflects 60 years of deliberate policy.

South Africa at 97.2% presents an apparent paradox. English is the home language of only 8.1% of the population — behind Zulu (24.4%), Xhosa (16.3%), and Afrikaans (12.2%). Yet .za domains are 97.2% English, with Afrikaans registering at just 0.2%. The explanation follows the same institutional pattern: English dominates South African commerce, media, and digital communication despite being a demographic minority language. The domain data captures the formal-sector internet, which operates in the language of business, not the language of home.

The EF English Proficiency Index provides a useful triangulation point. South Africa scores 602 (13th globally, "Very High" proficiency) — consistent with strong institutional English. Malaysia at 581 (24th, "High" proficiency) aligns with our 88.8% English finding. The Philippines ranks 28th (score 569, "High" proficiency). Singapore was removed from the EF EPI in 2025 because EF reclassified it as a native English-speaking country — the strongest possible external validation of our 97.0% finding.

The post-colonial pattern is clear: in every former British colony with significant internet adoption, English prevalence on ccTLD domains exceeds 88%. This is not because these populations speak English at home. The data is consistent with a model where colonial-era institutions — law, finance, education, government — established English as the language of the formal sector, and the formal-sector internet inherited that default. The web reflects the language of institutions, not the language of people.

Commercial English: The Language of Global Trade Penetrates Local Namespaces

If colonial inheritance explains the >85% tier, a different mechanism operates in the 20-60% range. These are countries with no colonial relationship to English that nevertheless host substantial English-language web presences — driven not by institutional inheritance but by commercial necessity.

The Indonesia Anomaly

ccTLD Country English % English Speakers (est.) EF EPI Rank
.id Indonesia 65.2% ~10% 80th (Low)
.th Thailand 57.2% ~27% (tourism-influenced) 116th (Very Low)
.hk Hong Kong 66.7% ~53% (bilingual) 39th (Moderate)

Indonesia at 65.2% English is the most counterintuitive data point in the entire analysis. Indonesia has 270 million people, a national language (Bahasa Indonesia) with near-universal adoption, and an English proficiency level that the EF rates 80th globally in the "Low" band. Yet nearly two-thirds of classified .id websites are in English. Indonesian — a language spoken by virtually the entire population — accounts for only 32.8% of classified websites.

The data is consistent with a commercial driver: Indonesia's export-oriented businesses, e-commerce platforms targeting international buyers, and tourism infrastructure register English-language websites to reach global audiences. Indonesia's tech startup ecosystem — Tokopedia, Gojek, Traveloka — builds in English for investor visibility and regional expansion across Southeast Asia. The .id namespace is not a mirror of Indonesian society; it is a mirror of Indonesian commerce.

Thailand at 57.2% confirms the tourism and commerce hypothesis. Thailand ranks 116th out of 123 countries on the EF EPI — deep in the "Very Low" proficiency band. Yet over half of classified .th websites are in English. Thailand is Southeast Asia's largest tourism economy, receiving over 35 million international visitors in 2024. Hotels, tour operators, and hospitality businesses publish in English because their customers read English, even if their staff barely speak it.

The European Ceiling: Where English Stops at 30%

Across Northern and Western Europe — countries with "Very High" English proficiency on the EF EPI — English prevalence on ccTLD domains clusters in a remarkably narrow band:

ccTLD Country English % EF EPI Rank EF EPI Score
.nl Netherlands 29.3% 1st 624
.de Germany 27.9% 4th 615
.se Sweden 28.0% 8th 609
.no Norway 27.9% 5th 613
.dk Denmark 24.3% 7th 611
.fi Finland 22.1% 12th 603
.at Austria 18.5% 3rd 616

The Netherlands ranks first in the world for English proficiency — and .nl is only 29.3% English. Germany is fourth — and .de is 27.9%. The EF EPI measures population English ability; our data measures what language people actually choose for their websites. The gap reveals that the decision to publish in English is not determined by whether a population can speak English but by whether it needs to.

The Dutch, Germans, Swedes, Danes, Norwegians, and Finns are among the most English-proficient non-native populations on Earth. They overwhelmingly choose to publish their national websites in their national languages. English holds a stable 22-30% minority — the commercial and technical layer: international-facing businesses, tech companies, academic institutions, and export-oriented industries. The domestic internet — local businesses, government services, news media, community organizations — operates in the local language because its audience is local.

This 20-30% band appears to be the natural equilibrium point for English in countries with strong domestic languages and high English proficiency. Below this band, you find countries with either low English proficiency or large domestic internet ecosystems (or both). Above it, you find either former colonies or transitional economies integrating into global markets.

Language Resistance: Where the Domestic Internet Wins

At the bottom of the scorecard, eight ccTLDs show English prevalence below 15%. These are not small or underdeveloped internet markets — several host among the largest national domain namespaces in the world. Their low English percentages reflect internet economies that are large enough, and linguistically coherent enough, to function without English as a significant component.

ccTLD Country English % Local Language Local % Dataset Domains
.jp Japan 7.0% Japanese 83.4% 42,409,543
.it Italy 9.3% Italian 89.3% 21,460,389
.ua Ukraine 9.6% Russian/Ukrainian 85.0% 5,603,693
.kr South Korea 12.6% Korean 78.8% 4,315,289*
.cz Czechia 12.7% Czech 83.5% 10,517,658
.br Brazil 13.6% Portuguese 84.5% 33,955,472
.fr France 14.7% French 82.9% 20,515,453
.pl Poland 14.4% Polish 83.7% 16,304,968

*South Korea's primary domain namespace is co.kr; the .kr figure represents the broader ccTLD.

Japan at 7.0% is the global floor for English on a major ccTLD. Japan's resistance is structural, not merely cultural. The Japanese writing system — a combination of hiragana, katakana, and kanji — creates a natural barrier to English-language content production and consumption. More importantly, Japan built its domestic internet ecosystem early and comprehensively: Yahoo! Japan (launched 1996), Rakuten, Line, and NTT's portal infrastructure created a Japanese-language internet that captured the market before global English-language platforms could establish dominance. With 42.4 million domain records in the DomainsProject dataset, .jp is the world's fourth-largest ccTLD by our record count — an entire internet operating almost exclusively in Japanese.

France at 14.7% is the only country in the dataset whose low English penetration is partially legislated. The Toubon Law (Loi n° 94-665 of August 4, 1994) mandates French as the language of commerce, public services, and advertising. For businesses headquartered in France, all commercial communication — including websites promoting products or services — must be in French, with foreign-language content permitted only when accompanied by a French translation "as legible, audible, and intelligible as the foreign version." Non-compliance carries criminal penalties of €750 per offense for individuals and €3,750 for corporations. France is not merely culturally resistant to English — it has encoded that resistance into statute.

Brazil at 13.6% reflects the scale of a self-contained Portuguese-language internet. With 33.9 million domain records in the DomainsProject dataset, .br (including .com.br) is one of the world's largest national namespaces. Brazil's internet population — 181 million users — consumes content primarily in Portuguese, produced by domestic media conglomerates (Globo, Folha, UOL), e-commerce platforms (Mercado Livre, Magazine Luiza), and social networks where Portuguese is the default. The 13.6% English layer exists primarily in the tech sector and international-facing businesses.

Ukraine at 9.6% English presents a unique case — not because of low English, but because of what the remaining 90.4% reveals. The .ua language breakdown is 45.5% Russian, 39.5% Ukrainian, and 9.6% English. This is a wartime dataset. Our single-snapshot data cannot establish whether these proportions represent a shift or a longstanding equilibrium — longitudinal data would be required to make directional claims. What the snapshot does show is near-parity between Russian and Ukrainian on .ua domains, a distribution that external reporting on post-2022 language policy changes suggests may not have existed a decade ago, though we cannot confirm that from our data alone.

Accidental Englishes: When a TLD Escapes Its Country

Two ccTLDs in the >85% tier arrived there through a mechanism entirely unrelated to colonial history, commercial incentive, or language policy. They arrived there because developers thought the letters looked cool.

ccTLD Territory Population English % Dataset Records Actual Use
.io British Indian Ocean Territory ~3,000 (military) 89.1% 13,234,503 Tech/developer namespace
.ai Anguilla ~15,753 86.9% 3,450,000+ AI/tech startup namespace

.io at 89.1% English is the internet's most successful case of semantic hijacking. As we documented in our .io analysis, the British Indian Ocean Territory has no permanent civilian population. Its 13.2 million domain records exist because "I/O" — input/output — is the foundational abstraction of computer science, and developers adopted .io as an on-brand TLD for tech projects. The 89.1% English rate reflects the global developer community's language of work, not any territorial characteristic.

.ai at 86.9% followed an identical path a decade later, driven by the AI boom. Anguilla, a Caribbean island of 15,753 people, found its ccTLD adopted by AI startups, machine learning platforms, and tech companies seeking the ".ai" brand association. The 86.9% English rate reflects Silicon Valley's lingua franca projected onto a Caribbean country code.

These two ccTLDs are instructive because they reveal what happens when a domain namespace has no local population to set a language default: it defaults to the language of whoever adopts it first. In both cases, that was the English-speaking global tech industry.

Multilingual Complexity: English as the Language Nobody Chose

The final mechanism is the most structurally interesting: what happens when a domain namespace serves a population with no single dominant language? The data suggests English fills the vacuum — not as a chosen national language, but as the neutral default that no constituency objects to.

The .eu Paradox

Language .eu Share EU Population Share (approx.)
English 49.6% ~1% (native), ~38% (any competence)
German 11.4% 18% (native)
French 10.7% 12% (native)
Polish 5.5% 8% (native)
Dutch 5.2% 5% (native)
Italian 4.0% 13% (native)
Spanish 2.5% 8% (native)
Czech 2.3% 2% (native)

English at 49.6% of .eu domains is 4.3x the share of the next language (German at 11.4%) — despite English being the native language of roughly 1% of the post-Brexit EU population. Ireland and Malta are the remaining English-official member states, with a combined 5.5 million people in a union of 450 million. English persists on .eu not because EU citizens speak it at home, but because it functions as the EU's administrative lingua franca — the language of cross-border commerce, pan-European institutions, and multinational organizations. When a company registers a .eu domain, it chooses the language that the widest audience across 27 member states can read. That language is English, by default.

Belgium and Switzerland: National Multilingualism, English as Arbitrator

Metric .be (Belgium) .ch (Switzerland)
Language 1 Dutch 46.3% German 51.5%
Language 2 English 30.6% English 28.8%
Language 3 French 19.4% French 14.8%
Language 4 German 1.6% Italian 1.8%

In Belgium, English at 30.6% is the second most common language on .be domains — ahead of French (19.4%) in a country where French is one of two national languages. Belgium is officially trilingual (Dutch, French, German), with the population split roughly 60/40 between Dutch-speaking Flanders and French-speaking Wallonia. On .be domains, Dutch leads at 46.3% (consistent with Flanders' economic dominance), but English at 30.6% has displaced French from second place. English is not a Belgian language — it is the language Belgian businesses use when they want to be understood by both Flemish and Walloon audiences without choosing sides.

Switzerland shows the same pattern with a German tilt. German (51.5%) reflects the German-speaking majority, but English (28.8%) outperforms French (14.8%) and Italian (1.8%) — Switzerland's second and third official languages. In a country with four national languages, the commercially neutral option is none of them.

These multilingual cases reveal English's most subtle function on the internet: it is not the language people want to use. It is the language people use when using their own language would exclude part of their audience. In the EU, in Belgium, in Switzerland, English prevalence is a measure of linguistic fragmentation — the more languages a namespace must serve, the more it defaults to the one that offends no one.

What's at Stake

  • The 64.3% figure understates English dominance in commerce and technology. If the analysis were restricted to business-facing websites (excluding personal blogs, community sites, and government services), the English share would likely be significantly higher. The commercial internet is more English than the internet at large.

  • Post-colonial English prevalence on ccTLDs is nearly invisible in language policy debates. India's .in at 95.9% English means that the country's digital economy operates in a language spoken natively by 0.02% of its population. The implications for digital inclusion — access to information, government services, e-commerce — are structural and largely unexamined.

  • The 20-30% European ceiling suggests a stable equilibrium, not an expanding frontier. The Netherlands, Germany, and Scandinavia demonstrate that high English proficiency does not lead to English dominance on national domains. Population ability and web-publishing behavior are different variables. English's share of these namespaces is likely stable, not growing.

  • Japan, Italy, France, and Brazil demonstrate that linguistic resistance at scale is achievable — but only when a country has both a large domestic internet ecosystem and a commercially self-sufficient language. Countries with smaller internet economies and minority languages face a different equation: their content is either in English or invisible.

  • The .eu case is a leading indicator for supranational digital governance. If the EU's digital single market operates 49.6% in English post-Brexit, the question of whether EU digital regulation should privilege multilingualism or accept English as the de facto standard is no longer theoretical — the market has already answered.

  • LLMSE classification data at 1.74 million sites is the largest known language-by-TLD cross-reference. W3Techs surveys the top 10 million sites globally but does not break down by ccTLD. Our data provides a ccTLD-level view that has not previously been available at this scale.

What Would Help

1. Registries: publish language metadata in zone files. No ccTLD registry currently exposes the language of registered websites in its zone data. Adding a voluntary lang field to WHOIS/RDAP records would enable real-time linguistic monitoring of national namespaces without requiring external classification infrastructure.

2. CMS developers: default the lang attribute correctly. Our methodology relies on HTML lang attributes as the primary signal, but approximately 35% of websites either omit it or declare en by default. WordPress, Wix, Squarespace, and other major platforms should set the lang attribute based on the user's configured language, not the template's origin language. This is both an accessibility requirement (WCAG 3.1.1) and a data quality issue — the LLMSE classification pipeline would produce higher-confidence results with better source markup.

3. Researchers: normalize by ccTLD, not by global aggregate. The headline statistic that "English is 49.5% of the web" (W3Techs) or "64.3% of classified sites" (our data) is misleading without TLD-level disaggregation. English is 7.0% of the Japanese web and 98.3% of the Australian web — per-TLD breakdowns are available at /stats/. Any serious analysis of web language diversity must specify which namespace it is measuring.

4. Policymakers: examine the post-colonial English default on ccTLDs. India (.in at 95.9% English), the Philippines (.ph at 94.0%), and South Africa (.za at 97.2%) have formal-sector internet economies that operate almost exclusively in English despite multilingual populations. Digital inclusion initiatives should assess whether critical services — health information, government portals, legal resources — are accessible in the languages people actually speak, not just the language the web defaults to.

5. The EU: reconcile digital single market policy with the 49.6% .eu English reality. Post-Brexit, the EU's commitment to multilingualism is a stated policy goal but not a digital reality. If half of .eu domains default to English, digital regulation that assumes multilingual compliance may be building on a foundation that the market has already eroded.


This analysis is based on LLMSE language classification of 1,744,730 websites indexed during Q1 2026, cross-referenced with the DomainsProject dataset of over 600 million domain records across 244 country-code namespaces. Language detection uses HTML lang attribute extraction with computational linguistics fallback via the lingua library. External triangulation references the EF English Proficiency Index (2025 edition), W3Techs content language surveys (April 2026), and Ethnologue language data. The full DomainsProject dataset is available at /dataset. LLMSE classification data is accessible at llmse.ai. Country-level domain statistics are available at /stats/.