Posts tagged "dataset"

5 posts found.

May 03, 2026 32 min read

The Kubernetes Census Hidden in DNS: 74,508 Apex Domains, 20,420 Cluster Identities, and One Default Value That Owns Them All

We extracted every Kubernetes signal we could find from a 17 April 2026 DNS crawl — heritage=external-dns TXT markers AND CNAME chains terminating in managed-Kubernetes ingress endpoints (AWS ELB k8s-prefixed names, .azmk8s.io, .gke.goog, .openshiftapps.com, .k8s.ondigitalocean.com, etc.). 74,508 unique apex domains carry at least one strict-precision Kubernetes signal (41,565 with TXT markers, 34,219 with strict CNAME pointers, 1,276 in both). 20,420 distinct cluster identities are visible. 13,620 apexes (32.8% of TXT-marker side) use the literal string "default" as their cluster identifier. 815 use the literal example strings from the ExternalDNS README. 6,842 apexes publish a sensitive Kubernetes namespace (argocd, vault, kube-system, istio-system) to public DNS. 1,936 apexes have already migrated to the Gateway API. This is the first combined-signal cluster-identity census of the public Kubernetes footprint.

May 02, 2026 21 min read

The Hidden SaaS Map: What 840 GB of DNS TXT Records Reveal About Who Owns the Internet's Apex

We classified every TXT record from a 17 April 2026 DNS crawl — 840 GB of raw JSONL (56 GB after xz compression) — and built a vendor census from the verification tokens domains leak into DNS. 40.2 million unique apexes carry at least one tracked SaaS verification token. Google's 26.0 million-apex footprint is 3.4x Microsoft 365's 7.6 million. Domain marketplaces (AfterNic + dan.com + 4.cn + Aliyun + west.cn + 17ex + Sedo + DomainEasy) collectively touch 5.0 million apexes — more than Atlassian, Stripe, Adobe, Apple, and DocuSign verification tokens combined. Zoho's 1.23 million is the single largest non-Google, non-Microsoft SaaS verification footprint we measure. The TXT layer is the closest thing the public Internet has to a SaaS census.

April 30, 2026 24 min read

The Dead Web: 1.65 Billion Hostnames That No Longer Resolve

We compared the master DomainsProject corpus (3.12 billion unique hostnames ever observed) against the 17 April 2026 active crawl (1.47 billion currently resolving) and found that 52.9% of the observable web no longer answers the DNS. .com alone holds 808 million dead hostnames; the five Freenom-managed ccTLDs (.tk, .ml, .ga, .cf, .gq) are 99% extinct; the new-gTLD program churns at 75% dead; and a small spine of restrictive ccTLDs — .jp, .it, .de, .nl — sits below 30%.