DomainsProject.org news feed
- After adding some generic subdomains (.com.xx, .net.xx, etc) resulting dataset grew significantly. Machine
ran out of disk space at 3.4T new (7.4T total) / 384 billion records.
- At least several other registrars (
.fm is a known culprit) are doing a dirty trick for non-existent domains. Special thanks to community for catching this.
- 54,081,701 new words for dataset. At least 82,312,348,922 new domains (1.6T) to check, which brings total generated dataset to pretty serious 5.6T.
- crawler code is now closed source and used internally. Most of the job is now done by Freya
- 4.0T dataset of generated DNS names is now being processed. Return is small, about 8-10k domains per 1 million records.
- Some of those are already in the database, so 212 billion records are expected to yield about 20 million new domains.
- There’s a separate process, called
autovacuum, running on a regular basis. It cleans up dataset from unreachable (expired, servfail, etc.) domains.
- Added this news file :-)