https://github.com/spaze/domains
Unofficial and incomplete lists of various domain names
https://github.com/spaze/domains
domains tld tlds x-powered-by
Last synced: 4 months ago
JSON representation
Unofficial and incomplete lists of various domain names
- Host: GitHub
- URL: https://github.com/spaze/domains
- Owner: spaze
- Created: 2017-02-17T15:54:04.000Z (over 8 years ago)
- Default Branch: main
- Last Pushed: 2023-02-04T01:17:36.000Z (over 2 years ago)
- Last Synced: 2024-12-25T13:41:35.962Z (5 months ago)
- Topics: domains, tld, tlds, x-powered-by
- Homepage:
- Size: 53.7 MB
- Stars: 71
- Watchers: 14
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# tld-cz.txt
Unofficial and incomplete lists of various domain names, for research purposes only.The CZ list is an append-only list because there's no easy way to filter out expired domains, see [this issue](https://github.com/spaze/domains/issues/2). [@k47](https://twitter.com/kaja47) ran a scan and says roughly half of the domains are dead. Consider yourself warned.
## Sources
- [Initial set](https://blog.root.cz/oskar/jak-vylistovat-domenu-cz/866150/) by Ondřej Caletka
- Few updates by [Patrik Votoček](https://github.com/Vrtak-CZ)
- [Majestic Million](https://blog.majestic.com/development/majestic-million-csv-daily/)
- [Alexa Top 1M](http://s3.amazonaws.com/alexa-static/top-1m.csv.zip)
- [regZone.cz](https://www.regzone.cz/uvolnovane-domeny/)
- [HSTS preloaded list](https://cs.chromium.org/chromium/src/net/http/transport_security_state_static.json)
- [scans.io dataset](https://gist.github.com/roycewilliams/b87a140a4869baf4d2c907c6e352b970) by [@roycewilliams](https://github.com/roycewilliams)
- [Expired domains archive](http://wladass.cz/archiv-expirovanych-domen/) from [Monitoruju.net](https://www.monitoruju.net/expirovane-domeny-archiv/)
- [Nette Framework-powered](https://gist.github.com/Myiyk/7589213) list by [@Myiyk](https://github.com/Myiyk)
- [Certificate Transparency](https://www.certificate-transparency.org/) logs
- A list by [Vítězslav Lindovský](https://vitezslav-lindovsky.cz/)
- A private collection by [Vladimír Smitka](https://github.com/lynt-smitka)
- Another private collection by [Kamil Vavra](https://github.com/vavkamil)
- A list by [domainsproject.org](https://github.com/tb0hdan/domains/tree/master/data/czech_republic)Thanks!
# whois-cz.csv
Registrant ID and date of expiration from the Whois for tld-cz.txt. Contains also some historical data.
If a domain from tld-cz.txt is not in whois-cz.txt with expire_at > current_date it should be considered as not existing.# cz-gov.txt
Czech government domains (fully-qualified, not just eTLD+1) originally from circa 2018.## Want to contribute?
Have a list of domains you'd like to add? Feel free to create a pull request!Here's a short how-to:
1. Only 2nd-level domains *foo.tld*, no *bar.foo.tld* (lines should match `^[a-z0-9-]+\.tld$`, so for example `grep --only-matching --perl-regexp "[a-z0-9-]+\.cz" data.txt > yourlist.txt`)
2. Use LF newlines, not CRLF (`dos2unix yourlist.txt`)
3. Generate a new list, for example with `cat tld-cz.txt yourlist.txt | sort | uniq > tld-cz-new.txt`
4. Review `tld-cz-new.txt`, rename to `tld-cz.txt`, pull request itThank you!