Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/nimblemachines/analyzing-iana-root-db
Simple scripts to fetch and parse the IANA root zone database (the current set of delegated top-level domains (TLDs)) into one of two forms: Lua tables, or a tab-separated CSV file suitable for conversion to a spreadsheet.
https://github.com/nimblemachines/analyzing-iana-root-db
Last synced: 7 days ago
JSON representation
Simple scripts to fetch and parse the IANA root zone database (the current set of delegated top-level domains (TLDs)) into one of two forms: Lua tables, or a tab-separated CSV file suitable for conversion to a spreadsheet.
- Host: GitHub
- URL: https://github.com/nimblemachines/analyzing-iana-root-db
- Owner: nimblemachines
- Created: 2019-09-03T06:03:32.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2021-10-13T19:41:39.000Z (about 3 years ago)
- Last Synced: 2024-08-02T15:22:23.060Z (3 months ago)
- Language: HTML
- Size: 531 KB
- Stars: 6
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: changed.lua
Awesome Lists containing this project
README
Curious as to the story with the program for New gTLDs
(https://newgtlds.icann.org/en/) and looking at the DNS root db
(http://www.iana.org/domains/root/db), I noticed something odd and suspicious:
a lot of gTLDs were registered by companies with eerily similiar names: Half
Hallow, LLC; Knob Town, LLC; Steel Falls, LLC.I finally decided to get to the bottom of the matter, downloaded the HTML
version of the root db, and went at it with Lua, trying to parse out the bits.
I haven't yet written code to download all the domain files (they are also
HTML, and would also need to be scraped), but I did write some simple filters
to show all of Google's, or Donut's registered domains.That last sentence was kind of a spoiler, I guess. All those weird names
(except for "Beats Electronics, LLC", which matches the pattern) belong to
Donuts, Inc, a Bellevue, WA company that is aggressively adding new TLDs. So
far they have the biggest portfolio of a single company: 201 domains. (241 if
you also count the 40 that have been delegated to United TLD Holdco, which
Donuts now owns.)I created a parsing mode that blats out the entire database in a form that can
be easily uploaded to Drive as a Sheet. This makes it easy to sort and play
with. And the URLs are finally handled in a nice way (using =HYPERLINK()).This code is a work in progress! I hope to grab new versions of the root file
and the constituent domain files on a regular basis. Weekly would be great --
these things are changing rapidly!Running ``./fetch.sh`` will grab the latest root zone db as an HTML file, date
stamp it, and stash it in ``root-db/``.Running ``./gen.sh`` will read the files in ``root-db/`` and for each one it
finds it will generate two files in ``out/``: a .lua file containing the
database as a table of tables; and a .txt file that is a CSV file suitable for
uploading to Google Docs as a spreadsheet (eg, for further analysis).``changes.lua`` is a work-in-progress. Since things are changing all the time
-- almost daily -- I thought it would be nice to make it easy to see what has
changed between two snapshots, but I haven't figured out how to do it yet.But that's not the whole story.
The IANA root zone database is a moving target. It shows the list of
currently-delegated domains, and who is currently responsible for each one.
But what about the original applications? Is there a list somewhere? It turns
that there used to be, but it's now hard to find. I saw a URL in this gist:https://gist.github.com/lukaszkorecki/2924179
and decided to try downloading it myself. This is the URL:
http://newgtlds.icann.org/en/program-status/application-results/strings-1200utc-13jun12-en
As of August 2018 that URL redirects to
https://gtldresult.icann.org/application-result/applicationstatus
The page at the original URL was a big HTML table (with .CSV and .PDF download
options); the page at the redirected-to URL shows the first of 56 pages, which
you could presumably download one by one and concatenate.Fuck that!
Luckily there is a copy of the original table in the Internet Archive's
Wayback Machine. The URL to *that* ishttps://web.archive.org/web/20120613142047if_/http://newgtlds-cloudfront.icann.org/sites/default/files/reveal/strings-1200utc-13jun12-en.html
and with that data we can see who the original culprits were...