{"id":22695890,"url":"https://github.com/sarukuku/fi-tld-scraper","last_synced_at":"2026-04-21T22:35:50.781Z","repository":{"id":148784268,"uuid":"122625778","full_name":"sarukuku/fi-tld-scraper","owner":"sarukuku","description":"A command line tool that scrapes data of .fi domains from an open api provided by the The Finnish Communications Regulatory Authority.","archived":false,"fork":false,"pushed_at":"2018-02-23T16:27:03.000Z","size":11,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-07-03T17:52:40.013Z","etag":null,"topics":["nodejs","scraper","tld"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sarukuku.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-02-23T13:37:42.000Z","updated_at":"2022-05-24T17:00:10.000Z","dependencies_parsed_at":null,"dependency_job_id":"ec11cb55-8b85-409c-92c6-e27ee0afe063","html_url":"https://github.com/sarukuku/fi-tld-scraper","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sarukuku/fi-tld-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarukuku%2Ffi-tld-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarukuku%2Ffi-tld-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarukuku%2Ffi-tld-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarukuku%2Ffi-tld-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sarukuku","download_url":"https://codeload.github.com/sarukuku/fi-tld-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sarukuku%2Ffi-tld-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32113314,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-21T11:25:29.218Z","status":"ssl_error","status_checked_at":"2026-04-21T11:25:28.499Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nodejs","scraper","tld"],"created_at":"2024-12-10T04:12:16.489Z","updated_at":"2026-04-21T22:35:50.751Z","avatar_url":"https://github.com/sarukuku.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# .fi TLD scraper\nA command line tool that scrapes data of .fi domains from an open api provided by the The Finnish Communications Regulatory Authority and saves the results to a JSON file and to a CSV file.\n\n## How to install\nClone or download the repository and run `yarn` in it.\n\n## Dependencies\nThe app has been tested only on a macOS High Sierra and has the followind dependencies:\n- availability of `/bin/bash`\n- availability of `yarn` command on `/bin/bash`\n- node version \u003e= 9.3.0\n- a good network connection\n\n## How to use\nRun a full scrape. This means that the scraper will fech all data on all .fi domains owned by organizations and unions. At the time of writing this the amout is close to 370 000 unique domains. As a result you'll get single JSON file and a single CSV file that both weigh around 250 Mb. Running the full scrape takes about ~20 minutes.\n```bash\nnpm run start\n```\n\nRun a limited scrape. The `--soft-limit` handle allows you to set a soft limit for the scrape. Exact returned amount it's guaranteed to be the same as the limit.\n```bash\nnpm run start -- --soft-limit=500\n```\n\nDisable JSON output.\n```bash\nnpm run start -- --no-json\n```\n\nDisable CSV output.\n```bash\nnpm run start -- --no-csv\n```\n\nAll of the flags above can be combined freely.\n\n## Example output of a limited test\n```bash\n$ npm run start -- --soft-limit=500\n\n\u003e @ start /fi-tld-scraper\n\u003e node index.js \"--soft-limit=500\"\n\nFetching page https://odata.domain.fi/v4/odata/domains\nFetching page https://odata.domain.fi/v4/odata/domains?$skip=100\nFetching page https://odata.domain.fi/v4/odata/domains?$skip=200\nFetching page https://odata.domain.fi/v4/odata/domains?$skip=300\nFetching page https://odata.domain.fi/v4/odata/domains?$skip=400\nScrape duration: 2275.034ms\n```\n\n## Form of the resulting JSON file\nThe resulting JSON file with one domain would look like this (actual data reducted).\n```json\n{\n  \"domains\":[\n    {\n      \"Name\":\"\",\n      \"State\":\"\",\n      \"GrantDate\":\"\",\n      \"LastValidityDate\":\"\",\n      \"IsDNSSecInUse\":\"\",\n      \"Holder\":\"\",\n      \"Registrar\":\"\",\n      \"OrganizationId\":\"\",\n      \"Address\":\"\",\n      \"PostalCode\":\"\",\n      \"PostalArea\":\"\",\n      \"AssociationType\":\"\",\n      \"PhoneNumber\":\"\",\n      \"DepartmentOrContactPerson\":\"\",\n      \"Country\":\"\",\n      \"NameServer1\":\"\",\n      \"NameServer2\":\"\",\n      \"NameServer3\":\"\",\n      \"NameServer4\":\"\",\n      \"NameServer5\":\"\",\n      \"NameServer6\":\"\",\n      \"NameServer7\":\"\",\n      \"NameServer8\":\"\",\n      \"NameServer9\":\"\",\n      \"NameServer10\":\"\"\n    }\n  ]\n}\n```\n\n## Form of the resulting CSV file\nThe resulting CSV file with one domain would look like this (actual data reducted).\n```csv\nName;State;GrantDate;LastValidityDate;IsDNSSecInUse;Holder;Registrar;OrganizationId;Address;PostalCode;PostalArea;AssociationType;PhoneNumber;DepartmentOrContactPerson;Country;NameServer1;NameServer2;NameServer3;NameServer4;NameServer5;NameServer6;NameServer7;NameServer8;NameServer9;NameServer10\n\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\";\"\"\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsarukuku%2Ffi-tld-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsarukuku%2Ffi-tld-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsarukuku%2Ffi-tld-scraper/lists"}