{"id":13413961,"url":"https://github.com/elliotwutingfeng/go-fasttld","last_synced_at":"2025-04-29T06:30:30.231Z","repository":{"id":37053768,"uuid":"480251158","full_name":"elliotwutingfeng/go-fasttld","owner":"elliotwutingfeng","description":"go-fasttld is a high performance effective top level domains (eTLD) extraction module.","archived":false,"fork":false,"pushed_at":"2024-09-07T19:04:03.000Z","size":852,"stargazers_count":33,"open_issues_count":1,"forks_count":5,"subscribers_count":0,"default_branch":"main","last_synced_at":"2024-09-07T20:25:03.567Z","etag":null,"topics":["compressed-trie","etld","extract","golang","hacktoberfest","idn","idna","ipv4","ipv6","mozilla","osint","parser","public","public-suffix-list","punycode","radix-tree","suffix","tld","tldextract","url"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elliotwutingfeng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-04-11T06:17:49.000Z","updated_at":"2024-09-07T19:04:04.000Z","dependencies_parsed_at":"2024-01-09T03:03:02.956Z","dependency_job_id":"08bdd517-d853-4067-8364-eb965ccf7e44","html_url":"https://github.com/elliotwutingfeng/go-fasttld","commit_stats":null,"previous_names":[],"tags_count":14,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2Fgo-fasttld","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2Fgo-fasttld/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2Fgo-fasttld/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elliotwutingfeng%2Fgo-fasttld/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elliotwutingfeng","download_url":"https://codeload.github.com/elliotwutingfeng/go-fasttld/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224151083,"owners_count":17264436,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compressed-trie","etld","extract","golang","hacktoberfest","idn","idna","ipv4","ipv6","mozilla","osint","parser","public","public-suffix-list","punycode","radix-tree","suffix","tld","tldextract","url"],"created_at":"2024-07-30T20:01:53.834Z","updated_at":"2024-11-11T18:17:12.394Z","avatar_url":"https://github.com/elliotwutingfeng.png","language":"Go","funding_links":[],"categories":["Text Processing","文本处理","Template Engines","Bot Building"],"sub_categories":["Parsers/Encoders/Decoders","解析 器/Encoders/Decoders"],"readme":"# go-fasttld\n\n[![Go Reference](https://img.shields.io/badge/go-reference-blue?logo=go\u0026logoColor=white\u0026style=for-the-badge)](https://pkg.go.dev/github.com/elliotwutingfeng/go-fasttld)\n[![Go Report Card](https://goreportcard.com/badge/github.com/elliotwutingfeng/go-fasttld?style=for-the-badge)](https://goreportcard.com/report/github.com/elliotwutingfeng/go-fasttld)\n[![Coveralls](https://img.shields.io/coverallsCoverage/github/elliotwutingfeng/go-fasttld?logo=coveralls\u0026style=for-the-badge)](https://coveralls.io/github/elliotwutingfeng/go-fasttld?branch=main)\n[![Mentioned in Awesome Go](https://img.shields.io/static/v1?logo=awesomelists\u0026label=\u0026labelColor=CCA6C4\u0026logoColor=261120\u0026message=Mentioned%20in%20awesome\u0026color=494368\u0026style=for-the-badge)](https://github.com/avelino/awesome-go)\n\n[![GitHub license](https://img.shields.io/badge/LICENSE-BSD--3--CLAUSE-GREEN?style=for-the-badge)](LICENSE)\n\n## Summary\n\n**go-fasttld** is a high performance [effective top level domains (eTLD)](https://wiki.mozilla.org/Public_Suffix_List) extraction module that extracts subcomponents from [URLs](https://en.wikipedia.org/wiki/URL).\n\nURLs can either contain hostnames, IPv4 addresses, or IPv6 addresses. eTLD extraction is based on the [Mozilla Public Suffix List](http://www.publicsuffix.org). Private domains listed in the [Mozilla Public Suffix List](http://www.publicsuffix.org) like 'blogspot.co.uk' and 'sinaapp.com' are also supported.\n\n![Demo](demo.gif)\n\nSpot any bugs? Report them [here](https://github.com/elliotwutingfeng/go-fasttld/issues)\n\n## Installation\n\n```sh\ngo get github.com/elliotwutingfeng/go-fasttld\n```\n\n## Try the CLI\n\nFirst, build the CLI application.\n\n```sh\n# `git clone` and `cd` to the go-fasttld repository folder first\nmake build_cli\n```\n\nAfterwards, try extracting subcomponents from a URL.\n\n```sh\n# `git clone` and `cd` to the go-fasttld repository folder first\n./dist/fasttld extract https://user@a.subdomain.example.a%63.uk:5000/a/b\\?id\\=42\n```\n\n## Try the example code\n\nAll of the following examples can be found at `examples/demo.go`. To play the demo, run the following command:\n\n```sh\n# `git clone` and `cd` to the go-fasttld repository folder first\nmake demo\n```\n\n### Hostname\n\n```go\n// Initialise fasttld extractor\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\n\n// Extract URL subcomponents\nurl := \"https://user@a.subdomain.example.a%63.uk:5000/a/b?id=42\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url})\n\n// Display results\nfasttld.PrintRes(url, res) // Pretty-prints res.Scheme, res.UserInfo, res.SubDomain etc.\n```\n\n| Scheme   | UserInfo | SubDomain   | Domain  | Suffix | RegisteredDomain | Port | Path       | HostType     |\n|----------|----------|-------------|---------|--------|------------------|------|------------|--------------|\n| https:// | user     | a.subdomain | example | a%63.uk  | example.a%63.uk    | 5000 | /a/b?id=42 | hostname     |\n\n### IPv4 Address\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://127.0.0.1:5000\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url})\n```\n\n| Scheme   | UserInfo | SubDomain | Domain    | Suffix | RegisteredDomain | Port | Path | HostType     |\n|----------|----------|-----------|-----------|--------|------------------|------|------|--------------|\n| https:// |          |           | 127.0.0.1 |        | 127.0.0.1        | 5000 |      | ipv4 address |\n\n### IPv6 Address\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://[aBcD:ef01:2345:6789:aBcD:ef01:2345:6789]:5000\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url})\n```\n\n| Scheme   | UserInfo | SubDomain | Domain                                  | Suffix | RegisteredDomain                        | Port | Path | HostType     |\n|----------|----------|-----------|-----------------------------------------|--------|-----------------------------------------|------|------|--------------|\n| https:// |          |           | aBcD:ef01:2345:6789:aBcD:ef01:2345:6789 |        | aBcD:ef01:2345:6789:aBcD:ef01:2345:6789 | 5000 |      | ipv6 address |\n\n### Internationalised label separators\n\n**go-fasttld** supports the following internationalised label separators (IETF RFC 3490)\n\n| Full Stop  | Ideographic Full Stop | Fullwidth Full Stop | Halfwidth Ideographic Full Stop |\n|------------|-----------------------|---------------------|---------------------------------|\n| U+002E `.` | U+3002 `。`           | U+FF0E `．`         | U+FF61 `｡`                      |\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://brb\\u002ei\\u3002am\\uff0egoing\\uff61to\\uff0ebe\\u3002a\\uff61fk\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url})\n```\n\n| Scheme   | UserInfo | SubDomain                             | Domain | Suffix    | RegisteredDomain  | Port | Path | HostType     |\n|----------|----------|---------------------------------------|--------|-----------|-------------------|------|------|--------------|\n| https:// |          | brb\\u002ei\\u3002am\\uff0egoing\\uff61to | be     | a\\uff61fk | be\\u3002a\\uff61fk |      |      | hostname     |\n\n## Public Suffix List options\n\n### Specify custom public suffix list file\n\nYou can use a custom public suffix list file by setting `CacheFilePath` in `fasttld.SuffixListParams{}` to its absolute path.\n\n```go\ncacheFilePath := \"/absolute/path/to/file.dat\"\nextractor, err := fasttld.New(fasttld.SuffixListParams{CacheFilePath: cacheFilePath})\n```\n\n### Updating the default Public Suffix List cache\n\nWhenever `fasttld.New` is called without specifying `CacheFilePath` in `fasttld.SuffixListParams{}`, the local cache of the default Public Suffix List is updated automatically if it is more than 3 days old. You can also manually update the cache by using `Update()`.\n\n```go\n// Automatic update performed if `CacheFilePath` is not specified\n// and local cache is more than 3 days old\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\n\n// Manually update local cache\nif err := extractor.Update(); err != nil {\n    log.Println(err)\n}\n```\n\n### Private domains\n\nAccording to the [Mozilla.org wiki](https://wiki.mozilla.org/Public_Suffix_List/Uses), the Mozilla Public Suffix List contains private domains like `blogspot.com` and `sinaapp.com`.\n\nBy default, these private domains are excluded (i.e. `IncludePrivateSuffix = false`)\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://google.blogspot.com\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url})\n```\n\n| Scheme   | UserInfo | SubDomain | Domain   | Suffix | RegisteredDomain | Port | Path | HostType     |\n|----------|----------|-----------|----------|--------|------------------|------|------|--------------|\n| https:// |          | google    | blogspot | com    | blogspot.com     |      |      | hostname     |\n\nYou can _include_ private domains by setting `IncludePrivateSuffix = true`\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{IncludePrivateSuffix: true})\nurl := \"https://google.blogspot.com\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url})\n```\n\n| Scheme   | UserInfo | SubDomain | Domain | Suffix       | RegisteredDomain    | Port | Path | HostType     |\n|----------|----------|-----------|--------|--------------|---------------------|------|------|--------------|\n| https:// |          |           | google | blogspot.com | google.blogspot.com |      |      | hostname     |\n\n## Extraction options\n\n### Ignore Subdomains\n\nYou can ignore subdomains by setting `IgnoreSubDomains = true`. By default, subdomains are extracted.\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://maps.google.com\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url, IgnoreSubDomains: true})\n```\n\n| Scheme   | UserInfo | SubDomain | Domain | Suffix | RegisteredDomain | Port | Path | HostType     |\n|----------|----------|-----------|--------|--------|------------------|------|------|--------------|\n| https:// |          |           | google | com    | google.com       |      |      | hostname     |\n\n### Punycode\n\nBy default, internationalised URLs are not converted to punycode before extraction.\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://hello.世界.com\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url})\n```\n\n| Scheme   | UserInfo | SubDomain | Domain | Suffix | RegisteredDomain | Port | Path | HostType     |\n|----------|----------|-----------|--------|--------|------------------|------|------|--------------|\n| https:// |          | hello     | 世界   | com    | 世界.com         |      |      | hostname     |\n\nYou can convert internationalised URLs to [punycode](https://en.wikipedia.org/wiki/Punycode) before extraction by setting `ConvertURLToPunyCode = true`.\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://hello.世界.com\"\nres, _ := extractor.Extract(fasttld.URLParams{URL: url, ConvertURLToPunyCode: true})\n```\n\n| Scheme   | UserInfo | SubDomain | Domain      | Suffix | RegisteredDomain | Port | Path | HostType     |\n|----------|----------|-----------|-------------|--------|------------------|------|------|--------------|\n| https:// |          | hello     | xn--rhqv96g | com    | xn--rhqv96g.com  |      |      | hostname     |\n\n## Parsing errors\n\nIf the URL is invalid, the second value returned by `Extract()`, **error**, will be non-nil. Partially extracted subcomponents can still be retrieved from the first value returned, **ExtractResult**.\n\n```go\nextractor, _ := fasttld.New(fasttld.SuffixListParams{})\nurl := \"https://example!.com\" // invalid characters in hostname\ncolor.New().Println(\"The following line should be an error message\")\nif res, err := extractor.Extract(fasttld.URLParams{URL: url}); err != nil {\n    color.New(color.FgHiRed, color.Bold).Print(\"Error: \")\n    color.New(color.FgHiWhite).Println(err)\n}\nfasttld.PrintRes(url, res) // Partially extracted subcomponents can still be retrieved\n```\n\n| Scheme   | UserInfo | SubDomain | Domain | Suffix | RegisteredDomain | Port | Path | HostType |\n|----------|----------|-----------|--------|--------|------------------|------|------|----------|\n| https:// |          |           |        |        |                  |      |      |          |\n\n## Testing\n\n```sh\n# `git clone` and `cd` to the go-fasttld repository folder first\nmake tests\n\n# Alternatively, run tests without race detection\n# Useful for systems that do not support the -race flag like windows/386\n# See https://tip.golang.org/src/cmd/dist/test.go\nmake tests_without_race\n```\n\n## Benchmarks\n\n```sh\n# `git clone` and `cd` to the go-fasttld repository folder first\nmake bench\n```\n\n### Modules used\n\n| Benchmark Name       | Source                           |\n|----------------------|----------------------------------|\n| GoFastTld            | go-fasttld (this module)         |\n| JPilloraGoTld        | github.com/jpillora/go-tld       |\n| JoeGuoTldExtract     | github.com/joeguo/tldextract     |\n| Mjd2021USATldExtract | github.com/mjd2021usa/tldextract |\n\n### Results\n\nBenchmarks performed on AMD Ryzen 7 5800X, Manjaro Linux.\n\n**go-fasttld** performs especially well on longer URLs.\n\n---\n\n#### #1\n\n\u003ccode\u003ehttps://iupac.org/iupac-announces-the-2021-top-ten-emerging-technologies-in-chemistry/\u003c/code\u003e\n\n| Benchmark Name       | Iterations | ns/op       | B/op     | allocs/op   | Fastest            |\n|----------------------|------------|-------------|----------|-------------|--------------------|\n| GoFastTld            | 8037906    | 150.8 ns/op | 0 B/op   | 0 allocs/op | :heavy_check_mark: |\n| JPilloraGoTld        | 1675113    | 716.1 ns/op | 224 B/op | 2 allocs/op |                    |\n| JoeGuoTldExtract     | 2204854    | 515.1 ns/op | 272 B/op | 5 allocs/op |                    |\n| Mjd2021USATldExtract | 1676722    | 712.0 ns/op | 288 B/op | 6 allocs/op |                    |\n\n---\n\n#### #2\n\n\u003ccode\u003ehttps://www.google.com/maps/dir/Parliament+Place,+Parliament+House+Of+Singapore,+Singapore/Parliament+St,+London,+UK/@25.2440033,33.6721455,4z/data=!3m1!4b1!4m14!4m13!1m5!1m1!1s0x31da19a0abd4d71d:0xeda26636dc4ea1dc!2m2!1d103.8504863!2d1.2891543!1m5!1m1!1s0x487604c5aaa7da5b:0xf13a2197d7e7dd26!2m2!1d-0.1260826!2d51.5017061!3e4\u003c/code\u003e\n\n| Benchmark Name       | Iterations | ns/op       | B/op      | allocs/op   | Fastest            |\n|----------------------|------------|-------------|-----------|-------------|--------------------|\n| GoFastTld            | 6381516    | 181.9 ns/op | 0 B/op    | 0 allocs/op | :heavy_check_mark: |\n| JPilloraGoTld        | 431671     | 2603 ns/op  | 928 B/op  | 4 allocs/op |                    |\n| JoeGuoTldExtract     | 893347     | 1176 ns/op  | 1120 B/op | 6 allocs/op |                    |\n| Mjd2021USATldExtract | 1030250    | 1165 ns/op  | 1120 B/op | 6 allocs/op |                    |\n\n---\n\n#### #3\n\n\u003ccode\u003ehttps://a.b.c.d.e.f.g.h.i.j.k.l.m.n.oo.pp.qqq.rrrr.ssssss.tttttttt.uuuuuuuuuuu.vvvvvvvvvvvvvvv.wwwwwwwwwwwwwwwwwwwwww.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy.zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz.cc\u003c/code\u003e\n\n| Benchmark Name       | Iterations | ns/op      | B/op      | allocs/op   | Fastest            |\n|----------------------|------------|------------|-----------|-------------|--------------------|\n| GoFastTld            | 833682     | 1424 ns/op | 0 B/op    | 0 allocs/op | :heavy_check_mark: |\n| JPilloraGoTld        | 734790     | 1640 ns/op | 304 B/op  | 3 allocs/op |                    |\n| JoeGuoTldExtract     | 695475     | 1452 ns/op | 1040 B/op | 5 allocs/op |                    |\n| Mjd2021USATldExtract | 330717     | 3628 ns/op | 1904 B/op | 8 allocs/op |                    |\n\n---\n\n## Implementation details\n\n### Why not split on \".\" and take the last element instead?\n\nSplitting on \".\" and taking the last element only works for simple eTLDs like `com`, but not more complex ones like `oseto.nagasaki.jp`.\n\n### eTLD tries\n\n![Trie](Trie_example.svg)\n\n**go-fasttld** stores eTLDs in [compressed tries](https://en.wikipedia.org/wiki/Trie).\n\nValid eTLDs from the [Mozilla Public Suffix List](http://www.publicsuffix.org) are appended to the compressed trie in reverse-order.\n\n```sh\nGiven the following eTLDs\nau\nnsw.edu.au\ncom.ac\nedu.ac\ngov.ac\n\nand the example URL host `example.nsw.edu.au`\n\nThe compressed trie will be structured as follows:\n\nSTART\n ╠═ au 🚩 ✅\n ║  ╚═ edu ✅\n ║     ╚═ nsw 🚩 ✅\n ╚═ ac\n    ╠═ com 🚩\n    ╠═ edu 🚩\n    ╚═ gov 🚩\n\n=== Symbol meanings ===\n🚩 : path to this node is a valid eTLD\n✅ : path to this node found in example URL host `example.nsw.edu.au`\n```\n\nThe URL host subcomponents are parsed from right-to-left until no more matching nodes can be found. In this example, the path of matching nodes are `au -\u003e edu -\u003e nsw`. Reversing the nodes gives the extracted eTLD `nsw.edu.au`.\n\n## Acknowledgements\n\nThis module is a port of the Python [fasttld](https://github.com/jophy/fasttld) module, with additional modifications to support extraction of subcomponents from full URLs, IPv4 addresses, and IPv6 addresses.\n\n- [fasttld (Python)](https://github.com/jophy/fasttld)\n- [tldextract (Python)](https://github.com/john-kurkowski/tldextract)\n- [ICANN IDN Character Validation Guidance](https://www.icann.org/resources/pages/idna-protocol-2012-02-25-en)\n- [IETF RFC 2396](https://www.ietf.org/rfc/rfc2396.txt)\n- [IETF RFC 3490](https://www.ietf.org/rfc/rfc3490.txt)\n- [IETF RFC 3986](https://www.ietf.org/rfc/rfc3986.txt)\n- [IETF RFC 6874](https://www.ietf.org/rfc/rfc6874.txt)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felliotwutingfeng%2Fgo-fasttld","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felliotwutingfeng%2Fgo-fasttld","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felliotwutingfeng%2Fgo-fasttld/lists"}