{"id":13468613,"url":"https://github.com/john-kurkowski/tldextract","last_synced_at":"2025-12-29T02:29:23.573Z","repository":{"id":652874,"uuid":"1413932","full_name":"john-kurkowski/tldextract","owner":"john-kurkowski","description":"Accurately separates a URL’s subdomain, domain, and public suffix, using the Public Suffix List (PSL).","archived":false,"fork":false,"pushed_at":"2025-04-22T06:19:41.000Z","size":1072,"stargazers_count":1891,"open_issues_count":13,"forks_count":212,"subscribers_count":46,"default_branch":"master","last_synced_at":"2025-05-07T10:52:46.636Z","etag":null,"topics":["country-codes","publicsuffix","publicsuffixlist","python","suffix","tld","tldextract"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/john-kurkowski.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["john-kurkowski"]}},"created_at":"2011-02-26T06:48:00.000Z","updated_at":"2025-05-03T20:27:46.000Z","dependencies_parsed_at":"2023-07-07T10:16:25.178Z","dependency_job_id":"3cf42985-5094-4429-af62-01957e404070","html_url":"https://github.com/john-kurkowski/tldextract","commit_stats":{"total_commits":398,"total_committers":46,"mean_commits":8.652173913043478,"dds":"0.29648241206030146","last_synced_commit":"485799c835e96861df47f3bcd3eef27b72b8d113"},"previous_names":[],"tags_count":63,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/john-kurkowski%2Ftldextract","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/john-kurkowski%2Ftldextract/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/john-kurkowski%2Ftldextract/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/john-kurkowski%2Ftldextract/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/john-kurkowski","download_url":"https://codeload.github.com/john-kurkowski/tldextract/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254129481,"owners_count":22019628,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["country-codes","publicsuffix","publicsuffixlist","python","suffix","tld","tldextract"],"created_at":"2024-07-31T15:01:14.892Z","updated_at":"2025-12-29T02:29:23.566Z","avatar_url":"https://github.com/john-kurkowski.png","language":"Python","funding_links":["https://github.com/sponsors/john-kurkowski"],"categories":["Python","\u003ca id=\"a76463feb91d09b3d024fae798b92be6\"\u003e\u003c/a\u003e侦察\u0026\u0026信息收集\u0026\u0026子域名发现与枚举\u0026\u0026OSINT","Python (1887)","URL Utilities","\u003ca id=\"170048b7d8668c50681c0ab1e92c679a\"\u003e\u003c/a\u003e工具"],"sub_categories":["\u003ca id=\"e945721056c78a53003e01c3d2f3b8fe\"\u003e\u003c/a\u003e子域名枚举\u0026\u0026爆破"],"readme":"# tldextract [![PyPI version](https://badge.fury.io/py/tldextract.svg)](https://badge.fury.io/py/tldextract) [![Build Status](https://github.com/john-kurkowski/tldextract/actions/workflows/ci.yml/badge.svg)](https://github.com/john-kurkowski/tldextract/actions/workflows/ci.yml)\n\n`tldextract` accurately separates a URL's subdomain, domain, and public suffix,\nusing [the Public Suffix List (PSL)](https://publicsuffix.org).\n\n**Why?** Naive URL parsing like splitting on dots fails for domains like\n`forums.bbc.co.uk` (gives \"co\" instead of \"bbc\"). `tldextract` handles the edge\ncases, so you don't have to.\n\n## Quick Start\n\n```python\n\u003e\u003e\u003e import tldextract\n\n\u003e\u003e\u003e tldextract.extract('http://forums.news.cnn.com/')\nExtractResult(subdomain='forums.news', domain='cnn', suffix='com', is_private=False)\n\n\u003e\u003e\u003e tldextract.extract('http://forums.bbc.co.uk/')\nExtractResult(subdomain='forums', domain='bbc', suffix='co.uk', is_private=False)\n\n\u003e\u003e\u003e # Access the parts you need\n\u003e\u003e\u003e ext = tldextract.extract('http://forums.bbc.co.uk')\n\u003e\u003e\u003e ext.domain\n'bbc'\n\u003e\u003e\u003e ext.top_domain_under_public_suffix\n'bbc.co.uk'\n\u003e\u003e\u003e ext.fqdn\n'forums.bbc.co.uk'\n```\n\n## Install\n\n```zsh\npip install tldextract\n```\n\n## How-to Guides\n\n### How to disable HTTP suffix list fetching for production\n\n```python\nno_fetch_extract = tldextract.TLDExtract(suffix_list_urls=())\nno_fetch_extract('http://www.google.com')\n```\n\n### How to set a custom cache location\n\nVia environment variable:\n\n```python\nexport TLDEXTRACT_CACHE=\"/path/to/cache\"\n```\n\nOr in code:\n\n```python\ncustom_cache_extract = tldextract.TLDExtract(cache_dir='/path/to/cache/')\n```\n\n### How to update TLD definitions\n\nCommand line:\n\n```zsh\ntldextract --update\n```\n\nOr delete the cache folder:\n\n```zsh\nrm -rf $HOME/.cache/python-tldextract\n```\n\n### How to treat private domains as suffixes\n\n```python\nextract = tldextract.TLDExtract(include_psl_private_domains=True)\nextract('waiterrant.blogspot.com')\n# ExtractResult(subdomain='', domain='waiterrant', suffix='blogspot.com', is_private=True)\n```\n\n### How to use a local suffix list\n\n```python\nextract = tldextract.TLDExtract(\n    suffix_list_urls=[\"file:///path/to/your/list.dat\"],\n    cache_dir='/path/to/cache/',\n    fallback_to_snapshot=False)\n```\n\n### How to use a remote suffix list\n\n```python\nextract = tldextract.TLDExtract(\n    suffix_list_urls=[\"https://myserver.com/suffix-list.dat\"])\n```\n\n### How to add extra suffixes\n\n```python\nextract = tldextract.TLDExtract(\n    extra_suffixes=[\"foo\", \"bar.baz\"])\n```\n\n### How to validate URLs before extraction\n\n```python\nfrom urllib.parse import urlsplit\n\nsplit_url = urlsplit(\"https://example.com:8080/path\")\nresult = tldextract.extract_urllib(split_url)\n```\n\n## Command Line\n\n```zsh\n$ tldextract http://forums.bbc.co.uk\nforums bbc co.uk\n\n$ tldextract --update  # Update cached suffix list\n$ tldextract --help    # See all options\n```\n\n## Understanding Domain Parsing\n\n### Public Suffix List\n\n`tldextract` uses the [Public Suffix List](https://publicsuffix.org), a\ncommunity-maintained list of domain suffixes. The PSL contains both:\n\n- **Public suffixes**: Where anyone can register a domain (`.com`, `.co.uk`,\n  `.org.kg`)\n- **Private suffixes**: Operated by companies for customer subdomains\n  (`blogspot.com`, `github.io`)\n\nWeb browsers use this same list for security decisions like cookie scoping.\n\n### Suffix vs. TLD\n\nWhile `.com` is a top-level domain (TLD), many suffixes like `.co.uk` are\ntechnically second-level. The PSL uses \"public suffix\" to cover both.\n\n### Default behavior with private domains\n\nBy default, `tldextract` treats private suffixes as regular domains:\n\n```python\n\u003e\u003e\u003e tldextract.extract('waiterrant.blogspot.com')\nExtractResult(subdomain='waiterrant', domain='blogspot', suffix='com', is_private=False)\n```\n\nTo treat them as suffixes instead, see\n[How to treat private domains as suffixes](#how-to-treat-private-domains-as-suffixes).\n\n### Caching behavior\n\nBy default, `tldextract` fetches the latest Public Suffix List on first use and\ncaches it indefinitely in `$HOME/.cache/python-tldextract`.\n\n### URL validation\n\n`tldextract` accepts any string and is very lenient. It prioritizes ease of use\nover strict validation, extracting domains from any string, even partial URLs or\nnon-URLs.\n\n## FAQ\n\n### Can you add/remove suffix \\_\\_\\_\\_?\n\n`tldextract` doesn't maintain the suffix list. Submit changes to\n[the Public Suffix List](https://publicsuffix.org/submit/).\n\nMeanwhile, use the `extra_suffixes` parameter, or fork the PSL and pass it to\nthis library with the `suffix_list_urls` parameter.\n\n### My suffix is in the PSL but not extracted correctly\n\nCheck if it's in the \"PRIVATE\" section. See\n[How to treat private domains as suffixes](#how-to-treat-private-domains-as-suffixes).\n\n### Why does it parse invalid URLs?\n\nSee [URL validation](#url-validation) and\n[How to validate URLs before extraction](#how-to-validate-urls-before-extraction).\n\n## Contribute\n\n### Setting up\n\n1. `git clone` this repository.\n2. Change into the new directory.\n3. `pip install --upgrade --editable '.[testing]'`\n\n### Running tests\n\n```zsh\ntox --parallel       # Test all Python versions\ntox -e py311         # Test specific Python version\nruff format .        # Format code\n```\n\n## History\n\nThis package started from a\n[StackOverflow answer](http://stackoverflow.com/questions/569137/how-to-get-domain-name-from-url/569219#569219)\nabout regex-based domain extraction. The regex approach fails for many domains,\nso this library switched to the Public Suffix List for accuracy.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohn-kurkowski%2Ftldextract","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohn-kurkowski%2Ftldextract","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohn-kurkowski%2Ftldextract/lists"}