{"id":13520202,"url":"https://github.com/Intsights/PyDomainExtractor","last_synced_at":"2025-03-31T16:30:54.379Z","repository":{"id":42433135,"uuid":"228214620","full_name":"Intsights/PyDomainExtractor","owner":"Intsights","description":"A blazingly fast domain extraction library written in Rust","archived":false,"fork":false,"pushed_at":"2024-07-24T08:26:10.000Z","size":301,"stargazers_count":65,"open_issues_count":7,"forks_count":6,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-10-01T21:18:52.723Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Intsights.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-12-15T16:24:53.000Z","updated_at":"2024-07-24T08:26:14.000Z","dependencies_parsed_at":"2024-07-24T10:12:18.517Z","dependency_job_id":null,"html_url":"https://github.com/Intsights/PyDomainExtractor","commit_stats":{"total_commits":49,"total_committers":3,"mean_commits":"16.333333333333332","dds":0.04081632653061229,"last_synced_commit":"d1769d94f3ed7a8d4d1e34b5a39ae108dfcafbc8"},"previous_names":[],"tags_count":38,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Intsights%2FPyDomainExtractor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Intsights%2FPyDomainExtractor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Intsights%2FPyDomainExtractor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Intsights%2FPyDomainExtractor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Intsights","download_url":"https://codeload.github.com/Intsights/PyDomainExtractor/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":222670691,"owners_count":17020513,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T05:02:13.947Z","updated_at":"2024-11-02T03:30:37.651Z","avatar_url":"https://github.com/Intsights.png","language":"Python","readme":"\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/Intsights/PyDomainExtractor\"\u003e\n        \u003cimg src=\"https://raw.githubusercontent.com/Intsights/PyDomainExtractor/master/images/logo.png\" alt=\"Logo\"\u003e\n    \u003c/a\u003e\n    \u003ch3 align=\"center\"\u003e\n        A blazingly fast domain extraction library written in Rust\n    \u003c/h3\u003e\n\u003c/p\u003e\n\n![license](https://img.shields.io/badge/MIT-License-blue)\n![Python](https://img.shields.io/badge/Python-3.7%20%7C%203.8%20%7C%203.9%20%7C%203.10-blue)\n![Build](https://github.com/Intsights/PyDomainExtractor/workflows/Build/badge.svg)\n[![PyPi](https://img.shields.io/pypi/v/PyDomainExtractor.svg)](https://pypi.org/project/PyDomainExtractor/)\n\n## Table of Contents\n\n- [Table of Contents](#table-of-contents)\n- [About The Project](#about-the-project)\n  - [Built With](#built-with)\n  - [Performance](#performance)\n    - [Extract From Domain](#extract-from-domain)\n    - [Extract From URL](#extract-from-url)\n  - [Installation](#installation)\n- [Usage](#usage)\n  - [Extraction](#extraction)\n  - [URL Extraction](#url-extraction)\n  - [Validation](#validation)\n  - [TLDs List](#tlds-list)\n- [License](#license)\n- [Contact](#contact)\n\n\n## About The Project\n\nPyDomainExtractor is a Python library designed to parse domain names quickly.\nIn order to achieve the highest performance possible, the library was written in Rust.\n\n\n### Built With\n\n* [AHash](https://github.com/tkaitchuck/aHash)\n* [idna](https://github.com/servo/rust-url/)\n* [memchr](https://github.com/BurntSushi/memchr)\n* [once_cell](https://github.com/matklad/once_cell)\n* [Public Suffix List](https://publicsuffix.org/)\n\n\n### Performance\n\n\n#### Extract From Domain\n\nTests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)\n\n| Library  | Function | Time |\n| ------------- | ------------- | ------------- |\n| [PyDomainExtractor](https://github.com/Intsights/PyDomainExtractor) | pydomainextractor.extract | 1.50s |\n| [publicsuffix2](https://github.com/nexb/python-publicsuffix2) | publicsuffix2.get_sld | 9.92s |\n| [tldextract](https://github.com/john-kurkowski/tldextract) | \\_\\_call\\_\\_ | 29.23s |\n| [tld](https://github.com/barseghyanartur/tld) | tld.parse_tld | 34.48s |\n\n\n#### Extract From URL\n\nThe test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)\n\n| Library  | Function | Time |\n| ------------- | ------------- | ------------- |\n| [PyDomainExtractor](https://github.com/Intsights/PyDomainExtractor) | pydomainextractor.extract_from_url | 2.24s |\n| [publicsuffix2](https://github.com/nexb/python-publicsuffix2) | publicsuffix2.get_sld | 10.84s |\n| [tldextract](https://github.com/john-kurkowski/tldextract) | \\_\\_call\\_\\_ | 36.04s |\n| [tld](https://github.com/barseghyanartur/tld) | tld.parse_tld | 57.87s |\n\n\n### Installation\n\n```sh\npip3 install PyDomainExtractor\n```\n\n\n## Usage\n\n\n### Extraction\n\n```python\nimport pydomainextractor\n\n\n# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.\ndomain_extractor = pydomainextractor.DomainExtractor()\n\ndomain_extractor.extract('google.com')\n\u003e\u003e\u003e {\n\u003e\u003e\u003e     'subdomain': '',\n\u003e\u003e\u003e     'domain': 'google',\n\u003e\u003e\u003e     'suffix': 'com'\n\u003e\u003e\u003e }\n\n# Loads a custom SuffixList data. Should follow PublicSuffixList's format.\ndomain_extractor = pydomainextractor.DomainExtractor(\n    'tld\\n'\n    'custom.tld\\n'\n)\n\ndomain_extractor.extract('google.com')\n\u003e\u003e\u003e {\n\u003e\u003e\u003e     'subdomain': 'google',\n\u003e\u003e\u003e     'domain': 'com',\n\u003e\u003e\u003e     'suffix': ''\n\u003e\u003e\u003e }\n\ndomain_extractor.extract('google.custom.tld')\n\u003e\u003e\u003e {\n\u003e\u003e\u003e     'subdomain': '',\n\u003e\u003e\u003e     'domain': 'google',\n\u003e\u003e\u003e     'suffix': 'custom.tld'\n\u003e\u003e\u003e }\n```\n\n\n### URL Extraction\n\n```python\nimport pydomainextractor\n\n\n# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.\ndomain_extractor = pydomainextractor.DomainExtractor()\n\ndomain_extractor.extract_from_url('http://google.com/')\n\u003e\u003e\u003e {\n\u003e\u003e\u003e     'subdomain': '',\n\u003e\u003e\u003e     'domain': 'google',\n\u003e\u003e\u003e     'suffix': 'com'\n\u003e\u003e\u003e }\n```\n\n\n### Validation\n\n```python\nimport pydomainextractor\n\n\n# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.\ndomain_extractor = pydomainextractor.DomainExtractor()\n\ndomain_extractor.is_valid_domain('google.com')\n\u003e\u003e\u003e True\n\ndomain_extractor.is_valid_domain('domain.اتصالات')\n\u003e\u003e\u003e True\n\ndomain_extractor.is_valid_domain('xn--mgbaakc7dvf.xn--mgbaakc7dvf')\n\u003e\u003e\u003e True\n\ndomain_extractor.is_valid_domain('domain-.com')\n\u003e\u003e\u003e False\n\ndomain_extractor.is_valid_domain('-sub.domain.com')\n\u003e\u003e\u003e False\n\ndomain_extractor.is_valid_domain('\\xF0\\x9F\\x98\\x81nonalphanum.com')\n\u003e\u003e\u003e False\n```\n\n\n### TLDs List\n\n```python\nimport pydomainextractor\n\n\n# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.\ndomain_extractor = pydomainextractor.DomainExtractor()\n\ndomain_extractor.get_tld_list()\n\u003e\u003e\u003e [\n\u003e\u003e\u003e     'bostik',\n\u003e\u003e\u003e     'backyards.banzaicloud.io',\n\u003e\u003e\u003e     'biz.bb',\n\u003e\u003e\u003e     ...\n\u003e\u003e\u003e ]\n```\n\n\n## License\n\nDistributed under the MIT License. See `LICENSE` for more information.\n\n\n## Contact\n\nGal Ben David - gal@intsights.com\n\nProject Link: [https://github.com/Intsights/PyDomainExtractor](https://github.com/Intsights/PyDomainExtractor)\n\n\n\n\n[license-shield]: https://img.shields.io/github/license/othneildrew/Best-README-Template.svg?style=flat-square\n","funding_links":[],"categories":["Python","Utilities"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIntsights%2FPyDomainExtractor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FIntsights%2FPyDomainExtractor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FIntsights%2FPyDomainExtractor/lists"}