{"id":15047972,"url":"https://github.com/mohammadraziei/liburlparser","last_synced_at":"2025-06-11T07:33:29.199Z","repository":{"id":168256206,"uuid":"643921823","full_name":"MohammadRaziei/liburlparser","owner":"MohammadRaziei","description":"Fastest domain extractor library written in C++ with python binding.","archived":false,"fork":false,"pushed_at":"2025-03-01T10:35:24.000Z","size":248,"stargazers_count":10,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-10T01:10:56.049Z","etag":null,"topics":["binding","cpp","lib","parser","psl","public-suffix-list","python","uri","url","urlparser"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MohammadRaziei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-22T12:39:49.000Z","updated_at":"2025-03-21T12:27:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"4d76759d-0db7-4ccb-b2a4-54293c021f81","html_url":"https://github.com/MohammadRaziei/liburlparser","commit_stats":null,"previous_names":["mohammadraziei/liburlparser"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fliburlparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fliburlparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fliburlparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MohammadRaziei%2Fliburlparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MohammadRaziei","download_url":"https://codeload.github.com/MohammadRaziei/liburlparser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248137886,"owners_count":21053775,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["binding","cpp","lib","parser","psl","public-suffix-list","python","uri","url","urlparser"],"created_at":"2024-09-24T21:06:23.804Z","updated_at":"2025-04-10T01:11:05.701Z","avatar_url":"https://github.com/MohammadRaziei.png","language":"C++","readme":"\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/mohammadraziei/liburlparser\"\u003e\n    \u003cimg src=\"https://github.com/MohammadRaziei/liburlparser/raw/master/docs/images/logo/liburlparser-logo-1.svg\" alt=\"Logo\"\u003e\n  \u003c/a\u003e\n  \u003ch3 align=\"center\"\u003e\n    Fastest domain extractor library written in C++ with python binding.\n  \u003c/h3\u003e\n  \u003ch4 align=\"center\"\u003e\n    First and complete library for parsing url in C++ and Python and Command Line\n  \u003c/h4\u003e\n\u003c/p\u003e\n\n[![mohammadraziei - liburlparser](https://img.shields.io/static/v1?label=mohammadraziei\u0026message=liburlparser\u0026color=white\u0026logo=github)](https://github.com/mohammadraziei/liburlparser \"Go to GitHub repo\")\n[![stars - liburlparser](https://img.shields.io/github/stars/mohammadraziei/liburlparser?style=social)](https://github.com/mohammadraziei/liburlparser)\n[![forks - liburlparser](https://img.shields.io/github/forks/mohammadraziei/liburlparser?style=social)](https://github.com/mohammadraziei/liburlparser)\n\n[![PyPi](https://img.shields.io/pypi/v/liburlparser.svg)](https://pypi.org/project/liburlparser/)\n![Python](https://img.shields.io/badge/Python-3.8%20%7C%203.9%20%7C%203.10%20%7C%203.11-blue)\n![Cpp](https://img.shields.io/badge/C++-17-blue)\n\n\n[![GitHub release](https://img.shields.io/github/release/mohammadraziei/liburlparser?include_prereleases=\u0026sort=semver\u0026color=purple)](https://github.com/mohammadraziei/liburlparser/releases/)\n[![License](https://img.shields.io/badge/License-MIT-purple)](#license)\n[![issues - liburlparser](https://img.shields.io/github/issues/mohammadraziei/liburlparser)](https://github.com/mohammadraziei/liburlparser/issues)\n\n\n[![SonarCloud](https://sonarcloud.io/images/project_badges/sonarcloud-white.svg)](https://sonarcloud.io/summary/new_code?id=MohammadRaziei_liburlparser)\n\n[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=MohammadRaziei_liburlparser\u0026metric=alert_status)](https://sonarcloud.io/summary/new_code?id=MohammadRaziei_liburlparser)\n[![CodeFactor](https://www.codefactor.io/repository/github/mohammadraziei/liburlparser/badge/master)](https://www.codefactor.io/repository/github/mohammadraziei/liburlparser/overview/master)\n[![snyk.io](https://snyk.io/advisor/python/liburlparser/badge.svg)](https://snyk.io/advisor/python/liburlparser)\n\n[//]: # ([![View site - GH Pages]\u0026#40;https://img.shields.io/badge/View_site-GH_Pages-2ea44f?style=for-the-badge\u0026#41;]\u0026#40;https://mohammadraziei.github.io/liburlparser/\u0026#41;)\n\n\n\n\u003c!--\n![Build](https://github.com/Intsights/PyDomainExtractor/workflows/Build/badge.svg)\n[![PyPi](https://img.shields.io/pypi/v/PyDomainExtractor.svg)](https://pypi.org/project/PyDomainExtractor/)\n--\u003e\n\u003c!--\n## Table of Contents\n\n- [Table of Contents](#table-of-contents)\n- [About The Project](#about-the-project)\n  - [Built With](#built-with)[README.md](README.md)\n  - [Performance](#performance)\n    - [Extract From Domain](#extract-from-domain)\n    - [Extract From URL](#extract-from-url)\n  - [Installation](#installation)\n- [Usage](#usage)\n  - [Extraction](#extraction)\n  - [URL Extraction](#url-extraction)\n  - [Validation](#validation)\n  - [TLDs List](#tlds-list)\n- [License](#license)\n- [Contact](#contact)\n--\u003e\n\n## About The Project\n\n**liburlparser** is a powerful domain extractor library written in C++ with Python bindings. It provides efficient URL parsing capabilities for both C++ and Python, making it a valuable tool for projects that involve working with web addresses.\n\n### Features\n\n\nHere are some key features of **liburlparser**:\n\n1. **Multiple Language Support**:\n   - liburlparser can be used in multiple programming languages, including `Python`, `C++`, and `Shell`.\n   - It offers an intuitive interface that remains consistent across both C++ and Python.\n\n2. **Clean Code Design**:\n   - The library provides two separate classes: `Url` and `Host`.\n   - This separation allows for cleaner and more organized code when dealing with URLs.\n\n3. **Public Suffix List Support**:\n   - liburlparser supports known combinatorial suffixes (e.g., \"ac.ir\") using the public_suffix_list.\n   - It can also handle unknown suffixes (e.g., \"comm\" in \"google.comm\").\n\n4. **Automatic Public Suffix List Updates**:\n   - Before each build and deployment, liburlparser updates the public_suffix_list automatically.\n\n5. **Host Properties**:\n   - The `Host` class includes properties such as subdomain, domain, domain name, and suffix.\n\n6. **URL Properties**:\n   - The `Url` class provides properties like protocol, userinfo, host (and all host properties), port, path, query parameters, and fragment.\n\n\n\u003c!--\n* Multiple programming language supported such as `Python`, `C++` and `Shell`\n* Intuitive interface and identical in C++ and Python\n* Provide two seperated class Url and Host for the purpose of clean code\n* Also support [public_suffix_list](https://publicsuffix.org/list/public_suffix_list.dat) for known combinatorial suffix such as \"ac.ir\"\n* Support unknown suffix like \"google.comm\" (it detect \"comm\" as suffix)\n* Update public_suffix_list automatically before each build and deploy\n* Host properties:\n  * subdomain\n  * domain\n  * domain_name\n  * suffix\n* Url properties:\n  * protocol\n  * userinfo\n  * host (and all the host properties)\n  * port\n  * path\n  * query\n  * params\n  * fragment\n--\u003e\n\n## Usage\n\n### Command Line\n```sh\npython -m liburlparser --help # show help section\npython -m liburlparser --version # show version\npython -m liburlparser --url \"https://mail.google.com/about\" | jq #return as json\npython -m liburlparser --host \"mail.google.com\" | jq # return as json\n```\n\n\n### Python\n\nyou can use liburlparser so intutively\n\nall of classes has help section\n```python\nimport liburlparser\nhelp(liburlparser)\nprint(liburlparser.__version__)\n\nfrom liburlparser import Url, Host\nhelp(Url)\nhelp(Host)\n```\n\nparse url and host\n```python\nfrom liburlparser import Url, Host\n## parse url:\nurl = Url(\"https://ee.aut.ac.ir/#id\") # parse all part of url\nprint(url, url.suffix, url.domain, url.fragment, url.host, url.to_dict(), url.to_json())\n## parse host\nhost = url.host # ee.aut.ac.ir\n# or\nhost = Host(\"ee.aut.ac.ir\")\n# or \nhost = Host.from_url(\"https://ee.aut.ac.ir/#id\") # the fastest way for parsing host from url\n# all of these methods return an object of Host class which already parse the host part of url \nprint(host, host.domain, host.suffix, host.to_dict(), host.to_json())\n```\nAlso there is some helping api to get better performance for some small tasks\n\n```python\n# if you need to extract the host of url as a string without any parsing \nhost_str = Url.extract_host(\"https://ee.aut.ac.ir/about\") # very fast\n```\nif you are fan of  `pydomainextractor`, there is some interface similar to it\n```python\nimport pydomainextractor\nextractor = pydomainextractor.DomainExtractor()\nextractor.extract(\"ee.aut.ac.ir\") # from host\nextractor.extract_from_url(\"https://ee.aut.ac.ir/about\") # from url\n\n# alternatively you can use:\nfrom liburlparser import Host\nHost.extract(\"ee.aut.ac.ir\") # from host\nHost.extract_from_url(\"https://ee.aut.ac.ir/about\") # from url\n# you can see there is the same api\n```\n\n### C++\nthere is some examples in [examples](https://github.com/MohammadRaziei/liburlparser/tree/master/examples) folder\n\n```c++\n#include \"urlparser.h\"\n...\n/// for parsing url\nTLD::Url url(\"https://ee.aut.ac.ir/about\");\nstd::string domain = url.domain(); // also for subdomain, port, params, ...\n/// for parsing host\nTLD::Host host(\"ee.aut.ac.ir\");\n// or\nTLD::Host host = url.host();\n// or\nTLD::Host host = TLD::Host::fromUrl(\"https://ee.aut.ac.ir/about\");\n```\nyou can see all methods in python we can use in c++ very easily\n\n\n\n## Installation\n### C++:\n\n#### build steps:\n```sh\ngit clone https://github.com/mohammadraziei/liburlparser\nmkdir -p build; cd build\ncmake ..\n# Build the project:\nmake\n# [Optional] run tests:\nmake test\n# [Optional] make documents:\nmake docs\n# [Optional] Run examples:\n./example\n# Make install\nsudo make install\n```\n\n\n\n### Python and Command Line:\nBe aware that it required `python\u003e=3.8`\n#### Installation\n###### pip by [pypi](https://pypi.org/project/liburlparser/)\n```sh\npip install liburlparser\n```\nif you want to use psl.update to update the public suffix list, you must install the `online` version\n```sh\npip install \"liburlparser[online]\"\n```\n\n\nOr\n###### pip by [git](https://github.com/mohammadraziei/liburlparser)\n```sh\npip install git+https://github.com/mohammadraziei/liburlparser\n```\nOr\n###### manually\n```sh\ngit clone https://github.com/mohammadraziei/liburlparser\npip install ./liburlparser\n```\n\n\n\n### Performance\n\n\n#### Extract From Host\n\nTests were run on a file containing 10 million random domains from various top-level domains (Mar. 13rd 2022)\n\n| Library  | Function | Time |\n| ------------- | ------------- | ------------- |\n| [liburlparser](https://github.com/mohammadraziei/liburlparser) | liburlparser.Host | 1.12s |\n| [PyDomainExtractor](https://github.com/Intsights/PyDomainExtractor) | pydomainextractor.extract | 1.50s |\n| [publicsuffix2](https://github.com/nexb/python-publicsuffix2) | publicsuffix2.get_sld | 9.92s |\n| [tldextract](https://github.com/john-kurkowski/tldextract) | \\_\\_call\\_\\_ | 29.23s |\n| [tld](https://github.com/barseghyanartur/tld) | tld.parse_tld | 34.48s |\n\n\n#### Extract From URL\n\nThe test was conducted on a file containing 1 million random urls (Mar. 13rd 2022)\n\n| Library                                                             | Function | Time   |\n|---------------------------------------------------------------------| ------------- |--------|\n| [liburlparser](https://github.com/mohammadraziei/liburlparser)      | liburlparser.Host.from_url | 2.10s  |\n| [PyDomainExtractor](https://github.com/Intsights/PyDomainExtractor) | pydomainextractor.extract_from_url | 2.24s  |\n| [publicsuffix2](https://github.com/nexb/python-publicsuffix2)       | publicsuffix2.get_sld | 10.84s |\n| [tldextract](https://github.com/john-kurkowski/tldextract)          | \\_\\_call\\_\\_ | 36.04s |\n| [tld](https://github.com/barseghyanartur/tld)                       | tld.parse_tld | 57.87s |\n\n\n\n## License\n\nDistributed under the MIT License. See [LICENSE](LICENSE) for more information.\n\n\n## Stats\n[![Stars](https://starchart.cc/mohammadraziei/liburlparser.svg?variant=adaptive)](https://starchart.cc/mohammadraziei/liburlparser)\n\n## Contact\n\n\u003c!-- Gal Ben David - gal@intsights.com --\u003e\n\nProject Link:\n- [https://github.com/mohammadraziei/liburlparser](https://github.com/mohammadraziei/liburlparser)\n- [https://pypi.org/project/liburlparser](https://pypi.org/project/liburlparser)\n\n\n\n[license-shield]: https://img.shields.io/github/license/othneildrew/Best-README-Template.svg?style=flat-square\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammadraziei%2Fliburlparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmohammadraziei%2Fliburlparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammadraziei%2Fliburlparser/lists"}