{"id":18735931,"url":"https://github.com/iampukar/url_crawler","last_synced_at":"2025-04-12T19:21:23.312Z","repository":{"id":62586637,"uuid":"475852225","full_name":"iampukar/url_crawler","owner":"iampukar","description":"A Python library to crawl the details of a URL.","archived":false,"fork":false,"pushed_at":"2022-03-30T14:56:59.000Z","size":12,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T13:38:21.433Z","etag":null,"topics":["page-crawler","python-crawler","python-webcrawler","url-crawler","webpage-crawler"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/url-crawler/1.0.0/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iampukar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-30T11:42:44.000Z","updated_at":"2022-03-31T13:28:38.000Z","dependencies_parsed_at":"2022-11-03T22:05:15.617Z","dependency_job_id":null,"html_url":"https://github.com/iampukar/url_crawler","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iampukar%2Furl_crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iampukar%2Furl_crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iampukar%2Furl_crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iampukar%2Furl_crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iampukar","download_url":"https://codeload.github.com/iampukar/url_crawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247817463,"owners_count":21001190,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["page-crawler","python-crawler","python-webcrawler","url-crawler","webpage-crawler"],"created_at":"2024-11-07T15:18:45.529Z","updated_at":"2025-04-12T19:21:23.284Z","avatar_url":"https://github.com/iampukar.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Overview\n-----------------------------------------------------------------------------\n\nurl_crawler is a Python library to crawl the details of a URL. \n\n## Package Installer \n\n    pip install url-crawler==1.0.0\n\n## Usage\n\n    from url_crawler import url_crawler\n    '''\n      url -\u003e string URL to crawl for information.\n    '''\n    url_details = url_crawler(url)\n    \n    print(url_details.url)\n    print(url_details.domain)\n    print(url_details.check_https)\n    print(url_details.dot_count)\n    print(url_details.digit_count)\n    print(url_details.url_length)\n    \n**Utilities**\n\n| Name           | Output | Description  |\n| ------------- | -----| -----|\n| url | str | Returns the string url. |\n| domain | str | Returns the domain of the url. |\n| registrar | str | Returns the registrar for the given URL. |\n| registered_country | str | Returns the registered domain country of the given URL. |\n| whois | dict | Returns the whois information of the given URL. |\n| registration_date | int | Returns the number of days since registration of the given URL. |\n| expiry_date | int | Returns the number of days to expiration of the given URL. |\n| intended_lifespan | int | Returns the number of days of intended lifespan of the given URL. |\n| dot_count | int | Returns the dot(.) count in the given URL. |\n| digit_count | int | Returns the digit count in the given URL. |\n| url_length | int | Returns the length of the given URL. |\n| fragments_count | int | Returns the fragment counts in the given URL. |\n| entropy | int | Returns the entropy of the given URL. |\n| check_http | bool | Checks for http headers in the given URL. |\n| check_http | bool | Checks for https headers in the given URL. |\n| url_response | bool | Checks for the URL response. |\n| check_encoding | bool | Checks for encoding in in the given URL. |\n| check_client | bool | Checks for client keyword in the given URL. |\n| check_admin | bool | Checks for admin keyword in the given URL. |\n| check_server | bool | Checks for server keyword in the given URL. |\n| check_login | bool | Checks for login keyword in the given URL. |\n| check_ports | bool | Checks for any ports in the given URL. |\n\n## Requirements\n\nThe `requirements.txt` file has details of all Python libraries for this package, and can be installed using \n```\npip install -r requirements.txt\n```\n\n## Organization\n\n    ├── src\n    │   ├── url_crawler\n              ├── init             \u003c- init\n              ├── url_crawler      \u003c- package source code for URL crawler\n    ├── setup.py             \u003c- setup file \n    ├── LICENSE              \u003c- LICENSE\n    ├── README.md            \u003c- README\n    ├── CONTRIBUTING.md      \u003c- contribution\n    ├── test.py              \u003c- test cases for unit testing\n    ├── requirements.txt     \u003c- requirements file for reproducing the code package\n\n## License\n\nMIT\n\n## Contributions\n\nFor steps on code contribution, please see [CONTRIBUTING](./CONTRIBUTING.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiampukar%2Furl_crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiampukar%2Furl_crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiampukar%2Furl_crawler/lists"}