Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iampukar/url_crawler
A Python library to crawl the details of a URL.
https://github.com/iampukar/url_crawler
page-crawler python-crawler python-webcrawler url-crawler webpage-crawler
Last synced: about 2 months ago
JSON representation
A Python library to crawl the details of a URL.
- Host: GitHub
- URL: https://github.com/iampukar/url_crawler
- Owner: iampukar
- License: mit
- Created: 2022-03-30T11:42:44.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-03-30T14:56:59.000Z (over 2 years ago)
- Last Synced: 2024-10-06T20:06:14.037Z (3 months ago)
- Topics: page-crawler, python-crawler, python-webcrawler, url-crawler, webpage-crawler
- Language: Python
- Homepage: https://pypi.org/project/url-crawler/1.0.0/
- Size: 11.7 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Overview
-----------------------------------------------------------------------------url_crawler is a Python library to crawl the details of a URL.
## Package Installer
pip install url-crawler==1.0.0
## Usage
from url_crawler import url_crawler
'''
url -> string URL to crawl for information.
'''
url_details = url_crawler(url)
print(url_details.url)
print(url_details.domain)
print(url_details.check_https)
print(url_details.dot_count)
print(url_details.digit_count)
print(url_details.url_length)
**Utilities**| Name | Output | Description |
| ------------- | -----| -----|
| url | str | Returns the string url. |
| domain | str | Returns the domain of the url. |
| registrar | str | Returns the registrar for the given URL. |
| registered_country | str | Returns the registered domain country of the given URL. |
| whois | dict | Returns the whois information of the given URL. |
| registration_date | int | Returns the number of days since registration of the given URL. |
| expiry_date | int | Returns the number of days to expiration of the given URL. |
| intended_lifespan | int | Returns the number of days of intended lifespan of the given URL. |
| dot_count | int | Returns the dot(.) count in the given URL. |
| digit_count | int | Returns the digit count in the given URL. |
| url_length | int | Returns the length of the given URL. |
| fragments_count | int | Returns the fragment counts in the given URL. |
| entropy | int | Returns the entropy of the given URL. |
| check_http | bool | Checks for http headers in the given URL. |
| check_http | bool | Checks for https headers in the given URL. |
| url_response | bool | Checks for the URL response. |
| check_encoding | bool | Checks for encoding in in the given URL. |
| check_client | bool | Checks for client keyword in the given URL. |
| check_admin | bool | Checks for admin keyword in the given URL. |
| check_server | bool | Checks for server keyword in the given URL. |
| check_login | bool | Checks for login keyword in the given URL. |
| check_ports | bool | Checks for any ports in the given URL. |## Requirements
The `requirements.txt` file has details of all Python libraries for this package, and can be installed using
```
pip install -r requirements.txt
```## Organization
├── src
│ ├── url_crawler
├── init <- init
├── url_crawler <- package source code for URL crawler
├── setup.py <- setup file
├── LICENSE <- LICENSE
├── README.md <- README
├── CONTRIBUTING.md <- contribution
├── test.py <- test cases for unit testing
├── requirements.txt <- requirements file for reproducing the code package## License
MIT
## Contributions
For steps on code contribution, please see [CONTRIBUTING](./CONTRIBUTING.md).