https://github.com/iampukar/url_crawler

A Python library to crawl the details of a URL.
https://github.com/iampukar/url_crawler

page-crawler python-crawler python-webcrawler url-crawler webpage-crawler

Last synced: 6 months ago
JSON representation

A Python library to crawl the details of a URL.

Host: GitHub
URL: https://github.com/iampukar/url_crawler
Owner: iampukar
License: mit
Created: 2022-03-30T11:42:44.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-03-30T14:56:59.000Z (over 3 years ago)
Last Synced: 2025-03-26T13:38:21.433Z (7 months ago)
Topics: page-crawler, python-crawler, python-webcrawler, url-crawler, webpage-crawler
Language: Python
Homepage: https://pypi.org/project/url-crawler/1.0.0/
Size: 11.7 KB
Stars: 2
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

README

          # Overview

-----------------------------------------------------------------------------

url_crawler is a Python library to crawl the details of a URL. 

## Package Installer 

    pip install url-crawler==1.0.0

## Usage

    from url_crawler import url_crawler

    '''

      url -> string URL to crawl for information.

    '''

    url_details = url_crawler(url)

    

    print(url_details.url)

    print(url_details.domain)

    print(url_details.check_https)

    print(url_details.dot_count)

    print(url_details.digit_count)

    print(url_details.url_length)

    

**Utilities**

| Name           | Output | Description  |

| ------------- | -----| -----|

| url | str | Returns the string url. |

| domain | str | Returns the domain of the url. |

| registrar | str | Returns the registrar for the given URL. |

| registered_country | str | Returns the registered domain country of the given URL. |

| whois | dict | Returns the whois information of the given URL. |

| registration_date | int | Returns the number of days since registration of the given URL. |

| expiry_date | int | Returns the number of days to expiration of the given URL. |

| intended_lifespan | int | Returns the number of days of intended lifespan of the given URL. |

| dot_count | int | Returns the dot(.) count in the given URL. |

| digit_count | int | Returns the digit count in the given URL. |

| url_length | int | Returns the length of the given URL. |

| fragments_count | int | Returns the fragment counts in the given URL. |

| entropy | int | Returns the entropy of the given URL. |

| check_http | bool | Checks for http headers in the given URL. |

| check_http | bool | Checks for https headers in the given URL. |

| url_response | bool | Checks for the URL response. |

| check_encoding | bool | Checks for encoding in in the given URL. |

| check_client | bool | Checks for client keyword in the given URL. |

| check_admin | bool | Checks for admin keyword in the given URL. |

| check_server | bool | Checks for server keyword in the given URL. |

| check_login | bool | Checks for login keyword in the given URL. |

| check_ports | bool | Checks for any ports in the given URL. |

## Requirements

The `requirements.txt` file has details of all Python libraries for this package, and can be installed using 

```

pip install -r requirements.txt

```

## Organization

    ├── src

    │   ├── url_crawler

              ├── init             <- init

              ├── url_crawler      <- package source code for URL crawler

    ├── setup.py             <- setup file 

    ├── LICENSE              <- LICENSE

    ├── README.md            <- README

    ├── CONTRIBUTING.md      <- contribution

    ├── test.py              <- test cases for unit testing

    ├── requirements.txt     <- requirements file for reproducing the code package

## License

MIT

## Contributions

For steps on code contribution, please see [CONTRIBUTING](./CONTRIBUTING.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/iampukar/url_crawler

Awesome Lists containing this project

README