https://github.com/0memo07/web-crawler

Web Crawler with Python
https://github.com/0memo07/web-crawler

beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler

Last synced: about 2 months ago
JSON representation

Web Crawler with Python

Host: GitHub
URL: https://github.com/0memo07/web-crawler
Owner: 0MeMo07
License: mit
Created: 2023-06-01T15:39:55.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-08-20T10:44:24.000Z (almost 2 years ago)
Last Synced: 2025-03-30T05:51:11.222Z (3 months ago)
Topics: beautifulsoup4, bs4, crawler, crawlers, crawling, crawling-python, web-crawler, web-crawler-python, web-crawling, webcrawler
Language: Python
Homepage:
Size: 8.79 KB
Stars: 7
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Python Web Crawler

This Python project can be used as a corresponding web crawler from a specific URL. This browser monitors its features while navigating on the given URL and keeps detailed logs for each URL visited.

## Features

- Searches a large number of HTML from a URL.
- Finds links in the explored HTML content and adds URLs to visit them.
- You can set the maximum depth level.
- Keeps a list of visited URLs and does not revisit the same URL.
- There are appropriate error message and exception handling elements for error handling.
- Uses color logging.

## Use
1. Clone the project:

```bash
git clone https://github.com/0MeMo07/Web-Crawler.git
2. Go to the project directory:
```bash
cd Web-Crawler
3. Install required dependencies:
```bash
pip install -r requirements.txt
4. Run the crawler Python file::
```bash
python crawler.py

## Support me

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/0memo07/web-crawler

Awesome Lists containing this project

README