https://github.com/0memo07/web-crawler
Web Crawler with Python
https://github.com/0memo07/web-crawler
beautifulsoup4 bs4 crawler crawlers crawling crawling-python web-crawler web-crawler-python web-crawling webcrawler
Last synced: about 2 months ago
JSON representation
Web Crawler with Python
- Host: GitHub
- URL: https://github.com/0memo07/web-crawler
- Owner: 0MeMo07
- License: mit
- Created: 2023-06-01T15:39:55.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-08-20T10:44:24.000Z (almost 2 years ago)
- Last Synced: 2025-03-30T05:51:11.222Z (3 months ago)
- Topics: beautifulsoup4, bs4, crawler, crawlers, crawling, crawling-python, web-crawler, web-crawler-python, web-crawling, webcrawler
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 7
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Python Web Crawler
This Python project can be used as a corresponding web crawler from a specific URL. This browser monitors its features while navigating on the given URL and keeps detailed logs for each URL visited.
## Features
- Searches a large number of HTML from a URL.
- Finds links in the explored HTML content and adds URLs to visit them.
- You can set the maximum depth level.
- Keeps a list of visited URLs and does not revisit the same URL.
- There are appropriate error message and exception handling elements for error handling.
- Uses color logging.## Use
1. Clone the project:```bash
git clone https://github.com/0MeMo07/Web-Crawler.git
2. Go to the project directory:
```bash
cd Web-Crawler
3. Install required dependencies:
```bash
pip install -r requirements.txt
4. Run the crawler Python file::
```bash
python crawler.py## Support me