https://github.com/tirkarthi/simple-crawler
A simple web crawler
https://github.com/tirkarthi/simple-crawler
Last synced: 4 months ago
JSON representation
A simple web crawler
- Host: GitHub
- URL: https://github.com/tirkarthi/simple-crawler
- Owner: tirkarthi
- License: mit
- Created: 2018-03-20T17:14:28.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-03-20T17:30:52.000Z (about 7 years ago)
- Last Synced: 2025-01-05T12:42:35.977Z (5 months ago)
- Language: HTML
- Size: 9.77 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Crawler [](https://travis-ci.org/tirkarthi/simple-crawler)
A simple crawler that classifies the links in the page.
# Installation
* Clone the repo
* Create a virtualenv with `python3 -m venv crawler-env`
* Activate the virtualenv with `source crawler-env/bin/activate`
* Install requirements with `pip install -r requirements.txt`
* Run the crawler with `python run.py --url https://www.example.com --limit 1`# Usage
```
usage: run.py [-h] --url URL [--limit LIMIT]A simple web cralwer
optional arguments:
-h, --help show this help message and exit
--url URL URL to crawl
--limit LIMIT Number of internal URLs to crawl
```# License
Copyright © 2018 Karthikeyan S
Distributed under the MIT Public License