An open API service indexing awesome lists of open source software.

https://github.com/tirkarthi/simple-crawler

A simple web crawler
https://github.com/tirkarthi/simple-crawler

Last synced: 4 months ago
JSON representation

A simple web crawler

Awesome Lists containing this project

README

        

# Crawler [![Build Status](https://travis-ci.org/tirkarthi/simple-crawler.svg?branch=master)](https://travis-ci.org/tirkarthi/simple-crawler)

A simple crawler that classifies the links in the page.

# Installation

* Clone the repo
* Create a virtualenv with `python3 -m venv crawler-env`
* Activate the virtualenv with `source crawler-env/bin/activate`
* Install requirements with `pip install -r requirements.txt`
* Run the crawler with `python run.py --url https://www.example.com --limit 1`

# Usage

```
usage: run.py [-h] --url URL [--limit LIMIT]

A simple web cralwer

optional arguments:
-h, --help show this help message and exit
--url URL URL to crawl
--limit LIMIT Number of internal URLs to crawl
```

# License

Copyright © 2018 Karthikeyan S

Distributed under the MIT Public License