Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/rly0nheart/tarantula
Python web crawler tool
https://github.com/rly0nheart/tarantula
crawling scraping web-crawler web-scraping
Last synced: 3 months ago
JSON representation
Python web crawler tool
- Host: GitHub
- URL: https://github.com/rly0nheart/tarantula
- Owner: rly0nheart
- License: mit
- Created: 2021-08-11T10:15:03.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2021-11-04T11:42:28.000Z (about 3 years ago)
- Last Synced: 2023-03-06T22:34:59.381Z (almost 2 years ago)
- Topics: crawling, scraping, web-crawler, web-scraping
- Language: Python
- Homepage: https://git.io/JRihm
- Size: 590 KB
- Stars: 10
- Watchers: 1
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![Python Version](https://img.shields.io/badge/python-3.x-blue?style=flat&logo=python)
![OS](https://img.shields.io/badge/OS-GNU%2FLinux-red?style=flat&logo=linux)
![GitHub](https://img.shields.io/github/license/rlyonheart/tarantula?style=flat)
![GitHub repo size](https://img.shields.io/github/repo-size/rlyonheart/tarantula)
![Lines of code](https://img.shields.io/tokei/lines/github/rlyonheart/tarantula)
![CodeFactor](https://www.codefactor.io/repository/github/rlyonheart/tarantula/badge)
![Twitter](https://img.shields.io/twitter/follow/rly0nheart?&style=flat&logo=twitter)
[![asciicast](https://asciinema.org/a/446985.svg)](https://asciinema.org/a/446985)Python web crawler tool.
scrapes internal and external urls# Installation
**Clone this repo:**
```
git clone https://github.com/rlyonheart/tarantula.git
``````
cd tarantula
``````
pip install -r requirements.txt
```# Optional Arguments
| Flag | MetaVar| Usage|
| ------------- |:----------------------:|:---------:|
|-c/--count
| **NUMBER** | *Number of links to crawl (default is 30)* |
|-v/--verbose
| | *run tarantula in verbose mode* |