An open API service indexing awesome lists of open source software.

https://github.com/thesurlydev/surly-spider

A command line interface for the spider library
https://github.com/thesurlydev/surly-spider

crawl crawler rust spider surly surly-spider

Last synced: 4 months ago
JSON representation

A command line interface for the spider library

Awesome Lists containing this project

README

          

# spider

A command line interface for crawling websites and storing their content.

## usage

```shell
USAGE:
ss [FLAGS] [OPTIONS] --domain

FLAGS:
-h, --help Prints help information
-r, --respect-robots Respect robots.txt file and not scrape not allowed files
-V, --version Prints version information
-v, --verbose Turn verbose logging on

OPTIONS:
-c, --concurrency How many request can be run simultaneously
-d, --domain Domain to crawl
-p, --polite-delay Polite crawling delay in milli seconds
-m, --max-depth Maximum crawl depth from the starting URL
-t, --timeout Timeout for HTTP requests in seconds
-u, --user-agent Custom User-Agent string for HTTP requests
-o, --output-dir Directory to store output (default: ./spider-output)
```