https://github.com/thesurlydev/surly-spider
A command line interface for the spider library
https://github.com/thesurlydev/surly-spider
crawl crawler rust spider surly surly-spider
Last synced: 4 months ago
JSON representation
A command line interface for the spider library
- Host: GitHub
- URL: https://github.com/thesurlydev/surly-spider
- Owner: thesurlydev
- License: mit
- Created: 2021-11-13T00:57:11.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-03-05T17:14:48.000Z (over 1 year ago)
- Last Synced: 2025-09-30T05:55:27.222Z (8 months ago)
- Topics: crawl, crawler, rust, spider, surly, surly-spider
- Language: Rust
- Homepage:
- Size: 46.9 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# spider
A command line interface for crawling websites and storing their content.
## usage
```shell
USAGE:
ss [FLAGS] [OPTIONS] --domain
FLAGS:
-h, --help Prints help information
-r, --respect-robots Respect robots.txt file and not scrape not allowed files
-V, --version Prints version information
-v, --verbose Turn verbose logging on
OPTIONS:
-c, --concurrency How many request can be run simultaneously
-d, --domain Domain to crawl
-p, --polite-delay Polite crawling delay in milli seconds
-m, --max-depth Maximum crawl depth from the starting URL
-t, --timeout Timeout for HTTP requests in seconds
-u, --user-agent Custom User-Agent string for HTTP requests
-o, --output-dir Directory to store output (default: ./spider-output)
```