Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Antosser/web-crawler
Rust Web Crawler that finds every page, image, and script on a website (and downloads it)
https://github.com/Antosser/web-crawler
crawler html rust seo web
Last synced: 6 days ago
JSON representation
Rust Web Crawler that finds every page, image, and script on a website (and downloads it)
- Host: GitHub
- URL: https://github.com/Antosser/web-crawler
- Owner: Antosser
- Created: 2023-01-22T19:43:09.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-04-23T08:41:47.000Z (9 months ago)
- Last Synced: 2024-04-23T22:55:50.191Z (9 months ago)
- Topics: crawler, html, rust, seo, web
- Language: Rust
- Homepage:
- Size: 220 KB
- Stars: 5
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-rust-list - Antosser/web-crawler - crawler?style=social"/> : Rust Web Crawler that finds every page, image, and script on a website (and downloads it) (Web Crawler)
- awesome-rust-list - Antosser/web-crawler - crawler?style=social"/> : Rust Web Crawler that finds every page, image, and script on a website (and downloads it) (Web Crawler)
README
# Web Crawler
Finds every page, image, and script on a website (and downloads it)
## Usage
```
Rust Web CrawlerUsage: web-crawler [OPTIONS]
Arguments:
Options:
-d, --download
Download all files
-c, --crawl-external
Whether or not to crawl other websites it finds a link to. Might result in downloading the entire internet
-m, --max-url-length
Maximum url length it allows. Will ignore page it url length reaches this limit [default: 300]
-e, --exclude
Will ignore paths that start with these strings (comma-seperated)
--export
Where to export found URLs
--export-internal
Where to export internal URLs
--export-external
Where to export external URLs
-t, --timeout
Timeout between requests in milliseconds [default: 100]
-h, --help
Print help
-V, --version
Print version
```## How to compile yourself
1. Download Rust
2. Type `cargo build -r`
3. Executable is in `target/release`**or**
1. Download Rust
2. Install using `cargo install web-crawler`