Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/skallwar/suckit
Suck the InTernet
https://github.com/skallwar/suckit
hacktoberfest rust webscraping
Last synced: about 2 months ago
JSON representation
Suck the InTernet
- Host: GitHub
- URL: https://github.com/skallwar/suckit
- Owner: Skallwar
- License: apache-2.0
- Created: 2019-10-29T17:21:51.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-04-05T16:17:01.000Z (9 months ago)
- Last Synced: 2024-10-31T22:33:11.628Z (2 months ago)
- Topics: hacktoberfest, rust, webscraping
- Language: Rust
- Homepage:
- Size: 888 KB
- Stars: 741
- Watchers: 10
- Forks: 39
- Open Issues: 37
-
Metadata Files:
- Readme: README.md
- License: LICENSE-APACHE
Awesome Lists containing this project
- awesome-datahoarding - Suck-It - threaded) (Download utilities / General)
README
![Build and test](https://github.com/Skallwar/suckit/workflows/Build%20and%20test/badge.svg)
[![codecov](https://codecov.io/gh/Skallwar/suckit/branch/master/graph/badge.svg?token=ZLD369AY2G)](https://codecov.io/gh/Skallwar/suckit)
[![Crates.io](https://img.shields.io/crates/v/suckit.svg)](https://crates.io/crates/suckit)
[![Docs](https://docs.rs/suckit/badge.svg)](https://docs.rs/suckit)
[![Deps](https://deps.rs/repo/github/Skallwar/suckit/status.svg)](https://deps.rs/repo/github/Skallwar/suckit)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
![MSRV](https://img.shields.io/badge/MSRV-1.70.0-blue)# SuckIT
`SuckIT` allows you to recursively visit and download a website's content to
your disk.![SuckIT Logo](media/suckit_logo.png)
# Features
* [x] Vacuums the entirety of a website recursively
* [x] Uses multithreading
* [x] Writes the website's content to your disk
* [x] Enables offline navigation
* [x] Offers random delays to avoid IP banning
* [ ] Saves application state on CTRL-C for later pickup# Options
```console
USAGE:
suckit [FLAGS] [OPTIONS]FLAGS:
-c, --continue-on-error Flag to enable or disable exit on error
--disable-certs-checks Dissable SSL certificates verification
--dry-run Do everything without saving the files to the disk
-h, --help Prints help information
-V, --version Prints version information
-v, --verbose Enable more information regarding the scraping process
--visit-filter-is-download-filter Use the dowload filter in/exclude regexes for visiting as wellOPTIONS:
-a, --auth ...
HTTP basic authentication credentials space-separated as "username password host". Can be repeated for
multiple credentials as "u1 p1 h1 u2 p2 h2"
--cookie
Cookie to send with each request, format: key1=value1;key2=value2 [default: ]--delay
Add a delay in seconds between downloads to reduce the likelihood of getting banned [default: 0]-d, --depth
Maximum recursion depth to reach when visiting. Default is -1 (infinity) [default: -1]-e, --exclude-download
Regex filter to exclude saving pages that match this expression [default: $^]--exclude-visit
Regex filter to exclude visiting pages that match this expression [default: $^]--ext-depth
Maximum recursion depth to reach when visiting external domains. Default is 0. -1 means infinity [default:
0]
-i, --include-download
Regex filter to limit to only saving pages that match this expression [default: .*]--include-visit
Regex filter to limit to only visiting pages that match this expression [default: .*]-j, --jobs Maximum number of threads to use concurrently [default: 1]
-o, --output Output directory
--random-range
Generate an extra random delay between downloads, from 0 to this number. This is added to the base delay
seconds [default: 0]
-t, --tries Maximum amount of retries on download failure [default: 20]
-u, --user-agent User agent to be used for sending requests [default: suckit]ARGS:
Entry point of the scraping
```# Example
A common use case could be the following:
`suckit http://books.toscrape.com -j 8 -o /path/to/downloaded/pages/`
![asciicast](media/suckit-adjusted-120cols-40rows-100ms.svg)
# Installation
As of right now, `SuckIT` does not work on Windows.
To install it, you need to have Rust installed.
* Check out [this link](https://www.rust-lang.org/learn/get-started) for
instructions on how to install Rust.* If you just want to install the suckit executable, you can simply run
`cargo install --git https://github.com/skallwar/suckit`* Now, run it from anywhere with the `suckit` command.
### Arch Linux
`suckit` can be installed from available [AUR packages](https://aur.archlinux.org/packages/?O=0&SeB=b&K=suckit&outdated=&SB=n&SO=a&PP=50&do_Search=Go) using an [AUR helper](https://wiki.archlinux.org/index.php/AUR_helpers). For example,
```
yay -S suckit
```__Want to contribute ? Feel free to
[open an issue](https://github.com/Skallwar/suckit/issues/new) or
[submit a PR](https://github.com/Skallwar/suckit/compare) !__# License
SuckIT is primarily distributed under the terms of both the MIT license
and the Apache License (Version 2.0)See [LICENSE-APACHE](LICENSE-APACHE) and [LICENSE-MIT](LICENSE-MIT) for details.