{"id":13502928,"url":"https://github.com/Skallwar/suckit","last_synced_at":"2025-03-29T12:33:16.212Z","repository":{"id":40350067,"uuid":"218343351","full_name":"Skallwar/suckit","owner":"Skallwar","description":"Suck the InTernet","archived":false,"fork":false,"pushed_at":"2024-04-05T16:17:01.000Z","size":909,"stargazers_count":761,"open_issues_count":38,"forks_count":40,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-03-24T01:47:01.734Z","etag":null,"topics":["hacktoberfest","rust","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Skallwar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-29T17:21:51.000Z","updated_at":"2025-03-18T02:18:41.000Z","dependencies_parsed_at":"2024-06-18T19:57:41.854Z","dependency_job_id":null,"html_url":"https://github.com/Skallwar/suckit","commit_stats":{"total_commits":254,"total_committers":21,"mean_commits":"12.095238095238095","dds":0.6732283464566929,"last_synced_commit":"208c74716587addeb011348080b846a7e073b7cc"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Skallwar%2Fsuckit","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Skallwar%2Fsuckit/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Skallwar%2Fsuckit/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Skallwar%2Fsuckit/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Skallwar","download_url":"https://codeload.github.com/Skallwar/suckit/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246187190,"owners_count":20737459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hacktoberfest","rust","webscraping"],"created_at":"2024-07-31T22:02:30.495Z","updated_at":"2025-03-29T12:33:15.898Z","avatar_url":"https://github.com/Skallwar.png","language":"Rust","readme":"![Build and test](https://github.com/Skallwar/suckit/workflows/Build%20and%20test/badge.svg)\n[![codecov](https://codecov.io/gh/Skallwar/suckit/branch/master/graph/badge.svg?token=ZLD369AY2G)](https://codecov.io/gh/Skallwar/suckit)\n[![Crates.io](https://img.shields.io/crates/v/suckit.svg)](https://crates.io/crates/suckit)\n[![Docs](https://docs.rs/suckit/badge.svg)](https://docs.rs/suckit)\n[![Deps](https://deps.rs/repo/github/Skallwar/suckit/status.svg)](https://deps.rs/repo/github/Skallwar/suckit)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n![MSRV](https://img.shields.io/badge/MSRV-1.70.0-blue)\n\n# SuckIT\n\n`SuckIT` allows you to recursively visit and download a website's content to\nyour disk.\n\n![SuckIT Logo](media/suckit_logo.png)\n\n# Features\n\n* [x] Vacuums the entirety of a website recursively\n* [x] Uses multithreading\n* [x] Writes the website's content to your disk\n* [x] Enables offline navigation\n* [x] Offers random delays to avoid IP banning\n* [ ] Saves application state on CTRL-C for later pickup\n\n# Options\n```console\nUSAGE:\n    suckit [FLAGS] [OPTIONS] \u003curl\u003e\n\nFLAGS:\n    -c, --continue-on-error                  Flag to enable or disable exit on error\n        --disable-certs-checks               Dissable SSL certificates verification\n        --dry-run                            Do everything without saving the files to the disk\n    -h, --help                               Prints help information\n    -V, --version                            Prints version information\n    -v, --verbose                            Enable more information regarding the scraping process\n        --visit-filter-is-download-filter    Use the dowload filter in/exclude regexes for visiting as well\n\nOPTIONS:\n    -a, --auth \u003cauth\u003e...\n            HTTP basic authentication credentials space-separated as \"username password host\". Can be repeated for\n            multiple credentials as \"u1 p1 h1 u2 p2 h2\"\n        --cookie \u003ccookie\u003e\n            Cookie to send with each request, format: key1=value1;key2=value2 [default: ]\n\n        --delay \u003cdelay\u003e\n            Add a delay in seconds between downloads to reduce the likelihood of getting banned [default: 0]\n\n    -d, --depth \u003cdepth\u003e\n            Maximum recursion depth to reach when visiting. Default is -1 (infinity) [default: -1]\n\n    -e, --exclude-download \u003cexclude-download\u003e\n            Regex filter to exclude saving pages that match this expression [default: $^]\n\n        --exclude-visit \u003cexclude-visit\u003e\n            Regex filter to exclude visiting pages that match this expression [default: $^]\n\n        --ext-depth \u003cext-depth\u003e\n            Maximum recursion depth to reach when visiting external domains. Default is 0. -1 means infinity [default:\n            0]\n    -i, --include-download \u003cinclude-download\u003e\n            Regex filter to limit to only saving pages that match this expression [default: .*]\n\n        --include-visit \u003cinclude-visit\u003e\n            Regex filter to limit to only visiting pages that match this expression [default: .*]\n\n    -j, --jobs \u003cjobs\u003e                            Maximum number of threads to use concurrently [default: 1]\n    -o, --output \u003coutput\u003e                        Output directory\n        --random-range \u003crandom-range\u003e\n            Generate an extra random delay between downloads, from 0 to this number. This is added to the base delay\n            seconds [default: 0]\n    -t, --tries \u003ctries\u003e                          Maximum amount of retries on download failure [default: 20]\n    -u, --user-agent \u003cuser-agent\u003e                User agent to be used for sending requests [default: suckit]\n\nARGS:\n    \u003curl\u003e    Entry point of the scraping\n```\n\n# Example\n\nA common use case could be the following:\n\n`suckit http://books.toscrape.com -j 8 -o /path/to/downloaded/pages/`\n\n![asciicast](media/suckit-adjusted-120cols-40rows-100ms.svg)\n\n# Installation\n\nAs of right now, `SuckIT` does not work on Windows.\n\nTo install it, you need to have Rust installed.\n\n* Check out [this link](https://www.rust-lang.org/learn/get-started) for\ninstructions on how to install Rust.\n\n* If you just want to install the suckit executable, you can simply run\n`cargo install --git https://github.com/skallwar/suckit`\n\n* Now, run it from anywhere with the `suckit` command.\n\n### Arch Linux\n\n`suckit` can be installed from available [AUR packages](https://aur.archlinux.org/packages/?O=0\u0026SeB=b\u0026K=suckit\u0026outdated=\u0026SB=n\u0026SO=a\u0026PP=50\u0026do_Search=Go) using an [AUR helper](https://wiki.archlinux.org/index.php/AUR_helpers). For example,\n\n```\nyay -S suckit\n```\n\n__Want to contribute ? Feel free to\n[open an issue](https://github.com/Skallwar/suckit/issues/new) or\n[submit a PR](https://github.com/Skallwar/suckit/compare) !__\n\n# License\n\nSuckIT is primarily distributed under the terms of both the MIT license\nand the Apache License (Version 2.0)\n\nSee [LICENSE-APACHE](LICENSE-APACHE) and [LICENSE-MIT](LICENSE-MIT) for details.\n","funding_links":[],"categories":["Rust","Applications"],"sub_categories":["Utilities"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSkallwar%2Fsuckit","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSkallwar%2Fsuckit","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSkallwar%2Fsuckit/lists"}