{"id":25098224,"url":"https://github.com/antosser/web-crawler","last_synced_at":"2025-09-04T07:16:15.516Z","repository":{"id":128802474,"uuid":"592057998","full_name":"Antosser/web-crawler","owner":"Antosser","description":"Rust Web Crawler that finds every page, image, and script on a website (and downloads it)","archived":false,"fork":false,"pushed_at":"2024-04-23T08:41:47.000Z","size":225,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-04-23T22:55:50.191Z","etag":null,"topics":["crawler","html","rust","seo","web"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Antosser.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-22T19:43:09.000Z","updated_at":"2024-08-05T14:13:34.536Z","dependencies_parsed_at":null,"dependency_job_id":"ccb37d3e-8790-48dd-8861-d2020beb9c04","html_url":"https://github.com/Antosser/web-crawler","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antosser%2Fweb-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antosser%2Fweb-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antosser%2Fweb-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Antosser%2Fweb-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Antosser","download_url":"https://codeload.github.com/Antosser/web-crawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249698873,"owners_count":21312285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","html","rust","seo","web"],"created_at":"2025-02-07T18:30:43.299Z","updated_at":"2025-04-19T12:48:41.229Z","avatar_url":"https://github.com/Antosser.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Web Crawler\n\nFinds every page, image, and script on a website (and downloads it)\n\n## Usage\n\n```\nRust Web Crawler\n\nUsage: web-crawler [OPTIONS] \u003cURL\u003e\n\nArguments:\n  \u003cURL\u003e\n\nOptions:\n  -d, --download\n          Download all files\n  -c, --crawl-external\n          Whether or not to crawl other websites it finds a link to. Might result in downloading the entire internet\n  -m, --max-url-length \u003cMAX_URL_LENGTH\u003e\n          Maximum url length it allows. Will ignore page it url length reaches this limit [default: 300]\n  -e, --exclude \u003cEXCLUDE\u003e\n          Will ignore paths that start with these strings (comma-seperated)\n      --export \u003cEXPORT\u003e\n          Where to export found URLs\n      --export-internal \u003cEXPORT_INTERNAL\u003e\n          Where to export internal URLs\n      --export-external \u003cEXPORT_EXTERNAL\u003e\n          Where to export external URLs\n  -t, --timeout \u003cTIMEOUT\u003e\n          Timeout between requests in milliseconds [default: 100]\n  -h, --help\n          Print help\n  -V, --version\n          Print version\n```\n\n## How to compile yourself\n\n1. Download Rust\n2. Type `cargo build -r`\n3. Executable is in `target/release`\n\n**or**\n\n1. Download Rust\n2. Install using `cargo install web-crawler`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantosser%2Fweb-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fantosser%2Fweb-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fantosser%2Fweb-crawler/lists"}