{"id":15780301,"url":"https://github.com/ecklf/reddit-clawler","last_synced_at":"2025-03-31T16:35:14.610Z","repository":{"id":211752255,"uuid":"728891901","full_name":"ecklf/reddit-clawler","owner":"ecklf","description":"A command-line tool written in Rust that crawls Reddit posts from a user or subreddit","archived":false,"fork":false,"pushed_at":"2024-09-09T16:59:15.000Z","size":141,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-10-04T18:41:43.716Z","etag":null,"topics":["cli","crawler","downloader","downloader-for-reddit","reddit"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ecklf.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-07T23:30:57.000Z","updated_at":"2024-09-09T16:58:01.000Z","dependencies_parsed_at":"2023-12-12T23:30:20.927Z","dependency_job_id":"a271f7e6-4c04-41d6-a7e5-127dc17f7063","html_url":"https://github.com/ecklf/reddit-clawler","commit_stats":null,"previous_names":["ecklf/reddit-clawler"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecklf%2Freddit-clawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecklf%2Freddit-clawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecklf%2Freddit-clawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ecklf%2Freddit-clawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ecklf","download_url":"https://codeload.github.com/ecklf/reddit-clawler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246498506,"owners_count":20787326,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","crawler","downloader","downloader-for-reddit","reddit"],"created_at":"2024-10-04T18:41:11.438Z","updated_at":"2025-03-31T16:35:14.586Z","avatar_url":"https://github.com/ecklf.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Reddit Clawler 🐾\n\nA command-line tool written in Rust that crawls Reddit posts from a user, subreddit, or search term.\n\n## Usage\n\nInstall the following dependencies:\n\n- [yt-dlp](https://github.com/yt-dlp/yt-dlp)\n\n\n## Commands\n\nYou can see all available commands by running:\n\n```sh\n./reddit_clawler --help\n```\n\nBy default, the tool will download posts to the `output/{subcommand}/{value}` folder \n\n### User\nCrawls posts from `/u/spez` with spawning `50` tasks to `./downloads/user/spez`:\n\n```sh\n./reddit_clawler user spez  --category new --tasks 50 -o ./downloads\n```\n\n### Subreddit \nCrawls posts from `/r/redpandas` from the `top` category, filtered by `hour`:\n\n```sh\n./reddit_clawler subreddit redpandas --category top --timeframe hour\n```\n\n### Search \nCrawls posts for search term `olympics` from the `top` category, filtered by `hour`:\n\n```sh\n./reddit_clawler search olympics --category top --timeframe hour\n```\n\n## Features\n\n### Providers (these are the most common I found):\n\n- [x] Reddit Media\n- [x] Imgur Media\n- [x] YouTube Videos\n- [x] Redgifs Videos\n\n### Caching\n\nAfter the downloads have finished, a `cache.json` file will be created in the folder of the downloaded resource.\nThis file keeps track of the posts you have already downloaded and skips downloading them on subsequent runs.\n\n### Rate limiting\n\nQuerying posts is paginated (100 items per requests) and can lead to rate limiting.\nTo avoid this, you can provide a `--limit` flag to limit the number of requests for fetching a resource.\nThis can be useful for subsequent crawling.\n\n### File format\n\nBy default it will prefer `mp4` over `gif`, if available.\n\n## Planned\n\n- [ ] Providing custom filename scheme\n- [ ] Configuration for conversion to other/small formats (`avif`/`webp`/`webm`)\n- [ ] Remove duplicated\n\n## Development\n\nYou can use the `--skip` flag to skip the download process:\n\n```sh\ncargo run -- user spez --skip\n```\n\nYou can use the `--mock` flag to provide a mock file for the responses of the Reddit client:\n\n```sh\ncargo run -- user spez --mock ./tests/mocks/reddit/submitted_response/reddit_video.json\n```\n\n## License\n\nReddit Clawler is licensed under the GNU General Public License v3.0. See the LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fecklf%2Freddit-clawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fecklf%2Freddit-clawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fecklf%2Freddit-clawler/lists"}