{"id":37657526,"url":"https://github.com/aneesh-aparajit/reddit-crawler","last_synced_at":"2026-01-16T11:43:55.793Z","repository":{"id":65335556,"uuid":"582619418","full_name":"aneesh-aparajit/reddit-crawler","owner":"aneesh-aparajit","description":"Reddit Crawler API for collecting datasets from Reddit.","archived":false,"fork":false,"pushed_at":"2022-12-31T07:56:51.000Z","size":94,"stargazers_count":11,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-27T15:18:57.808Z","etag":null,"topics":["crawler","nlp","python","reddit","scraper","web-crawler"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aneesh-aparajit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-12-27T11:37:44.000Z","updated_at":"2025-01-02T08:13:17.000Z","dependencies_parsed_at":"2023-01-31T18:32:24.484Z","dependency_job_id":null,"html_url":"https://github.com/aneesh-aparajit/reddit-crawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/aneesh-aparajit/reddit-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneesh-aparajit%2Freddit-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneesh-aparajit%2Freddit-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneesh-aparajit%2Freddit-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneesh-aparajit%2Freddit-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aneesh-aparajit","download_url":"https://codeload.github.com/aneesh-aparajit/reddit-crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aneesh-aparajit%2Freddit-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28478359,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T06:30:42.265Z","status":"ssl_error","status_checked_at":"2026-01-16T06:30:16.248Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","nlp","python","reddit","scraper","web-crawler"],"created_at":"2026-01-16T11:43:55.254Z","updated_at":"2026-01-16T11:43:55.768Z","avatar_url":"https://github.com/aneesh-aparajit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Reddit Multimodal Crawler [![Downloads](https://static.pepy.tech/personalized-badge/reddit-multimodal-crawler?period=total\u0026units=abbreviation\u0026left_color=blue\u0026right_color=blue\u0026left_text=Downloads)](https://pepy.tech/project/reddit-multimodal-crawler)\n\nThis is a wrapper to the `PRAW` package to scrape content from image in the form of `csv`, `json`, `tsv`, `sql` files.\n\nThis repository will help you scrape various subreddits, and will return to you multi-media attributes.\n\nYou can pip install this to integrate with some other application, or use it as an commandline application.\n\n- PyPI Link:  https://pypi.org/project/reddit-multimodal-crawler/\n\n```commandLine\npip install reddit-multimodal-crawler\n```\n\n## How to use the repository?\n\nBefore running the code, you should have registered with the Reddit API and create a sample project to run the code and obtain the `client_id`, `client_secret` and make a `user_agent`. Then pass them in the arguements.\n\nAlthough, the easier way is to use the `pip install reddit-multimodal-crawler`.\n\n## Functionalities\n\nThis will help you scrape multiple subreddits just like `PRAW` but, will also return and save datasets for the same. Will scrape the posts and the comments as well.\n\n### Sample Code\n\n```python\nimport nltk\nfrom reddit_multimodal_crawler.crawler import Crawler\nimport argparse\n\nnltk.download(\"vader_lexicon\")\n\nif __name__ == \"__main__\":\n\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\n        \"--subreddit_file_path\",\n        \"A path to the file which contains the subreddits to scrape from.\",\n        type=str,\n    )\n    parser.add_argument(\n        \"--limit\", \"The limit to number of articles to scrape.\", type=int\n    )\n    parser.add_argument(\"--client_id\", \"The Client ID provided by Reddit.\", type=str)\n    parser.add_argument(\n        \"--client_secret\", \"The Secret ID provided by the Reddit.\", type=str\n    )\n    parser.add_argument(\n        \"--user_agent\",\n        \"The User Agent in the form of \u003cAPP_NAME\u003e \u003cVERSION\u003e by /u/\u003cREDDIT_USERNAME\u003e\",\n        type=str,\n    )\n    parser.add_argument(\n        \"--posts\", \"A boolean variable to parse through the posts or not.\", type=bool\n    )\n    parser.add_argument(\n        \"--comments\",\n        \"A boolean variable to parse through the comments of the top posts of subreddit\",\n        type=bool,\n    )\n\n    args = parser.parse_args()\n\n    client_id = args[\"client_id\"]\n    client_secret = args[\"client_secret\"]\n    user_agent = args[\"user_agent\"]\n    file_path = args[\"subreddit_file_path\"]\n    limit = args[\"limit\"]\n\n    r = Crawler(client_id=client_id, client_secret=client_secret, user_agent=user_agent)\n\n    subreddit_list = open(file_path, \"r\").readlines().split()\n\n    print(subreddit_list)\n\n    if args[\"posts\"]:\n        r.get_posts(subreddit_names=subreddit_list, sort_by=\"top\", limit=limit)\n\n    if args[\"comments\"]:\n        r.get_comments(subreddit_names=subreddit_list, sort_by=\"top\", limit=limit)\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faneesh-aparajit%2Freddit-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faneesh-aparajit%2Freddit-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faneesh-aparajit%2Freddit-crawler/lists"}