{"id":34103767,"url":"https://github.com/ericlingit/subreddit-trawler","last_synced_at":"2026-03-12T04:30:58.131Z","repository":{"id":63703891,"uuid":"570022480","full_name":"ericlingit/subreddit-trawler","owner":"ericlingit","description":"Scrape a subreddit's posts.","archived":false,"fork":false,"pushed_at":"2022-11-24T07:54:34.000Z","size":963,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-02-24T23:42:17.549Z","etag":null,"topics":["reddit","scraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ericlingit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-11-24T06:48:33.000Z","updated_at":"2022-11-24T06:51:00.000Z","dependencies_parsed_at":"2022-11-24T08:17:21.883Z","dependency_job_id":null,"html_url":"https://github.com/ericlingit/subreddit-trawler","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ericlingit/subreddit-trawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericlingit%2Fsubreddit-trawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericlingit%2Fsubreddit-trawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericlingit%2Fsubreddit-trawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericlingit%2Fsubreddit-trawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ericlingit","download_url":"https://codeload.github.com/ericlingit/subreddit-trawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ericlingit%2Fsubreddit-trawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30415448,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-12T04:25:42.844Z","status":"ssl_error","status_checked_at":"2026-03-12T04:25:34.624Z","response_time":114,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["reddit","scraper"],"created_at":"2025-12-14T17:50:50.478Z","updated_at":"2026-03-12T04:30:58.114Z","avatar_url":"https://github.com/ericlingit.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Subreddit trawler\n\nScrape sub reddit posts using the old url `https://old.reddit.com`.\n\nhttps://old.reddit.com/r/Chinatown_irl/\n\nhttps://old.reddit.com/r/China_irl/\n\n\n\n- scrape sub reddit\n    - visit each post link\n        - skip announcement\n            - if the url contains `predictions?tournament`, always skip this link. no old version is available.\n                - eg: `https://www.reddit.com/r/wallstreetbets/predictions?tournament=tnmt-0b14066a-ad68-4351-8261-d1c0740c44d2`\n        - scrape comments\n            - submit text\n            - submit image\n            - submit video\n            - nsfw/spoiler\n\n- find next button\n    - extract link\n    - go to link\n    - repeat above\n\nExamples for various post types:\n- [Text post](https://old.reddit.com/r/China_irl/comments/z0oio5)\n- [Image post](https://old.reddit.com/r/China_irl/comments/z0ojwn)\n- [Video post](https://old.reddit.com/r/China_irl/comments/yzv625)\n- [Gallery](https://old.reddit.com/r/China_irl/comments/z0728o)\n- [NSFW text (Whats the most NSFW experience you witnessed right in front of your eyes?)](https://old.reddit.com/r/AskReddit/comments/z0uq39)\n- [NSFW image (Grown man ass-kissing)](https://www.reddit.com/r/cringepics/comments/z0xhwy)\n- [NSFW video (Ukrainian drone flies right into the Russian trench)](https://old.reddit.com/r/CombatFootage/comments/z1391l)\n\n## Notes\n\nSample video PostLink:\n\n```json\n{\n    \"id\": \"z09a7r\",\n    \"author\": \"Dry_Illustrator5642\",\n    \"timestamp\": 1668963979000,\n    \"url\": \"https://v.redd.it/4huchegx4x0a1\",\n    \"permalink\": \"https://old.reddit.com/r/China_irl/comments/z09a7r/翼刀性感电臀舞/\",\n    \"domain\": \"v.redd.it\",\n    \"comments_count\": 1,\n    \"score\": 0,\n    \"nsfw\": false,\n    \"spoiler\": false,\n    \"type\": \"video\"\n}\n```\n\nActual downloadable video addr: `https://v.redd.it/4huchegx4x0a1/DASH_720.mp4`\nAudio addr: `https://v.redd.it/4huchegx4x0a1/DASH_audio.mp4`\n\n\nSample image PostLink:\n\n```json\n{\n    \"id\": \"wv4ydl\",\n    \"author\": \"darkyknight01\",\n    \"timestamp\": 1661201834000,\n    \"url\": \"https://i.redd.it/6b66lj3fwbj91.jpg\",\n    \"permalink\": \"https://old.reddit.com/r/zenfone6/comments/wv4ydl/in_delhi_i_need_info_for_that_how_should_i/\",\n    \"domain\": \"i.redd.it\",\n    \"comments_count\": 1,\n    \"score\": 1,\n    \"nsfw\": false,\n    \"spoiler\": false,\n    \"type\": \"image\"\n}\n```\n\nSample text PostLink:\n\n```json\n{\n    \"id\": \"xg61f6\",\n    \"author\": \"silver2006\",\n    \"timestamp\": 1663370013000,\n    \"url\": \"/r/zenfone6/comments/xg61f6/need_help_unlocking_the_bootloader/\",\n    \"permalink\": \"https://old.reddit.com/r/zenfone6/comments/xg61f6/need_help_unlocking_the_bootloader/\",\n    \"domain\": \"self.zenfone6\",\n    \"comments_count\": 4,\n    \"score\": 1,\n    \"nsfw\": false,\n    \"spoiler\": false,\n    \"type\": \"text\"\n}\n```\n\nSample link PostLink:\n\n```json\n{\n    \"id\": \"z2bhbm\",\n    \"author\": \"Counterhaters\",\n    \"timestamp\": 1669166866000,\n    \"url\": \"https://www.zaobao.com.sg/realtime/china/story20221122-1335992\",\n    \"permalink\": \"https://old.reddit.com/r/China_irl/comments/z2bhbm/消息中国拟对蚂蚁处以逾10亿美元罚款/\",\n    \"domain\": \"zaobao.com.sg\",\n    \"comments_count\": 1,\n    \"score\": 4,\n    \"nsfw\": false,\n    \"spoiler\": false,\n    \"type\": \"link\"\n}\n```\n\nGallery element:\n\n```html\n\u003cdiv class=\"media-gallery\"\u003e\n    \u003cdiv class=\"gallery-tiles\"\u003e\n        \u003cdiv class=\"gallery-tile gallery-navigation\"\u003e\n            \u003cdiv class=\"media-preview-content gallery-tile-content\"\u003e\n                \u003cimg class=\"preview\", src=\"...\", width=..., height=...\u003e\n            \u003c/div\u003e\n        \u003c/div\u003e\n    \u003c/div\u003e\n\u003c/div\u003e\n```\n\nThe \"next\" button element:\n\n```html\n\u003cspan class=\"next-button\"\u003e\n    \u003ca href=\"https://old.reddit.com/r/Music/?count=25\u0026after=t3_z1lqur\" rel=\"nofollow next\"\u003enext ›\u003c/a\u003e\n\u003c/span\u003e\n```\n\nThe element that lists all posts:\n\n```html\n\u003cdiv id=\"siteTable\" class=\"sitetable linklisting\"\u003e\n```\n\n![screenshot of element that has all the links](Screenshot-link-list.png)\n\nWhen you forget to change user-agent:\n\n```html\n\u003c!doctype html\u003e\n\u003chtml\u003e\n\n\u003chead\u003e\n    \u003ctitle\u003eToo Many Requests\u003c/title\u003e\n\u003c/head\u003e\n\n\u003cbody\u003e\n    \u003ch1\u003ewhoa there, pardner!\u003c/h1\u003e\n    \u003cp\u003ewe're sorry, but you appear to be a bot and we've seen too many requests from you lately. we enforce a hard\n        speed limit on requests that appear to comefrom bots to prevent abuse.\u003c/p\u003e\n    \u003cp\u003eif you are not a bot but are spoofing one via your browser's user agentstring: please change your user agent\n        string to avoid seeing this messageagain.\u003c/p\u003e\n    \u003cp\u003eplease wait 1 second(s) and try again.\u003c/p\u003e\n    \u003cp\u003eas a reminder to developers, we recommend that clients make no more than \u003ca\n            href=\"http://github.com/reddit/reddit/wiki/API\"\u003eone request every two seconds\u003c/a\u003e to avoid seeing this\n        message.\u003c/p\u003e\n\u003c/body\u003e\n\n\u003c/html\u003e\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericlingit%2Fsubreddit-trawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fericlingit%2Fsubreddit-trawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericlingit%2Fsubreddit-trawler/lists"}