https://github.com/ericlingit/subreddit-trawler

Scrape a subreddit's posts.
https://github.com/ericlingit/subreddit-trawler

reddit scraper

Last synced: 3 months ago
JSON representation

Scrape a subreddit's posts.

Host: GitHub
URL: https://github.com/ericlingit/subreddit-trawler
Owner: ericlingit
License: gpl-3.0
Created: 2022-11-24T06:48:33.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-11-24T07:54:34.000Z (over 3 years ago)
Last Synced: 2026-02-24T23:42:17.549Z (4 months ago)
Topics: reddit, scraper
Language: Python
Homepage:
Size: 940 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Subreddit trawler

Scrape sub reddit posts using the old url `https://old.reddit.com`.

https://old.reddit.com/r/Chinatown_irl/

https://old.reddit.com/r/China_irl/

- scrape sub reddit
- visit each post link
- skip announcement
- if the url contains `predictions?tournament`, always skip this link. no old version is available.
- eg: `https://www.reddit.com/r/wallstreetbets/predictions?tournament=tnmt-0b14066a-ad68-4351-8261-d1c0740c44d2`
- scrape comments
- submit text
- submit image
- submit video
- nsfw/spoiler

- find next button
- extract link
- go to link
- repeat above

Examples for various post types:
- [Text post](https://old.reddit.com/r/China_irl/comments/z0oio5)
- [Image post](https://old.reddit.com/r/China_irl/comments/z0ojwn)
- [Video post](https://old.reddit.com/r/China_irl/comments/yzv625)
- [Gallery](https://old.reddit.com/r/China_irl/comments/z0728o)
- [NSFW text (Whats the most NSFW experience you witnessed right in front of your eyes?)](https://old.reddit.com/r/AskReddit/comments/z0uq39)
- [NSFW image (Grown man ass-kissing)](https://www.reddit.com/r/cringepics/comments/z0xhwy)
- [NSFW video (Ukrainian drone flies right into the Russian trench)](https://old.reddit.com/r/CombatFootage/comments/z1391l)

## Notes

Sample video PostLink:

```json
{
"id": "z09a7r",
"author": "Dry_Illustrator5642",
"timestamp": 1668963979000,
"url": "https://v.redd.it/4huchegx4x0a1",
"permalink": "https://old.reddit.com/r/China_irl/comments/z09a7r/翼刀性感电臀舞/",
"domain": "v.redd.it",
"comments_count": 1,
"score": 0,
"nsfw": false,
"spoiler": false,
"type": "video"
}
```

Actual downloadable video addr: `https://v.redd.it/4huchegx4x0a1/DASH_720.mp4`
Audio addr: `https://v.redd.it/4huchegx4x0a1/DASH_audio.mp4`

Sample image PostLink:

```json
{
"id": "wv4ydl",
"author": "darkyknight01",
"timestamp": 1661201834000,
"url": "https://i.redd.it/6b66lj3fwbj91.jpg",
"permalink": "https://old.reddit.com/r/zenfone6/comments/wv4ydl/in_delhi_i_need_info_for_that_how_should_i/",
"domain": "i.redd.it",
"comments_count": 1,
"score": 1,
"nsfw": false,
"spoiler": false,
"type": "image"
}
```

Sample text PostLink:

```json
{
"id": "xg61f6",
"author": "silver2006",
"timestamp": 1663370013000,
"url": "/r/zenfone6/comments/xg61f6/need_help_unlocking_the_bootloader/",
"permalink": "https://old.reddit.com/r/zenfone6/comments/xg61f6/need_help_unlocking_the_bootloader/",
"domain": "self.zenfone6",
"comments_count": 4,
"score": 1,
"nsfw": false,
"spoiler": false,
"type": "text"
}
```

Sample link PostLink:

```json
{
"id": "z2bhbm",
"author": "Counterhaters",
"timestamp": 1669166866000,
"url": "https://www.zaobao.com.sg/realtime/china/story20221122-1335992",
"permalink": "https://old.reddit.com/r/China_irl/comments/z2bhbm/消息中国拟对蚂蚁处以逾10亿美元罚款/",
"domain": "zaobao.com.sg",
"comments_count": 1,
"score": 4,
"nsfw": false,
"spoiler": false,
"type": "link"
}
```

Gallery element:

```html

```

The "next" button element:

```html

next ›

```

The element that lists all posts:

```html

```

![screenshot of element that has all the links](Screenshot-link-list.png)

When you forget to change user-agent:

```html

Too Many Requests

whoa there, pardner!

we're sorry, but you appear to be a bot and we've seen too many requests from you lately. we enforce a hard
speed limit on requests that appear to comefrom bots to prevent abuse.

if you are not a bot but are spoofing one via your browser's user agentstring: please change your user agent
string to avoid seeing this messageagain.

please wait 1 second(s) and try again.

as a reminder to developers, we recommend that clients make no more than one request every two seconds to avoid seeing this
message.

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ericlingit/subreddit-trawler

Awesome Lists containing this project

README

whoa there, pardner!