Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/datavorous/redditminer

Reddit data hoarding scripts for bulk post/img data download and user OSINT, without any Auth
https://github.com/datavorous/redditminer

api data-mining hoarding json python reddit reddit-api reddit-downloader reddit-scraper requests scraper webscraping

Last synced: 2 days ago
JSON representation

Reddit data hoarding scripts for bulk post/img data download and user OSINT, without any Auth

Host: GitHub
URL: https://github.com/datavorous/redditminer
Owner: datavorous
License: mit
Created: 2024-09-10T13:18:18.000Z (16 days ago)
Default Branch: main
Last Pushed: 2024-09-11T17:13:31.000Z (15 days ago)
Last Synced: 2024-09-21T18:15:19.835Z (5 days ago)
Topics: api, data-mining, hoarding, json, python, reddit, reddit-api, reddit-downloader, reddit-scraper, requests, scraper, webscraping
Language: Python
Homepage:
Size: 1.07 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

### that's a lot of data

Check out [subreddit_data.json](https://github.com/datavorous/RedditSuite/blob/main/subreddit_data.json), around 1 MB.

> [!WARNING]
> Use with caution, Reddit might gift you with an IP ban.
> I could extract max 2552 posts at once from 'all' using this; can't be sure about other subreddits

Try out `example.py`, things are pretty easy to understand.
Also, `pip install googlesearch-python` beforehand.
This script uses `data_hoarder()` to extract large amounts of post data from subreddits, saving it in a JSON format with fields like title, author, permalink, score, comments, and timestamps. If available, image and thumbnail URLs are also included.

Example data format:
```json
[
{
"title": "Example Title",
"author": "AuthorName",
"permalink": "/r/SubredditName/comments/ID/PostTitle/",
"score": 1234,
"num_comments": 56,
"created_utc": 1716902623.0
},
{
"title": "Example Title with Image",
"author": "AuthorName",
"permalink": "/r/SubredditName/comments/ID/PostTitle/",
"score": 5678,
"num_comments": 78,
"created_utc": 1719949630.0,
"image_url": "https://i.redd.it/example.png",
"thumbnail_url": "https://a.thumbs.redditmedia.com/example.jpg"
}
]
```
> Note: Image and thumbnail URLs are only included if available.

## Key Features:
- Extract subreddit data in bulk using `data_hoarder()`.
- Retrieve post title, body text, and top-level comments with `scrape_post_details()`.
- Download images directly using `download_image()`.

## Additional Utilities:
- Scrape posts and comments from specific users using `user_osint()`.
- Search Reddit for posts related to a query using `search_reddit()`.

### Example usage:

```python
miner.data_hoarder('all', limit=2500, category='top', output_file='hi.json')
miner.download_image('https://i.redd.it/example.jpg')
post_details = miner.scrape_post_details('/r/Subreddit/comments/ID/PostTitle/')
```

Have fun mining Reddit data!