Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/datavorous/redditminer
Reddit data hoarding scripts for bulk post/img data download and user OSINT, without any Auth
https://github.com/datavorous/redditminer
api data-mining hoarding json python reddit reddit-api reddit-downloader reddit-scraper requests scraper webscraping
Last synced: 2 days ago
JSON representation
Reddit data hoarding scripts for bulk post/img data download and user OSINT, without any Auth
- Host: GitHub
- URL: https://github.com/datavorous/redditminer
- Owner: datavorous
- License: mit
- Created: 2024-09-10T13:18:18.000Z (16 days ago)
- Default Branch: main
- Last Pushed: 2024-09-11T17:13:31.000Z (15 days ago)
- Last Synced: 2024-09-21T18:15:19.835Z (5 days ago)
- Topics: api, data-mining, hoarding, json, python, reddit, reddit-api, reddit-downloader, reddit-scraper, requests, scraper, webscraping
- Language: Python
- Homepage:
- Size: 1.07 MB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
### that's a lot of data
Check out [subreddit_data.json](https://github.com/datavorous/RedditSuite/blob/main/subreddit_data.json), around 1 MB.
> [!WARNING]
> Use with caution, Reddit might gift you with an IP ban.
> I could extract max 2552 posts at once from 'all' using this; can't be sure about other subredditsTry out `example.py`, things are pretty easy to understand.
Also, `pip install googlesearch-python` beforehand.
This script uses `data_hoarder()` to extract large amounts of post data from subreddits, saving it in a JSON format with fields like title, author, permalink, score, comments, and timestamps. If available, image and thumbnail URLs are also included.Example data format:
```json
[
{
"title": "Example Title",
"author": "AuthorName",
"permalink": "/r/SubredditName/comments/ID/PostTitle/",
"score": 1234,
"num_comments": 56,
"created_utc": 1716902623.0
},
{
"title": "Example Title with Image",
"author": "AuthorName",
"permalink": "/r/SubredditName/comments/ID/PostTitle/",
"score": 5678,
"num_comments": 78,
"created_utc": 1719949630.0,
"image_url": "https://i.redd.it/example.png",
"thumbnail_url": "https://a.thumbs.redditmedia.com/example.jpg"
}
]
```
> Note: Image and thumbnail URLs are only included if available.## Key Features:
- Extract subreddit data in bulk using `data_hoarder()`.
- Retrieve post title, body text, and top-level comments with `scrape_post_details()`.
- Download images directly using `download_image()`.## Additional Utilities:
- Scrape posts and comments from specific users using `user_osint()`.
- Search Reddit for posts related to a query using `search_reddit()`.### Example usage:
```python
miner.data_hoarder('all', limit=2500, category='top', output_file='hi.json')
miner.download_image('https://i.redd.it/example.jpg')
post_details = miner.scrape_post_details('/r/Subreddit/comments/ID/PostTitle/')
```Have fun mining Reddit data!