Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dbeley/reddit-scraper
Various scripts to download posts/submissions/comments of a reddit subreddit/post/user.
https://github.com/dbeley/reddit-scraper
Last synced: about 2 months ago
JSON representation
Various scripts to download posts/submissions/comments of a reddit subreddit/post/user.
- Host: GitHub
- URL: https://github.com/dbeley/reddit-scraper
- Owner: dbeley
- License: mit
- Created: 2018-03-28T23:03:06.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T07:43:17.000Z (about 2 years ago)
- Last Synced: 2023-03-02T21:01:54.357Z (almost 2 years ago)
- Topics: reddit
- Language: Python
- Homepage:
- Size: 115 KB
- Stars: 10
- Watchers: 4
- Forks: 0
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# reddit-scraper
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/1b70d3ce7401431e88f357e090852ea9)](https://app.codacy.com/app/dbeley/reddit-scraper?utm_source=github.com&utm_medium=referral&utm_content=dbeley/reddit-scraper&utm_campaign=Badge_Grade_Dashboard)
Various scripts to scrape reddit.
- **download_comments_post.py** : Download the comments of one or several posts.
- **download_comments_user.py** : Download the last 1000 comments of one or several users.
- **download_posts_user.py** : Download the posts of one or several users.
- **fetch_posts_subreddit.py** : Download the posts of a subreddit with the help of the Pushshift api.Some scripts using pushshift api wrapper psaw can be found in the psaw folder.
## Requirements
- tqdm
- praw
- requests
- pandas
- numpy
- xlsxwriter
- xlrd
- psawNeeds a praw.ini under the form :
```
[bot]
client_id=id
client_secret=secret
password=password
username=username
```## Installation of the virtualenv (recommended)
```
pipenv install
```## Usage
```
python fetch_posts_subreddit.py -s france
```## Help
### download_comments_post
```
python download_comments_post -h
``````
usage: download_comments_post.py [-h] [--debug] [-i ID] [-u URL]
[--source SOURCE] [--file FILE]
[--export_format EXPORT_FORMAT]
[--import_format IMPORT_FORMAT]Download comments of a post or a set of posts (by id or by url)
optional arguments:
-h, --help show this help message and exit
--debug Display debugging information
-i ID, --id ID IDs of the posts to extract (separated by commas)
-u URL, --url URL URLs of the posts to extract (separated by commas)
--source SOURCE The name of the json file containing posts ids
--file FILE The name of the file containing comments already
extracted
--export_format EXPORT_FORMAT
Export format (csv or xlsx). Default : csv
--import_format IMPORT_FORMAT
Import format, if used with --file (csv or xlsx).
Default : csv
```### download_comments_user
```
python download_comments_user -h
``````
usage: download_comments_user.py [-h] [--debug] [-u USERNAME]
[--export_format EXPORT_FORMAT]Download the last 1000 comments of one or several users
optional arguments:
-h, --help show this help message and exit
--debug Display debugging information
-u USERNAME, --username USERNAME
The users to download comments from (separated by
commas)
--export_format EXPORT_FORMAT
Export format (csv or xlsx). Default : csv
```### download_posts_user
```
python download_posts_user -h
``````
usage: download_posts_user.py [-h] [--debug] [-u USERNAME]
[--export_format EXPORT_FORMAT]Download all the posts of one or several users
optional arguments:
-h, --help show this help message and exit
--debug Display debugging information
-u USERNAME, --username USERNAME
The users to download posts from (separated by commas)
--export_format EXPORT_FORMAT
Export format (csv or xlsx). Default : csv
```### fetch_posts_subreddit
```
python fetch_posts_subreddit -h
``````
usage: fetch_posts_subreddit.py [-h] [--debug] [-s SUBREDDIT] [-a AFTER]
[-b BEFORE] [--source SOURCE] [--file FILE]
[--export_format EXPORT_FORMAT]
[--import_format IMPORT_FORMAT]Download all the posts of a specific subreddit
optional arguments:
-h, --help show this help message and exit
--debug Display debugging information
-s SUBREDDIT, --subreddit SUBREDDIT
The subreddit to download posts from. Default :
/r/france
-a AFTER, --after AFTER
The min unixstamp to download
-b BEFORE, --before BEFORE
The max unixstamp to download
--source SOURCE The name of the json file containing posts ids
--file FILE The name of the file containing posts already
extracted
--export_format EXPORT_FORMAT
Export format (csv or xlsx). Default : csv
--import_format IMPORT_FORMAT
Import format, if used with --file (csv or xlsx).
Default : csv
```