Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/anujdhillxn/redditscraper

A Reddit scraper using the PRAW API for downloading images and videos. Replications avoided efficiently using tries.
https://github.com/anujdhillxn/redditscraper

praw-reddit python3 tries webscraping

Last synced: 5 days ago
JSON representation

A Reddit scraper using the PRAW API for downloading images and videos. Replications avoided efficiently using tries.

Awesome Lists containing this project

README

        

## Create a reddit account!
- Signup for a reddit account.
- Select the "Are you a developer? Create an app button" Here.
- Give you program a name and a redirect URL(http://localhost).
- On the final screen note your client id and secret.

| Create Account | Access Developer | Name | ID and secret |
| --- | --- | --- | --- |
| | | | |

## Run download script!
- Add any subs you want to download to the sub_list.csv one per line.
- Run SubDownload.py
- The first time you run the script it will ask you for details. Note you don't need to enter a user name or password if you don't plan on posting.
- The script will create a token.pickle file so you don't have to enter them again. If you mess up your credentials just delete the pickle file and it will ask for them again.
- The script will create an images folder and a videos folder and fill them with images and videos respectively. You can change how many posts it checks on each subreddit by changing limit for each subreddit in sub_list.csv.
- A database is also maintained for all the images and videos downloaded so far. For now, the title and author name is stored for each file in "posted.csv". You can modify the script to store more items.
- Each file downloaded has a unique ID which is used to check if the file is already present in the database. If it is, then the file is not downloaded again.
- For fast searching, a trie made of only the unique IDs is used. It can be serialized and deserialized for future usage.