Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/lamdav/redditfeedfetcher
Listen to RSS/Atom Reddit Feed and Download Imgur Albums
https://github.com/lamdav/redditfeedfetcher
atom downloader imgur reddit rss
Last synced: 10 days ago
JSON representation
Listen to RSS/Atom Reddit Feed and Download Imgur Albums
- Host: GitHub
- URL: https://github.com/lamdav/redditfeedfetcher
- Owner: lamdav
- Created: 2019-02-27T10:28:34.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2023-03-04T03:13:42.000Z (almost 2 years ago)
- Last Synced: 2025-01-11T08:56:31.701Z (13 days ago)
- Topics: atom, downloader, imgur, reddit, rss
- Language: JavaScript
- Homepage:
- Size: 173 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Reddit Feed Fetcher
Consume Reddit Feed and Download Imgur Albums## Motive
I wanted some script/program that could listen in on my saved post and check if it was an imgur album
it could download and store for future use/backup.Reddit provides personal RSS feed. That is, I have an RSS/Atom feed for my saved posts. Using that,
I can periodically listen/query this endpoint to fetch for new post I have saved and do some
rudimentary checking before downloading the album (i.e. check if its an imgur posts, from a particular sub,
etc.). From there, I can use parse out information to pass into the imgur API to fetch the raw image
links.## Notes
This project is still pretty buggy and inefficient. I should probably use a connection pool and
have some way to throttle my connection when starting to process. I should also include someway to
check if I have already downloaded something to avoid using up bandwidth. All in all, I hacked this
together quickly one night.### TODO:
- [x] Connection Pooling
- [x] Fetch past recent items on rss feed
- [x] Add ability to skip processed items on rate limit
- [x] Avoid fetching existing images based on path
- [ ] Add rate limiting mitigations/throttling
- [x] Add option to only process recent items
- [ ] Fix occassional hiccups with `undefined` path args and timeout (`x` number of retries?)## Structure
```
REDDIT_SAVED_RSS_FEED="link to reddit rss feed"
IMGUR_CLIENT_SECRET="imgur client secret"ENABLE_LOG_SUMMARY="true or false value to enable more robust logging"
DESTINATION="where to drop off image and pdf"
CONNECTIONS=integer value of sockets to use for requests
START_AFTER="reddit id to start after"
SINGLE_BATCH="if defined, only one batch will execute"
```
`.env` variables needed to be defined.All images are stored using this path convention
```
DESTINATION/SUBREDDIT_SOURCE/POST_TITLE # base path
/page_0[1-9].png
/page_[10+].png # prefixed 0 if page download is between 0-9
/POST_TITLE.pdf # all pages stitched together
```Use `START_AFTER` if you are rate limited skip processing of previous feed elements. A
reddit `id` is posted in the logs every batch. Use `CONNECTIONS` to use set a limit on the
number of connections used to fetch data.After every `rss` batch, there will be a `3` second delay before the next batch starts. This
is to just to allow I/O processes to keep up and as a rudimentary delay from hitting the
imgur servers too frequently.PDF stitch will always be regenerated. I currently do not have a way to detect if a previously
missing image has been fetched (i.e. missing because of rate limiting/partial batch processing).## How to use
1. `git clone https://github.com/lamdaV/RedditFeedFetcher.git`
2. `yarn install` or `npm install`
3. create a `.env` file and fill out relevant information (see above)
- [Reddit RSS Wiki](https://www.reddit.com/wiki/rss)
- [Reddit RSS Personalized Feed](https://redditblog.com/2010/02/02/feed-me/)
- [Imgur Client Secret](https://apidocs.imgur.com/)
4. `yarn start` or `npm run start`
- On Linux or OSX environment, run `yarn start | tee path/to/output.log` for both stdout logging
and file logging.