https://github.com/pixelastic/reddinx
Import all pictures from a subreddit in a format suitable for Algolia
https://github.com/pixelastic/reddinx
algolia reddit
Last synced: 11 months ago
JSON representation
Import all pictures from a subreddit in a format suitable for Algolia
- Host: GitHub
- URL: https://github.com/pixelastic/reddinx
- Owner: pixelastic
- License: mit
- Created: 2020-07-23T15:25:30.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2025-04-10T13:38:28.000Z (about 1 year ago)
- Last Synced: 2025-06-26T13:18:12.352Z (12 months ago)
- Topics: algolia, reddit
- Language: JavaScript
- Homepage: https://projects.pixelastic.com/reddinx/
- Size: 2.31 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# reddinx
reddinx is a reddit indexer. It saves on disk all posts of
a specific subreddit, in a format compatible with Algolia records.
## Usage
```javascript
const reddinx = require('reddinx');
// Launch an initial import of all posts since the subreddit creation
await reddinx.initial(subredditName)
// Then, periodically (for example once every day), update the data with
await reddinx.incremental(subredditName)
```
## What it does
It will get all posts metadata of the specified subreddit and save them on disk
in the `./data` folder. You don't need any API key, as both those API are free
to use.
Depending on the size of the subreddit, this can take up to several hours for an
initial import and should be much faster on any subsequent incremental import.
## How it works
The reddit API only allows access to the most recent posts, so to get all posts
from the subreddit creation (which can be years old), we rely on a third
party API: pushshift.io.
Pushshift provides an API to query all reddit posts, even very old ones. But its
data is not fresh; it's a snapshot of what the post looked like at the time
pushshift indexed it.
So once we got the list of all posts from pushshift, we query the reddit API to
get the latest content, and save it on disk.
## Documentation
The complete documentation can be found on https://projects.pixelastic.com/reddinx/