Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/karlicoss/rexport
Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖
https://github.com/karlicoss/rexport
backup data-liberation export praw reddit takeout
Last synced: 5 days ago
JSON representation
Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖
- Host: GitHub
- URL: https://github.com/karlicoss/rexport
- Owner: karlicoss
- License: mit
- Created: 2019-09-04T22:45:43.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-10-30T22:05:15.000Z (9 days ago)
- Last Synced: 2024-10-30T23:18:22.358Z (9 days ago)
- Topics: backup, data-liberation, export, praw, reddit, takeout
- Language: Python
- Homepage:
- Size: 81.1 KB
- Stars: 164
- Watchers: 6
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.org
- License: LICENSE
Awesome Lists containing this project
- awesome-starred - karlicoss/rexport - Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖 (others)
README
#+begin_src python :dir src :results drawer :exports results
import rexport.export as E; return E.make_parser().prog
#+end_src#+RESULTS:
:results:
Export your personal Reddit data: saves, upvotes, submissions etc. as JSON.
:end:* Setting up
1. The easiest way is =pip3 install --user git+https://github.com/karlicoss/rexport=.Alternatively, use =git clone --recursive=, or =git pull && git submodule update --init=. After that, you can use =pip3 install --editable .=.
2. To use the API, you need to register a [[https://www.reddit.com/prefs/apps][custom 'personal script' app]] and get =client_id= and =client_secret= parameters.
See more [[https://praw.readthedocs.io/en/latest/getting_started/authentication.html][here]].
3. To access user's personal data (e.g. saved posts/comments), Reddit API also requires =username= and =password=.Yes, unfortunately it wants your plaintext Reddit password, you can read more about it [[https://praw.readthedocs.io/en/latest/getting_started/quick_start.html#authorized-reddit-instances][here]].
* Exporting
#+begin_src python :dir src :results drawer :exports results
import rexport.export as E; return E.make_parser().epilog
#+end_src#+RESULTS:
:results:Usage:
*Recommended*: create =secrets.py= keeping your api parameters, e.g.:
: username = "USERNAME"
: password = "PASSWORD"
: client_id = "CLIENT_ID"
: client_secret = "CLIENT_SECRET"If you have two-factor authentication enabled, append the six-digit 2FA token to the password, separated by a colon:
: password = "PASSWORD:343642"
The token will, however, be short-lived.
After that, use:
: python3 -m rexport.export --secrets /path/to/secrets.py
That way you type less and have control over where you keep your plaintext secrets.
*Alternatively*, you can pass parameters directly, e.g.
: python3 -m rexport.export --username --password --client_id --client_secret
However, this is verbose and prone to leaking your keys/tokens/passwords in shell history.
You can also import ~export.py~ as a module and call ~get_json~ function directly to get raw JSON.
I *highly* recommend checking exported files at least once just to make sure they contain everything you expect from your export. If not, please feel free to ask or raise an issue!
:end:
* API limitations
*WARNING*: reddit API [[https://www.reddit.com/r/redditdev/comments/61z088/sample_more_than_1000_submissions_within_subreddit][limits your queries to 1000 entries]].
I *highly* recommend to back up regularly and keep old exports. Easy way to achieve it is command like this:
: python3 -m rexport.export --secrets /path/to/secrets.py >"export-$(date -I).json"
Or, you can use [[https://github.com/karlicoss/arctee][arctee]] that automates this.
# TODO link to exports post?
# TODO link how DAL part can merge them togetherCheck out these links if you're interested in getting older data that's inaccessible by API:
- [[https://www.reddit.com/r/DataHoarder/comments/d0hjs7/reddit_takeout_export_your_account_data_as_json/ezbbcxe][comment]] by /u/binkarus
- [[https://www.reddit.com/r/ideasfortheadmins/wiki/faq#wiki_can_we_have_a_way_to_download_our_entire_history_even_though_reddit_cuts_off_at_a_certain_point.3F][Reddit admis]] say that the rationale behind the API limitation is performance and caching
- perhaps you can request all of your data under [[https://www.reddit.com/r/DataHoarder/comments/d0hjs7/reddit_takeout_export_your_account_data_as_json/eza0nsx][GDPR]]? I haven't tried that personally though.
- [[https://pushshift.io][pushshift]] can help you retrieve old data. You can use [[https://github.com/purarue/pushshift_comment_export][purarue/pushshift_comment_export]] to get the data.
* Example output
See [[file:example-output.json][example-output.json]], it's got some example data you might find in your data export. I've cleaned it up a bit as it's got lots of different fields many of which are probably not relevant.However, this is pretty API dependent and changes all the time, so better check with [[https://www.reddit.com/dev/api][Reddit API]] if you are looking to something specific.
* Using the data
#+begin_src python :dir src :results drawer :exports results
import rexport.exporthelpers.dal_helper as D; return D.make_parser().epilog
#+end_src#+RESULTS:
:results:You can use =rexport.dal= (stands for "Data Access/Abstraction Layer") to access your exported data, even offline.
I elaborate on motivation behind it [[https://beepb00p.xyz/exports.html#dal][here]].- main usecase is to be imported as python module to allow for *programmatic access* to your data.
You can find some inspiration in [[https://beepb00p.xyz/mypkg.html][=my.=]] package that I'm using as an API to all my personal data.
- to test it against your export, simply run: ~python3 -m rexport.dal --source /path/to/export~
- you can also try it interactively: ~python3 -m rexport.dal --source /path/to/export --interactive~
:end:
Example output:
: Your most saved subreddits:
: [('orgmode', 50),
: ('emacs', 36),
: ('QuantifiedSelf', 33),
: ('AskReddit', 33),
: ('selfhosted', 29)]