Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/karlicoss/rexport

Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖
https://github.com/karlicoss/rexport

backup data-liberation export praw reddit takeout

Last synced: 5 days ago
JSON representation

Reddit takeout: export your account data as JSON: comments, submissions, upvotes etc. 🦖

Awesome Lists containing this project

README

        

#+begin_src python :dir src :results drawer :exports results
import rexport.export as E; return E.make_parser().prog
#+end_src

#+RESULTS:
:results:
Export your personal Reddit data: saves, upvotes, submissions etc. as JSON.
:end:

* Setting up
1. The easiest way is =pip3 install --user git+https://github.com/karlicoss/rexport=.

Alternatively, use =git clone --recursive=, or =git pull && git submodule update --init=. After that, you can use =pip3 install --editable .=.
2. To use the API, you need to register a [[https://www.reddit.com/prefs/apps][custom 'personal script' app]] and get =client_id= and =client_secret= parameters.

See more [[https://praw.readthedocs.io/en/latest/getting_started/authentication.html][here]].
3. To access user's personal data (e.g. saved posts/comments), Reddit API also requires =username= and =password=.

Yes, unfortunately it wants your plaintext Reddit password, you can read more about it [[https://praw.readthedocs.io/en/latest/getting_started/quick_start.html#authorized-reddit-instances][here]].

* Exporting

#+begin_src python :dir src :results drawer :exports results
import rexport.export as E; return E.make_parser().epilog
#+end_src

#+RESULTS:
:results:

Usage:

*Recommended*: create =secrets.py= keeping your api parameters, e.g.:

: username = "USERNAME"
: password = "PASSWORD"
: client_id = "CLIENT_ID"
: client_secret = "CLIENT_SECRET"

If you have two-factor authentication enabled, append the six-digit 2FA token to the password, separated by a colon:

: password = "PASSWORD:343642"

The token will, however, be short-lived.

After that, use:

: python3 -m rexport.export --secrets /path/to/secrets.py

That way you type less and have control over where you keep your plaintext secrets.

*Alternatively*, you can pass parameters directly, e.g.

: python3 -m rexport.export --username --password --client_id --client_secret

However, this is verbose and prone to leaking your keys/tokens/passwords in shell history.

You can also import ~export.py~ as a module and call ~get_json~ function directly to get raw JSON.

I *highly* recommend checking exported files at least once just to make sure they contain everything you expect from your export. If not, please feel free to ask or raise an issue!

:end:

* API limitations

*WARNING*: reddit API [[https://www.reddit.com/r/redditdev/comments/61z088/sample_more_than_1000_submissions_within_subreddit][limits your queries to 1000 entries]].

I *highly* recommend to back up regularly and keep old exports. Easy way to achieve it is command like this:

: python3 -m rexport.export --secrets /path/to/secrets.py >"export-$(date -I).json"

Or, you can use [[https://github.com/karlicoss/arctee][arctee]] that automates this.

# TODO link to exports post?
# TODO link how DAL part can merge them together

Check out these links if you're interested in getting older data that's inaccessible by API:

- [[https://www.reddit.com/r/DataHoarder/comments/d0hjs7/reddit_takeout_export_your_account_data_as_json/ezbbcxe][comment]] by /u/binkarus
- [[https://www.reddit.com/r/ideasfortheadmins/wiki/faq#wiki_can_we_have_a_way_to_download_our_entire_history_even_though_reddit_cuts_off_at_a_certain_point.3F][Reddit admis]] say that the rationale behind the API limitation is performance and caching
- perhaps you can request all of your data under [[https://www.reddit.com/r/DataHoarder/comments/d0hjs7/reddit_takeout_export_your_account_data_as_json/eza0nsx][GDPR]]? I haven't tried that personally though.
- [[https://pushshift.io][pushshift]] can help you retrieve old data. You can use [[https://github.com/purarue/pushshift_comment_export][purarue/pushshift_comment_export]] to get the data.


* Example output
See [[file:example-output.json][example-output.json]], it's got some example data you might find in your data export. I've cleaned it up a bit as it's got lots of different fields many of which are probably not relevant.

However, this is pretty API dependent and changes all the time, so better check with [[https://www.reddit.com/dev/api][Reddit API]] if you are looking to something specific.

* Using the data

#+begin_src python :dir src :results drawer :exports results
import rexport.exporthelpers.dal_helper as D; return D.make_parser().epilog
#+end_src

#+RESULTS:
:results:

You can use =rexport.dal= (stands for "Data Access/Abstraction Layer") to access your exported data, even offline.
I elaborate on motivation behind it [[https://beepb00p.xyz/exports.html#dal][here]].

- main usecase is to be imported as python module to allow for *programmatic access* to your data.

You can find some inspiration in [[https://beepb00p.xyz/mypkg.html][=my.=]] package that I'm using as an API to all my personal data.

- to test it against your export, simply run: ~python3 -m rexport.dal --source /path/to/export~

- you can also try it interactively: ~python3 -m rexport.dal --source /path/to/export --interactive~

:end:

Example output:

: Your most saved subreddits:
: [('orgmode', 50),
: ('emacs', 36),
: ('QuantifiedSelf', 33),
: ('AskReddit', 33),
: ('selfhosted', 29)]