Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/karlicoss/ghexport

Export your Github activity: events, repositories, stars, etc.
https://github.com/karlicoss/ghexport

backup data-liberation export github github-api takeout

Last synced: 2 months ago
JSON representation

Export your Github activity: events, repositories, stars, etc.

Awesome Lists containing this project

README

        

#+begin_src python :dir src :results drawer :exports results
import ghexport.export as E; return E.make_parser().prog
#+end_src

#+RESULTS:
:results:
Export your Github personal data: issues, PRs, comments, followers and followings, etc.

*Note*: this only deals with metadata. If you want a download of actual git repositories, I recommend using [[https://github.com/josegonzalez/python-github-backup][python-github-backup]].
:end:

* Setting up
1. The easiest way is =pip3 install --user git+https://github.com/karlicoss/ghexport=.

Alternatively, use =git clone --recursive=, or =git pull && git submodule update --init=. After that, you can use =pip3 install --editable=.
2. To use the API, you need to get a [[https://github.com/settings/tokens][personal access token]] from settings. Note that you need to use =repo= scope.


* Exporting

#+begin_src python :dir src :results drawer :exports results
import ghexport.export as E; return E.make_parser().epilog
#+end_src

#+RESULTS:
:results:

Usage:

*Recommended*: create =secrets.py= keeping your api parameters, e.g.:

: token = "TOKEN"

After that, use:

: python3 -m ghexport.export --secrets /path/to/secrets.py

That way you type less and have control over where you keep your plaintext secrets.

*Alternatively*, you can pass parameters directly, e.g.

: python3 -m ghexport.export --token

However, this is verbose and prone to leaking your keys/tokens/passwords in shell history.

You can also import ~ghexport.export~ as a module and call ~get_json~ function directly to get raw JSON.

I *highly* recommend checking exported files at least once just to make sure they contain everything you expect from your export. If not, please feel free to ask or raise an issue!

:end:

** Extra export options
- you can control specific data you want to export via ~--include~ option (see ~--help~ for available fields)

By default, all data will be included in the export.

- you can include or exclude [[https://docs.github.com/en/rest/metrics/traffic][repository traffic]] data via ~--include-repos-traffic~ or ~--exclude-repos-traffic~.

Currently it's included by default.

You might want to exclude it if you have some issues with traffic API endpoint (it tends to be flakier than other endpoints).

* API limitations

*WARNING*: github API limits extent to which you can retrieve certain data, e.g. [[https://developer.github.com/v3/activity/events][events]] you can only get events from the past 90 days, and not more than 300 events.

I *highly* recommend to export regularly and keep old exports. Easy way to achieve it is command like this:

: python3 -m ghexport.export --secrets /path/to/secrets.py >"export-$(date -I).json"

Or, you can use [[https://github.com/karlicoss/arctee][arctee]] that automates this.

To get your older data past 90 days, you can request a [[https://github.com/settings/admin][manual export]] in your account settings.

# TODO hmm, mention that dal.py can handle this?

* Known Issues

The =requests= (and therefore =PyGithub=) modules on which this depends seems to sometimes fail to login if a =~/.netrc= file is present, see [[https://github.com/psf/requests/issues/5801#issuecomment-901610012][here]] for context.

* Using the data

#+begin_src python :dir src :results drawer :exports results
import ghexport.exporthelpers.dal_helper as D; return D.make_parser().epilog
#+end_src

#+RESULTS:
:results:

You can use =ghexport.dal= (stands for "Data Access/Abstraction Layer") to access your exported data, even offline.
I elaborate on motivation behind it [[https://beepb00p.xyz/exports.html#dal][here]].

- main usecase is to be imported as python module to allow for *programmatic access* to your data.

You can find some inspiration in [[https://beepb00p.xyz/mypkg.html][=my.=]] package that I'm using as an API to all my personal data.

- to test it against your export, simply run: ~python3 -m ghexport.dal --source /path/to/export~

- you can also try it interactively: ~python3 -m ghexport.dal --source /path/to/export --interactive~

:end:

Example output:

: Your events:
: Counter({'PushEvent': 181,
: 'WatchEvent': 27,
: 'CreateEvent': 22,
: 'IssueCommentEvent': 20,
: 'PullRequestEvent': 15,
: 'IssuesEvent': 5,
: 'DeleteEvent': 5,
: 'ForkEvent': 3,
: 'PullRequestReviewCommentEvent': 1})