Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/austil/datapuller
"Easy" data dump of your activity on various web services
https://github.com/austil/datapuller
Last synced: about 15 hours ago
JSON representation
"Easy" data dump of your activity on various web services
- Host: GitHub
- URL: https://github.com/austil/datapuller
- Owner: austil
- License: gpl-3.0
- Created: 2018-03-18T16:53:29.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-07T23:54:16.000Z (almost 2 years ago)
- Last Synced: 2024-08-01T21:55:07.696Z (3 months ago)
- Language: JavaScript
- Homepage:
- Size: 215 KB
- Stars: 13
- Watchers: 2
- Forks: 0
- Open Issues: 7
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- project-awesome - austil/datapuller - "Easy" data dump of your activity on various web services (JavaScript)
README
# Data Puller
![Pull all CLI](./screenshot_pull.png)
This repository is a collection of script I've made to conveniently pull my personnal data from internet services I use the most.
The goal is to get everything about me in one place for futher analysis (data science with R, full text search with Elastic, ...).Those scripts pull every bit of interesting data about you available from web services APIs into plain JSON files.
Currently supporting :
- Pocket : unread, archived & favorites
- Twitter : likes, tweets, retweets
- Youtube : likes, favorites, history (via manual import & parsing)
- Reddit : upvoted, saved
- Github : stars:hospital: Have a look at [The Data Detox Kit](https://datadetox.myshadow.org/en/detox).
## Run
```bash
# A specific puller (for setup or debug), e.g. twitter
node src/pullers/twitter_pull.js
# All puller at once
npm run start# Stats
node src/stats.js
# Specific report
node src/reports/twitter_report.js
node src/reports/pocket_readnext.js
```## Setup
* Run `npm install`
* Provide your API Credentials via env variables or a `./config.json` file (have a look at `./src/config_manager.js`)
* Go through the auth procedure of every configured puller by launching them separatly (with something like `node ./src/pullers/pocket_pull.js`)### Youtube Restrictions
The watch history and the watch later playlist are [not accessible](https://developers.google.com/youtube/v3/revision_history#september-15-2016) through the Youtube API for privacy reasons.
To get arround this you can obtain a `watch-history.html` file via the [Google Takeout page](https://takeout.google.com/settings/takeout).
Then, put this file in the `drop_zone` folder so it can be parsed by the youtube puller on the next run.
As for the watch later playlist, the Google Takeout export is already a JSON file.Late 2019 update : the watch history is now available in JSON but still require pulling videos details.
### Why this project, Are website data exports not enough ?
Website's export feature have shortcomings (late 2019) :
- Pocket export is in html and does not differenciate favorite from other items
- Github export does not include starred repos
- Youtube export does not include any videos metadata like duration and categoryAs for Facebook, Reddit and Twitter, they're doing a great job so my scripts may be irrelevant.