https://github.com/austil/datapuller

"Easy" data dump of your activity on various web services
https://github.com/austil/datapuller

Last synced: 5 months ago
JSON representation

"Easy" data dump of your activity on various web services

Host: GitHub
URL: https://github.com/austil/datapuller
Owner: austil
License: gpl-3.0
Created: 2018-03-18T16:53:29.000Z (about 7 years ago)
Default Branch: master
Last Pushed: 2022-12-07T23:54:16.000Z (over 2 years ago)
Last Synced: 2024-08-01T21:55:07.696Z (9 months ago)
Language: JavaScript
Homepage:
Size: 215 KB
Stars: 13
Watchers: 2
Forks: 0
Open Issues: 7
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

project-awesome - austil/datapuller - "Easy" data dump of your activity on various web services (JavaScript)

README

# Data Puller

![Pull all CLI](./screenshot_pull.png)

This repository is a collection of script I've made to conveniently pull my personnal data from internet services I use the most.
The goal is to get everything about me in one place for futher analysis (data science with R, full text search with Elastic, ...).

Those scripts pull every bit of interesting data about you available from web services APIs into plain JSON files.

Currently supporting :

- Pocket : unread, archived & favorites
- Twitter : likes, tweets, retweets
- Youtube : likes, favorites, history (via manual import & parsing)
- Reddit : upvoted, saved
- Github : stars

:hospital: Have a look at [The Data Detox Kit](https://datadetox.myshadow.org/en/detox).

## Run

```bash
# A specific puller (for setup or debug), e.g. twitter
node src/pullers/twitter_pull.js
# All puller at once
npm run start

# Stats
node src/stats.js
# Specific report
node src/reports/twitter_report.js
node src/reports/pocket_readnext.js
```

## Setup

* Run `npm install`
* Provide your API Credentials via env variables or a `./config.json` file (have a look at `./src/config_manager.js`)
* Go through the auth procedure of every configured puller by launching them separatly (with something like `node ./src/pullers/pocket_pull.js`)

## More on this project

### Youtube Restrictions

The watch history and the watch later playlist are [not accessible](https://developers.google.com/youtube/v3/revision_history#september-15-2016) through the Youtube API for privacy reasons.
To get arround this you can obtain a `watch-history.html` file via the [Google Takeout page](https://takeout.google.com/settings/takeout).
Then, put this file in the `drop_zone` folder so it can be parsed by the youtube puller on the next run.
As for the watch later playlist, the Google Takeout export is already a JSON file.

Late 2019 update : the watch history is now available in JSON but still require pulling videos details.

### Why this project, Are website data exports not enough ?

Website's export feature have shortcomings (late 2019) :

- Pocket export is in html and does not differenciate favorite from other items
- Github export does not include starred repos
- Youtube export does not include any videos metadata like duration and category

As for Facebook, Reddit and Twitter, they're doing a great job so my scripts may be irrelevant.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/austil/datapuller

Awesome Lists containing this project

README