An open API service indexing awesome lists of open source software.

https://github.com/droher/diachronic

Get daily historical snapshots of every article on any Wiki, formatted as Parquet files
https://github.com/droher/diachronic

apache-arrow google-cloud terraform wikimedia wikipedia

Last synced: 7 months ago
JSON representation

Get daily historical snapshots of every article on any Wiki, formatted as Parquet files

Awesome Lists containing this project

README

          

# diachronic

A parser that turns the revision history dump for a set of wiki sites
(e.g. Wikipedia, Wiktionary) into parquet files of daily snapshots.

Uses Apache Arrow for serialization.

The files are uploaded to a specified Google Cloud bucket.