Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/proullon/wikipedia-dl
Wikipedia dumps importer to cockroachdb
https://github.com/proullon/wikipedia-dl
Last synced: 24 days ago
JSON representation
Wikipedia dumps importer to cockroachdb
- Host: GitHub
- URL: https://github.com/proullon/wikipedia-dl
- Owner: proullon
- Created: 2020-04-30T14:37:50.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2024-01-20T13:44:30.000Z (12 months ago)
- Last Synced: 2024-12-09T22:44:20.434Z (about 1 month ago)
- Language: Go
- Size: 25.4 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Wikipedia to CockroachDB
This is an utilitary tool to import any Wikipedia language into your CockroachDB cluster.
To avoid hammering your cluster, wikipediatocrdb used [workerpool](https://github.com/proullon/workerpool) to adapt parallelisation in light of insert speed.
# .dev.conf
To use Makefile rules, populate .dev.conf file (from .dev.conf.example)
## Binary requirements
* wget
* bzip2## Parameters
* language: set language (default en)
* interactive: select which dumps will be imported
* dump-folder: download and extraction folder
* tight: remove dump after import
* with-page-content: insert wikipedia article body
* with-page-reference: populate `article_references` table## Documentation
* https://en.wikipedia.org/wiki/Wikipedia:Database_download
* https://dumps.wikimedia.org/enwiki/latest/
* https://www.cockroachlabs.com/blog/serializable-lockless-distributed-isolation-cockroachdb/