https://github.com/niklas88/tinypedia
An bare bones, text only local Wikipedia server, working directly on the .bz2 compressed dumps
https://github.com/niklas88/tinypedia
barebones experimental wikipedia
Last synced: 12 months ago
JSON representation
An bare bones, text only local Wikipedia server, working directly on the .bz2 compressed dumps
- Host: GitHub
- URL: https://github.com/niklas88/tinypedia
- Owner: niklas88
- License: apache-2.0
- Created: 2018-02-21T19:11:47.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-03-09T12:08:55.000Z (about 8 years ago)
- Last Synced: 2023-10-20T22:04:22.666Z (over 2 years ago)
- Topics: barebones, experimental, wikipedia
- Language: Go
- Size: 77.1 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tinypedia
A very bare bones, text only, local Wikipedia server. It works directly on the
`.bz2` compressed dumps without creating any additional files.
## Current State
Retrieving articles by title using the path `#`works
and the text can be viewed as extracted by `wtf_wikipedia.js` + some formatting
for sections. Sadly this fails to extract the text from special markup such as
IPA pronounciations. The raw mediawiki markdown can also be extracted using
`/wiki/`
Since we currently use URL encoding directly this is not compatible with the
title encoding used by Wikipedia (e.g. `Ada%20Lovelace` instead of `Ada_Lovelace`).
## Building and Installing
First make sure you have Go and the `go` command installed and that
`$GOTPATH/bin` is in your path. Then install with a simple `go get`
go get github.com/ad-freiburg/tinypedia
## Running
Change to the directory containing the dump files
* enwiki-latest-pages-articles-multistream-index.txt.bz2
* enwiki-latest-pages-articles-multistream.xml.bz2
And simply run the tinypedia exectuable
tinypedia
If you have named the files differenty use the `-i` and `-d` command line
switches to point `tinypedia` to the _index_ and _data_ files respectively.