https://github.com/i80and/quelt

A simple and fast offline Wikipedia reader
https://github.com/i80and/quelt

Last synced: 8 months ago
JSON representation

A simple and fast offline Wikipedia reader

Host: GitHub
URL: https://github.com/i80and/quelt
Owner: i80and
License: mit
Created: 2011-12-23T15:36:55.000Z (over 14 years ago)
Default Branch: master
Last Pushed: 2012-12-01T15:38:22.000Z (over 13 years ago)
Last Synced: 2025-02-02T23:56:43.153Z (over 1 year ago)
Language: C
Homepage:
Size: 164 KB
Stars: 1
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

Quelt
=====
A lightweight offline Wikipedia reader.

Compilation Requirements
------------
* C99 compiler
* Unix environment. Some win32 shims exist, but they are untested.
* Expat (only for quelt-split)
* Zlib

Building
--------
$ make

Usage
-----
$ ./quelt-split [path to XML dump] [-v]
$ ./quelt [part of title] --search [--plain]
$ ./quelt [exact title] [--plain]

File format
-----------
The initial plan was for Quelt to use a separate file for every article, using
the path for the article name. A cute idea, but reality set in fairly quickly:

* Filename restrictions, especially on NTFS
* Standard filesystem tools are not built to handle 3 million+ files easily

So instead, a custom binary format is used with two files: `quelt.db`, and
`quelt.index`.

`quelt.index`:

`quelt.db` is a concatenated sequence of zlib streams, where the start of each
article is given by the article offsets in `quelt.index`.

The index is broken up into segments, all of which (except the last) are of
length `segment_length` and sorted independently. This gives an efficient
average search time of `O((n_segments/2) * log(segment_length))` comparisons
via a series of binary searches, while still allowing quelt and quelt-split to
run on memory-constrained machines. Note that this could be used as the first
step to a real external merge sort.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/i80and/quelt

Awesome Lists containing this project

README