Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cscott/mw-ocg-texter.old
Convert mediawiki collection bundles to stripped plaintext
https://github.com/cscott/mw-ocg-texter.old
Last synced: 13 days ago
JSON representation
Convert mediawiki collection bundles to stripped plaintext
- Host: GitHub
- URL: https://github.com/cscott/mw-ocg-texter.old
- Owner: cscott
- Created: 2013-12-13T19:22:13.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2014-10-01T03:49:29.000Z (over 10 years ago)
- Last Synced: 2024-11-07T06:16:07.307Z (2 months ago)
- Language: JavaScript
- Size: 12.2 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# mw-ocg-texter
[![NPM][NPM1]][NPM2][![Build Status][1]][2] [![dependency status][3]][4] [![dev dependency status][5]][6]
Converts mediawiki collection bundles (as generated by [mw-ocg-bundler]) to
stripped plaintext.This is a proof-of-concept, but it could be used to archive or embed the
textual content of wikipedia in a minimal amount of space.## Installation
Node version 0.8 and 0.10 are tested to work.
Install the node package dependencies.
```
npm install
```Install other system dependencies.
```
apt-get install unzip
```## Generating bundles
You may wish to install the [mw-ocg-bundler] npm package to create bundles
from wikipedia articles. The below text assumes that you have done
so; ignore the `mw-ocg-bundler` references if you have bundles from
some other source.## Running
To generate a plaintext file named `out.txt` from the English
(`enwiki`) wikipedia article "United States":
```
mw-ocg-bundler -o us.zip --prefix enwiki "United States"
bin/mw-ocg-texter -o out.txt us.zip
```The default format does 80-column word wrap. If you would like to
use "semantic" new lines (that is, newlines end paragraphs and there
are no newlines within paragraphs) use the `--no-wrap`
option:
```
bin/mw-ocg-texter --no-wrap -o out.txt us.zip
```For other options, see:
```
bin/mw-ocg-texter --help
```## Other ideas
This backend should implement the [Unicode Nearly Plain-Text Encoding of
Mathematics](http://unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf)
to render math content.## Related Projects
* [mw-ocg-bundler][]
* [mw-ocg-latexer][]## License
GPLv2
(c) 2013-2014 by C. Scott Ananian
[mw-ocg-bundler]: https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler
[mw-ocg-latexer]: https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-latex_renderer[NPM1]: https://nodei.co/npm/mw-ocg-texter.svg
[NPM2]: https://nodei.co/npm/mw-ocg-texter/[1]: https://travis-ci.org/cscott/mw-ocg-texter.svg
[2]: https://travis-ci.org/cscott/mw-ocg-texter
[3]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer.svg
[4]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer
[5]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer/dev-status.svg
[6]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer#info=devDependencies