Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cscott/mw-ocg-texter
Convert mediawiki collection bundles to stripped plaintext.
https://github.com/cscott/mw-ocg-texter
Last synced: 13 days ago
JSON representation
Convert mediawiki collection bundles to stripped plaintext.
- Host: GitHub
- URL: https://github.com/cscott/mw-ocg-texter
- Owner: cscott
- Created: 2014-10-01T19:42:03.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2015-09-26T18:45:57.000Z (over 9 years ago)
- Last Synced: 2024-11-07T06:16:07.627Z (2 months ago)
- Language: JavaScript
- Homepage: https://www.mediawiki.org/wiki/Offline_content_generator
- Size: 12.1 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# mw-ocg-texter
[![NPM][NPM1]][NPM2][![Build Status][1]][2] [![dependency status][3]][4] [![dev dependency status][5]][6]
Converts mediawiki collection bundles (as generated by [mw-ocg-bundler]) to
stripped plaintext.This is a proof-of-concept, but it could be used to archive or embed the
textual content of wikipedia in a minimal amount of space.## Installation
Node version 0.8 and 0.10 are tested to work.
Install the node package dependencies.
```
npm install
```Install other system dependencies.
```
apt-get install unzip
```## Generating bundles
You may wish to install the [mw-ocg-bundler] npm package to create bundles
from wikipedia articles. The below text assumes that you have done
so; ignore the `mw-ocg-bundler` references if you have bundles from
some other source.## Running
To generate a plaintext file named `out.txt` from the `en.wikipedia.org` article
"United States":
```
$SOMEPATH/bin/mw-ocg-bundler -v -o us.zip -h en.wikipedia.org "United States"
bin/mw-ocg-texter -o out.txt us.zip
```In the above command `$SOMEPATH` is the place you installed
`mw-ocg-bundler`; if you've used the directory structure recommended
by `mw-ocg-service` this will be `../mw-ocg-bundler`.The default format does 80-column word wrap. If you would like to
use "semantic" new lines (that is, newlines end paragraphs and there
are no newlines within paragraphs) use the `--no-wrap`
option:
```
bin/mw-ocg-texter --no-wrap -o out.txt us.zip
```For other options, see:
```
bin/mw-ocg-texter --help
```## Standalone mode
To convert a single article without the bundle creation step, use:
```
bin/mw-ocg-texter -h en.wikipedia.org -t "United States"
```
The `-h` option specifies the hostname of the wiki, and the `-t`
option gives the title to convert. The content will be fetched
from the Wikimedia REST API and converted, with output to standard
out (unless the `-o` option is given).## Other ideas
This backend should implement the [Unicode Nearly Plain-Text Encoding of
Mathematics](http://unicode.org/notes/tn28/UTN28-PlainTextMath-v3.pdf)
to render math content.## Related Projects
* [mw-ocg-bundler][]
* [mw-ocg-latexer][]## License
GPLv2
(c) 2013-2014 by C. Scott Ananian
[mw-ocg-bundler]: https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-bundler
[mw-ocg-latexer]: https://github.com/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-latex_renderer[NPM1]: https://nodei.co/npm/mw-ocg-texter.png
[NPM2]: https://nodei.co/npm/mw-ocg-texter/[1]: https://travis-ci.org/cscott/mw-ocg-texter.svg
[2]: https://travis-ci.org/cscott/mw-ocg-texter
[3]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer.svg
[4]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer
[5]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer/dev-status.svg
[6]: https://david-dm.org/wikimedia/mediawiki-extensions-Collection-OfflineContentGenerator-text_renderer#info=devDependencies