https://github.com/danmichaelo/mwtextextractor
Extracts unformatted body text from MediaWiki wikitext
https://github.com/danmichaelo/mwtextextractor
Last synced: about 1 year ago
JSON representation
Extracts unformatted body text from MediaWiki wikitext
- Host: GitHub
- URL: https://github.com/danmichaelo/mwtextextractor
- Owner: danmichaelo
- Created: 2013-05-12T13:27:35.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2019-02-20T07:05:07.000Z (over 7 years ago)
- Last Synced: 2025-05-06T07:03:58.444Z (about 1 year ago)
- Language: Python
- Size: 17.6 KB
- Stars: 2
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
mwtextextractor
===================
.. image:: https://travis-ci.org/danmichaelo/mwtextextractor.png?branch=master
:target: https://travis-ci.org/danmichaelo/mwtextextractor
.. image:: https://coveralls.io/repos/danmichaelo/mwtextextractor/badge.png
:target: https://coveralls.io/r/danmichaelo/mwtextextractor
mwtextextractor extracts simple body text from MediaWiki wikitext by stripping off templates, html tags, tables, headers, etc.
The extracted text can be used for word counting.
Example:
.. code-block:: python
from mwtextextractor import get_body_text
print get_body_text('Lorem {{ipsum}} dolor')