Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cneud/page-to-text

extract text from PAGE file
https://github.com/cneud/page-to-text

Last synced: about 1 month ago
JSON representation

extract text from PAGE file

Awesome Lists containing this project

README

        

# page-to-text
Extracts the text from a [PAGE](http://primaresearch.org/publications/ICPR2010_Pletschacher_PAGE) file and writes it to `stdout`.

Note that this tool does not consider `ReadingOrder` if available in the PAGE-XML, but instead writes output based of the order in the XML tree.

Use like:

python page_to_text.py