Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cneud/page-to-text
extract text from PAGE file
https://github.com/cneud/page-to-text
Last synced: about 1 month ago
JSON representation
extract text from PAGE file
- Host: GitHub
- URL: https://github.com/cneud/page-to-text
- Owner: cneud
- Created: 2015-10-02T13:17:36.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2022-11-10T16:56:04.000Z (about 2 years ago)
- Last Synced: 2024-08-04T13:06:50.835Z (5 months ago)
- Language: Python
- Size: 5.86 KB
- Stars: 3
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# page-to-text
Extracts the text from a [PAGE](http://primaresearch.org/publications/ICPR2010_Pletschacher_PAGE) file and writes it to `stdout`.Note that this tool does not consider `ReadingOrder` if available in the PAGE-XML, but instead writes output based of the order in the XML tree.
Use like:
python page_to_text.py