Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dhlab-epfl/dhSegment
Generic framework for historical document processing
https://github.com/dhlab-epfl/dhSegment
document-processing historical-data python3 segmentation tensorflow
Last synced: 3 months ago
JSON representation
Generic framework for historical document processing
- Host: GitHub
- URL: https://github.com/dhlab-epfl/dhSegment
- Owner: dhlab-epfl
- License: gpl-3.0
- Created: 2017-07-13T14:08:24.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2021-07-09T16:14:24.000Z (almost 3 years ago)
- Last Synced: 2024-01-16T02:48:10.583Z (5 months ago)
- Topics: document-processing, historical-data, python3, segmentation, tensorflow
- Language: Python
- Homepage: https://dhlab-epfl.github.com/dhSegment
- Size: 5.89 MB
- Stars: 360
- Watchers: 29
- Forks: 114
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-digitization - GitHub src
- awesome-ocr - dhlab-epfl/dhSegment - Generic framework for historical document processing (11. <a name='OpticalCharacterRecognitionEnginesandFrameworks'></a>Optical Character Recognition Engines and Frameworks / 8.1. <a name='GroundTruth'></a>Ground Truth)
- awesome-historical-newspaper-analysis - dhSegment
- awesome-ocr - dhlab-epfl/dhSegment - Generic framework for historical document processing (Optical Character Recognition Engines and Frameworks / CTPN [paper:2016](https://arxiv.org/pdf/1609.03605.pdf))
README
# dhSegment
[![Documentation Status](https://readthedocs.org/projects/dhsegment/badge/?version=latest)](https://dhsegment.readthedocs.io/en/latest/?badge=latest)
**dhSegment** is a tool for Historical Document Processing. Its generic approach allows to segment regions and
extract content from different type of documents. See
[some examples here](https://dhsegment.readthedocs.io/en/latest/intro/intro.html#use-cases).The complete description of the system can be found in the corresponding [paper](https://arxiv.org/abs/1804.10371).
It was created by [Benoit Seguin](https://twitter.com/Seguin_Be) and Sofia Ares Oliveira at DHLAB, EPFL.
## Installation and usage
The [installation procedure](https://dhsegment.readthedocs.io/en/latest/start/install.html)
and examples of usage can be found in the documentation (see section below).## Demo
Have a try at the [demo](https://dhsegment.readthedocs.io/en/latest/start/demo.html) to train (optional) and apply dhSegment in page extraction using the `demo.py` script.## Documentation
The documentation is available on [readthedocs](https://dhsegment.readthedocs.io/).##
If you are using this code for your research, you can cite the corresponding paper as :
```
@inproceedings{oliveiraseguinkaplan2018dhsegment,
title={dhSegment: A generic deep-learning approach for document segmentation},
author={Ares Oliveira, Sofia and Seguin, Benoit and Kaplan, Frederic},
booktitle={Frontiers in Handwriting Recognition (ICFHR), 2018 16th International Conference on},
pages={7--12},
year={2018},
organization={IEEE}
}
```