https://github.com/dhlab-epfl/dhSegment
Generic framework for historical document processing
https://github.com/dhlab-epfl/dhSegment
document-processing historical-data python3 segmentation tensorflow
Last synced: 7 months ago
JSON representation
Generic framework for historical document processing
- Host: GitHub
- URL: https://github.com/dhlab-epfl/dhSegment
- Owner: dhlab-epfl
- License: gpl-3.0
- Created: 2017-07-13T14:08:24.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2021-07-09T16:14:24.000Z (over 4 years ago)
- Last Synced: 2024-11-16T00:01:54.870Z (11 months ago)
- Topics: document-processing, historical-data, python3, segmentation, tensorflow
- Language: Python
- Homepage: https://dhlab-epfl.github.com/dhSegment
- Size: 5.89 MB
- Stars: 372
- Watchers: 28
- Forks: 116
- Open Issues: 13
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-digitization - GitHub src
README
# dhSegment
[](https://dhsegment.readthedocs.io/en/latest/?badge=latest)
**dhSegment** is a tool for Historical Document Processing. Its generic approach allows to segment regions and
extract content from different type of documents. See
[some examples here](https://dhsegment.readthedocs.io/en/latest/intro/intro.html#use-cases).The complete description of the system can be found in the corresponding [paper](https://arxiv.org/abs/1804.10371).
It was created by [Benoit Seguin](https://twitter.com/Seguin_Be) and Sofia Ares Oliveira at DHLAB, EPFL.
## Installation and usage
The [installation procedure](https://dhsegment.readthedocs.io/en/latest/start/install.html)
and examples of usage can be found in the documentation (see section below).## Demo
Have a try at the [demo](https://dhsegment.readthedocs.io/en/latest/start/demo.html) to train (optional) and apply dhSegment in page extraction using the `demo.py` script.## Documentation
The documentation is available on [readthedocs](https://dhsegment.readthedocs.io/).##
If you are using this code for your research, you can cite the corresponding paper as :
```
@inproceedings{oliveiraseguinkaplan2018dhsegment,
title={dhSegment: A generic deep-learning approach for document segmentation},
author={Ares Oliveira, Sofia and Seguin, Benoit and Kaplan, Frederic},
booktitle={Frontiers in Handwriting Recognition (ICFHR), 2018 16th International Conference on},
pages={7--12},
year={2018},
organization={IEEE}
}
```