{"id":13423831,"url":"https://github.com/dhlab-epfl/dhSegment","last_synced_at":"2025-03-15T17:32:25.899Z","repository":{"id":48828692,"uuid":"97129580","full_name":"dhlab-epfl/dhSegment","owner":"dhlab-epfl","description":"Generic framework for historical document processing","archived":false,"fork":false,"pushed_at":"2021-07-09T16:14:24.000Z","size":6179,"stargazers_count":372,"open_issues_count":13,"forks_count":116,"subscribers_count":28,"default_branch":"master","last_synced_at":"2024-11-16T00:01:54.870Z","etag":null,"topics":["document-processing","historical-data","python3","segmentation","tensorflow"],"latest_commit_sha":null,"homepage":"https://dhlab-epfl.github.com/dhSegment","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dhlab-epfl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-13T14:08:24.000Z","updated_at":"2024-11-02T11:42:46.000Z","dependencies_parsed_at":"2022-09-23T22:30:27.453Z","dependency_job_id":null,"html_url":"https://github.com/dhlab-epfl/dhSegment","commit_stats":null,"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhlab-epfl%2FdhSegment","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhlab-epfl%2FdhSegment/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhlab-epfl%2FdhSegment/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dhlab-epfl%2FdhSegment/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dhlab-epfl","download_url":"https://codeload.github.com/dhlab-epfl/dhSegment/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243767333,"owners_count":20344909,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document-processing","historical-data","python3","segmentation","tensorflow"],"created_at":"2024-07-31T00:00:43.453Z","updated_at":"2025-03-15T17:32:25.892Z","avatar_url":"https://github.com/dhlab-epfl.png","language":"Python","funding_links":[],"categories":["Document layout analysis, text enrichment and semantic segmentation","Segmentation","Pipelines"],"sub_categories":["hOCR","Document Segmentation"],"readme":"# dhSegment\n\n[![Documentation Status](https://readthedocs.org/projects/dhsegment/badge/?version=latest)](https://dhsegment.readthedocs.io/en/latest/?badge=latest)\n\n**dhSegment** is a tool for Historical Document Processing. Its generic approach allows to segment regions and\nextract content from different type of documents. See \n[some examples here](https://dhsegment.readthedocs.io/en/latest/intro/intro.html#use-cases).\n\nThe complete description of the system can be found in the corresponding [paper](https://arxiv.org/abs/1804.10371).\n\nIt was created by [Benoit Seguin](https://twitter.com/Seguin_Be) and Sofia Ares Oliveira at DHLAB, EPFL.\n\n## Installation and usage\nThe [installation procedure](https://dhsegment.readthedocs.io/en/latest/start/install.html) \nand examples of usage can be found in the documentation (see section below).\n\n## Demo\nHave a try at the [demo](https://dhsegment.readthedocs.io/en/latest/start/demo.html) to train (optional) and apply dhSegment in page extraction using the `demo.py` script.\n\n## Documentation\nThe documentation is available on [readthedocs](https://dhsegment.readthedocs.io/).\n\n##\nIf you are using this code for your research, you can cite the corresponding paper as :\n```\n@inproceedings{oliveiraseguinkaplan2018dhsegment,\n  title={dhSegment: A generic deep-learning approach for document segmentation},\n  author={Ares Oliveira, Sofia and Seguin, Benoit and Kaplan, Frederic},\n  booktitle={Frontiers in Handwriting Recognition (ICFHR), 2018 16th International Conference on},\n  pages={7--12},\n  year={2018},\n  organization={IEEE}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhlab-epfl%2FdhSegment","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdhlab-epfl%2FdhSegment","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdhlab-epfl%2FdhSegment/lists"}