{"id":16916515,"url":"https://github.com/ternaus/midv-500-models","last_synced_at":"2025-09-18T08:57:22.041Z","repository":{"id":56423851,"uuid":"265323358","full_name":"ternaus/midv-500-models","owner":"ternaus","description":"Model for document segmentation trained on the midv-500-models dataset.","archived":false,"fork":false,"pushed_at":"2020-11-09T02:24:25.000Z","size":45013,"stargazers_count":73,"open_issues_count":0,"forks_count":11,"subscribers_count":4,"default_branch":"master","last_synced_at":"2024-12-28T14:26:46.577Z","etag":null,"topics":["computer-vision","deep-learning","document-scanner","image-segmentation","python","pytorch","semantic-segmentation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ternaus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":null,"patreon":"ternaus","open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"custom":null}},"created_at":"2020-05-19T18:00:02.000Z","updated_at":"2024-11-18T13:05:04.000Z","dependencies_parsed_at":"2022-08-15T18:30:31.905Z","dependency_job_id":null,"html_url":"https://github.com/ternaus/midv-500-models","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ternaus%2Fmidv-500-models","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ternaus%2Fmidv-500-models/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ternaus%2Fmidv-500-models/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ternaus%2Fmidv-500-models/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ternaus","download_url":"https://codeload.github.com/ternaus/midv-500-models/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232814633,"owners_count":18580503,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","deep-learning","document-scanner","image-segmentation","python","pytorch","semantic-segmentation"],"created_at":"2024-10-13T19:28:15.563Z","updated_at":"2025-09-18T08:57:16.970Z","avatar_url":"https://github.com/ternaus.png","language":"Python","funding_links":["https://patreon.com/ternaus"],"categories":[],"sub_categories":[],"readme":"# midv-500-models\n[![DOI](https://zenodo.org/badge/265323358.svg)](https://zenodo.org/badge/latestdoi/265323358)\n\nThe repository contains a model for binary semantic segmentation of the documents.\n\n![](https://habrastorage.org/webt/gy/-t/xn/gy-txnzezlnurcwwlv7q5vs77x4.jpeg)\n\n* **Left**: input.\n* **Center**: prediction.\n* **Right**: overlay of the image and predicted mask.\n\n\n## Installation\n\n`pip install -U midv500models`\n\n### Example inference\n\nJupyter notebook with an example: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1lNv88MJOKgc-50XeYcHlJODpvT2JF9ru?usp=sharing)\n\n## Dataset\nModel is trained on [MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream](https://arxiv.org/abs/1807.05786).\n\n### Preparation\n\nDownload the dataset from the ftp server with\n```bash\nwget -r ftp://smartengines.com/midv-500/\n```\n\nUnpack the dataset\n```bash\ncd smartengines.com/midv-500/dataset/\nunzip \\*.zip\n```\n\nThe resulting folder structure will be\n\n```bash\nsmartengines.com\n    midv-500\n        dataset\n            01_alb_id\n                ground_truth\n                    CA\n                        CA01_01.tif\n                    ...\n                images\n                    CA\n                        CA01_01.json\n                    ...\n                ...\n            ...\n        ...\n    ...\n```\n\nTo preprocess the data use the script\n```python\npython midv500models/preprocess_data.py -i \u003cinput_folder\u003e \\\n                                          -o \u003coutput_folder\u003e\n```\n\nwhere `input_folder` corresponds to the file with the unpacked dataset and output folder will look as:\n\n```bash\nimages\n    CA01_01.jpg\n    ...\nmasks\n    CA01_01.png\n```\n\ntarget binary masks will have values \\[0, 255\\], where 0 is background and 255 is the document.\n\n## Training\n\n```bash\npython midv500models/train.py -c midv500models/configs/2020-05-19.yaml \\\n                              -i \u003cpath to train\u003e\n```\n\n## Inference\n\n```bash\npython midv500models/inference.py -c midv500models/configs/2020-05-19.yaml \\\n                                  -i \u003cpath to images\u003e \\\n                                  -o \u003cpath to save preidctions\u003e\n                                  -w \u003cpath to weights\u003e\n```\n\n## Weights\nUnet with Resnet34 backbone: [Config](midv500models/configs/2020-05-19.yaml) [Weights](Unet_Resnet34.pth)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fternaus%2Fmidv-500-models","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fternaus%2Fmidv-500-models","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fternaus%2Fmidv-500-models/lists"}