https://github.com/ternaus/midv-500-models
Model for document segmentation trained on the midv-500-models dataset.
https://github.com/ternaus/midv-500-models
computer-vision deep-learning document-scanner image-segmentation python pytorch semantic-segmentation
Last synced: 9 months ago
JSON representation
Model for document segmentation trained on the midv-500-models dataset.
- Host: GitHub
- URL: https://github.com/ternaus/midv-500-models
- Owner: ternaus
- License: mit
- Created: 2020-05-19T18:00:02.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2020-11-09T02:24:25.000Z (over 5 years ago)
- Last Synced: 2024-12-28T14:26:46.577Z (over 1 year ago)
- Topics: computer-vision, deep-learning, document-scanner, image-segmentation, python, pytorch, semantic-segmentation
- Language: Python
- Homepage:
- Size: 42.9 MB
- Stars: 73
- Watchers: 4
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE
Awesome Lists containing this project
README
# midv-500-models
[](https://zenodo.org/badge/latestdoi/265323358)
The repository contains a model for binary semantic segmentation of the documents.

* **Left**: input.
* **Center**: prediction.
* **Right**: overlay of the image and predicted mask.
## Installation
`pip install -U midv500models`
### Example inference
Jupyter notebook with an example: [](https://colab.research.google.com/drive/1lNv88MJOKgc-50XeYcHlJODpvT2JF9ru?usp=sharing)
## Dataset
Model is trained on [MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream](https://arxiv.org/abs/1807.05786).
### Preparation
Download the dataset from the ftp server with
```bash
wget -r ftp://smartengines.com/midv-500/
```
Unpack the dataset
```bash
cd smartengines.com/midv-500/dataset/
unzip \*.zip
```
The resulting folder structure will be
```bash
smartengines.com
midv-500
dataset
01_alb_id
ground_truth
CA
CA01_01.tif
...
images
CA
CA01_01.json
...
...
...
...
...
```
To preprocess the data use the script
```python
python midv500models/preprocess_data.py -i \
-o
```
where `input_folder` corresponds to the file with the unpacked dataset and output folder will look as:
```bash
images
CA01_01.jpg
...
masks
CA01_01.png
```
target binary masks will have values \[0, 255\], where 0 is background and 255 is the document.
## Training
```bash
python midv500models/train.py -c midv500models/configs/2020-05-19.yaml \
-i
```
## Inference
```bash
python midv500models/inference.py -c midv500models/configs/2020-05-19.yaml \
-i \
-o
-w
```
## Weights
Unet with Resnet34 backbone: [Config](midv500models/configs/2020-05-19.yaml) [Weights](Unet_Resnet34.pth)