{"id":13415679,"url":"https://github.com/thomasjhuang/deep-learning-for-document-dewarping","last_synced_at":"2025-03-14T23:31:00.215Z","repository":{"id":63725220,"uuid":"191257265","full_name":"thomasjhuang/deep-learning-for-document-dewarping","owner":"thomasjhuang","description":"An application of high resolution GANs to dewarp images of perturbed documents","archived":false,"fork":false,"pushed_at":"2021-10-18T17:25:42.000Z","size":182,"stargazers_count":120,"open_issues_count":1,"forks_count":25,"subscribers_count":8,"default_branch":"master","last_synced_at":"2024-07-31T21:54:22.206Z","etag":null,"topics":["document","gan","ocr","ocr-recognition","pix2pix"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thomasjhuang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-06-10T23:01:49.000Z","updated_at":"2024-06-28T05:43:42.000Z","dependencies_parsed_at":"2022-11-24T18:40:17.964Z","dependency_job_id":null,"html_url":"https://github.com/thomasjhuang/deep-learning-for-document-dewarping","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjhuang%2Fdeep-learning-for-document-dewarping","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjhuang%2Fdeep-learning-for-document-dewarping/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjhuang%2Fdeep-learning-for-document-dewarping/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thomasjhuang%2Fdeep-learning-for-document-dewarping/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thomasjhuang","download_url":"https://codeload.github.com/thomasjhuang/deep-learning-for-document-dewarping/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243663382,"owners_count":20327299,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document","gan","ocr","ocr-recognition","pix2pix"],"created_at":"2024-07-30T21:00:51.279Z","updated_at":"2025-03-14T23:30:59.904Z","avatar_url":"https://github.com/thomasjhuang.png","language":"Python","funding_links":[],"categories":["2. \u003ca name='DeskewingandDewarping'\u003e\u003c/a\u003eDeskewing and Dewarping","Deskewing and Dewarping"],"sub_categories":["1.4. \u003ca name='OCRCLI'\u003e\u003c/a\u003eOCR CLI"],"readme":"# Docuwarp\n[![Codacy Badge](https://app.codacy.com/project/badge/Grade/e8bf67a83de04872aecd3c09f11b6389)](https://www.codacy.com/gh/thomasjhuang/deep-learning-for-document-dewarping/dashboard?utm_source=github.com\u0026amp;utm_medium=referral\u0026amp;utm_content=thomasjhuang/deep-learning-for-document-dewarping\u0026amp;utm_campaign=Badge_Grade)\n![Python version](https://img.shields.io/pypi/pyversions/dominate.svg?style=flat)\n\nThis project is focused on dewarping document images through the usage of pix2pixHD, a GAN that is useful for general image to image translation. The objective is to take images of documents that are warped, folded, crumpled, etc. and convert the image to a \"dewarped\" state by using [pix2pixHD](https://github.com/NVIDIA/pix2pixHD) to train and perform inference. All of the model code is borrowed directly from the pix2pixHD official repository.\n\nSome of the intuition behind doing this is inspired by these two papers:\n1. [DocUNet: Document Image Unwarping via A Stacked U-Net (Ma et.al)](https://www.juew.org/publication/DocUNet.pdf)\n2. [Document Image Dewarping using Deep Learning (Ramanna et.al)](www.insticc.org/Primoris/Resources/PaperPdf.ashx?idPaper=73684)\n\n## Prerequisites\n\nThis project requires **Python** and the following Python libraries installed:\n\n-   Linux or OSX\n-   [scikit-learn](http://scikit-learn.org/stable/)\n-   NVIDIA GPU (11G memory or larger) + CUDA cuDNN\n-   [Pytorch](https://pytorch.org/get-started/locally/)\n-   [Pillow](https://pillow.readthedocs.io/en/stable/installation.html)\n-   [OpenCV](https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_setup/py_table_of_contents_setup/py_table_of_contents_setup.html)\n\n## Getting Started\n### Installation\n-   Install PyTorch and dependencies from \u003chttp://pytorch.org\u003e\n-   Install python libraries [dominate](https://github.com/Knio/dominate).\n```bash\npip install dominate\n```\n-   Clone this repo:\n```bash\ngit clone https://github.com/thomasjhuang/deep-learning-for-document-dewarping\ncd deep-learning-for-document-dewarping\n```\n\n### Training\n-   Train the kaggle model with 256x256 crops:\n```bash\npython train.py --name kaggle --label_nc 0 --no_instance --no_flip --netG local --ngf 32 --fineSize 256\n```\n-   To view training results, please checkout intermediate results in `./checkpoints/kaggle/web/index.html`.\nIf you have tensorflow installed, you can see tensorboard logs in `./checkpoints/kaggle/logs` by adding `--tf_log` to the training scripts.\n\n### Training with your own dataset\n-   If you want to train with your own dataset, please generate label maps which are one-channel whose pixel values correspond to the object labels (i.e. 0,1,...,N-1, where N is the number of labels). This is because we need to generate one-hot vectors from the label maps. Please also specity `--label_nc N` during both training and testing.\n-   If your input is not a label map, please just specify `--label_nc 0` which will directly use the RGB colors as input. The folders should then be named `train_A`, `train_B` instead of `train_label`, `train_img`, where the goal is to translate images from A to B.\n-   If you don't have instance maps or don't want to use them, please specify `--no_instance`.\n-   The default setting for preprocessing is `scale_width`, which will scale the width of all training images to `opt.loadSize` (1024) while keeping the aspect ratio. If you want a different setting, please change it by using the `--resize_or_crop` option. For example, `scale_width_and_crop` first resizes the image to have width `opt.loadSize` and then does random cropping of size `(opt.fineSize, opt.fineSize)`. `crop` skips the resizing step and only performs random cropping. If you don't want any preprocessing, please specify `none`, which will do nothing other than making sure the image is divisible by 32.\n\n### Testing\n-   Test the model:\n```bash\npython test.py --name kaggle --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256\n```\nThe test results will be saved to a directory here: `./results/kaggle/test_latest/`.\n\n\n### Dataset\n-   I use the kaggle denoising dirty documents dataset. To train a model on the full dataset, please download it from the [official website](https://www.kaggle.com/c/denoising-dirty-documents/data).\nAfter downloading, please put it under the `datasets` folder with warped images under the directory name `train_A` and unwarped images under the directory `train_B`. Your test images are warped images, and should be under the name `test_A`. Below is an example dataset directory structure.\n\n            .\n            ├── ...\n            ├── datasets                  \n            │   ├── train_A               # warped images\n            │   ├── train_B               # unwarped, \"ground truth\" images\n            │   └── test_A                # warped images used for testing\n            └── ...\n     \n### Multi-GPU training\n-   Train a model using multiple GPUs (`bash ./scripts/train_kaggle_256_multigpu.sh`):\n```bash\n#!./scripts/train_kaggle_256_multigpu.sh\npython train.py --name kaggle_256_multigpu --label_nc 0 --netG local --ngf 32 --resize_or_crop crop --no_instance --no_flip --fineSize 256 --batchSize 32 --gpu_ids 0,1,2,3,4,5,6,7\n```\n\n### Training with Automatic Mixed Precision (AMP) for faster speed\n-   To train with mixed precision support, please first install apex from: \u003chttps://github.com/NVIDIA/apex\u003e\n-   You can then train the model by adding `--fp16`. For example,\n```bash\n#!./scripts/train_512p_fp16.sh\npython -m torch.distributed.launch train.py --name label2city_512p --fp16\n```\nIn my test case, it trains about 80% faster with AMP on a Volta machine.\n\n## More Training/Test Details\n-   Flags: see `options/train_options.py` and `options/base_options.py` for all the training flags; see `options/test_options.py` and `options/base_options.py` for all the test flags.\n-   Instance map: we take in both label maps and instance maps as input. If you don't want to use instance maps, please specify the flag `--no_instance`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasjhuang%2Fdeep-learning-for-document-dewarping","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthomasjhuang%2Fdeep-learning-for-document-dewarping","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthomasjhuang%2Fdeep-learning-for-document-dewarping/lists"}