Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kayoyin/dirtydocuments
https://github.com/kayoyin/dirtydocuments
Last synced: 8 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/kayoyin/dirtydocuments
- Owner: kayoyin
- Created: 2019-07-31T09:31:37.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-08-01T12:37:57.000Z (over 5 years ago)
- Last Synced: 2023-08-23T15:14:41.298Z (about 1 year ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 29.2 MB
- Stars: 19
- Watchers: 5
- Forks: 10
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Scanned Document Denoiser
Implementation of an ensemble model with CNN autoencoder and image processing as base learners, and CNN or XGB meta learner, and CycleGAN for cleaning scanned documents, to faciliate OCR subsequently.
For further details of the approach, my [Medium article](https://medium.com/illuin/cleaning-up-dirty-scanned-documents-with-deep-learning-2e8e6de6cfa6) explains the full project and results.
## To use the ensemble model (CNN or XGB):
* Put dirty images in `data/x_train`, associated target clean images in `data/y_train`, dirty test images in `data/x_test`.
* Run `data_augmentation.ipynb`
* Run `base_learners.ipynb`
* Run `stacker_models.ipynb`## To use the CycleGAN model:
* Put dirty images for training/testing in `gan_data/trainA` / `gan_data/testA`
* Put clean images in `gan_data/trainB` / `gan_data/testB`
* The dirty and clean images do not need to be paired
* Run `CycleGAN.ipynb`## References
* [Denoising Dirty Documents](https://www.kaggle.com/c/denoising-dirty-documents/overview) on Kaggle
* [Image Processing in OpenCV](https://docs.opencv.org/2.4/doc/tutorials/imgproc/table_of_content_imgproc/table_of_content_imgproc.html)
* [Image Processing + Machine Learning in R: Denoising Dirty Documents Tutorial Series](http://blog.kaggle.com/2015/12/04/image-processing-machine-learning-in-r-denoising-dirty-documents-tutorial-series/) by Colin Priest
* [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/pdf/1703.10593.pdf), by Zhu et al.
* [Keras Implementation](https://github.com/eriklindernoren/Keras-GAN/tree/master/cyclegan) of CycleGAN