Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dito97/neural-deskew
toolkit for learning efficient document image skew estimation (DISE)
https://github.com/dito97/neural-deskew
deskewing document-analysis pytorch-2 self-supervised-learning
Last synced: 2 days ago
JSON representation
toolkit for learning efficient document image skew estimation (DISE)
- Host: GitHub
- URL: https://github.com/dito97/neural-deskew
- Owner: DiTo97
- License: mit
- Created: 2023-07-10T22:44:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-03T10:36:47.000Z (2 months ago)
- Last Synced: 2024-10-11T19:12:14.167Z (27 days ago)
- Topics: deskewing, document-analysis, pytorch-2, self-supervised-learning
- Language: Python
- Homepage:
- Size: 46.9 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# neural-deskew
A shallow multi-layer perceptron (MLP) for document image deskew on top of three classifical deskew algorithms.
## Usage
TODO
## Abstract
This project focuses on developing a neural network model for document image skew estimation using a Multi-Layer Perceptron (MLP) architecture and the Albumentations library for data augmentation. The goal is to accurately estimate the skew angle of document images.
## Dataset
- [document image skew estimation (DISE) 2021](https://drive.google.com/file/d/1a-a6aOqdsghjeHGLnCLsDs7NoJIus-Pw/view?usp=sharing)
A custom dataset is prepared, comprising 2000 document images with associated ground truth skew angles. The dataset is split into 1500 images for training and validation, and 500 images for testing. Each image is processed to restore vertical alignment and enable robustness to different sizes, occlusions, rotations, and lighting conditions.
## Model
The proposed MLP model takes three confidence vectors generated by different deskewing techniques as input. These vectors represent the likelihood of the document being rotated at various angles. The MLP processes these vectors and produces a unified confidence vector spanning the entire angle space. The model architecture includes convolutional layers to process the confidence vectors, followed by fully connected layers and dropout regularization to enhance generalization.
## Training
The training process is managed using PyTorch Lightning, which simplifies the training loop and provides features such as early stopping. The PyTorch Lightning Trainer is configured with early stopping using a patience of three to prevent overfitting. The training progress and metrics are logged using the Weights & Biases (W&B) library, enabling comprehensive experiment tracking and visualization.
To train the model, a training.py script is provided. It takes arguments for the dataset directory, model configuration YAML file, training hyperparameters, and data split ratios. The script loads the data, initializes the model and Trainer, and begins the training process. Additionally, a run_training.sh script is available to launch training using default configurations.
## Checkpoint
TODO
The model weights and architecture are checkpointed using the [checkpoint]()
## Configuration
The project includes configuration files config.yaml and model_config.yaml for easy customization of hyperparameters such as learning rate, batch size, hidden dimension, and number of epochs. These files allow seamless adaptation of the training process to specific requirements.
## Resources
- [YAML serialization for augmentation pipelines](https://albumentations.ai/docs/examples/serialization/)
- [download and use W&B artifacts](https://docs.wandb.ai/guides/artifacts/download-and-use-an-artifact)