https://github.com/fahmiaziz98/receipt_parsing

receipt parsing using donut model, next we will add using LLM + OCR or VLM
https://github.com/fahmiaziz98/receipt_parsing

donut flask image-to-text parsing transformer

Last synced: 7 months ago
JSON representation

receipt parsing using donut model, next we will add using LLM + OCR or VLM

Host: GitHub
URL: https://github.com/fahmiaziz98/receipt_parsing
Owner: fahmiaziz98
Created: 2023-09-24T22:43:39.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-06-21T01:03:11.000Z (about 1 year ago)
Last Synced: 2024-06-22T10:32:33.807Z (about 1 year ago)
Topics: donut, flask, image-to-text, parsing, transformer
Language: Jupyter Notebook
Homepage: https://arxiv.org/abs/2111.15664
Size: 6.51 MB
Stars: 3
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Receipt Parsing

This project demonstrates how to use the Donut model with CORD dataset to perform receipt parsing. Receipt parsing involves extracting structured information from photographed receipts, such as itemized lists and totals.

## Overview

- The [CORD](https://huggingface.co/datasets/naver-clova-ix/cord-v2) dataset is a collection of receipts and invoices with ground truth annotations.
- We leverage the power of [Hugging Face Transformers](https://huggingface.co/transformers/) to fine-tune a pre-trained [Donut](https://huggingface.co/naver-clova-ix/donut-base) model for receipt parsing.
- The model used in this project is `fahmiaziz/finetune-donut-cord-v2.5`, which is adapted to the CORD-V2 dataset you can see it [here](https://huggingface.co/fahmiaziz/finetune-donut-cord-v2.5).
## Requirements

To run this project, you need:

- Python 3.7+
- PyTorch
- PyTorch-Lightning
- [Hugging Face Transformers](https://huggingface.co/transformers/)
- [Pillow](https://pillow.readthedocs.io/en/stable/)
- [flask](https://flask.palletsprojects.com/en/2.3.x/installation/#install-flask)

## Model Training and Evaluation

We trained and evaluated our receipt parsing model using the Donut model with CORD-V2 dataset. The goal was to achieve a high accuracy of 90% or above. You can access the detailed training and evaluation results on Weights & Biases (WandB):

- [Model Training and Evaluation Dashboard](https://wandb.ai/fahmiazizfadhil09/Donut-hpo)

![Model Training](evaluation.png)

Here are some highlights of the training and evaluation process:

- **Dataset**: We used the CORD dataset, which includes a diverse collection of receipts and invoices.

- **Model**: Our model is based on the [fahmiaziz/finetune-donut-cord-v2.5](https://huggingface.co/fahmiaziz/finetune-donut-cord-v2.5) architecture that has been fine-tuned from [donut-base](https://huggingface.co/naver-clova-ix/donut-base) and customized specifically for receipt parsing.

- **Training Metrics**: During training, we monitor various metrics, including accuracy, Tree Edit Distance (Tree ED) to ensure model performance.

- **Evaluation**: Our model achieved over 90% accuracy on the test dataset, demonstrating its effectiveness in parsing receipts.

- **Visualization**: The training and evaluation process can be visualized through the WandB dashboard linked above.

Feel free to explore the details of the training and evaluation results in WandB to gain more insight into our model's performance.

## Demonstration
[](https://youtu.be/4dclAXt4EQw "Receipt Parsing")

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fahmiaziz98/receipt_parsing

Awesome Lists containing this project

README