https://github.com/pranav-0309/ocr_model_dc

OCR model to extract a primary and a secondary ID, for each image-insurance type pair.
https://github.com/pranav-0309/ocr_model_dc

jupyter-notebook ocr ocr-python ocr-recognition python3 pytorch

Last synced: 3 months ago
JSON representation

OCR model to extract a primary and a secondary ID, for each image-insurance type pair.

Host: GitHub
URL: https://github.com/pranav-0309/ocr_model_dc
Owner: pranav-0309
Created: 2024-12-08T13:24:53.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-12-08T13:45:03.000Z (6 months ago)
Last Synced: 2025-03-29T07:44:57.318Z (3 months ago)
Topics: jupyter-notebook, ocr, ocr-python, ocr-recognition, python3, pytorch
Language: Jupyter Notebook
Homepage:
Size: 2.55 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

![digitizing_team](digitizing_team.png)

In this project we're taking a hypothetical scenario where an insurance company has a new initiative which is to digitalize all historical insurance claim documents, which includes improving the labeling of some IDs scanned from paper documents and identifying them as primary or secondary IDs.

To help them in their effort, I've used multi-modal learning to train an Optical Character Recognition (OCR) model. To improve the classification, the model will use **images** of the scanned documents as input and their **insurance type** (home, life, auto, health, or other).

Integrating different data modalities (such as image and text) enables the model to perform better in complex scenarios, helping to capture more nuanced information.

The **labels** that the model will be trained to identify are of two types: a primary and a secondary ID, for each image-insurance type pair.

To have a look at my code, open the `notebook.ipynb` file!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/pranav-0309/ocr_model_dc

Awesome Lists containing this project

README