https://github.com/403errors/ai-docparser

An application framework developed using the latest AI technologies to extract the values of specific pre-defined keys from a given PDF document. Also generating a document summary using the key & values extracted in the while doing so.
https://github.com/403errors/ai-docparser

automation csv-export nlp pdf-files python3 regex reinforcement-learning spacy

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/403errors/ai-docparser
Owner: 403errors
Created: 2024-11-30T18:53:01.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-01-18T16:27:12.000Z (5 months ago)
Last Synced: 2025-03-14T09:43:20.041Z (4 months ago)
Topics: automation, csv-export, nlp, pdf-files, python3, regex, reinforcement-learning, spacy
Language: Jupyter Notebook
Homepage: https://www.kaggle.com/code/sitama/ai-docparser
Size: 289 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# AI DocParser

**AI DocParser** is an AI-powered document parsing tool designed to extract, process, and analyze data from various document formats. It leverages state-of-the-art machine learning models to **automate** the processing of structured and unstructured data.

[![Kaggle](https://img.shields.io/badge/Kaggle-Visit%20Project-blue?logo=kaggle)](https://www.kaggle.com/code/sitama/ai-docparser)

## Example

### Input:
![input image](imgs/input_sample.png)

### Output:
![input image](imgs/output_sample.png)

## Features

- **Document Parsing**: Extract data from PDFs, images, and other document types.
- **AI-Powered Analysis**: Use machine learning models to understand and process text.
- **Customizable Workflows**: Easily adapt to different use cases by modifying parameters or integrating additional models.
- **Model Retraining**: Fine-tune the parsing model with custom datasets for improved accuracy.

## Tech Stack

- Implemented SpaCy for Named Entity Recognition, text extraction using fitz with accuracy of 99.22%
- Used RegEx for special type extractioon like date from the legal documents.
- Optimized data extraction with reinforcement learning, achieving high performance in dynamic PDFs

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/403errors/ai-docparser

Awesome Lists containing this project

README