https://github.com/jwinman91/ai-ocr
An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool
https://github.com/jwinman91/ai-ocr
genai image-to-plot-generation image-to-text-generation llama-cpp ocr-python ocr-recognition python3
Last synced: 3 months ago
JSON representation
An AI-powered, but model-agnostic (Optical-Character-Recognition) OCR tool
- Host: GitHub
- URL: https://github.com/jwinman91/ai-ocr
- Owner: jWinman91
- License: apache-2.0
- Created: 2024-09-04T19:51:09.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-03-01T23:10:10.000Z (4 months ago)
- Last Synced: 2025-03-02T00:19:32.161Z (4 months ago)
- Topics: genai, image-to-plot-generation, image-to-text-generation, llama-cpp, ocr-python, ocr-recognition, python3
- Language: Python
- Homepage:
- Size: 77.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# AI-Optical-Character-Recognition (AI-OCR): Extracting data from images
During my undergrad and postgrad Physics labs, I often had to manually read measurements from instruments, jot them down on paper, transfer them to a spreadsheet, and then generate plots—an inefficient and tedious process. Since I work in GenAI now after leaving Academia, I realized: Wait a minute... this can be automated! AI-OCR does exactly that. Simply take pictures of your measurements (or upload PDFs containing standardized numerical data, like financial reports), specify what numbers to extract, and let the AI generate insightful plots.
This tool also helps break free from proprietary software silos in Academia, where measurement data is often locked into vendor-specific formats. To showcase its capabilities, I attached a demo video below where I measured my blood pressure throughout the day, uploaded the images, and effortlessly plotted the results. It can even be applied to financial reports. I’ve used it on my business accounting PDFs to generate histograms of stock buy-in values—showing how AI-OCR can unlock valuable insights from structured financial data.
This repository is the backend code for a tool with which you can extract data from images using visual LLMs.
The frontend code (using streamlit) can be found here: [AI-OCR-Frotend](https://github.com/jWinman91/AI-OCR-Frontend).## Table of Contents
- [Installation](#Installation)
- [Usage](#Usage)
- [Example](#Example)
- [License](#license)## Installation
To use the AI-OCR tool, it is best if you install this repository for the backend, as well as the [frontend repository](https://github.com/jWinman91/AI-OCR-Frontend), i.e. follow these steps:
1. Clone this repository for the backend
```bash
git clone https://github.com/jWinman91/AI-OCR.git
cd ai-ocr
```
2. Install the required dependencies for the backend:
```bash
pip install -r requirements.txt
```
On Linux or MacOS you can also simply run the install.sh script:
```bash
chmod +x install.sh && ./install.sh
```
3. Clone the frontend repository
```bash
git clone https://github.com/jWinman91/AI-OCR-Frontend.git
cd ai-ocr-frondend
```
3. Install the required dependencies for the frontend:
```bash
pip install -r requirements.txt
```## Usage
You can then start the backend by running:
```bash
python app.py $IP_ADDRESS
```Since, the backend uses fastapi, you could now try it out via the fastapi docs by going to ```$IP_ADDRESS:5000/docs```.
But you can also start the frontend now by running
``` bash
chmod +x start_up.sh
./start_up.sh
```
from within the cloned frontend repository.A streamlit window will automaticall open in your browser.
Within the web application you'll then find two pages on the sidebar:
* AI-OCR: Webpage for running the actual optical character recognition
* Model Configurations: Subpage for configuring the models (e.g. ChatGPT, Llava, ...)## Example
Here is an example on how to use the streamlit frontend with ChatGPT configure as a model:
[](https://youtu.be/IHEpVTO-K3I)## Acknowledgments
- [Hugging Face](https://huggingface.co/) - Framework for working with state-of-the-art natural language processing models.