https://github.com/sushant1827/gemma-3-ocr
A Streamlit web application that leverages the Gemma-3 Vision model to extract and structure text from images
https://github.com/sushant1827/gemma-3-ocr
gemma3 image-analysis ocr ollama streamlit-webapp
Last synced: 6 months ago
JSON representation
A Streamlit web application that leverages the Gemma-3 Vision model to extract and structure text from images
- Host: GitHub
- URL: https://github.com/sushant1827/gemma-3-ocr
- Owner: sushant1827
- Created: 2025-03-20T18:32:22.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-22T06:17:09.000Z (7 months ago)
- Last Synced: 2025-04-06T22:13:18.270Z (6 months ago)
- Topics: gemma3, image-analysis, ocr, ollama, streamlit-webapp
- Language: Python
- Homepage:
- Size: 50.8 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Gemma-3 OCR

## Overview
Gemma-3 OCR is a Streamlit web application that leverages the Gemma-3 Vision model to extract and structure text from images. This tool makes it easy to convert text in images to well-formatted, editable content with a simple and intuitive user interface.
## Features
- **Simple Image Upload**: Easily upload images containing text using the file picker
- **Powerful Text Extraction**: Uses Gemma-3 Vision model to extract text with high accuracy
- **Structured Output**: Returns extracted text in a well-organized Markdown format
- **Clean Interface**: Intuitive design with clear separation of input and results
- **Instant Processing**: Get results within seconds of uploading an image## Requirements
- Python 3.7+
- Streamlit
- Ollama
- PIL/Pillow
- An Ollama-compatible system with the Gemma-3 model installed## Installation
1. Clone this repository:
```
git clone https://github.com/yourusername/gemma3-ocr.git
cd gemma3-ocr
```2. Install the required dependencies:
```
pip install streamlit ollama pillow
```3. Make sure Ollama is installed and running with the Gemma-3 model:
```
ollama pull gemma3:12b
```## Usage
1. Start the application:
```
streamlit run app.py
```2. Open your browser and navigate to the provided URL (typically http://localhost:8501)
3. Upload an image containing text using the sidebar upload button
4. Click "Extract Text" to process the image
5. View the extracted text in the main panel
6. Use the "Clear" button to reset results when needed
## Example Use Cases
- Digitizing printed documents
- Extracting text from screenshots
- Converting handwritten notes to digital text
- Capturing text from presentation slides
- Extracting content from diagrams and infographics## Acknowledgments
This project uses the Gemma-3 Vision model, developed by Google DeepMind and made available through Ollama.
---
## Output

---