https://github.com/haripasapuleti/text-extractor
A Flask-based web app that uses OCR technology to extract text from uploaded images. Users can upload images, preview them, and download the extracted text as a file.
https://github.com/haripasapuleti/text-extractor
file-upload flask image-preprocessing ocr-text-reader python
Last synced: 8 months ago
JSON representation
A Flask-based web app that uses OCR technology to extract text from uploaded images. Users can upload images, preview them, and download the extracted text as a file.
- Host: GitHub
- URL: https://github.com/haripasapuleti/text-extractor
- Owner: HariPasapuleti
- Created: 2025-02-20T06:07:53.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-04-10T12:35:09.000Z (about 1 year ago)
- Last Synced: 2025-04-10T13:54:28.417Z (about 1 year ago)
- Topics: file-upload, flask, image-preprocessing, ocr-text-reader, python
- Language: HTML
- Homepage:
- Size: 430 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🖼️ Image Text Extractor
A Flask-powered web application that extracts text from images using **Tesseract OCR**. The app supports image preprocessing for improved accuracy, an intuitive interface, and the ability to download extracted text.
## 🌟 Features
- **OCR-Based Text Extraction**: Leverages Tesseract OCR for reliable text recognition.
- **Preprocessing for Accuracy**: Automatically preprocesses images (grayscale and thresholding).
- **Image Preview**: See your uploaded image before processing.
- **Text Export**: Download extracted text as a `.txt` file with a single click.
- **Responsive UI**: A mobile-friendly, modern interface.
---
## 🚀 Live Demo
You can view a live demo of the app [here](https://your-demo-link.com)
---
## 🛠️ Tech Stack
- **Backend**: Flask, Python
- **Frontend**: HTML, CSS, JavaScript
- **Image Processing**: OpenCV, Pillow (PIL)
- **OCR**: Tesseract OCR
---
## 🧑💻 Installation Guide
### Prerequisites
1. **Python 3.7+**
2. **Tesseract OCR**:
- Windows: [Download Tesseract](https://github.com/tesseract-ocr/tesseract).
- Linux/Mac:
```bash
sudo apt install tesseract-ocr
```
### Steps to Run Locally
1. Clone the repository:
```bash
git clone https://github.com/HariPasapuleti/Text-Extractor.git
cd Text-Extractor
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Configure Tesseract path:
- Windows:
```python
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
```
- Linux/Mac: No changes needed (default path).
4. Start the Flask server:
```bash
python text_extractor.py
```
---
## 🖥️ Usage
1. **Upload an Image**: Accepts `.jpg` or `.png` formats.
2. **Preview**: Check the image preview before processing.
3. **Extract Text**: View extracted text directly on the page.
4. **Download**: Save the extracted text as a `.txt` file.
---