An open API service indexing awesome lists of open source software.

https://github.com/komed3/img2txt

Precise text extraction from images and PDF documents
https://github.com/komed3/img2txt

image-processing ocr pdf-processing tesseract-ocr text-extraction

Last synced: 3 days ago
JSON representation

Precise text extraction from images and PDF documents

Awesome Lists containing this project

README

          

# img2txt

Free web-based tool for precise text extraction from images and PDF documents relying on local OCR processing via Tesseract WebAssembly.

## Features

- **Precise text extraction**: Uses [Tesseract WebAssembly](https://github.com/tesseract-ocr) for accurate OCR processing.
- **Local processing**: All OCR processing happens locally in the browser, ensuring data privacy.
- **Image and PDF support**: Supports both image and PDF documents.
- **Interactive region selection**: Allows users to select specific regions for text extraction.
- **Zoom and pan**: Supports zooming and panning for better region selection.
- **Rotation**: Supports rotating images and PDFs for better OCR results.
- **Multi-language support**: Supports multiple languages for OCR processing.

## Usage

1. Upload an image or PDF document.
2. Select the regions for text extraction. Will be processed in the order they are selected.
3. Click the "Extract Text" button to extract text from the selected regions.
4. The extracted text will be formatted and displayed, ready to be copied to the clipboard.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
(c) 2026 Paul Köhler (komed3). All rights reserved.