Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/xenoswarlocks/image_text_extractor
A Python-based tool for batch processing and extracting text from images using OCR (Tesseract). The extracted text is cleaned by removing unwanted terms, and potential names are identified and formatted. Results are saved in a structured text file for easy reference. Ideal for automating data extraction and preprocessing tasks.
https://github.com/xenoswarlocks/image_text_extractor
Last synced: 12 days ago
JSON representation
A Python-based tool for batch processing and extracting text from images using OCR (Tesseract). The extracted text is cleaned by removing unwanted terms, and potential names are identified and formatted. Results are saved in a structured text file for easy reference. Ideal for automating data extraction and preprocessing tasks.
- Host: GitHub
- URL: https://github.com/xenoswarlocks/image_text_extractor
- Owner: XenosWarlocks
- License: apache-2.0
- Created: 2024-12-01T05:42:31.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2024-12-01T06:25:12.000Z (about 1 month ago)
- Last Synced: 2024-12-01T06:36:59.522Z (about 1 month ago)
- Language: Python
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Image_Text_Extractor
A Python-based tool for batch processing and extracting text from images using OCR (Tesseract). The extracted text is cleaned by removing unwanted terms, and potential names are identified and formatted. Results are saved in a structured text file for easy reference. Ideal for automating data extraction and preprocessing tasks.## Project Structure
```
ImageTextExtractor/
├── config.json # Configuration file
├── main.py # Entry point
├── modules/
│ ├── extractor.py # OCR and file processing logic
│ ├── filters.py # Text filtering and name extraction
│ ├── utils.py # Utilities for logging and progress tracking
│ ├── db_handler.py # Database interaction
│ ├── ner_model.py # NER-based name extraction
├── tests/
│ ├── test_extractor.py # Unit tests for extractor
│ ├── test_filters.py # Unit tests for filtering
│ ├── test_db.py # Unit tests for database
├── requirements.txt # Python dependencies
├── README.md # Documentation```