Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/t0mer/ocr-docker
ocr-docker is small, Flask powerd web app, helps us to extract text from images and pdf document using OCR
https://github.com/t0mer/ocr-docker
docker flask ocr python tesseract
Last synced: about 2 months ago
JSON representation
ocr-docker is small, Flask powerd web app, helps us to extract text from images and pdf document using OCR
- Host: GitHub
- URL: https://github.com/t0mer/ocr-docker
- Owner: t0mer
- License: apache-2.0
- Created: 2021-04-03T07:40:55.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-06-22T19:24:11.000Z (over 1 year ago)
- Last Synced: 2024-11-01T07:51:50.679Z (about 2 months ago)
- Topics: docker, flask, ocr, python, tesseract
- Language: CSS
- Homepage:
- Size: 96.1 MB
- Stars: 22
- Watchers: 4
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-opensource-israel - ocr-docker - a small, Flask powerd web app, helps us to extract text from images and pdf document using OCR. ![GitHub last commit](https://img.shields.io/github/last-commit/t0mer/ocr-docker?style=flat-square "GitHub last commit") ![GitHub top language](https://img.shields.io/github/languages/top/t0mer/ocr-docker?style=flat-square) (Projects by main language / css)
README
# OCR-Docker
## Extract text from images & pdf filesOCR-Docker is a Python & [Flask](https://flask.palletsprojects.com/en/1.1.x/) powered, easy to use system that helps us to easily extract text from images and pdf files in multiple languages.
## Features
- Extract text from images (png, jpg, tiff).
- Extract text from pdf files (single or multiple pages).## Components and Frameworks used in TTS-STT
* [tesseract-ocr](https://github.com/tesseract-ocr/) - open source ocr
* [tessdata](https://github.com/tesseract-ocr/tessdata) - tesseract-ocr data models
* [ghostscript](https://www.ghostscript.com/)
* [imagemagick](https://imagemagick.org/index.php)
* [pytesseract](https://pypi.org/project/pytesseract/)
* [Pillow](https://pypi.org/project/Pillow/)
* [Image](https://pypi.org/project/image/)
* [Flask](https://flask.palletsprojects.com/en/1.1.x/)
* [Loguru](https://pypi.org/project/loguru/)
* [PyYAML](https://pypi.org/project/PyYAML/)The OCR (Optical Character Recognition) feature is free thanks to [tesseract-ocr](https://github.com/tesseract-ocr/) which is an Open Source OCR project.
## Installation
#### docker-compose from hub
```yaml
version: "3.7"
services:
ocr:
image: techblog/ocr-docker:latest
ports:
- "8080:8080"
container_name: tts-stt
labels:
- "com.ouroboros.enable=true"
networks:
- default
restart: unless-stopped
```
Now, run ```docker-compose up -d``` to pull and run your container.
Open your browser and navigate to your container ip address with port 8080, you should see the following screen.[![OCR](https://github.com/t0mer/ocr-docker/blob/main/screenshot/ocr.png?raw=true "OCR")](https://github.com/t0mer/ocr-docker/blob/main/screenshot/ocr.png?raw=true "OCR")