https://github.com/t0mer/ocr-docker

ocr-docker is small, Flask powerd web app, helps us to extract text from images and pdf document using OCR
https://github.com/t0mer/ocr-docker

docker flask ocr python tesseract

Last synced: about 14 hours ago
JSON representation

ocr-docker is small, Flask powerd web app, helps us to extract text from images and pdf document using OCR

Host: GitHub
URL: https://github.com/t0mer/ocr-docker
Owner: t0mer
License: apache-2.0
Created: 2021-04-03T07:40:55.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-01-27T00:14:37.000Z (5 months ago)
Last Synced: 2025-03-02T13:11:18.392Z (4 months ago)
Topics: docker, flask, ocr, python, tesseract
Language: CSS
Homepage:
Size: 96.1 MB
Stars: 51
Watchers: 4
Forks: 14
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-opensource-israel - ocr-docker - a small, Flask powerd web app, helps us to extract text from images and pdf document using OCR. ![GitHub last commit](https://img.shields.io/github/last-commit/t0mer/ocr-docker?style=flat-square "GitHub last commit") ![GitHub top language](https://img.shields.io/github/languages/top/t0mer/ocr-docker?style=flat-square) (Projects by main language / css)

README

        # OCR-Docker

## Extract text from images & pdf files

OCR-Docker is a Python & [Flask](https://flask.palletsprojects.com/en/1.1.x/) powered, easy to use system that helps us to easily extract text from images and pdf files in multiple languages.

## Features

- Extract text from images (png, jpg, tiff).

- Extract text from pdf files (single or multiple pages).

## Components and Frameworks used in TTS-STT

* [tesseract-ocr](https://github.com/tesseract-ocr/) - open source ocr

* [tessdata](https://github.com/tesseract-ocr/tessdata) - tesseract-ocr data models

* [ghostscript](https://www.ghostscript.com/)

* [imagemagick](https://imagemagick.org/index.php)

* [pytesseract](https://pypi.org/project/pytesseract/)

* [Pillow](https://pypi.org/project/Pillow/)

* [Image](https://pypi.org/project/image/)

* [Flask](https://flask.palletsprojects.com/en/1.1.x/)

* [Loguru](https://pypi.org/project/loguru/)

* [PyYAML](https://pypi.org/project/PyYAML/)

 The OCR (Optical Character Recognition) feature is free thanks to [tesseract-ocr](https://github.com/tesseract-ocr/) which is an Open Source OCR project.

## Installation

#### docker-compose from hub

```yaml

version: "3.7"

services:

  ocr:

    image: techblog/ocr-docker:latest

    ports:

      - "8080:8080"

    container_name: tts-stt

    labels:

      - "com.ouroboros.enable=true"

    networks:

      - default

    restart: unless-stopped

```

Now, run ```docker-compose up -d``` to pull and run your container.

Open your browser and navigate to your container ip address with port 8080, you should see the following screen.

[![OCR](https://github.com/t0mer/ocr-docker/blob/main/screenshot/ocr.png?raw=true "OCR")](https://github.com/t0mer/ocr-docker/blob/main/screenshot/ocr.png?raw=true "OCR")

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/t0mer/ocr-docker

Awesome Lists containing this project

README