Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/justmars/start-ocr

Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.
https://github.com/justmars/start-ocr

Last synced: 3 days ago
JSON representation

Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.

Awesome Lists containing this project

README

        

# start-ocr

![Github CI](https://github.com/justmars/start-ocr/actions/workflows/ci.yml/badge.svg)

1. Applying pdfplumber + opencv + pytesseract to extract content and metadata from formal PDF files.
2. pdfplumber's `page.extract_text_lines()` is experimental and thus can work or not depending on the pdf file.
3. See [documentation](https://justmars.github.io/start-ocr).

## Installation

```sh
just start
```