https://github.com/syaagalib/project_ocr
This code can do ocr
https://github.com/syaagalib/project_ocr
ocr-python python streamlit ui
Last synced: 6 months ago
JSON representation
This code can do ocr
- Host: GitHub
- URL: https://github.com/syaagalib/project_ocr
- Owner: SYAAGalib
- Created: 2024-09-19T07:46:32.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-12T14:20:30.000Z (7 months ago)
- Last Synced: 2025-06-12T15:32:03.453Z (7 months ago)
- Topics: ocr-python, python, streamlit, ui
- Language: Python
- Homepage:
- Size: 1.23 MB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# PDF to Text
[](https://share.streamlit.io/nainiayoub/pdf-text-data-extractor/main/app.py)



PDF text data extraction app that takes a PDF document as input and returns either a txt file that contains all pages or a compressed folder of txt files representing the document pages. OCR can also be enabled for scanned docoments.

## How does it worK?
```mermaid
flowchart LR
A[PDF] --> |text conversion / OCR| B(Text)
B --> |Option 1| D[txt file]
B --> |Option 2| E[ZIP folder of txt files for pages]
```
1. Upload your PDF.
2. Enable OCR (for scanned documents).
3. Select the PDF language.
4. Download your output file (zip/txt).
## How to support the project
You can help support the project through feedback and/or [buy me coffee](https://www.buymeacoffee.com/nainiayoub).