Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ds4v/nomnasite
Deployment of NomNaOCR: https://youtu.be/o5xpfwalEWw
https://github.com/ds4v/nomnasite
digitization history vietnamese
Last synced: 15 days ago
JSON representation
Deployment of NomNaOCR: https://youtu.be/o5xpfwalEWw
- Host: GitHub
- URL: https://github.com/ds4v/nomnasite
- Owner: ds4v
- Created: 2022-07-09T20:56:27.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-07T22:52:57.000Z (6 months ago)
- Last Synced: 2024-12-11T00:46:18.563Z (23 days ago)
- Topics: digitization, history, vietnamese
- Language: Python
- Homepage: https://nomnasite.streamlit.app
- Size: 88.5 MB
- Stars: 2
- Watchers: 1
- Forks: 5
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Web application for Sino-Nôm digitalization
[![demo](./imgs/demo.png)](https://youtu.be/o5xpfwalEWw)
> Demo: https://share.streamlit.io/ds4v/nomnasite/main/app.py
## Usage
```bash
pip install requirements.txt
streamlit run app.py
```👉 Check out [YouTube demo](https://youtu.be/o5xpfwalEWw)
## Features
1. Input image from local or URL.
2. Leverage DL models to extract text from image:
- Use [VNPF's site](https://www.nomfoundation.org) as collected source.
- Apply models based on the results of [NomNaOCR](https://github.com/ds4v/NomNaOCR).
3. Interactive mode using [streamlit-drawable-canvas](https://github.com/andfanilo/streamlit-drawable-canvas):
- **Drawing** mode: draw rectangle boxes on image regions containing characters.
- **Editing** mode: rotate, skew, scale, move any box of the canvas on demand.
- Undo, Redo or Delete canvas contents.
4. Saving OCR results:
- Export detection, recognition, and translation results to [CSV](data/data.csv) or [JSON](data/data.json).
- Download [patches](data/patches.zip) cropped from detected bounding boxes.
5. Translate using APIs from:
- VNUHCM University of Science: https://www.clc.hcmus.edu.vn/?page_id=3039
- Sino-Nôm dictionary: https://hvdic.thivien.net/transcript.php#trans**(\*)** Note: In **Editing** mode, double-click a box to remove it.
## Reference
My Vietnamese Sino-Nôm digitalization series :
- [NomNaOCR](https://github.com/ds4v/NomNaOCR): Optical Character Recognition.
- [NomNaNMT](https://github.com/ds4v/NomNaNMT): Neural Machine Translation.
- [NomNaSite](https://github.com/ds4v/NomNaSite): Web Application.