https://github.com/ryanlinjui/menu-text-detection
Extract structured menu information from images into JSON by E2E Vision-Language model fine-tuning pipeline or LLM.
https://github.com/ryanlinjui/menu-text-detection
document-understanding donut fine-tuning image-text-to-text transformer
Last synced: 2 months ago
JSON representation
Extract structured menu information from images into JSON by E2E Vision-Language model fine-tuning pipeline or LLM.
- Host: GitHub
- URL: https://github.com/ryanlinjui/menu-text-detection
- Owner: ryanlinjui
- License: mit
- Created: 2023-09-13T13:40:46.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2026-04-12T06:18:32.000Z (3 months ago)
- Last Synced: 2026-04-12T08:15:14.071Z (3 months ago)
- Topics: document-understanding, donut, fine-tuning, image-text-to-text, transformer
- Language: Python
- Homepage: https://huggingface.co/spaces/ryanlinjui/menu-text-detection
- Size: 5.43 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Menu Text Detection System
Extract structured menu information from images into JSON using a fine-tuned E2E model or LLM.
[](https://huggingface.co/spaces/ryanlinjui/menu-text-detection)
[](https://huggingface.co/collections/ryanlinjui/menu-text-detection-670ccf527626bb004bbfb39b)
https://github.com/user-attachments/assets/80e5d54c-f2c8-4593-ad9b-499e5b71d8f6
## 🚀 Features
### Overview
Currently supports the following information from menu images:
- **Restaurant Name**
- **Business Hours**
- **Address**
- **Phone Number**
- **Dish Information**
- Name
- Price
> For the JSON schema, see [tools directory](./tools).
### Supported Methods to Extract Menu Information
#### Fine-tuned E2E model and Training metrics
- [**Donut (Document Parsing Task)**](https://huggingface.co/ryanlinjui/donut-base-finetuned-menu) - Base model by [Clova AI (ECCV ’22)](https://github.com/clovaai/donut)
#### LLM Function Calling
- Google Gemini API
- OpenAI GPT API
## 💻 Training / Fine-Tuning
### Setup
Use [uv](https://github.com/astral-sh/uv) to set up the development environment:
```bash
uv sync
```
> or use `pip install -r requirements.txt` if it has any problems
### Training Script (Datasets collecting, Fine-Tuning)
Please refer [`train.ipynb`](./train.ipynb). Use Jupyter Notebook for training:
```bash
uv run jupyter-notebook
```
> For VSCode users, please install Jupyter extension, then select `.venv/bin/python` as your kernel.
### Run Demo Locally
```bash
uv run python app.py
```