https://github.com/meangrinch/mangatranslator
Manga translation app powered by AI
https://github.com/meangrinch/mangatranslator
ai auto-translation comics inpainting manga manga-translator manhua manhwa ocr segmentation text-detection translation
Last synced: 6 days ago
JSON representation
Manga translation app powered by AI
- Host: GitHub
- URL: https://github.com/meangrinch/mangatranslator
- Owner: meangrinch
- License: apache-2.0
- Created: 2025-04-08T02:00:30.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-05-14T01:35:32.000Z (24 days ago)
- Last Synced: 2026-05-14T03:28:58.499Z (24 days ago)
- Topics: ai, auto-translation, comics, inpainting, manga, manga-translator, manhua, manhwa, ocr, segmentation, text-detection, translation
- Language: Python
- Homepage:
- Size: 2.66 MB
- Stars: 208
- Watchers: 4
- Forks: 36
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[English](README.md) | [简体中文](docs/translations/README_zh.md) | [한국어](docs/translations/README_ko.md) | [日本語](docs/translations/README_ja.md)
## MangaTranslator
Gradio-based web application for automating the translation of manga/comic page images using AI. Targets speech bubbles and text outside of speech bubbles. Supports 59 languages and custom font pack usage.
Original
Translated (w/ a single click)
## Table of Contents
- [Features](#features)
- [Requirements](#requirements)
- [Install](#install)
- [Post-Install Setup](#post-install-setup)
- [Run](#run)
- [Documentation](#documentation)
- [Updating](#updating)
- [License & Credits](#license--credits)
## Features
- **Detection**: Speech bubble detection & segmentation (YOLO + SAM 2.1/3)
- **Cleaning**: Inpaint speech bubbles and OSB text (Flux.2 Klein, Flux.1 Kontext, or OpenCV)
- **Translation**: LLM-powered OCR & translation (59 languages)
- **Rendering**: Text rendering with alignment and custom font packs
- **Upscaling**: 2x-AnimeSharpV4 for enhanced output quality
- **Processing**: Single/batch processing with directory preservation and ZIP support
- **Interfaces**: Web UI (Gradio) and CLI
- **Automation**: One-click translation; no intervention required
## Requirements
- Python 3.10+
- PyTorch (CPU, CUDA, ROCm, XPU, MPS)
- Font pack with `.ttf`/`.otf` files; included with portable package
- LLM for Japanese source text; VLM for other languages (API or local)
## Install
### Portable Package (Recommended)
Download the standalone zip from the releases page: [Portable Build](https://github.com/meangrinch/MangaTranslator/releases/tag/portable)
**Requirements:**
- **Windows:** Bundled Python/Git included; no additional requirements
- **Linux/macOS:** Python 3.10+ and Git must be installed on your system
**Setup:**
1. Extract the zip file
2. Run the setup script for your platform:
- **Windows:** Double-click `setup.bat`
- **Linux/macOS:** Run `./setup.sh` in terminal
3. PyTorch version is automatically detected and installed based on your system
4. Open the launcher script created in `./MangaTranslator/`:
- **Windows:** `start-webui.bat`
- **Linux/macOS:** `start-webui.sh`
Included font packs:
- _Komika_ (normal text)
- _Cookies_ (OSB text)
- _Comicka_ (either)
- _Roboto_ (supports accents)
- _Noto Sans SC_ (supports Simplified Chinese)
> [!TIP]
> In the event that you need to transfer to a fresh portable package:
>
> - You can safely move the `fonts`, `models`, and `output` directories to the new portable package
> - You might be able to move the `runtime` directory over, assuming the same setup configuration is wanted
### Manual install
1. Clone and enter the repo
```bash
git clone https://github.com/meangrinch/MangaTranslator.git
cd MangaTranslator
```
2. Create and activate a virtual environment (recommended)
```bash
python -m venv venv
# Windows PowerShell/CMD
.\venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
```
3. Install PyTorch (see: [PyTorch Install](https://pytorch.org/get-started/locally/))
```bash
# Example (CUDA 13.0)
pip install torch==2.10.0+cu130 torchvision==0.25.0+cu130 --extra-index-url https://download.pytorch.org/whl/cu130
# Example (ROCm 7.1)
pip install torch==2.10.0+rocm7.1 torchvision==0.25.0+rocm7.1 --extra-index-url https://download.pytorch.org/whl/rocm7.1
# Example (XPU)
pip install torch==2.10.0+xpu torchvision==0.25.0+xpu --extra-index-url https://download.pytorch.org/whl/xpu
# Example (MPS/CPU)
pip install torch==2.10.0 torchvision==0.25.0
```
4. Install Nunchaku (optional, for Flux.1 Kontext Nunchaku backend)
- Nunchaku wheels are not on PyPI. Install directly from the v1.2.1 GitHub release URL, matching your OS and Python version. CUDA only, and requires a 2000-series card or newer.
```bash
# Example (Windows, Python 3.13, PyTorch 2.10.0, CUDA 13.0)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/v1.2.1/nunchaku-1.2.1+cu13.0torch2.10-cp313-cp313-win_amd64.whl
```
> [!NOTE]
> Nunchaku is not necessary for the use of Flux models via the SDNQ backend.
5. Install dependencies
```bash
pip install -r requirements.txt
```
## Post-Install Setup
### Models
- The application will automatically download and use all required models
### Fonts
- Put font packs as subfolders in `fonts/` with `.otf`/`.ttf` files
- Prefer filenames that include `italic`/`bold` or both so variants are detected
- Example structure:
```text
fonts/
├─ CC Wild Words/
│ ├─ CCWildWords-Regular.otf
│ ├─ CCWildWords-Italic.otf
│ ├─ CCWildWords-Bold.otf
│ └─ CCWildWords-BoldItalic.otf
└─ Komika/
├─ KOMIKA-HAND.ttf
└─ KOMIKA-HANDBOLD.ttf
```
### LLM setup
- Providers: Google, OpenAI, Anthropic, xAI, DeepSeek, Z.ai, Moonshot AI, Xiaomi MiMo, OpenRouter, OpenAI-Compatible
- Web UI: configure provider/model/key in the Config tab (stored locally)
- CLI: pass keys/URLs as flags or via env vars
- Env vars: `GOOGLE_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `XAI_API_KEY`, `DEEPSEEK_API_KEY`, `ZAI_API_KEY`, `MOONSHOT_API_KEY`, `MIMO_API_KEY`, `OPENROUTER_API_KEY`, `OPENAI_COMPATIBLE_API_KEY`
- OpenAI-compatible default URL: `http://localhost:8080/v1`
> [!NOTE]
> YanoljaNEXT-Rosetta models (e.g., `yanolja/YanoljaNEXT-Rosetta-4B-2511-GGUF`) are automatically detected when used via the OpenAI-Compatible provider and receive optimized prompting. These are text-only models and require two-step + local OCR model. The Special Instructions field is mapped to Rosetta's translation glossary (one entry per line, e.g., `Yanolja NEXT -> 야놀자넥스트`).
### OSB text setup (optional)
If you want to use the OSB text pipeline, you need a Hugging Face token with access to the following repositories:
- `deepghs/AnimeText_yolo`
#### Steps to create a token:
1. Sign in or create a Hugging Face account
2. Visit and accept the terms on:
- [AnimeText_yolo](https://huggingface.co/deepghs/AnimeText_yolo)
- [FLUX.1 Kontext (dev)](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) (optional, if using Kontext with Nunchaku)
- [SAM 3](https://huggingface.co/facebook/sam3) (optional, if using SAM 3)
3. Create a new access token in your Hugging Face settings with read access to gated repos ("Read access to contents of public gated repos")
4. Add the token to the app:
- Web UI: set `hf_token` in Config
- Env var (alternative): set `HUGGINGFACE_TOKEN`
5. Save config to preserve the token across sessions
## Run
### Web UI (Gradio)
- **Portable package:**
- Windows: Double-click `start-webui.bat` inside the `MangaTranslator` folder
- Linux/macOS: Run `./start-webui.sh` inside the `MangaTranslator` folder
- **Manual install:**
- Windows: Run `python app.py --open-browser`
Options: `--models` (default `./models`), `--fonts` (default `./fonts`), `--port` (default `7676`), `--cpu`.
First launch can take ~1–2 minutes.
Once launched, configure your LLM provider in the Config tab, then upload images and click Translate.
### CLI
Examples:
```bash
# Single image, Japanese → English, Google provider
python main.py --input \
--font-dir "fonts/Komika" --provider Google --google-api-key
# Batch folder, custom source/target languages, OpenAI-Compatible provider (llama.cpp)
python main.py --input --batch \
--font-dir "fonts/Komika" \
--input-language --output-language \
--provider OpenAI-Compatible --openai-compatible-url http://localhost:8080/v1 \
--output ./output
# Single Image, Japanese → English (Google), OSB text pipeline, custom OSB text font
python main.py --input \
--font-dir "fonts/Komika" --provider Google --google-api-key \
--osb-enable --osb-font-dir "fonts/Clementine"
# Cleaning-only mode (no translation/text rendering)
python main.py --input --cleaning-only
# Upscaling-only mode (no detection/translation, only upscale)
python main.py --input --upscaling-only --image-upscale-mode final --image-upscale-factor 2.0
# Test mode (no translation; render placeholder text)
python main.py --input --test-mode
# Full options
python main.py --help
```
## Documentation
- [Hardware Requirements](docs/HARDWARE_REQUIREMENTS.md)
- [Recommended Fonts](docs/FONTS.md)
- [Troubleshooting](docs/TROUBLESHOOTING.md)
## Updating
### Portable Package
- Windows: Run `update.bat` from the portable package root
- Linux/macOS: Run `./update.sh` from the portable package root
### Manual Install
From the repo root:
```bash
git pull
pip install -r requirements.txt # Or activate venv first if present
```
## License & credits
- License: Apache-2.0 (see [LICENSE](LICENSE))
- Author: [grinnch](https://github.com/meangrinch)
ML Models & Libraries
- YOLOv8m Speech Bubble Detector: [kitsumed](https://huggingface.co/kitsumed/yolov8m_seg-speech-bubble)
- Manga109 Speech Bubble Detector: [huyvux3005](https://huggingface.co/huyvux3005/manga109-segmentation-bubble)
- Comic Speech Bubble Detector YOLOv8m: [ogkalu](https://huggingface.co/ogkalu/comic-speech-bubble-detector-yolov8m)
- Manga109 YOLO: [deepghs](https://huggingface.co/deepghs/manga109_yolo)
- AnimeText YOLO: [deepghs](https://huggingface.co/deepghs/AnimeText_yolo)
- SAM 2.1: Segment Anything in Images and Videos: [Meta AI](https://huggingface.co/facebook/sam2.1-hiera-large)
- SAM 3: [Meta AI](https://huggingface.co/facebook/sam3)
- FLUX.1 Kontext: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev)
- FLUX.2 Klein 4B: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B)
- FLUX.2 Klein 9B: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B)
- Nunchaku: [Nunchaku AI](https://github.com/nunchaku-ai/nunchaku)
- SDNQ Quants: [Disty0](https://huggingface.co/Disty0)
- 2x-AnimeSharpV4: [Kim2091](https://huggingface.co/Kim2091/2x-AnimeSharpV4)
- Manga OCR: [kha-white](https://github.com/kha-white/manga-ocr)
- PaddleOCR-VL-1.5: [PaddlePaddle](https://github.com/PaddlePaddle/PaddleOCR-VL-1.5)