An open API service indexing awesome lists of open source software.

https://github.com/meangrinch/mangatranslator

Manga translation app powered by AI
https://github.com/meangrinch/mangatranslator

ai auto-translation comics inpainting manga manga-translator manhua manhwa ocr segmentation text-detection translation

Last synced: 6 days ago
JSON representation

Manga translation app powered by AI

Awesome Lists containing this project

README

          

[English](README.md) | [简体中文](docs/translations/README_zh.md) | [한국어](docs/translations/README_ko.md) | [日本語](docs/translations/README_ja.md)

## MangaTranslator

Gradio-based web application for automating the translation of manga/comic page images using AI. Targets speech bubbles and text outside of speech bubbles. Supports 59 languages and custom font pack usage.




Original
Translated (w/ a single click)






## Table of Contents

- [Features](#features)
- [Requirements](#requirements)
- [Install](#install)
- [Post-Install Setup](#post-install-setup)
- [Run](#run)
- [Documentation](#documentation)
- [Updating](#updating)
- [License & Credits](#license--credits)

## Features

- **Detection**: Speech bubble detection & segmentation (YOLO + SAM 2.1/3)
- **Cleaning**: Inpaint speech bubbles and OSB text (Flux.2 Klein, Flux.1 Kontext, or OpenCV)
- **Translation**: LLM-powered OCR & translation (59 languages)
- **Rendering**: Text rendering with alignment and custom font packs
- **Upscaling**: 2x-AnimeSharpV4 for enhanced output quality
- **Processing**: Single/batch processing with directory preservation and ZIP support
- **Interfaces**: Web UI (Gradio) and CLI
- **Automation**: One-click translation; no intervention required

## Requirements

- Python 3.10+
- PyTorch (CPU, CUDA, ROCm, XPU, MPS)
- Font pack with `.ttf`/`.otf` files; included with portable package
- LLM for Japanese source text; VLM for other languages (API or local)

## Install

### Portable Package (Recommended)

Download the standalone zip from the releases page: [Portable Build](https://github.com/meangrinch/MangaTranslator/releases/tag/portable)

**Requirements:**

- **Windows:** Bundled Python/Git included; no additional requirements
- **Linux/macOS:** Python 3.10+ and Git must be installed on your system

**Setup:**

1. Extract the zip file
2. Run the setup script for your platform:
- **Windows:** Double-click `setup.bat`
- **Linux/macOS:** Run `./setup.sh` in terminal
3. PyTorch version is automatically detected and installed based on your system
4. Open the launcher script created in `./MangaTranslator/`:
- **Windows:** `start-webui.bat`
- **Linux/macOS:** `start-webui.sh`

Included font packs:

- _Komika_ (normal text)
- _Cookies_ (OSB text)
- _Comicka_ (either)
- _Roboto_ (supports accents)
- _Noto Sans SC_ (supports Simplified Chinese)

> [!TIP]
> In the event that you need to transfer to a fresh portable package:
>
> - You can safely move the `fonts`, `models`, and `output` directories to the new portable package
> - You might be able to move the `runtime` directory over, assuming the same setup configuration is wanted

### Manual install

1. Clone and enter the repo

```bash
git clone https://github.com/meangrinch/MangaTranslator.git
cd MangaTranslator
```

2. Create and activate a virtual environment (recommended)

```bash
python -m venv venv
# Windows PowerShell/CMD
.\venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
```

3. Install PyTorch (see: [PyTorch Install](https://pytorch.org/get-started/locally/))

```bash
# Example (CUDA 13.0)
pip install torch==2.10.0+cu130 torchvision==0.25.0+cu130 --extra-index-url https://download.pytorch.org/whl/cu130
# Example (ROCm 7.1)
pip install torch==2.10.0+rocm7.1 torchvision==0.25.0+rocm7.1 --extra-index-url https://download.pytorch.org/whl/rocm7.1
# Example (XPU)
pip install torch==2.10.0+xpu torchvision==0.25.0+xpu --extra-index-url https://download.pytorch.org/whl/xpu
# Example (MPS/CPU)
pip install torch==2.10.0 torchvision==0.25.0
```

4. Install Nunchaku (optional, for Flux.1 Kontext Nunchaku backend)

- Nunchaku wheels are not on PyPI. Install directly from the v1.2.1 GitHub release URL, matching your OS and Python version. CUDA only, and requires a 2000-series card or newer.

```bash
# Example (Windows, Python 3.13, PyTorch 2.10.0, CUDA 13.0)
pip install https://github.com/nunchaku-ai/nunchaku/releases/download/v1.2.1/nunchaku-1.2.1+cu13.0torch2.10-cp313-cp313-win_amd64.whl
```

> [!NOTE]
> Nunchaku is not necessary for the use of Flux models via the SDNQ backend.

5. Install dependencies

```bash
pip install -r requirements.txt
```

## Post-Install Setup

### Models

- The application will automatically download and use all required models

### Fonts

- Put font packs as subfolders in `fonts/` with `.otf`/`.ttf` files
- Prefer filenames that include `italic`/`bold` or both so variants are detected
- Example structure:

```text
fonts/
├─ CC Wild Words/
│ ├─ CCWildWords-Regular.otf
│ ├─ CCWildWords-Italic.otf
│ ├─ CCWildWords-Bold.otf
│ └─ CCWildWords-BoldItalic.otf
└─ Komika/
├─ KOMIKA-HAND.ttf
└─ KOMIKA-HANDBOLD.ttf
```

### LLM setup

- Providers: Google, OpenAI, Anthropic, xAI, DeepSeek, Z.ai, Moonshot AI, Xiaomi MiMo, OpenRouter, OpenAI-Compatible
- Web UI: configure provider/model/key in the Config tab (stored locally)
- CLI: pass keys/URLs as flags or via env vars
- Env vars: `GOOGLE_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `XAI_API_KEY`, `DEEPSEEK_API_KEY`, `ZAI_API_KEY`, `MOONSHOT_API_KEY`, `MIMO_API_KEY`, `OPENROUTER_API_KEY`, `OPENAI_COMPATIBLE_API_KEY`
- OpenAI-compatible default URL: `http://localhost:8080/v1`

> [!NOTE]
> YanoljaNEXT-Rosetta models (e.g., `yanolja/YanoljaNEXT-Rosetta-4B-2511-GGUF`) are automatically detected when used via the OpenAI-Compatible provider and receive optimized prompting. These are text-only models and require two-step + local OCR model. The Special Instructions field is mapped to Rosetta's translation glossary (one entry per line, e.g., `Yanolja NEXT -> 야놀자넥스트`).

### OSB text setup (optional)

If you want to use the OSB text pipeline, you need a Hugging Face token with access to the following repositories:

- `deepghs/AnimeText_yolo`

#### Steps to create a token:

1. Sign in or create a Hugging Face account
2. Visit and accept the terms on:
- [AnimeText_yolo](https://huggingface.co/deepghs/AnimeText_yolo)
- [FLUX.1 Kontext (dev)](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) (optional, if using Kontext with Nunchaku)
- [SAM 3](https://huggingface.co/facebook/sam3) (optional, if using SAM 3)
3. Create a new access token in your Hugging Face settings with read access to gated repos ("Read access to contents of public gated repos")
4. Add the token to the app:
- Web UI: set `hf_token` in Config
- Env var (alternative): set `HUGGINGFACE_TOKEN`
5. Save config to preserve the token across sessions

## Run

### Web UI (Gradio)

- **Portable package:**
- Windows: Double-click `start-webui.bat` inside the `MangaTranslator` folder
- Linux/macOS: Run `./start-webui.sh` inside the `MangaTranslator` folder
- **Manual install:**
- Windows: Run `python app.py --open-browser`

Options: `--models` (default `./models`), `--fonts` (default `./fonts`), `--port` (default `7676`), `--cpu`.
First launch can take ~1–2 minutes.

Once launched, configure your LLM provider in the Config tab, then upload images and click Translate.

### CLI

Examples:

```bash
# Single image, Japanese → English, Google provider
python main.py --input \
--font-dir "fonts/Komika" --provider Google --google-api-key

# Batch folder, custom source/target languages, OpenAI-Compatible provider (llama.cpp)
python main.py --input --batch \
--font-dir "fonts/Komika" \
--input-language --output-language \
--provider OpenAI-Compatible --openai-compatible-url http://localhost:8080/v1 \
--output ./output

# Single Image, Japanese → English (Google), OSB text pipeline, custom OSB text font
python main.py --input \
--font-dir "fonts/Komika" --provider Google --google-api-key \
--osb-enable --osb-font-dir "fonts/Clementine"

# Cleaning-only mode (no translation/text rendering)
python main.py --input --cleaning-only

# Upscaling-only mode (no detection/translation, only upscale)
python main.py --input --upscaling-only --image-upscale-mode final --image-upscale-factor 2.0

# Test mode (no translation; render placeholder text)
python main.py --input --test-mode

# Full options
python main.py --help
```

## Documentation

- [Hardware Requirements](docs/HARDWARE_REQUIREMENTS.md)
- [Recommended Fonts](docs/FONTS.md)
- [Troubleshooting](docs/TROUBLESHOOTING.md)

## Updating

### Portable Package

- Windows: Run `update.bat` from the portable package root
- Linux/macOS: Run `./update.sh` from the portable package root

### Manual Install

From the repo root:

```bash
git pull
pip install -r requirements.txt # Or activate venv first if present
```

## License & credits

- License: Apache-2.0 (see [LICENSE](LICENSE))
- Author: [grinnch](https://github.com/meangrinch)

ML Models & Libraries

- YOLOv8m Speech Bubble Detector: [kitsumed](https://huggingface.co/kitsumed/yolov8m_seg-speech-bubble)
- Manga109 Speech Bubble Detector: [huyvux3005](https://huggingface.co/huyvux3005/manga109-segmentation-bubble)
- Comic Speech Bubble Detector YOLOv8m: [ogkalu](https://huggingface.co/ogkalu/comic-speech-bubble-detector-yolov8m)
- Manga109 YOLO: [deepghs](https://huggingface.co/deepghs/manga109_yolo)
- AnimeText YOLO: [deepghs](https://huggingface.co/deepghs/AnimeText_yolo)
- SAM 2.1: Segment Anything in Images and Videos: [Meta AI](https://huggingface.co/facebook/sam2.1-hiera-large)
- SAM 3: [Meta AI](https://huggingface.co/facebook/sam3)
- FLUX.1 Kontext: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev)
- FLUX.2 Klein 4B: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B)
- FLUX.2 Klein 9B: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B)
- Nunchaku: [Nunchaku AI](https://github.com/nunchaku-ai/nunchaku)
- SDNQ Quants: [Disty0](https://huggingface.co/Disty0)
- 2x-AnimeSharpV4: [Kim2091](https://huggingface.co/Kim2091/2x-AnimeSharpV4)
- Manga OCR: [kha-white](https://github.com/kha-white/manga-ocr)
- PaddleOCR-VL-1.5: [PaddlePaddle](https://github.com/PaddlePaddle/PaddleOCR-VL-1.5)