https://github.com/meangrinch/mangatranslator

Manga translation app powered by AI
https://github.com/meangrinch/mangatranslator
ai auto-translation comics inpainting manga manga-translator manhua manhwa ocr segmentation text-detection translation
Last synced: about 1 month ago
JSON representation
Manga translation app powered by AI
Host: GitHub
URL: https://github.com/meangrinch/mangatranslator
Owner: meangrinch
License: apache-2.0
Created: 2025-04-08T02:00:30.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2026-05-14T01:35:32.000Z (about 2 months ago)
Last Synced: 2026-05-14T03:28:58.499Z (about 2 months ago)
Topics: ai, auto-translation, comics, inpainting, manga, manga-translator, manhua, manhwa, ocr, segmentation, text-detection, translation
Language: Python
Homepage:
Size: 2.66 MB
Stars: 208
Watchers: 4
Forks: 36
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          [English](README.md) | [简体中文](docs/translations/README_zh.md) | [한국어](docs/translations/README_ko.md) | [日本語](docs/translations/README_ja.md)

## MangaTranslator

Gradio-based web application for automating the translation of manga/comic page images using AI. Targets speech bubbles and text outside of speech bubbles. Supports 59 languages and custom font pack usage.



  

    

      Original

      Translated (w/ a single click)

    

    

      

      

    

  



## Table of Contents

- [Features](#features)

- [Requirements](#requirements)

- [Install](#install)

- [Post-Install Setup](#post-install-setup)

- [Run](#run)

- [Documentation](#documentation)

- [Updating](#updating)

- [License & Credits](#license--credits)

## Features

- **Detection**: Speech bubble detection & segmentation (YOLO + SAM 2.1/3)

- **Cleaning**: Inpaint speech bubbles and OSB text (Flux.2 Klein, Flux.1 Kontext, or OpenCV)

- **Translation**: LLM-powered OCR & translation (59 languages)

- **Rendering**: Text rendering with alignment and custom font packs

- **Upscaling**: 2x-AnimeSharpV4 for enhanced output quality

- **Processing**: Single/batch processing with directory preservation and ZIP support

- **Interfaces**: Web UI (Gradio) and CLI

- **Automation**: One-click translation; no intervention required

## Requirements

- Python 3.10+

- PyTorch (CPU, CUDA, ROCm, XPU, MPS)

- Font pack with `.ttf`/`.otf` files; included with portable package

- LLM for Japanese source text; VLM for other languages (API or local)

## Install

### Portable Package (Recommended)

Download the standalone zip from the releases page: [Portable Build](https://github.com/meangrinch/MangaTranslator/releases/tag/portable)

**Requirements:**

- **Windows:** Bundled Python/Git included; no additional requirements

- **Linux/macOS:** Python 3.10+ and Git must be installed on your system

**Setup:**

1. Extract the zip file

2. Run the setup script for your platform:

   - **Windows:** Double-click `setup.bat`

   - **Linux/macOS:** Run `./setup.sh` in terminal

3. PyTorch version is automatically detected and installed based on your system

4. Open the launcher script created in `./MangaTranslator/`:

   - **Windows:** `start-webui.bat`

   - **Linux/macOS:** `start-webui.sh`

Included font packs:

- _Komika_ (normal text)

- _Cookies_ (OSB text)

- _Comicka_ (either)

- _Roboto_ (supports accents)

- _Noto Sans SC_ (supports Simplified Chinese)

> [!TIP]

> In the event that you need to transfer to a fresh portable package:

>

> - You can safely move the `fonts`, `models`, and `output` directories to the new portable package

> - You might be able to move the `runtime` directory over, assuming the same setup configuration is wanted

### Manual install

1. Clone and enter the repo

```bash

git clone https://github.com/meangrinch/MangaTranslator.git

cd MangaTranslator

```

2. Create and activate a virtual environment (recommended)

```bash

python -m venv venv

# Windows PowerShell/CMD

.\venv\Scripts\activate

# Linux/macOS

source venv/bin/activate

```

3. Install PyTorch (see: [PyTorch Install](https://pytorch.org/get-started/locally/))

```bash

# Example (CUDA 13.0)

pip install torch==2.10.0+cu130 torchvision==0.25.0+cu130 --extra-index-url https://download.pytorch.org/whl/cu130

# Example (ROCm 7.1)

pip install torch==2.10.0+rocm7.1 torchvision==0.25.0+rocm7.1 --extra-index-url https://download.pytorch.org/whl/rocm7.1

# Example (XPU)

pip install torch==2.10.0+xpu torchvision==0.25.0+xpu --extra-index-url https://download.pytorch.org/whl/xpu

# Example (MPS/CPU)

pip install torch==2.10.0 torchvision==0.25.0

```

4. Install Nunchaku (optional, for Flux.1 Kontext Nunchaku backend)

- Nunchaku wheels are not on PyPI. Install directly from the v1.2.1 GitHub release URL, matching your OS and Python version. CUDA only, and requires a 2000-series card or newer.

```bash

# Example (Windows, Python 3.13, PyTorch 2.10.0, CUDA 13.0)

pip install https://github.com/nunchaku-ai/nunchaku/releases/download/v1.2.1/nunchaku-1.2.1+cu13.0torch2.10-cp313-cp313-win_amd64.whl

```

> [!NOTE]

> Nunchaku is not necessary for the use of Flux models via the SDNQ backend.

5. Install dependencies

```bash

pip install -r requirements.txt

```

## Post-Install Setup

### Models

- The application will automatically download and use all required models

### Fonts

- Put font packs as subfolders in `fonts/` with `.otf`/`.ttf` files

- Prefer filenames that include `italic`/`bold` or both so variants are detected

- Example structure:

```text

fonts/

├─ CC Wild Words/

│  ├─ CCWildWords-Regular.otf

│  ├─ CCWildWords-Italic.otf

│  ├─ CCWildWords-Bold.otf

│  └─ CCWildWords-BoldItalic.otf

└─ Komika/

   ├─ KOMIKA-HAND.ttf

   └─ KOMIKA-HANDBOLD.ttf

```

### LLM setup

- Providers: Google, OpenAI, Anthropic, xAI, DeepSeek, Z.ai, Moonshot AI, Xiaomi MiMo, OpenRouter, OpenAI-Compatible

- Web UI: configure provider/model/key in the Config tab (stored locally)

- CLI: pass keys/URLs as flags or via env vars

- Env vars: `GOOGLE_API_KEY`, `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `XAI_API_KEY`, `DEEPSEEK_API_KEY`, `ZAI_API_KEY`, `MOONSHOT_API_KEY`, `MIMO_API_KEY`, `OPENROUTER_API_KEY`, `OPENAI_COMPATIBLE_API_KEY`

- OpenAI-compatible default URL: `http://localhost:8080/v1`

> [!NOTE]

> YanoljaNEXT-Rosetta models (e.g., `yanolja/YanoljaNEXT-Rosetta-4B-2511-GGUF`) are automatically detected when used via the OpenAI-Compatible provider and receive optimized prompting. These are text-only models and require two-step + local OCR model. The Special Instructions field is mapped to Rosetta's translation glossary (one entry per line, e.g., `Yanolja NEXT -> 야놀자넥스트`).

### OSB text setup (optional)

If you want to use the OSB text pipeline, you need a Hugging Face token with access to the following repositories:

- `deepghs/AnimeText_yolo`

#### Steps to create a token:

1. Sign in or create a Hugging Face account

2. Visit and accept the terms on:

   - [AnimeText_yolo](https://huggingface.co/deepghs/AnimeText_yolo)

   - [FLUX.1 Kontext (dev)](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev) (optional, if using Kontext with Nunchaku)

   - [SAM 3](https://huggingface.co/facebook/sam3) (optional, if using SAM 3)

3. Create a new access token in your Hugging Face settings with read access to gated repos ("Read access to contents of public gated repos")

4. Add the token to the app:

   - Web UI: set `hf_token` in Config

   - Env var (alternative): set `HUGGINGFACE_TOKEN`

5. Save config to preserve the token across sessions

## Run

### Web UI (Gradio)

- **Portable package:**

  - Windows: Double-click `start-webui.bat` inside the `MangaTranslator` folder

  - Linux/macOS: Run `./start-webui.sh` inside the `MangaTranslator` folder

- **Manual install:**

  - Windows: Run `python app.py --open-browser`

Options: `--models` (default `./models`), `--fonts` (default `./fonts`), `--port` (default `7676`), `--cpu`.

First launch can take ~1–2 minutes.

Once launched, configure your LLM provider in the Config tab, then upload images and click Translate.

### CLI

Examples:

```bash

# Single image, Japanese → English, Google provider

python main.py --input  \

  --font-dir "fonts/Komika" --provider Google --google-api-key 

# Batch folder, custom source/target languages, OpenAI-Compatible provider (llama.cpp)

python main.py --input  --batch \

  --font-dir "fonts/Komika" \

  --input-language  --output-language  \

  --provider OpenAI-Compatible --openai-compatible-url http://localhost:8080/v1 \

  --output ./output

# Single Image, Japanese → English (Google), OSB text pipeline, custom OSB text font

python main.py --input  \

  --font-dir "fonts/Komika" --provider Google --google-api-key  \

  --osb-enable --osb-font-dir "fonts/Clementine"

# Cleaning-only mode (no translation/text rendering)

python main.py --input  --cleaning-only

# Upscaling-only mode (no detection/translation, only upscale)

python main.py --input  --upscaling-only --image-upscale-mode final --image-upscale-factor 2.0

# Test mode (no translation; render placeholder text)

python main.py --input  --test-mode

# Full options

python main.py --help

```

## Documentation

- [Hardware Requirements](docs/HARDWARE_REQUIREMENTS.md)

- [Recommended Fonts](docs/FONTS.md)

- [Troubleshooting](docs/TROUBLESHOOTING.md)

## Updating

### Portable Package

- Windows: Run `update.bat` from the portable package root

- Linux/macOS: Run `./update.sh` from the portable package root

### Manual Install

From the repo root:

```bash

git pull

pip install -r requirements.txt  # Or activate venv first if present

```

## License & credits

- License: Apache-2.0 (see [LICENSE](LICENSE))

- Author: [grinnch](https://github.com/meangrinch)

ML Models & Libraries

- YOLOv8m Speech Bubble Detector: [kitsumed](https://huggingface.co/kitsumed/yolov8m_seg-speech-bubble)

- Manga109 Speech Bubble Detector: [huyvux3005](https://huggingface.co/huyvux3005/manga109-segmentation-bubble)

- Comic Speech Bubble Detector YOLOv8m: [ogkalu](https://huggingface.co/ogkalu/comic-speech-bubble-detector-yolov8m)

- Manga109 YOLO: [deepghs](https://huggingface.co/deepghs/manga109_yolo)

- AnimeText YOLO: [deepghs](https://huggingface.co/deepghs/AnimeText_yolo)

- SAM 2.1: Segment Anything in Images and Videos: [Meta AI](https://huggingface.co/facebook/sam2.1-hiera-large)

- SAM 3: [Meta AI](https://huggingface.co/facebook/sam3)

- FLUX.1 Kontext: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev)

- FLUX.2 Klein 4B: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-klein-4B)

- FLUX.2 Klein 9B: [Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B)

- Nunchaku: [Nunchaku AI](https://github.com/nunchaku-ai/nunchaku)

- SDNQ Quants: [Disty0](https://huggingface.co/Disty0)

- 2x-AnimeSharpV4: [Kim2091](https://huggingface.co/Kim2091/2x-AnimeSharpV4)

- Manga OCR: [kha-white](https://github.com/kha-white/manga-ocr)

- PaddleOCR-VL-1.5: [PaddlePaddle](https://github.com/PaddlePaddle/PaddleOCR-VL-1.5)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/meangrinch/mangatranslator

Awesome Lists containing this project

README