An open API service indexing awesome lists of open source software.

https://github.com/ogkalu2/comic-translate

Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.
https://github.com/ogkalu2/comic-translate

anime comics computer-vision deep-learning gui inpainting machine-translation manga manhua manhwa neural-network ocr pyside6 python pytorch segmentation text-detection text-segmentation translation webtoons

Last synced: 22 days ago
JSON representation

Desktop app for automatically translating comics - BDs, Manga, Manhwa, Fumetti and more in a variety of formats (Image, Pdf, Epub, cbr, cbz, etc) and in multiple languages.

Awesome Lists containing this project

README

          

# Comic Translate
English | [한국어](docs/README_ko.md) | [Français](docs/README_fr.md) | [简体中文](docs/README_zh-CN.md) | [日本語](docs/README_ja.md) | [Português Brasileiro](docs/README_pt-BR.md)

## Intro
Many Automatic Manga Translators exist. Very few properly support comics of other kinds in other languages.
This project was created to utilize the ability of State of the Art (SOTA) Large Language Models (LLMs) like GPT-4 and translate comics from all over the world. Currently, it supports translating to and from English, Korean, Japanese, French, Simplified Chinese, Traditional Chinese, Russian, German, Dutch, Spanish and Italian. It can translate to (but not from) Turkish, Polish, Portuguese and Brazillian Portuguese.

- [The State of Machine Translation](#the-state-of-machine-translation)
- [Preview](#comic-samples)
- [Getting Started](#installation)
- [Installation](#installation)
- [Python](#python)
- [Usage](#usage)
- [Tips](#tips)
- [API keys](#api-keys)
- [Getting API Keys](#getting-api-keys)
- [Open AI](#open-ai-gpt)
- [Google Cloud Vision](#google-cloud-vision)

- [How it works](#how-it-works)
- [Text Detection](#text-detection)
- [OCR](#OCR)
- [Inpainting](#inpainting)
- [Translation](#translation)
- [Text Rendering](#text-rendering)

- [Acknowledgements](#acknowledgements)

## The State of Machine Translation
For a couple dozen languages, the best Machine Translator is not Google Translate, Papago or even DeepL, but a SOTA LLM like GPT-4o, and by far.
This is very apparent for distant language pairs (Korean<->English, Japanese<->English etc) where other translators still often devolve into gibberish.
Excerpt from "The Walking Practice"(보행 연습) by Dolki Min(돌기민)
![Model](https://i.imgur.com/72jvLBa.png)

## Comic Samples
GPT-4 as Translator.
Note: Some of these also have Official English Translations

[The Wretched of the High Seas](https://www.drakoo.fr/bd/drakoo/les_damnes_du_grand_large/les_damnes_du_grand_large_-_histoire_complete/9782382330128)

[Journey to the West](https://ac.qq.com/Comic/comicInfo/id/541812)

[The Wormworld Saga](https://wormworldsaga.com/index.php)

[Frieren: Beyond Journey's End](https://renta.papy.co.jp/renta/sc/frm/item/220775/title/742932/)

[Days of Sand](https://9ekunst.nl/2021/05/20/nieuw-album-van-aimee-de-jongh-is-benauwend-als-een-zandstorm/)

[Player (OH Hyeon-Jun)](https://comic.naver.com/webtoon/list?titleId=745876&page=1&sort=ASC&tab=fri)

[Carbon & Silicon](https://www.amazon.com/Carbone-Silicium-French-Mathieu-Bablet-ebook/dp/B0C1LTGZ85/)

## Installation
### Python
Install Python 3.12. Tick "Add python.exe to PATH" during the setup.
```bash
https://www.python.org/downloads/
```
Install git
```bash
https://git-scm.com/
```
Install uv
```
https://docs.astral.sh/uv/getting-started/installation/
```

Then, in the command line
```bash
git clone https://github.com/ogkalu2/comic-translate
cd comic-translate
uv init --python 3.12
```
and install the requirements
```bash
uv add -r requirements.txt --compile-bytecode
```

To Update, run this in the comic-translate folder
```bash
git pull
uv init --python 3.12 (Note: only run this line if you did not use uv for the first time installation)
uv add -r requirements.txt --compile-bytecode
```

If you have an NVIDIA GPU, then it is recommended to run
```bash
uv pip install onnxruntime-gpu
```

## Usage
In the comic-translate directory, run
```bash
uv run comic.py
```
This will launch the GUI

### Tips
* If you have a CBR file, you'll need to install Winrar or 7-Zip then add the folder it's installed to (e.g "C:\Program Files\WinRAR" for Windows) to Path. If it's installed but not to Path, you may get the error,
```bash
raise RarCannotExec("Cannot find working tool")
```
In that case, Instructions for [Windows](https://www.windowsdigitals.com/add-folder-to-path-environment-variable-in-windows-11-10/), [Linux](https://linuxize.com/post/how-to-add-directory-to-path-in-linux/), [Mac](https://techpp.com/2021/09/08/set-path-variable-in-macos-guide/)

* Make sure the selected Font supports characters of the target language
* v2.0 introduces a Manual Mode. When you run into issues with Automatic Mode (No text detected, Incorrect OCR, Insufficient Cleaning etc), you are now able to make corrections. Simply Undo the Image and toggle Manual Mode.
* In Automatic Mode, Once an Image has been processed, it is loaded in the Viewer or stored to be loaded on switch so you can keep reading in the app as the other Images are being translated.
* Ctrl + Mouse Wheel to Zoom otherwise Vertical Scrolling
* The Usual Trackpad Gestures work for viewing the Image
* Right, Left Keys to Navigate Between Images

## API Keys
To following selections will require access to closed resources and subsequently, API Keys:
* GPT-4o or 4o-mini for Translation (Paid, about $0.01 USD/Page for 4o)
* DeepL Translator (Free for 500,000 characters/month)
* GPT-4o for OCR (Default Option for French, Russian, German, Dutch, Spanish, Italian) (Paid, about $0.02 USD/Page)
* Microsoft Azure Vision for OCR (Free for 5000 images/month)
* Google Cloud Vision for OCR (Free for 1000 images/month)
You can set your API Keys by going to Settings > Credentials

### Getting API Keys
#### Open AI (GPT)
* Go to OpenAI's Platform website at [platform.openai.com](https://platform.openai.com/) and sign in with (or create) an OpenAI account.
* Hover your Mouse over the right taskbar of the page and select "API Keys."
* Click "Create New Secret Key" to generate a new API key. Copy and store it.

#### Google Cloud Vision
* Sign in/Create a [Google Cloud](https://cloud.google.com/) account. Go to [Cloud Resource Manager](https://console.cloud.google.com/cloud-resource-manager) and click "Create Project". Set your project name.
* [Select your project here](https://console.cloud.google.com/welcome) then select "Billing" then "Create Account". In the pop-up, "Enable billing account", and accept the offer of a free trial account. Your "Account type" should be individual. Fill in a valid credit card.
* Enable Google Cloud Vison for your project [here](https://console.cloud.google.com/apis/library/vision.googleapis.com)
* In the [Google Cloud Credentials](https://console.cloud.google.com/apis/credentials) page, click "Create Credentials" then API Key. Copy and store it.

## How it works
### Speech Bubble Detection and Text Segmentation
[bubble-and-text-detector](https://huggingface.co/ogkalu/comic-text-and-bubble-detector). RT-DETR-v2 model trained on 11k images of comics (Manga, Webtoons, Western).
Algorithmic segmentation based on the boxes provided from the detection model.

### OCR
By Default:
* [manga-ocr](https://github.com/kha-white/manga-ocr) for Japanese
* [Pororo](https://github.com/yunwoong7/korean_ocr_using_pororo) for Korean
* [PPOCRv5](https://www.paddleocr.ai/main/en/version3.x/algorithm/PP-OCRv5/PP-OCRv5.html) for Everything Else

Optional:

These can be used for any of the supported languages. An API Key is required.

* [Google Cloud Vision](https://cloud.google.com/vision/docs/ocr)
* [Microsoft Azure Vision](https://learn.microsoft.com/en-us/azure/ai-services/computer-vision/overview-ocr)

### Inpainting
To remove the segmented text
* A [Manga/Anime finetuned](https://huggingface.co/dreMaz/AnimeMangaInpainting) [lama](https://github.com/advimman/lama) checkpoint. Implementation courtsey of [lama-cleaner](https://github.com/Sanster/lama-cleaner)
* [AOT-GAN](https://arxiv.org/abs/2104.01431) based model by [zyddnys](https://github.com/zyddnys)

### Translation
Currently, this supports using GPT-4.1, DeepL, Claude-3,
Gemini-2.5, Yandex, Google Translate and Microsoft Azure Translator.

All LLMs are fed the entire page text to aid translations.
There is also the Option to provide the Image itself for further context.

### Text Rendering
Wrapped text in bounding boxes obtained from bubbles and text.

## Acknowledgements

* [https://github.com/Sanster/lama-cleaner](https://github.com/Sanster/lama-cleaner)
* [https://huggingface.co/dreMaz](https://huggingface.co/dreMaz)
* [https://github.com/yunwoong7/korean_ocr_using_pororo](https://github.com/yunwoong7/korean_ocr_using_pororo)
* [https://github.com/kha-white/manga-ocr](https://github.com/kha-white/manga-ocr)
* [https://github.com/JaidedAI/EasyOCR](https://github.com/JaidedAI/EasyOCR)
* [https://github.com/PaddlePaddle/PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)
* [https://github.com/RapidAI/RapidOCR](https://github.com/RapidAI/RapidOCR)
* [https://github.com/phenom-films/dayu_widgets](https://github.com/phenom-films/dayu_widgets)