https://github.com/piotrminkina/epub-deepl
Round-trip EPUB ↔ HTML translation via DeepL — bundles all content into one HTML to conserve your monthly quota, then restores a structurally-identical EPUB with TOC, OPF, NCX, SVG, and Unicode integrity preserved.
https://github.com/piotrminkina/epub-deepl
automation cli deepl ebooks epub epubcheck html localization lxml python translation
Last synced: 7 days ago
JSON representation
Round-trip EPUB ↔ HTML translation via DeepL — bundles all content into one HTML to conserve your monthly quota, then restores a structurally-identical EPUB with TOC, OPF, NCX, SVG, and Unicode integrity preserved.
- Host: GitHub
- URL: https://github.com/piotrminkina/epub-deepl
- Owner: piotrminkina
- License: mit
- Created: 2026-06-10T00:48:16.000Z (15 days ago)
- Default Branch: master
- Last Pushed: 2026-06-10T01:39:29.000Z (15 days ago)
- Last Synced: 2026-06-10T03:14:29.223Z (14 days ago)
- Topics: automation, cli, deepl, ebooks, epub, epubcheck, html, localization, lxml, python, translation
- Language: Python
- Size: 384 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# EPUB DeepL
[](https://github.com/piotrminkina/epub-deepl/actions/workflows/ci.yml)
[](https://www.python.org/downloads/)
[](LICENSE)
[](https://docs.astral.sh/ruff/)
A Python CLI that translates an EPUB through DeepL with **maximum
structural fidelity to the original**. The translated book reads in any
e-reader exactly like the source minus the translated text — TOC labels
match chapter headings, manifest and spine are byte-for-byte equivalent,
embedded SVG attributes survive, non-ASCII characters round-trip cleanly
through Unicode.
The naive alternative — unzip the EPUB, translate each XHTML separately,
repackage by hand — is expensive on three axes that this tool collapses
into a single upload/download cycle per book:
1. **Structural fragility.** Manual reassembly drops the TOC,
mis-orders the spine, breaks cross-file links, mangles OPF metadata
or NCX navigation. Producing a valid EPUB by hand is error-prone
and slow.
2. **Operator time.** Tens of file-by-file upload/download cycles
per book.
3. **Translation-job count.** Per-document translation services
(e.g. DeepL Pro Starter, with its 5-documents-per-month limit)
charge once per file. An EPUB with 10–50 XHTMLs exhausts the
monthly quota on one book; this tool spends one document per book.
**Status:** working MVP, no versioned release cut yet. Targets EPUB 2.0
with NCX-based navigation. EPUB 3 + `nav.xhtml` is out of scope for now.
## Install
The tool is a standard Python package. Any environment with Python 3.11+ and
the system libraries for `lxml` (typically present, or installable via
`apt install libxml2 libxslt1.1`) is sufficient.
```bash
git clone epub-deepl
cd epub-deepl
# Per ADR-0004 the venv is named after the host's Python minor so it
# coexists with venvs from other interpreters (e.g. a Dev Container's).
PY_MINOR="$(python3 -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')"
python3 -m venv ".venv-${PY_MINOR}"
source ".venv-${PY_MINOR}/bin/activate"
pip install -e .
epub-deepl --help
```
To skip activating the virtualenv each session, use the `bin/` launcher
(see below) or symlink it into a directory on your `PATH`.
> **Contributing or developing the tool?** See
> [CONTRIBUTING.md](CONTRIBUTING.md) for the recommended Dev Container
> workflow, test commands, and code style.
## Usage
The CLI has two subcommands, designed around a manual DeepL upload/download
step.
```bash
# 1. Bundle the EPUB into a single HTML for DeepL
epub-deepl prepare path/to/book.epub
# → produces path/to/book.prepare.html
# 2. Upload book.prepare.html to https://www.deepl.com/translator/files,
# choose target language, download the translated HTML.
# 3. Reassemble the translated EPUB
epub-deepl restore path/to/book.epub path/to/book.translated.html
# → produces path/to/book.translated.epub
```
The target language is auto-detected from the translated HTML's
`` attribute (DeepL sets it correctly). Pass
`--lang ` to override the detection — useful when the
translator left the source language tag in place or when you want a
specific BCP 47 variant (e.g. `--lang pt-BR`).
The original EPUB is read-only during `restore` and acts as the structural
template; only translated body content, OPF metadata (`dc:title`,
`dc:description`, `dc:subject`, `dc:language`), and NCX navigation labels
are mutated.
### `bin/` launcher (no venv activation)
`bin/epub-deepl` is a thin Bash wrapper that self-locates the project
root and execs the matching venv's Python with the CLI module. It
picks `.venv-${PY_MINOR}/` for the current `python3`, falling back to
legacy `.venv/` only if its `pyvenv.cfg` declares the matching minor
(see [ADR-0004](docs/adr/0004-per-python-minor-venv.md)). Use it for
shell aliases, cron jobs, or editor integrations where activating a
virtualenv first is awkward:
```bash
# Run from any directory
/path/to/repo/bin/epub-deepl prepare book.epub
# Or place on PATH
ln -s "$(pwd)/bin/epub-deepl" ~/.local/bin/
epub-deepl prepare book.epub
```
The wrapper fails fast with a concrete creation recipe when no
compatible venv exists.
## Commands
| Command | Description |
|---|---|
| `epub-deepl prepare ` | Validate input and emit `.prepare.html` |
| `epub-deepl restore [--lang ]` | Validate translated HTML against the input EPUB and emit `.translated.epub`. `--lang` is optional (auto-detected from ``). |
| `epub-deepl --help` | Top-level usage |
| ` --help` | Flags for a specific subcommand |
Common flags on both subcommands:
| Flag | Effect |
|---|---|
| `--output FILE` | Override the default output path |
| `--force` | Overwrite existing output (does NOT bypass input-equals-output guard) |
| `--verbose` | Per-file progress to stderr |
Exit codes: `0` success, `1` user error (bad input / validation failure /
output collision), `2` internal error.
## How It Works
`prepare` walks the input EPUB's spine in reading order and emits a single
HTML5 document. Each source XHTML becomes a ``. OPF metadata is exposed as
visible content under ``. NCX entries
are serialised as a flat `` block with `data-*`
attributes preserving `src` and `playOrder` for restore.
`restore` parses the translated HTML, locates every `data-source-href`,
and rebuilds each XHTML by replacing only the `` content of the
original. The OPF and NCX trees are mutated in-place — manifest, spine,
identifiers, and namespace structure pass through unchanged. NCX
`` text is recomputed via **anchor resolution**: for each
``, the algorithm locates the element with
that fragment ID in the restored XHTML and uses its translated heading
text — guaranteeing TOC ↔ chapter-heading consistency without translating
the labels twice.
Detailed architecture and edge cases:
[`docs/plans/tech-spec.md`](docs/plans/tech-spec.md).
## Scope
### In scope (MVP)
- EPUB 2.0.1 with NCX-based navigation
- Round-trip preservation of all human-visible content + OPF / NCX
structural metadata required by e-readers
- DeepL HTML document compatibility (HTML5 self-contained payload)
- Solo-user CLI workflow with manual upload / download to DeepL
- Pre-flight validation of the input EPUB (fail-fast on DRM, broken
manifest, broken spine, non-XHTML spine items, missing NCX)
### Out of scope
- EPUB 3 with `nav.xhtml` navigation (deferred — post-MVP)
- DRM-protected EPUBs (detected and rejected; never supported)
- Automated DeepL API integration (user uploads manually)
- Automated `epubcheck` invocation (manual user step)
- Books exceeding DeepL's per-document character limit
- GUI, web interface, daemon mode, multi-user features
- Translation memory, caching, or glossary support
Full requirements with user stories: [`docs/plans/prd.md`](docs/plans/prd.md).
## Project Status
**MVP working set, no versioned release yet.** Validated against a
diverse EPUB 2.0 + NCX corpus (technical, novel, workbook genres).
Full corpus round-trip preserves the `epubcheck` baseline (zero new
errors introduced by the tool). Real-DeepL spike completed: one full
Polish translation round-tripped cleanly, R-8 (DeepL preserves
`data-*` attributes) empirically validated.
CI matrix tests Python 3.11 / 3.12 / 3.13 on every push and PR; a
dedicated CI job re-runs the synthetic `epubcheck` zero-drift tests
with a JRE installed.
Per-release notes in [`CHANGELOG.md`](CHANGELOG.md);
empirical operational gotchas in
[`docs/lessons-learned.md`](docs/lessons-learned.md);
architecture decisions in [`docs/adr/`](docs/adr/).
Known limitations:
- EPUB 3 + `nav.xhtml` support — deferred to post-MVP
- Apple Books / Calibre-specific metadata quirks — observed but not
specially handled
- Books exceeding DeepL's per-document character limit (~1 MB+) — no
automatic chunking; user falls back to per-chapter workflow
## License
MIT — see [LICENSE](LICENSE).
---
*A 1 MB book translated as one DeepL document instead of 30 chapters: the
math works out to 30× the books you can translate per month, with a TOC
that actually matches the chapter headings.*