{"id":50932760,"url":"https://github.com/piotrminkina/epub-deepl","last_synced_at":"2026-06-17T06:02:31.094Z","repository":{"id":363703465,"uuid":"1264524093","full_name":"piotrminkina/epub-deepl","owner":"piotrminkina","description":"Round-trip EPUB ↔ HTML translation via DeepL — bundles all content into one HTML to conserve your monthly quota, then restores a structurally-identical EPUB with TOC, OPF, NCX, SVG, and Unicode integrity preserved.","archived":false,"fork":false,"pushed_at":"2026-06-10T01:39:29.000Z","size":393,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-06-10T03:14:29.223Z","etag":null,"topics":["automation","cli","deepl","ebooks","epub","epubcheck","html","localization","lxml","python","translation"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/piotrminkina.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-10T00:48:16.000Z","updated_at":"2026-06-10T01:39:32.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/piotrminkina/epub-deepl","commit_stats":null,"previous_names":["piotrminkina/epub-deepl"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/piotrminkina/epub-deepl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piotrminkina%2Fepub-deepl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piotrminkina%2Fepub-deepl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piotrminkina%2Fepub-deepl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piotrminkina%2Fepub-deepl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/piotrminkina","download_url":"https://codeload.github.com/piotrminkina/epub-deepl/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/piotrminkina%2Fepub-deepl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34435981,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-17T02:00:05.408Z","response_time":127,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","cli","deepl","ebooks","epub","epubcheck","html","localization","lxml","python","translation"],"created_at":"2026-06-17T06:02:30.176Z","updated_at":"2026-06-17T06:02:31.072Z","avatar_url":"https://github.com/piotrminkina.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# EPUB DeepL\n\n[![CI](https://github.com/piotrminkina/epub-deepl/actions/workflows/ci.yml/badge.svg)](https://github.com/piotrminkina/epub-deepl/actions/workflows/ci.yml)\n[![Python 3.11+](https://img.shields.io/badge/python-3.11%20%7C%203.12%20%7C%203.13-blue)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n[![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-d7ff64)](https://docs.astral.sh/ruff/)\n\nA Python CLI that translates an EPUB through DeepL with **maximum\nstructural fidelity to the original**. The translated book reads in any\ne-reader exactly like the source minus the translated text — TOC labels\nmatch chapter headings, manifest and spine are byte-for-byte equivalent,\nembedded SVG attributes survive, non-ASCII characters round-trip cleanly\nthrough Unicode.\n\nThe naive alternative — unzip the EPUB, translate each XHTML separately,\nrepackage by hand — is expensive on three axes that this tool collapses\ninto a single upload/download cycle per book:\n\n1. **Structural fragility.** Manual reassembly drops the TOC,\n   mis-orders the spine, breaks cross-file links, mangles OPF metadata\n   or NCX navigation. Producing a valid EPUB by hand is error-prone\n   and slow.\n2. **Operator time.** Tens of file-by-file upload/download cycles\n   per book.\n3. **Translation-job count.** Per-document translation services\n   (e.g. DeepL Pro Starter, with its 5-documents-per-month limit)\n   charge once per file. An EPUB with 10–50 XHTMLs exhausts the\n   monthly quota on one book; this tool spends one document per book.\n\n**Status:** working MVP, no versioned release cut yet. Targets EPUB 2.0\nwith NCX-based navigation. EPUB 3 + `nav.xhtml` is out of scope for now.\n\n## Install\n\nThe tool is a standard Python package. Any environment with Python 3.11+ and\nthe system libraries for `lxml` (typically present, or installable via\n`apt install libxml2 libxslt1.1`) is sufficient.\n\n```bash\ngit clone \u003cyour-fork\u003e epub-deepl\ncd epub-deepl\n\n# Per ADR-0004 the venv is named after the host's Python minor so it\n# coexists with venvs from other interpreters (e.g. a Dev Container's).\nPY_MINOR=\"$(python3 -c 'import sys; print(f\"{sys.version_info.major}.{sys.version_info.minor}\")')\"\npython3 -m venv \".venv-${PY_MINOR}\"\nsource \".venv-${PY_MINOR}/bin/activate\"\npip install -e .\nepub-deepl --help\n```\n\nTo skip activating the virtualenv each session, use the `bin/` launcher\n(see below) or symlink it into a directory on your `PATH`.\n\n\u003e **Contributing or developing the tool?** See\n\u003e [CONTRIBUTING.md](CONTRIBUTING.md) for the recommended Dev Container\n\u003e workflow, test commands, and code style.\n\n## Usage\n\nThe CLI has two subcommands, designed around a manual DeepL upload/download\nstep.\n\n```bash\n# 1. Bundle the EPUB into a single HTML for DeepL\nepub-deepl prepare path/to/book.epub\n#   → produces path/to/book.prepare.html\n\n# 2. Upload book.prepare.html to https://www.deepl.com/translator/files,\n#    choose target language, download the translated HTML.\n\n# 3. Reassemble the translated EPUB\nepub-deepl restore path/to/book.epub path/to/book.translated.html\n#   → produces path/to/book.translated.epub\n```\n\nThe target language is auto-detected from the translated HTML's\n`\u003chtml lang\u003e` attribute (DeepL sets it correctly). Pass\n`--lang \u003ccode\u003e` to override the detection — useful when the\ntranslator left the source language tag in place or when you want a\nspecific BCP 47 variant (e.g. `--lang pt-BR`).\n\nThe original EPUB is read-only during `restore` and acts as the structural\ntemplate; only translated body content, OPF metadata (`dc:title`,\n`dc:description`, `dc:subject`, `dc:language`), and NCX navigation labels\nare mutated.\n\n### `bin/` launcher (no venv activation)\n\n`bin/epub-deepl` is a thin Bash wrapper that self-locates the project\nroot and execs the matching venv's Python with the CLI module. It\npicks `.venv-${PY_MINOR}/` for the current `python3`, falling back to\nlegacy `.venv/` only if its `pyvenv.cfg` declares the matching minor\n(see [ADR-0004](docs/adr/0004-per-python-minor-venv.md)). Use it for\nshell aliases, cron jobs, or editor integrations where activating a\nvirtualenv first is awkward:\n\n```bash\n# Run from any directory\n/path/to/repo/bin/epub-deepl prepare book.epub\n\n# Or place on PATH\nln -s \"$(pwd)/bin/epub-deepl\" ~/.local/bin/\nepub-deepl prepare book.epub\n```\n\nThe wrapper fails fast with a concrete creation recipe when no\ncompatible venv exists.\n\n## Commands\n\n| Command | Description |\n|---|---|\n| `epub-deepl prepare \u003cinput.epub\u003e` | Validate input and emit `\u003cstem\u003e.prepare.html` |\n| `epub-deepl restore \u003cinput.epub\u003e \u003ctranslated.html\u003e [--lang \u003ccode\u003e]` | Validate translated HTML against the input EPUB and emit `\u003cstem\u003e.translated.epub`. `--lang` is optional (auto-detected from `\u003chtml lang\u003e`). |\n| `epub-deepl --help` | Top-level usage |\n| `\u003csubcommand\u003e --help` | Flags for a specific subcommand |\n\nCommon flags on both subcommands:\n\n| Flag | Effect |\n|---|---|\n| `--output FILE` | Override the default output path |\n| `--force` | Overwrite existing output (does NOT bypass input-equals-output guard) |\n| `--verbose` | Per-file progress to stderr |\n\nExit codes: `0` success, `1` user error (bad input / validation failure /\noutput collision), `2` internal error.\n\n## How It Works\n\n`prepare` walks the input EPUB's spine in reading order and emits a single\nHTML5 document. Each source XHTML becomes a `\u003csection\ndata-source-href=\"…\" data-spine-idx=\"N\"\u003e`. OPF metadata is exposed as\nvisible content under `\u003cheader data-source=\"opf-metadata\"\u003e`. NCX entries\nare serialised as a flat `\u003cnav data-source=\"ncx\"\u003e` block with `data-*`\nattributes preserving `src` and `playOrder` for restore.\n\n`restore` parses the translated HTML, locates every `data-source-href`,\nand rebuilds each XHTML by replacing only the `\u003cbody\u003e` content of the\noriginal. The OPF and NCX trees are mutated in-place — manifest, spine,\nidentifiers, and namespace structure pass through unchanged. NCX\n`\u003cnavLabel\u003e` text is recomputed via **anchor resolution**: for each\n`\u003ccontent src=\"path#fragment\"/\u003e`, the algorithm locates the element with\nthat fragment ID in the restored XHTML and uses its translated heading\ntext — guaranteeing TOC ↔ chapter-heading consistency without translating\nthe labels twice.\n\nDetailed architecture and edge cases:\n[`docs/plans/tech-spec.md`](docs/plans/tech-spec.md).\n\n## Scope\n\n### In scope (MVP)\n\n- EPUB 2.0.1 with NCX-based navigation\n- Round-trip preservation of all human-visible content + OPF / NCX\n  structural metadata required by e-readers\n- DeepL HTML document compatibility (HTML5 self-contained payload)\n- Solo-user CLI workflow with manual upload / download to DeepL\n- Pre-flight validation of the input EPUB (fail-fast on DRM, broken\n  manifest, broken spine, non-XHTML spine items, missing NCX)\n\n### Out of scope\n\n- EPUB 3 with `nav.xhtml` navigation (deferred — post-MVP)\n- DRM-protected EPUBs (detected and rejected; never supported)\n- Automated DeepL API integration (user uploads manually)\n- Automated `epubcheck` invocation (manual user step)\n- Books exceeding DeepL's per-document character limit\n- GUI, web interface, daemon mode, multi-user features\n- Translation memory, caching, or glossary support\n\nFull requirements with user stories: [`docs/plans/prd.md`](docs/plans/prd.md).\n\n## Project Status\n\n**MVP working set, no versioned release yet.** Validated against a\ndiverse EPUB 2.0 + NCX corpus (technical, novel, workbook genres).\nFull corpus round-trip preserves the `epubcheck` baseline (zero new\nerrors introduced by the tool). Real-DeepL spike completed: one full\nPolish translation round-tripped cleanly, R-8 (DeepL preserves\n`data-*` attributes) empirically validated.\n\nCI matrix tests Python 3.11 / 3.12 / 3.13 on every push and PR; a\ndedicated CI job re-runs the synthetic `epubcheck` zero-drift tests\nwith a JRE installed.\n\nPer-release notes in [`CHANGELOG.md`](CHANGELOG.md);\nempirical operational gotchas in\n[`docs/lessons-learned.md`](docs/lessons-learned.md);\narchitecture decisions in [`docs/adr/`](docs/adr/).\n\nKnown limitations:\n\n- EPUB 3 + `nav.xhtml` support — deferred to post-MVP\n- Apple Books / Calibre-specific metadata quirks — observed but not\n  specially handled\n- Books exceeding DeepL's per-document character limit (~1 MB+) — no\n  automatic chunking; user falls back to per-chapter workflow\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n\n---\n\n*A 1 MB book translated as one DeepL document instead of 30 chapters: the\nmath works out to 30× the books you can translate per month, with a TOC\nthat actually matches the chapter headings.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiotrminkina%2Fepub-deepl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpiotrminkina%2Fepub-deepl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpiotrminkina%2Fepub-deepl/lists"}