{"id":48692758,"url":"https://github.com/wxyhgk/retain-pdf","last_synced_at":"2026-05-31T06:01:59.461Z","repository":{"id":347762581,"uuid":"1195189059","full_name":"wxyhgk/retain-pdf","owner":"wxyhgk","description":"在保留版面、公式与结构的前提下进行 PDF 翻译，适用于科研与技术文档","archived":false,"fork":false,"pushed_at":"2026-05-18T03:17:26.000Z","size":240417,"stargazers_count":1656,"open_issues_count":9,"forks_count":196,"subscribers_count":8,"default_branch":"main","last_synced_at":"2026-05-18T05:55:17.611Z","etag":null,"topics":["document-ai","document-processing","layout-preserving","ocr","pdf","scientific-papers","translation","typst"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wxyhgk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-29T10:55:41.000Z","updated_at":"2026-05-18T05:52:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/wxyhgk/retain-pdf","commit_stats":null,"previous_names":["wxyhgk/retain-pdf"],"tags_count":46,"template":false,"template_full_name":null,"purl":"pkg:github/wxyhgk/retain-pdf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wxyhgk%2Fretain-pdf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wxyhgk%2Fretain-pdf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wxyhgk%2Fretain-pdf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wxyhgk%2Fretain-pdf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wxyhgk","download_url":"https://codeload.github.com/wxyhgk/retain-pdf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wxyhgk%2Fretain-pdf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33720897,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-31T02:00:06.040Z","response_time":95,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["document-ai","document-processing","layout-preserving","ocr","pdf","scientific-papers","translation","typst"],"created_at":"2026-04-11T06:01:21.877Z","updated_at":"2026-05-31T06:01:59.418Z","avatar_url":"https://github.com/wxyhgk.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# RetainPDF：PDF 保留排版翻译工具\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/RetainPDF-github.svg\" alt=\"RetainPDF\" width=\"320\" /\u003e\n\u003c/p\u003e\n\n\n开源社区做保留排版的项目不少，但是都围绕可复制，可编辑的 PDF，以及行内公式不复杂的场景.\n\nRetainPDF 从一开始就是要解决各类 PDF 的保留排版翻译问题，尤其是图片型/扫描版 PDF，以及行内公式的渲染问题.\n\n在保留排版翻译这个领域，正面硬刚闭源模型,并且在一些场景下做得更好，比如翻译后的 PDF 体积、整体速度和字体大小控制。\n\n此外本项目是前后端分离、OCR、翻译、排版与交付打通的全栈项目，整体结构尽量解耦，既能直接使用，也方便后续开发者继续扩展、替换模块和二次开发。\n\n\n简单对比：\n\n| 项目 | 扫描型 PDF | 复杂行内公式 | 代码不误翻 | 表格控制 | 自定义翻译策略 | 排版保留 | PDF 压缩优化 | API 自动化 |\n| --- | --- | --- | --- | --- | --- | --- | --- | --- |\n| PDFMathTranslate | ❌ | ❌ | ❌ | 弱 | 弱 | 一般 | 一般 | ✅ |\n| PolyglotPDF | ❌ | ❌ | ❌ | 弱 | 弱 | 一般 | 一般 | ✅ |\n| Doc2X | ✅ | ✅ | ❌ | 中 | 弱 | 强 | 弱 | ❌ 不开放 |\n| RetainPDF | ✅ | ✅ | ✅ | ✅ 可开关 | ✅ 可按规则配置 | 强 | ✅ 持续优化 | ✅ |\n\n## 效果图\n\n### SCI 论文\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/image%201.png\" alt=\"SCI 示例 1\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/image%202.png\" alt=\"SCI 示例 2\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n### 图片型 / 扫描版 PDF\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/image%203.png\" alt=\"扫描版示例 1\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/image%207.png\" alt=\"扫描版示例 2\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n### 图书类\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/image%204.png\" alt=\"图书示例 1\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/image%205.png\" alt=\"图书示例 2\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/image%206.png\" alt=\"图书示例 3\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n## 快速开始\n\n如果你只是想直接使用，先去 [GitHub Releases](https://github.com/wxyhgk/retain-pdf/releases) 下载对应平台的发布包：\n\n- Windows：优先下载 `Setup.exe`\n- macOS：下载 `.dmg`\n- Linux：下载 `.deb`\n\n如果你想给局域网、团队或多台设备一起用，优先选 Docker 部署。\n\n### Windows 桌面端\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"image/RetainPDF-desktop.png\" alt=\"RetainPDF Windows 桌面端\" width=\"860\" /\u003e\n\u003c/p\u003e\n\n### macOS 提示\n\n由于当前没有 Apple 开发者账号，macOS 版本第一次打开时可能会提示应用“已损坏”。这不是文件真的损坏，而是系统的签名校验导致的。把应用拖到 `/Applications` 后，执行：\n\n```bash\nsudo xattr -r -d com.apple.quarantine /Applications/RetainPDF.app\n```\n\n然后再重新打开应用即可。\n\n### Docker 部署\n\n当前仓库提供了 Docker 交付目录：\n\n- [docker/delivery/README.md](docker/delivery/README.md)\n- [docker/delivery/docker-compose.yml](docker/delivery/docker-compose.yml)\n\n基本步骤：\n\n```bash\ngit clone https://github.com/wxyhgk/retain-pdf.git\ncd retain-pdf/docker/delivery\ndocker compose up -d\n```\n\n启动后默认访问：\n\n```text\nhttp://127.0.0.1:40001\n```\n\n默认端口：\n\n- `40001`：前端页面\n- `41000`：Rust API\n- `42000`：简便同步接口\n\n### Docker 更新\n\n如果只是更新到最新镜像版本：\n\n```bash\ncd retain-pdf/docker/delivery\ndocker compose pull\ndocker compose up -d\n```\n\n如果你要切换到指定镜像版本，也可以这样：\n\n```bash\ncd retain-pdf/docker/delivery\nAPP_IMAGE=wxyhgk/retainpdf-app:latest \\\nWEB_IMAGE=wxyhgk/retainpdf-web:latest \\\ndocker compose up -d\n```\n\n更新后建议执行一次状态检查：\n\n```bash\ndocker compose ps\n```\n\n当前镜像地址：\n\n- [wxyhgk/retainpdf-app](https://hub.docker.com/r/wxyhgk/retainpdf-app)\n- [wxyhgk/retainpdf-web](https://hub.docker.com/r/wxyhgk/retainpdf-web)\n\n## 开发者\n\n\n### 文档入口\n\n建议按下面顺序阅读。\n\n- [当前 API 文档](doc/API.md)\n- [文档目录](doc/README.md)\n- [工程评价与后续执行计划](doc/工程评价与后续执行计划.md)\n- [服务总览](doc/api-overview.md)\n- [本地启动与配置](doc/api-dev.md)\n- [接口说明](doc/api-endpoints.md)\n- [存储结构](doc/api-storage.md)\n- [错误排查](doc/api-troubleshooting.md)\n\n### 代码与子模块说明\n\n- [后端脚本说明](backend/scripts/README.md)\n- [旧 FastAPI 包装层](backend/Fast_API/README.md)\n- `frontend/`：当前浏览器前端静态资源与桌面端打包输入目录\n\n### 当前目录结构\n\n- `frontend/`\n  浏览器前端、桌面壳、预览实验页面。\n- `backend/`\n  Rust API、Python 脚本、嵌入式 Python、旧 FastAPI 包装层、历史工作区。\n- `docker/`\n  Dockerfile、发布脚本、交付用 compose 配置。\n- `data/`\n  本地运行输出、任务目录、历史样本数据。\n\n### 当前工程判断\n\nRetainPDF 目前已经可以完成从 PDF 上传、OCR、翻译、排版重建到产物下载的完整链路。\n\n接下来我的重点不是盲目堆功能，而是继续把下面几件事做稳：\n\n- 工程一致性\n- API 与产物契约稳定性\n- 构建可复现性\n- 长文块与公式场景下的翻译稳定性\n\n如果你想了解我接下来准备怎么推进，可以看：\n\n- [工程评价与后续执行计划](doc/工程评价与后续执行计划.md)\n\n### 欢迎一起参与\n\n如果你也对下面这些方向感兴趣，欢迎一起把这个项目继续往前做：\n\n- 高精度 OCR / 疑难版面解析\n- 长文块与公式场景下的翻译稳定性\n- 排版回填、字体自适应与 PDF 渲染\n- 桌面端、Docker 交付与工程化完善\n\n不管你更擅长算法、前端、后端还是部署，只要你也想把“真正能用的 PDF 保留排版翻译”这件事做深，欢迎进来一起搞。\n\n## License\n\nThis project is distributed under the MIT License. See [LICENSE](LICENSE) for the full text.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwxyhgk%2Fretain-pdf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwxyhgk%2Fretain-pdf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwxyhgk%2Fretain-pdf/lists"}