Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Byaidu/PDFMathTranslate
PDF scientific paper translation and bilingual comparison - 完整保留排版的 PDF 文档全文双语翻译
https://github.com/Byaidu/PDFMathTranslate
chinese english japanese korean latex pdf translation
Last synced: about 2 months ago
JSON representation
PDF scientific paper translation and bilingual comparison - 完整保留排版的 PDF 文档全文双语翻译
- Host: GitHub
- URL: https://github.com/Byaidu/PDFMathTranslate
- Owner: Byaidu
- License: agpl-3.0
- Created: 2024-09-06T06:56:03.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2024-10-28T06:58:34.000Z (about 2 months ago)
- Last Synced: 2024-10-29T22:56:24.488Z (about 2 months ago)
- Topics: chinese, english, japanese, korean, latex, pdf, translation
- Language: Python
- Homepage:
- Size: 40.6 MB
- Stars: 78
- Watchers: 3
- Forks: 12
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- StarryDivineSky - Byaidu/PDFMathTranslate - YOLO 等开源项目。 (A01_文本生成_文本对话 / 其他_文本生成_文本对话)
- AiTreasureBox - Byaidu/PDFMathTranslate - 12-14_5799_738](https://img.shields.io/github/stars/Byaidu/PDFMathTranslate.svg)|PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker| (Repos)
- awesome - Byaidu/PDFMathTranslate - PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker (Python)
README
# PDFMathTranslate
PDF scientific paper translation and bilingual comparison based on font rules and deep learning, preserving formula and figure layout.
![image](https://github.com/user-attachments/assets/57e1cde6-c647-4af8-8f8f-587a40050dde)
![image](https://github.com/user-attachments/assets/0e6d7e44-18cd-443a-8a84-db99edf2c268)
## Installation
```bash
pip install pdf2zh
```## Usage
Execute the translation command in the command line to generate the translated document `example-zh.pdf` and the bilingual document `example-dual.pdf` in the current directory.
### Translate the entire document
```bash
pdf2zh example.pdf
```### Translate part of the document
```bash
pdf2zh example.pdf -p 1-3,5
```### Translate with the specified language
```bash
pdf2zh example.pdf -li en -lo ja
```### Use regex to specify formula fonts and characters that need to be preserved
Hint: Starting from `\ufb00` is English style ligature.
```bash
pdf2zh BDA3.pdf -f "(CM[^RT].*|MS.*|XY.*|MT.*|BL.*|.*0700|.*0500|.*Italic)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
```## Acknowledgement
Document merging: [PyMuPDF](https://github.com/pymupdf/PyMuPDF)
Document parsing: [Pdfminer.six](https://github.com/pdfminer/pdfminer.six)
Document extraction: [MinerU](https://github.com/opendatalab/MinerU)
Multi-threaded translation: [MathTranslate](https://github.com/SUSYUSTC/MathTranslate)
Layout parsing: [DocLayout-YOLO](https://github.com/opendatalab/DocLayout-YOLO)
## Star History