https://github.com/saddam-94/docx-translator
docx translator for multiple language support.
https://github.com/saddam-94/docx-translator
docker docker-compose machine-translation nlp
Last synced: 2 months ago
JSON representation
docx translator for multiple language support.
- Host: GitHub
- URL: https://github.com/saddam-94/docx-translator
- Owner: saddam-94
- Created: 2025-03-07T10:24:16.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-08T05:18:18.000Z (over 1 year ago)
- Last Synced: 2025-03-27T18:50:30.856Z (over 1 year ago)
- Topics: docker, docker-compose, machine-translation, nlp
- Language: Python
- Homepage:
- Size: 1.56 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Document Translation Project
This project provides a script to translate DOCX documents from one language to another using the Argos Translate library.
## Prerequisites
- Python
- Docker
- Docker Compose
## Setup
1. Clone the repository to your local machine.
2. Navigate to the project directory.
run requirements.txt file
```
pip install -r requirements.txt
```
## Docker Setup
### Build the Docker Image
To build the Docker image, run the following command:
```sh
docker-compose build
```
Run the Docker Container
To run the Docker container, use the following command:
```sh
docker-compose up
```
Usage
The main script for translating documents is translate.py. This script reads the source and target languages from a YAML file (lang_code.yml) and translates all DOCX files in the specified input directory.
Command Line Arguments
--input_path: The path to the directory containing the DOCX files to be translated.
--output_path: The path to the directory where the translated DOCX files will be saved.
--src_lang: The source language code (e.g., English for 'en').
--tar_lang: The target language code (e.g., French for 'fr').
Example
To translate documents from English to French, you can run the following command:
```cmd
python translate.py --input-path input_docx --output_path output_docx --src-lang 'English' --tar-lang 'Bengali'
```
Below are list of language use can use
```
English: en
Afrikaans: af
Albanian: sq
Amharic: am
Arabic: ar
Armenian: hy
Basque: eu
Bengali: bn
Bosnian: bs
Bulgarian: bg
Catalan: ca
Chinese (Simplified): zh
Chinese (Traditional): zh-TW
Croatian: hr
Czech: cs
Danish: da
Dutch: nl
Estonian: et
Finnish: fi
French: fr
Georgian: ka
German: de
Greek: el
Gujarati: gu
Haitian Creole: ht
Hebrew: he
Hindi: hi
Hungarian: hu
Icelandic: is
Indonesian: id
Italian: it
Japanese: ja
Kannada: kn
Kazakh: kk
Khmer: km
Korean: ko
Latvian: lv
Lithuanian: lt
Macedonian: mk
Malay: ms
Mongolian: mn
Nepali: ne
Norwegian: no
Persian: fa
Polish: pl
Portuguese: pt
Punjabi: pa
Romanian: ro
Russian: ru
Serbian: sr
Slovak: sk
Slovenian: sl
Somali: so
Spanish: es
Swahili: sw
Swedish: sv
Tamil: ta
Telugu: te
Turkish: tr
Ukrainian: uk
Urdu: ur
Vietnamese: vi
Welsh: cy
Yiddish: yi
Zulu: zu
```
For using as an python api, call below method,
```python
from translate import translate
translated_txt = translate(input_path, output_path, src_lang, tar_lang)
translated_tx.save(output_path)
```
```File Structure
translate.py: The main script for translating DOCX documents.
lang_code.yml: A YAML file containing language codes.
Dockerfile: The Dockerfile for building the Docker image.
docker-compose.yml: The Docker Compose file for setting up the Docker environment.
requirements.txt: A file listing the Python dependencies for the project.
```
Example output
-------------------
please refer to the [documentation file](./output_docx/reference_translated_brochure.docx) as input docx.
Here is the corresponding [output docx](./output_docx/brochure_fr.docx)
## Possible improvement and speedup
Currently it can run 135.8s. If we can utilize multiprocesing or concurrent strategy by only spliting the txt as sentence, and execute all the cpu core then easily reduce the latency. Also we can use GPU machine if avalilabe which also reduce the execution time greatly.