{"id":50480957,"url":"https://github.com/derekslinz/recursive-local-translator","last_synced_at":"2026-06-01T17:30:41.700Z","repository":{"id":354345606,"uuid":"1223070808","full_name":"derekslinz/recursive-local-translator","owner":"derekslinz","description":"A recursive document translator tool that leverages argostranslate/ctranslate2 and cuda/mps acceleration when possible.","archived":false,"fork":false,"pushed_at":"2026-04-28T06:17:47.000Z","size":75,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-28T08:05:01.067Z","etag":null,"topics":["argostranslate","ctranslate2","document-translation","localization-tool","osint-tool","translation-tool"],"latest_commit_sha":null,"homepage":"https://www.linzalytics.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/derekslinz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-28T01:32:43.000Z","updated_at":"2026-04-28T06:17:48.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/derekslinz/recursive-local-translator","commit_stats":null,"previous_names":["derekslinz/recursive-local-translator"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/derekslinz/recursive-local-translator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/derekslinz%2Frecursive-local-translator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/derekslinz%2Frecursive-local-translator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/derekslinz%2Frecursive-local-translator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/derekslinz%2Frecursive-local-translator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/derekslinz","download_url":"https://codeload.github.com/derekslinz/recursive-local-translator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/derekslinz%2Frecursive-local-translator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33786894,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-01T02:00:06.963Z","response_time":115,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["argostranslate","ctranslate2","document-translation","localization-tool","osint-tool","translation-tool"],"created_at":"2026-06-01T17:30:40.690Z","updated_at":"2026-06-01T17:30:41.687Z","avatar_url":"https://github.com/derekslinz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Recursive Local Translator\n\nA high-performance, modular workspace translator designed to handle massive directories with mixed file formats. It leverages **CTranslate2** for ultra-fast inference and supports **CUDA** (NVIDIA) and **MPS** (Apple Silicon) hardware acceleration.\n\nUnlike simple wrappers, this tool drives the underlying translation models directly, providing significant speed improvements and better control over memory and device utilization.\n\n---\n\n## Key Capabilities\n\n### 1. Multi-Pass Workspace Processing\nThe tool operates in three distinct phases to ensure consistency and safety:\n- **Pass 1: Renaming**: Recursively translates directory names and filenames.\n- **Pass 2: Upgrading**: Converts legacy Office formats (e.g., `.doc`) to modern OpenXML (`.docx`) using LibreOffice.\n- **Pass 3: Content**: Performs in-place translation of file contents or generates sidecar text extracts.\n\n### 2. Comprehensive Format Support\n| Category | Supported Extensions |\n| :--- | :--- |\n| **Documentation** | `.txt`, `.log`, `.nfo`, `.md`, `.mdx`, `.rmd`, `.rst`, `.adoc`, `.org`, `.wiki`, `.rtx`, `.tex` |\n| **Config / Web** | `.cfg`, `.conf`, `.toml`, `.properties`, `.mak`, `.cmake`, `.yaml`, `.yml`, `.xml`, `.html`, `.htm`, `.xhtml`, `.shtml`, `.json`, `.json5`, `.jsonc`, `.jsonl`, `.svg`, `.resx`, `.xliff`, `.xlf`, `.tmx` |\n| **Subtitles** | `.srt`, `.vtt`, `.ass`, `.ssa`, `.sub`, `.sbv`, `.po`, `.pot` |\n| **Additional Text** | `.lrc`, `.info`, `.textile`, `.strings`, `.arb`, `.fb2`, `.ts` (Qt XML autodetected only) |\n| **Office** | `.docx`, `.xlsx`, `.pptx` (formatting preserved) |\n| **OpenDocument** | `.odt`, `.ods`, `.odp` (native in-place translation) |\n| **Email / Ebook** | `.eml` (subject/body + Base64 text-like attachments), `.epub` |\n| **Sidecars** | `.pdf`, `.vsd`, `.vsdx`, `.msg`, `.djvu`, `.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp` (generates `.en.txt`) |\n\n### 3. Intelligent Content Handling\n- **Mixed Language Support**: Automatically detects language per paragraph/chunk. English text within a Russian document is preserved untouched.\n- **Transliteration Mode**: High-speed mode to convert Cyrillic script to Latin script without full semantic translation.\n- **Content-Based Renaming**: Can automatically rename generic filenames (like `scan_001.jpg` or `Untitled.txt`) based on the translated content found inside the file.\n\n---\n\n## Installation\n\n### 1. Prerequisites\n- **Python 3.8 - 3.12** (Python 3.13+ currently has limited ML library support).\n- **LibreOffice**: Required for format upgrades (`.doc` → `.docx`).\n- **Tesseract OCR**: Required for image and PDF OCR fallback.\n\n### 2. Setup\nIt is highly recommended to use a virtual environment to manage dependencies:\n\n```bash\n# Clone the repository\ngit clone https://github.com/derekslinz/recursive-local-translator.git\ncd recursive-local-translator\n\n# Create and activate virtual environment\npython3 -m venv .venv\nsource .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n\n# Install dependencies\npip install -r requirements.txt\n\n# Install the Russian-English translation model\nargospm install translate-ru_en\n```\n\n\u003e [!TIP]\n\u003e **Hardware Acceleration**: Manual device selection via `--device` is typically **not necessary**. The tool automatically detects and utilizes the best available accelerator (CUDA on NVIDIA GPUs, MPS on Apple Silicon) and falls back to CPU only if needed.\n\n---\n\n## Usage Guide\n\n### Basic Command\nTranslates a specific directory from Russian to English (use `.` for the current directory):\n```bash\npython3 translate_all.py /path/to/workspace\n```\n\n### Advanced Flags\n| Argument | Description |\n| :--- | :--- |\n| `root_path` | **Required**: The target directory to process recursively. |\n| `--transliterate` | **Fast Mode**: Only converts Cyrillic to Latin (e.g., `Папка` → `Papka`). |\n| `--auto-detect` | Detects language per file. Skips files that are already in the target language. |\n| `--rename-only` | Only renames files and folders; skips content translation. |\n| `--upgrade-only` | Only converts legacy formats; skips all translation. |\n| `--sidecars` | Enables generating `.en.txt` extracts for binary formats (PDF, Images). |\n| `--device [auto/cuda/mps/cpu]` | Forces a specific hardware accelerator (typically auto-detected). |\n| `--workers [N]` | Sets concurrency level (default: 5). |\n\n---\n\n## Important Notes\n- **Irreversible**: Content translation is performed **in-place**. ALWAYS work on a copy/mirror of your data.\n- **Encoding**: The tool assumes UTF-8 encoding. Non-compliant files are processed with \"surrogateescape\" error handling to prevent crashes.\n\n---\n\n---\n\n## Robustness \u0026 Reliability\n\n### Cascading Fallbacks\nTo ensure maximum extraction success, the tool uses a cascading fallback strategy:\n- **PDF Extraction**: `PyMuPDF` → `PyPDF2` → `pdfminer.six` → `pdftotext` (CLI).\n- **OCR Engine**: `Tesseract` → `EasyOCR` → `pytesseract`.\n\n### Email Attachment Handling (`.eml`)\n\n- Base64-encoded attachments are decoded before processing.\n- Text-like attachments (for example `.txt`, `.csv`, `.json`, `.xml`) are translated and re-encoded.\n- Binary attachments are preserved untouched to avoid corruption.\n\n### Filesystem Safety\n- **Path Sanitization**: Automatically removes invalid characters and collapses excessive repetitions.\n- **Length Enforcement**: Filenames are strictly limited to **255 bytes** using multi-byte safe truncation, preventing OS \"path too long\" errors.\n- **Configuration Protection**: `.ini` and `.sys` files are automatically excluded from inline translation to prevent system corruption.\n\n---\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderekslinz%2Frecursive-local-translator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fderekslinz%2Frecursive-local-translator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fderekslinz%2Frecursive-local-translator/lists"}