{"id":23236987,"url":"https://github.com/nicolay-r/bulk-translate","last_synced_at":"2025-06-14T09:06:30.769Z","repository":{"id":262136851,"uuid":"886209496","full_name":"nicolay-r/bulk-translate","owner":"nicolay-r","description":"A tiny Python no-string package for performing translation of a massive stream of texts with native support of pre-annotated fixed-spans that are invariant for translator.","archived":false,"fork":false,"pushed_at":"2025-03-10T13:19:28.000Z","size":399,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-25T22:17:48.133Z","etag":null,"topics":["arekit","framework","googletrans","iterator","pipeline","span","span-based","spreadsheet","spreadsheets","translate"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nicolay-r.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-10T13:44:50.000Z","updated_at":"2025-03-16T01:20:27.000Z","dependencies_parsed_at":"2024-11-10T19:26:41.377Z","dependency_job_id":"766381d3-f813-4384-b3bd-db6f4e3c189a","html_url":"https://github.com/nicolay-r/bulk-translate","commit_stats":null,"previous_names":["nicolay-r/bulk-translate"],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolay-r%2Fbulk-translate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolay-r%2Fbulk-translate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolay-r%2Fbulk-translate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolay-r%2Fbulk-translate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nicolay-r","download_url":"https://codeload.github.com/nicolay-r/bulk-translate/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicolay-r%2Fbulk-translate/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":258850658,"owners_count":22767828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arekit","framework","googletrans","iterator","pipeline","span","span-based","spreadsheet","spreadsheets","translate"],"created_at":"2024-12-19T04:13:23.434Z","updated_at":"2025-06-14T09:06:30.746Z","avatar_url":"https://github.com/nicolay-r.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# bulk-translate 0.25.2\n![](https://img.shields.io/badge/Python-3.9-brightgreen.svg)\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nicolay-r/bulk-translate/blob/master/bulk_translate_demo.ipynb)\n[![twitter](https://img.shields.io/twitter/url/https/shields.io.svg?style=social)](https://x.com/nicolayr_/status/1871218031709323461)\n[![PyPI downloads](https://img.shields.io/pypi/dm/bulk-translate.svg)](https://pypistats.org/packages/bulk-translate)\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"logo.png\"/\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/nicolay-r/nlp-thirdgate?tab=readme-ov-file#text-translation\"\u003e\u003cb\u003eThird-party providers hosting\u003c/b\u003e↗️\u003c/a\u003e\n\u003c/p\u003e\n\nA tiny Python no-string package for performing translation of a massive `CSV`/`JSONL` files that \nnatively provides support of pre-annotated **fixed-spans** that are invariant for translator.\n\n## Description\n  \n\u003cdetails\u003e\n\u003csummary\u003e\n  \n### 📘 More on spans\n\u003c/summary\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"example.png\"  width=\"600\"/\u003e\n\u003c/p\u003e\n\n\u003c/details\u003e\n\u003cdetails\u003e\n\u003csummary\u003e\n\n### 📘 `bulk-translate` features\n\u003c/summary\u003e\n\nThe out-of-the box features of the `bulk-translate` are:\n* ✅ Support of the `spans` for annotation / optional translation.\n* ✅ Native Implementation of two translation modes:\n  - `fast-mode`: exploits extra chars that could be used for grouping all the text parts into single batch with further deconstruction.\n  - `accurate`: performs individual translation of each text part.\n* ✅ No strings: you're free to adopt any LM / LLM backend.\n  - Support `googletrans` by default.\n \n\u003c/details\u003e\n\n## Installation\n\nFrom PyPI: \n```bash\npip install bulk-translate\n```\n\nor latest version from here:\n```bash\npip install git+https://github.com/nicolay-r/bulk-translate\n```\n\n## Usage\n\n### API\n\n### 👉 [Follow this notebook tutorial at `nlp-thirdgate`](https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/translate_texts_with_spans_via_googletrans.ipynb)\n\n\n## Command Line / Shell \n\n\u003e **NOTE:** Spans supports only in JSON-lines format.\n \n\u003e **NOTE:** Requires `source_iter` package installation.\n\nFor the following [`test.tsv` example data](/test/data/test.tsv) with annotated entities enclosed in square brackets:\n\n```bash\npython -m bulk_translate.translate \\\n    --src \"test/data/test.tsv\" \\\n    --schema '{\"translated\":\"{text}\"}' \\\n    --adapter \"dynamic:models/googletrans_310a.py:GoogleTranslateModel\" \\\n    --output \"test-translated.jsonl\" \\\n    --batch-size 10 \\\n    %%m \\\n    --src \"auto\" \\\n    --dest \"ru\"\n```\n\n## Powered by\n\nThe pipeline construction components were taken from AREkit [[github]](https://github.com/nicolay-r/AREkit)\n\n\u003cp float=\"left\"\u003e\n\u003ca href=\"https://github.com/nicolay-r/AREkit\"\u003e\u003cimg src=\"https://github.com/nicolay-r/ARElight/assets/14871187/01232f7a-970f-416c-b7a4-1cda48506afe\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicolay-r%2Fbulk-translate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnicolay-r%2Fbulk-translate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicolay-r%2Fbulk-translate/lists"}