{"id":30892312,"url":"https://github.com/evilfreelancer/datasets-translator","last_synced_at":"2025-09-08T19:43:51.781Z","repository":{"id":270405923,"uuid":"910285274","full_name":"EvilFreelancer/datasets-translator","owner":"EvilFreelancer","description":"Colleciton of scripts for translating datasets from one language to another","archived":false,"fork":false,"pushed_at":"2024-12-30T22:24:27.000Z","size":6,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-30T23:19:55.234Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EvilFreelancer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-30T22:08:15.000Z","updated_at":"2024-12-30T22:24:30.000Z","dependencies_parsed_at":"2024-12-30T23:19:57.087Z","dependency_job_id":"c7d0475c-1357-420d-b5ab-7f0895a7232e","html_url":"https://github.com/EvilFreelancer/datasets-translator","commit_stats":null,"previous_names":["evilfreelancer/datasets-translator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/EvilFreelancer/datasets-translator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdatasets-translator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdatasets-translator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdatasets-translator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdatasets-translator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EvilFreelancer","download_url":"https://codeload.github.com/EvilFreelancer/datasets-translator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EvilFreelancer%2Fdatasets-translator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274231432,"owners_count":25245585,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-08T02:00:09.813Z","response_time":121,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-08T19:43:49.686Z","updated_at":"2025-09-08T19:43:51.774Z","avatar_url":"https://github.com/EvilFreelancer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Translator\n\nПростой скрипт для автоматизации перевода датасетов, на данный момент реализова перевод для:\n\n- [isaiahbjork/chain-of-thought-sharegpt](https://huggingface.co/datasets/isaiahbjork/chain-of-thought-sharegpt)\n- [HuggingFaceH4/MATH-500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500)\n\n## translator.py\n\nПростой скрипт для машинного перевода использующий модели:\n\n- [utrobinmv/t5_translate_en_ru_zh_small_1024](https://huggingface.co/utrobinmv/t5_translate_en_ru_zh_small_1024) -\n  small модель (наиболее быстрая)\n- [utrobinmv/t5_translate_en_ru_zh_large_1024](https://huggingface.co/utrobinmv/t5_translate_en_ru_zh_large_1024) -\n  large модель (самый качественный перевод)\n- [utrobinmv/t5_translate_en_ru_zh_base_200](https://huggingface.co/utrobinmv/t5_translate_en_ru_zh_base_200) - base\n  модель (не уступающая по качеству модели large), но для более коротких текстов и более быстрая.\n\nПолезные ссылки:\n\n- [Сравнение локальных моделей машинного перевода для английского, китайского и русского языков](https://habr.com/ru/articles/791522/)\n- [New argos model en_ru for add argospm-index](https://community.libretranslate.com/t/new-argos-model-en-ru-for-add-argospm-index/311)\n\n## ollama_translator.py\n\nВерсия скрипта для перевода датасетов заточенная работу через Ollama API.\n\nПример использования:\n\n```shell\n python ollama_translatior.py ./MATH-500/test.jsonl ./MATH-500-Russian.jsonl --fields_to_translate=problem,solution,answer\n```\n\n## Ссылки\n\n- https://github.com/EvilFreelancer/impruver - приложение для обучения LLM\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fdatasets-translator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fevilfreelancer%2Fdatasets-translator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fevilfreelancer%2Fdatasets-translator/lists"}