{"id":27879475,"url":"https://github.com/mcmonkeyprojects/translate-tool","last_synced_at":"2025-05-05T03:22:16.157Z","repository":{"id":217157932,"uuid":"743205265","full_name":"mcmonkeyprojects/translate-tool","owner":"mcmonkeyprojects","description":"AI translation tool","archived":false,"fork":false,"pushed_at":"2024-01-29T00:10:36.000Z","size":49,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-03-26T22:53:41.270Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mcmonkeyprojects.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2024-01-14T16:41:36.000Z","updated_at":"2024-01-14T16:41:41.000Z","dependencies_parsed_at":"2024-01-29T01:54:55.945Z","dependency_job_id":"281c2f0b-974d-4c91-939c-851273a995e0","html_url":"https://github.com/mcmonkeyprojects/translate-tool","commit_stats":null,"previous_names":["mcmonkeyprojects/translate-tool"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcmonkeyprojects%2Ftranslate-tool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcmonkeyprojects%2Ftranslate-tool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcmonkeyprojects%2Ftranslate-tool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mcmonkeyprojects%2Ftranslate-tool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mcmonkeyprojects","download_url":"https://codeload.github.com/mcmonkeyprojects/translate-tool/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252430325,"owners_count":21746639,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-05T03:22:14.149Z","updated_at":"2025-05-05T03:22:16.146Z","avatar_url":"https://github.com/mcmonkeyprojects.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mcmonkey's AI Translation Tool\n\nBulk translates everything in a reference file using local translation AI.\n\nBuilt using https://github.com/huggingface/candle and by default uses this model: https://huggingface.co/jbochi/madlad400-7b-mt-bt in GGUF-q4 which is derived from https://huggingface.co/google/madlad400-7b-mt-bt\n\n## Usage\n\n- Install rust: https://www.rust-lang.org/tools/install\n    - Expect silly rust errors that you have to google (eg requiring Visual Studio on Windows for some reason)\n\n#### Compile:\n```sh\ncargo build --release\n```\n\n#### Run:\n```sh\n./target/release/translate-tool.exe --in-json \"data/test-in.json\" --out-json \"data/test-out.json\" --language de\n```\n\nLanguage should be a standard language code - if in doubt, see list at https://arxiv.org/pdf/2309.04662.pdf Appendix A.1\n\nTack `--verbose` onto the end to get some live debug output as it goes.\n\nUse `--model-id jbochi/madlad400-3b-mt` if you're impatient and want a smaller model.\n\nAdd `--max-ratio 10` to automatically stop the model if its output is 10x longer than input (defaults to 5). This usually indicates an AI breakdown.\n\nAdd `--add-json \"data/other-file.json\"` to also append new keys in a secondary key file.\n\nAdd `--max-tokens 60` to set the split length. This depends on when/how the model breaks down. Set it *BELOW* seq len.\n\nSpeed comparison (extremely variable with prompt):\n| CPU | 7B-MT-BT | 3B-MT |\n| --- | -------- | ----- |\n| Intel i7-12700KF (12 p-core) | 7 tok/s | 15 tok/s |\n| AMD Ryzen 5 3600 (6 core) | 4 tok/s | 8 tok/s |\n\n#### Example input JSON file:\n```json\n{\n    \"keys\": {\n        \"This keys needs translation\": \"\",\n        \"This key doesn't\": \"cause it has a value\"\n    }\n}\n```\n\n#### Explanation\n\nThis will translate keys and store the result in the value, skipping any keys that already have a value.\n\nFirst run will automatically download the model, subsequent runs will load from HF cache (in user dir -\u003e `.cache/huggingface/hub`)\n\nNote that this runs entirely on CPU, because the Transformers GPU version needs too much VRAM to work and GGUF doesn't want to work on GPU within candle I guess? \"Oh but why not use regular GGML to run it then\" because GGML doesn't support T5??? Idk why candle supports GGML-formatted T5 but GGML itself doesn't. AI tech is a mess. If you're reading this after year 2024 when this was made there's hopefully less dumb ways to do what is currently cutting edge AI stuff.\n\nThis will burn your CPU and take forever.\n\nNote that I'm not experienced in Rust and the lifetime syntax is painful so I might've screwed something up.\n\n## Legal Stuff\n\nThis project depends on Candle which is either MIT or Apache2. Both licenses are in their repo don't ask me what that means idek.\n\nSections of source code are copied from Candle examples.\n\nThis project depends on MADLAD models that google research released under Apache2 which I'm not entirely clear why a software license is on model weights but again idek.\n\nAnything unique to this project is yeeted out freely under the MIT license.\n\nI have no idea whether any legal restrictions apply to the resultant translated text but you're probably fine probably (if you have rights to use the source text at least)\n\n## License\n\nThe MIT License (MIT)\n\nCopyright (c) 2024 Alex \"mcmonkey\" Goodwin\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcmonkeyprojects%2Ftranslate-tool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmcmonkeyprojects%2Ftranslate-tool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmcmonkeyprojects%2Ftranslate-tool/lists"}