{"id":18930406,"url":"https://github.com/richawo/llm-translator","last_synced_at":"2025-04-15T15:31:24.252Z","repository":{"id":240328721,"uuid":"696519473","full_name":"richawo/llm-translator","owner":"richawo","description":"Translate Markdown files from one language to another using OpenAI's API while retaining original formatting. This Jupyter notebook tokenizes input text, splits into chunks, translates with OpenAI, and reconstructs output to preserve Markdown structure. Useful for localizing documentation, articles, books, and other long-form Markdown content.","archived":false,"fork":false,"pushed_at":"2023-10-15T22:12:57.000Z","size":24,"stargazers_count":17,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T22:51:12.739Z","etag":null,"topics":["ai","artificial-intelligence","gpt-4","jupyter-notebook","llm","localisation","localization","openai","openai-api","translation","translator","translator-app"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/richawo.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-25T23:02:51.000Z","updated_at":"2025-03-20T09:34:35.000Z","dependencies_parsed_at":"2024-05-18T03:57:33.685Z","dependency_job_id":null,"html_url":"https://github.com/richawo/llm-translator","commit_stats":null,"previous_names":["richawo/llm-translator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richawo%2Fllm-translator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richawo%2Fllm-translator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richawo%2Fllm-translator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/richawo%2Fllm-translator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/richawo","download_url":"https://codeload.github.com/richawo/llm-translator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249097964,"owners_count":21212387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","artificial-intelligence","gpt-4","jupyter-notebook","llm","localisation","localization","openai","openai-api","translation","translator","translator-app"],"created_at":"2024-11-08T11:37:33.373Z","updated_at":"2025-04-15T15:31:23.803Z","avatar_url":"https://github.com/richawo.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenAI Translator / Localisation Tool\n\nThis project provides a tool for translating Markdown documents from one language to another using OpenAI's API. It tokenizes the input document, splits it into chunks, translates each chunk, and stitches the output back together to retain the original formatting.\n\n![image](https://github.com/richawo/llm-translator/assets/35015261/fd801bc1-b802-4b5e-a772-586bd2c57699)\n\n\n## Features\n\n- Accepts Plain Text/Markdown file as input\n- Tokenizes input text using tiktoken\n- Splits input into chunks at multiple newlines \n- Sends each chunk to OpenAI for translation\n- Reconstructs translated output with original formatting\n\n## Usage\n\nTo use this translation workflow:\n\n1. Clone this repository\n2. Install requirements\n   ```\n   pip install -r requirements.txt\n   ```\n3. Set OpenAI API key\n4. Run the Jupyter notebook\n   - Pass file path to `input_path` variable\n   - Set `input_language` and `output_language`\n   - Execute notebook cells\n5. Translated file will be printed in the final cell \n\n## Configuration\n\nThe main configuration options are:\n\n- `input_path` - Path to input file \n- `input_language` - Source language code \n- `output_language` - Target language code\n- `split_string` - String used to split input into chunks\n\n## Examples\n\nThis can be used to translate Plain Text/Markdown docs like:\n\n- READMEs\n- Wikis/documentation\n- Articles/blog posts\n- Books \n\n## Limitations\n\n- Only tested with Markdown and plain text formatting\n- Accuracy depends on OpenAI's translation model\n- Currently only caters to OpenAI's GPT models\n- Does not allow for lining up translations sequentially - only one file at a time\n- Does not allow for processing multiple segments of the tranlsation simultaneously\n\n## Credits\n\n- [tiktoken](https://github.com/openai/tiktoken) for fast encoding/tokenization\n- [OpenAI API](https://openai.com/api/) for translation \n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichawo%2Fllm-translator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frichawo%2Fllm-translator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frichawo%2Fllm-translator/lists"}