Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/richawo/llm-translator

Translate Markdown files from one language to another using OpenAI's API while retaining original formatting. This Jupyter notebook tokenizes input text, splits into chunks, translates with OpenAI, and reconstructs output to preserve Markdown structure. Useful for localizing documentation, articles, books, and other long-form Markdown content.
https://github.com/richawo/llm-translator

ai artificial-intelligence gpt-4 jupyter-notebook llm localisation localization openai openai-api translation translator translator-app

Last synced: about 2 months ago
JSON representation

Translate Markdown files from one language to another using OpenAI's API while retaining original formatting. This Jupyter notebook tokenizes input text, splits into chunks, translates with OpenAI, and reconstructs output to preserve Markdown structure. Useful for localizing documentation, articles, books, and other long-form Markdown content.

Awesome Lists containing this project

README

        

# OpenAI Translator / Localisation Tool

This project provides a tool for translating Markdown documents from one language to another using OpenAI's API. It tokenizes the input document, splits it into chunks, translates each chunk, and stitches the output back together to retain the original formatting.

![image](https://github.com/richawo/llm-translator/assets/35015261/fd801bc1-b802-4b5e-a772-586bd2c57699)

## Features

- Accepts Plain Text/Markdown file as input
- Tokenizes input text using tiktoken
- Splits input into chunks at multiple newlines
- Sends each chunk to OpenAI for translation
- Reconstructs translated output with original formatting

## Usage

To use this translation workflow:

1. Clone this repository
2. Install requirements
```
pip install -r requirements.txt
```
3. Set OpenAI API key
4. Run the Jupyter notebook
- Pass file path to `input_path` variable
- Set `input_language` and `output_language`
- Execute notebook cells
5. Translated file will be printed in the final cell

## Configuration

The main configuration options are:

- `input_path` - Path to input file
- `input_language` - Source language code
- `output_language` - Target language code
- `split_string` - String used to split input into chunks

## Examples

This can be used to translate Plain Text/Markdown docs like:

- READMEs
- Wikis/documentation
- Articles/blog posts
- Books

## Limitations

- Only tested with Markdown and plain text formatting
- Accuracy depends on OpenAI's translation model
- Currently only caters to OpenAI's GPT models
- Does not allow for lining up translations sequentially - only one file at a time
- Does not allow for processing multiple segments of the tranlsation simultaneously

## Credits

- [tiktoken](https://github.com/openai/tiktoken) for fast encoding/tokenization
- [OpenAI API](https://openai.com/api/) for translation

## License

MIT