An open API service indexing awesome lists of open source software.

https://github.com/adam-pawelek/llmtranslate

Python library for text translation and language detection using OpenAI's GPT-4o.
https://github.com/adam-pawelek/llmtranslate

ai-translator genai gpt4o gpt4o-mini lama language-detection llm mistral multilingual multilingual-translations openai python3 translation

Last synced: 6 months ago
JSON representation

Python library for text translation and language detection using OpenAI's GPT-4o.

Awesome Lists containing this project

README

          

# llmtranslate
[![Test](https://github.com/adam-pawelek/llmtranslate/actions/workflows/test.yml/badge.svg)](https://github.com/adam-pawelek/llmtranslate/actions/workflows/test.yml)
[![Python package - Publish](https://github.com/adam-pawelek/llmtranslate/actions/workflows/publish.yml/badge.svg)](https://github.com/adam-pawelek/llmtranslate/actions/workflows/publish.yml)
[![Docs](https://img.shields.io/badge/docs-llmtranslate-brightgreen)](https://llm-translate.com/)
[![PyPI version](https://img.shields.io/pypi/v/llmtranslate?color=brightgreen)](https://pypi.org/project/llmtranslate/)
[![License: MIT](https://img.shields.io/badge/License-MIT-brightgreen.svg)](https://github.com/adam-pawelek/llmtranslate?tab=MIT-1-ov-file)
[![codecov](https://codecov.io/github/adam-pawelek/llmtranslate/graph/badge.svg?token=WCQOJC032S)](https://codecov.io/github/adam-pawelek/llmtranslate)
[![Downloads](https://static.pepy.tech/badge/llmtranslate)](https://pepy.tech/project/llmtranslate)

The `llmtranslate` library is a Python tool that leverages LangChain's [ChatModels 🦜🔗](https://python.langchain.com/docs/integrations/chat/) to translate text between languages and recognize the language of a given text. This library provides both synchronous and asynchronous translators to accommodate various use cases.

📖 **Documentation:** [llm-translate.com](https://llm-translate.com/)

## Installation

To ensure proper dependency management, install the required libraries **in the following order**:

1. Install LangChain chat libraries for your preferred LLM. For example:

```bash
pip install langchain-openai
```
For other LLM integrations, refer to [LangChain ChatModels documentation.](https://python.langchain.com/docs/integrations/chat/)
2. Install the `llmtranslate` library:

```bash
pip install llmtranslate
```

### Poetry problem
If you’re using Poetry, you must set a Python requirement capped below 4.0 to match langchain-openai (which requires <4.0). So, if your pyproject.toml has:
```toml
python = ">=3.12"
```
replace it with:
```toml
python = "^3.12"
```
or:
```toml
python = ">=3.12,<4.0"
```
Otherwise, Poetry sees Python 4.x as valid, which conflicts with langchain-openai’s <4.0 requirement.

---

## Example Using OpenAI's GPT

### Using `Translator` (Synchronous)
```python
from llmtranslate import Translator
from langchain_openai import ChatOpenAI

# Initialize the LLM and Translator
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")
translator = Translator(llm=llm)

# Detect the language of the text
text_language = translator.get_text_language("Hi how are you?")
if text_language:
print(text_language.ISO_639_1_code) # Output: en
print(text_language.ISO_639_2_code) # Output: eng
print(text_language.ISO_639_3_code) # Output: eng
print(text_language.language_name) # Output: English

# Translate text
translated_text = translator.translate("Bonjour tout le monde", "English")
print(translated_text) # Output: Hello everyone
```

---
## Using AsyncTranslator (Asynchronous)

This example demonstrates how to use the `AsyncTranslator` class to asynchronously detect the language of a text and translate it into another language. The `AsyncTranslator` is designed to work with asynchronous operations by splitting the input text into manageable chunks and by handling multiple concurrent requests to your language model.

### Key Configuration Options

- **`max_translation_chunk_length`**: Sets the maximum length for each text chunk that is translated.
- **`max_translation_chunk_length_multilang`**: Sets the maximum length for text chunks when dealing with multiple languages.
- **`max_concurrent_llm_calls`**: Limits the number of concurrent calls made to the underlying language model.
- **`max_concurrent_time_period`**: (New) Limits the maximum number of concurrent LLM calls within a specific time period (in seconds). For instance, if you want to allow up to 50 requests per minute, set:
- `max_concurrent_llm_calls` to **50**
- `max_concurrent_time_period` to **60**

If you only need to limit the number of concurrent calls without a time constraint, you can leave `max_concurrent_time_period` at its default value (typically `0`).

### Code Example

```python
import asyncio
from llmtranslate import AsyncTranslator
from langchain_openai import ChatOpenAI

# Initialize the language model (LLM) with your API key
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")

async def translate_text():
# Initialize AsyncTranslator with appropriate parameters
translator = AsyncTranslator(
llm=llm,
max_translation_chunk_length=100, # Maximum chunk length for translation
max_translation_chunk_length_multilang=50, # Maximum chunk length for multi-language texts
max_concurrent_llm_calls=10, # Maximum number of concurrent LLM calls
# To enforce a rate limit (e.g., 50 calls per minute), uncomment and adjust the following:
# max_concurrent_llm_calls=50,
# max_concurrent_time_period=60,
)

# Prepare tasks: one to detect language and one to translate the text
tasks = [
translator.get_text_language("Hi how are you?"),
translator.translate("Hi how are you?", "Spanish")
]

# Run the tasks concurrently
results = await asyncio.gather(*tasks)

# Retrieve and display detected language information
text_language = results[0]
if text_language:
print(text_language.ISO_639_1_code) # Expected Output: en
print(text_language.ISO_639_2_code) # Expected Output: eng
print(text_language.ISO_639_3_code) # Expected Output: eng
print(text_language.language_name) # Expected Output: English

# Display the translated text
print(results[1]) # Expected Output: Hola, ¿cómo estás?

# Run the asynchronous translation process
asyncio.run(translate_text())
```

### Summary

- **Asynchronous Execution:** Uses `asyncio` to concurrently run language detection and translation tasks.
- **Language Detection:** The `get_text_language` function returns language details (e.g., ISO 639-1, ISO 639-2, ISO 639-3 codes and language name).
- **Translation:** The `translate` function converts the input text to the desired target language.
- **Concurrency Control:**
- **Without Time Limit:** Simply set `max_concurrent_llm_calls` to limit concurrent calls.
- **With Time Limit:** Specify both `max_concurrent_llm_calls` and `max_concurrent_time_period` to control the number of calls within a specific time window (e.g., 50 calls per 60 seconds).

This example should help you understand how to integrate and configure all available parameters in the `AsyncTranslator`, including the new time-based rate limiting feature.

---
## Key Parameters

### `max_translation_chunk_length`
- **Description**: Defines the maximum length (in characters) of a text chunk to be translated in a single call when the text is in one language.
- **Recommendation**: If translations are not accurate, try reducing this value as weaker LLMs struggle with large chunks of text.

### `max_translation_chunk_length_multilang`
- **Description**: Defines the maximum length (in characters) of a text chunk when the text contains multiple languages.
- **Recommendation**: Reduce this value for better accuracy with multi-language inputs.

### `max_concurrent_llm_calls`
- **Description**: Limits the number of concurrent calls made to the LLM.
- **Default**: 10
- **Usage**: Adjust this parameter to align with your LLM provider's concurrency limits.

---

## Other Supported LLMs Examples
You can find more examples of LangChain Chat models in the documentation at:
- [llm-translate.com](https://llm-translate.com/)
- [LangChain Documentation](https://python.langchain.com/docs/integrations/chat/)

### Anthropic's Claude
To create an instance of an Anthropic-based Chat LLM:
```bash
pip install langchain-anthropic
pip install llmtranslate
```
```python
from langchain_anthropic import Anthropic
from llmtranslate import Translator
llm = Anthropic(model="claude-2", anthropic_api_key="your_anthropic_api_key")
translator = Translator(llm=llm)
```

### Mistral via API
To create an instance of a Chat LLM using the Mistral API:
```bash
pip install -qU langchain_mistralai
pip install llmtranslate
```
```python
from langchain_mistral import MistralChat
from llmtranslate import Translator

llm = MistralChat(api_key="your_mistral_api_key")
translator = Translator(llm=llm)
```

---

## Supported Languages

llmtranslate supports all languages supported by GPT-4o. For a complete list of language codes, you can visit the [ISO 639-1 website](https://localizely.com/iso-639-1-list/).

Here is a table showing which languages are supported by gpt-4o and gpt4o-mini:

| Language Name | Language Code | Supported by gpt-4o | Supported by gpt4o-mini |
|-------------------|---------------|---------------------|-------------------------|
| English | en | Yes | Yes |
| Mandarin Chinese | zh | Yes | Yes |
| Hindi | hi | Yes | Yes |
| Spanish | es | Yes | Yes |
| French | fr | Yes | Yes |
| German | de | Yes | Yes |
| Russian | ru | Yes | Yes |
| Arabic | ar | Yes | Yes |
| Italian | it | Yes | Yes |
| Korean | ko | Yes | Yes |
| Punjabi | pa | Yes | Yes |
| Bengali | bn | Yes | Yes |
| Portuguese | pt | Yes | Yes |
| Indonesian | id | Yes | Yes |
| Urdu | ur | Yes | Yes |
| Persian (Farsi) | fa | Yes | Yes |
| Vietnamese | vi | Yes | Yes |
| Polish | pl | Yes | Yes |
| Samoan | sm | Yes | Yes |
| Thai | th | Yes | Yes |
| Ukrainian | uk | Yes | Yes |
| Turkish | tr | Yes | Yes |
| Maori | mi | **No** | **No** |
| Norwegian | no | Yes | Yes |
| Dutch | nl | Yes | Yes |
| Greek | el | Yes | Yes |
| Romanian | ro | Yes | Yes |
| Swahili | sw | Yes | Yes |
| Hungarian | hu | Yes | Yes |
| Hebrew | he | Yes | Yes |
| Swedish | sv | Yes | Yes |
| Czech | cs | Yes | Yes |
| Finnish | fi | Yes | Yes |
| Amharic | am | **No** | **No** |
| Tagalog | tl | Yes | Yes |
| Burmese | my | Yes | Yes |
| Tamil | ta | Yes | Yes |
| Kannada | kn | Yes | Yes |
| Pashto | ps | Yes | Yes |
| Yoruba | yo | Yes | Yes |
| Malay | ms | Yes | Yes |
| Haitian Creole | ht | Yes | Yes |
| Nepali | ne | Yes | Yes |
| Sinhala | si | Yes | Yes |
| Catalan | ca | Yes | Yes |
| Malagasy | mg | Yes | Yes |
| Latvian | lv | Yes | Yes |
| Lithuanian | lt | Yes | Yes |
| Estonian | et | Yes | Yes |
| Somali | so | Yes | Yes |
| Tigrinya | ti | **No** | **No** |
| Breton | br | **No** | **No** |
| Fijian | fj | Yes | **No** |
| Maltese | mt | Yes | Yes |
| Corsican | co | Yes | Yes |
| Luxembourgish | lb | Yes | Yes |
| Occitan | oc | Yes | Yes |
| Welsh | cy | Yes | Yes |
| Albanian | sq | Yes | Yes |
| Macedonian | mk | Yes | Yes |
| Icelandic | is | Yes | Yes |
| Slovenian | sl | Yes | Yes |
| Galician | gl | Yes | Yes |
| Basque | eu | Yes | Yes |
| Azerbaijani | az | Yes | Yes |
| Uzbek | uz | Yes | Yes |
| Kazakh | kk | Yes | Yes |
| Mongolian | mn | Yes | Yes |
| Tibetan | bo | **No** | **No** |
| Khmer | km | Yes | **No** |
| Lao | lo | Yes | Yes |
| Telugu | te | Yes | Yes |
| Marathi | mr | Yes | Yes |
| Chichewa | ny | Yes | Yes |
| Esperanto | eo | Yes | Yes |
| Kurdish | ku | **No** | **No** |
| Tajik | tg | Yes | Yes |
| Xhosa | xh | Yes | **No** |
| Yiddish | yi | Yes | Yes |
| Zulu | zu | Yes | Yes |
| Sundanese | su | Yes | Yes |
| Tatar | tt | Yes | Yes |
| Quechua | qu | **No** | **No** |
| Uighur | ug | **No** | **No** |
| Wolof | wo | **No** | **No** |
| Tswana | tn | Yes | Yes |

## Additional Resources

- [PyPI page](https://pypi.org/project/llmtranslate/)
- [ISO 639-1 Codes](https://localizely.com/iso-639-1-list/)
- [Github project repository](https://github.com/adam-pawelek/llmtranslate)
- [Documentation](https://llm-translate.com/)

## Authors
- Adam Pawełek
- [LinkedIn](https://www.linkedin.com/in/adam-roman-pawelek/)
- [Email](mailto:adam.pwk@outlook.com)

## License

llmtranslate is licensed under the MIT License. See the LICENSE file for more details.