https://github.com/adam-pawelek/llmtranslate
Python library for text translation and language detection using OpenAI's GPT-4o.
https://github.com/adam-pawelek/llmtranslate
ai-translator genai gpt4o gpt4o-mini lama language-detection llm mistral multilingual multilingual-translations openai python3 translation
Last synced: 6 months ago
JSON representation
Python library for text translation and language detection using OpenAI's GPT-4o.
- Host: GitHub
- URL: https://github.com/adam-pawelek/llmtranslate
- Owner: adam-pawelek
- License: mit
- Created: 2024-08-27T18:37:50.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-10-24T21:42:17.000Z (12 months ago)
- Last Synced: 2024-10-25T01:12:16.060Z (12 months ago)
- Topics: ai-translator, genai, gpt4o, gpt4o-mini, lama, language-detection, llm, mistral, multilingual, multilingual-translations, openai, python3, translation
- Language: Python
- Homepage:
- Size: 386 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# llmtranslate
[](https://github.com/adam-pawelek/llmtranslate/actions/workflows/test.yml)
[](https://github.com/adam-pawelek/llmtranslate/actions/workflows/publish.yml)
[](https://llm-translate.com/)
[](https://pypi.org/project/llmtranslate/)
[](https://github.com/adam-pawelek/llmtranslate?tab=MIT-1-ov-file)
[](https://codecov.io/github/adam-pawelek/llmtranslate)
[](https://pepy.tech/project/llmtranslate)The `llmtranslate` library is a Python tool that leverages LangChain's [ChatModels 🦜🔗](https://python.langchain.com/docs/integrations/chat/) to translate text between languages and recognize the language of a given text. This library provides both synchronous and asynchronous translators to accommodate various use cases.
📖 **Documentation:** [llm-translate.com](https://llm-translate.com/)
## Installation
To ensure proper dependency management, install the required libraries **in the following order**:
1. Install LangChain chat libraries for your preferred LLM. For example:
```bash
pip install langchain-openai
```
For other LLM integrations, refer to [LangChain ChatModels documentation.](https://python.langchain.com/docs/integrations/chat/)
2. Install the `llmtranslate` library:
```bash
pip install llmtranslate
```### Poetry problem
If you’re using Poetry, you must set a Python requirement capped below 4.0 to match langchain-openai (which requires <4.0). So, if your pyproject.toml has:
```toml
python = ">=3.12"
```
replace it with:
```toml
python = "^3.12"
```
or:
```toml
python = ">=3.12,<4.0"
```
Otherwise, Poetry sees Python 4.x as valid, which conflicts with langchain-openai’s <4.0 requirement.---
## Example Using OpenAI's GPT
### Using `Translator` (Synchronous)
```python
from llmtranslate import Translator
from langchain_openai import ChatOpenAI# Initialize the LLM and Translator
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")
translator = Translator(llm=llm)# Detect the language of the text
text_language = translator.get_text_language("Hi how are you?")
if text_language:
print(text_language.ISO_639_1_code) # Output: en
print(text_language.ISO_639_2_code) # Output: eng
print(text_language.ISO_639_3_code) # Output: eng
print(text_language.language_name) # Output: English# Translate text
translated_text = translator.translate("Bonjour tout le monde", "English")
print(translated_text) # Output: Hello everyone
```---
## Using AsyncTranslator (Asynchronous)This example demonstrates how to use the `AsyncTranslator` class to asynchronously detect the language of a text and translate it into another language. The `AsyncTranslator` is designed to work with asynchronous operations by splitting the input text into manageable chunks and by handling multiple concurrent requests to your language model.
### Key Configuration Options
- **`max_translation_chunk_length`**: Sets the maximum length for each text chunk that is translated.
- **`max_translation_chunk_length_multilang`**: Sets the maximum length for text chunks when dealing with multiple languages.
- **`max_concurrent_llm_calls`**: Limits the number of concurrent calls made to the underlying language model.
- **`max_concurrent_time_period`**: (New) Limits the maximum number of concurrent LLM calls within a specific time period (in seconds). For instance, if you want to allow up to 50 requests per minute, set:
- `max_concurrent_llm_calls` to **50**
- `max_concurrent_time_period` to **60**If you only need to limit the number of concurrent calls without a time constraint, you can leave `max_concurrent_time_period` at its default value (typically `0`).
### Code Example
```python
import asyncio
from llmtranslate import AsyncTranslator
from langchain_openai import ChatOpenAI# Initialize the language model (LLM) with your API key
llm = ChatOpenAI(model_name="gpt-4o", openai_api_key="your_openai_api_key")async def translate_text():
# Initialize AsyncTranslator with appropriate parameters
translator = AsyncTranslator(
llm=llm,
max_translation_chunk_length=100, # Maximum chunk length for translation
max_translation_chunk_length_multilang=50, # Maximum chunk length for multi-language texts
max_concurrent_llm_calls=10, # Maximum number of concurrent LLM calls
# To enforce a rate limit (e.g., 50 calls per minute), uncomment and adjust the following:
# max_concurrent_llm_calls=50,
# max_concurrent_time_period=60,
)
# Prepare tasks: one to detect language and one to translate the text
tasks = [
translator.get_text_language("Hi how are you?"),
translator.translate("Hi how are you?", "Spanish")
]
# Run the tasks concurrently
results = await asyncio.gather(*tasks)
# Retrieve and display detected language information
text_language = results[0]
if text_language:
print(text_language.ISO_639_1_code) # Expected Output: en
print(text_language.ISO_639_2_code) # Expected Output: eng
print(text_language.ISO_639_3_code) # Expected Output: eng
print(text_language.language_name) # Expected Output: English# Display the translated text
print(results[1]) # Expected Output: Hola, ¿cómo estás?# Run the asynchronous translation process
asyncio.run(translate_text())
```### Summary
- **Asynchronous Execution:** Uses `asyncio` to concurrently run language detection and translation tasks.
- **Language Detection:** The `get_text_language` function returns language details (e.g., ISO 639-1, ISO 639-2, ISO 639-3 codes and language name).
- **Translation:** The `translate` function converts the input text to the desired target language.
- **Concurrency Control:**
- **Without Time Limit:** Simply set `max_concurrent_llm_calls` to limit concurrent calls.
- **With Time Limit:** Specify both `max_concurrent_llm_calls` and `max_concurrent_time_period` to control the number of calls within a specific time window (e.g., 50 calls per 60 seconds).This example should help you understand how to integrate and configure all available parameters in the `AsyncTranslator`, including the new time-based rate limiting feature.
---
## Key Parameters### `max_translation_chunk_length`
- **Description**: Defines the maximum length (in characters) of a text chunk to be translated in a single call when the text is in one language.
- **Recommendation**: If translations are not accurate, try reducing this value as weaker LLMs struggle with large chunks of text.### `max_translation_chunk_length_multilang`
- **Description**: Defines the maximum length (in characters) of a text chunk when the text contains multiple languages.
- **Recommendation**: Reduce this value for better accuracy with multi-language inputs.### `max_concurrent_llm_calls`
- **Description**: Limits the number of concurrent calls made to the LLM.
- **Default**: 10
- **Usage**: Adjust this parameter to align with your LLM provider's concurrency limits.---
## Other Supported LLMs Examples
You can find more examples of LangChain Chat models in the documentation at:
- [llm-translate.com](https://llm-translate.com/)
- [LangChain Documentation](https://python.langchain.com/docs/integrations/chat/)### Anthropic's Claude
To create an instance of an Anthropic-based Chat LLM:
```bash
pip install langchain-anthropic
pip install llmtranslate
```
```python
from langchain_anthropic import Anthropic
from llmtranslate import Translator
llm = Anthropic(model="claude-2", anthropic_api_key="your_anthropic_api_key")
translator = Translator(llm=llm)
```### Mistral via API
To create an instance of a Chat LLM using the Mistral API:
```bash
pip install -qU langchain_mistralai
pip install llmtranslate
```
```python
from langchain_mistral import MistralChat
from llmtranslate import Translatorllm = MistralChat(api_key="your_mistral_api_key")
translator = Translator(llm=llm)
```---
## Supported Languages
llmtranslate supports all languages supported by GPT-4o. For a complete list of language codes, you can visit the [ISO 639-1 website](https://localizely.com/iso-639-1-list/).
Here is a table showing which languages are supported by gpt-4o and gpt4o-mini:
| Language Name | Language Code | Supported by gpt-4o | Supported by gpt4o-mini |
|-------------------|---------------|---------------------|-------------------------|
| English | en | Yes | Yes |
| Mandarin Chinese | zh | Yes | Yes |
| Hindi | hi | Yes | Yes |
| Spanish | es | Yes | Yes |
| French | fr | Yes | Yes |
| German | de | Yes | Yes |
| Russian | ru | Yes | Yes |
| Arabic | ar | Yes | Yes |
| Italian | it | Yes | Yes |
| Korean | ko | Yes | Yes |
| Punjabi | pa | Yes | Yes |
| Bengali | bn | Yes | Yes |
| Portuguese | pt | Yes | Yes |
| Indonesian | id | Yes | Yes |
| Urdu | ur | Yes | Yes |
| Persian (Farsi) | fa | Yes | Yes |
| Vietnamese | vi | Yes | Yes |
| Polish | pl | Yes | Yes |
| Samoan | sm | Yes | Yes |
| Thai | th | Yes | Yes |
| Ukrainian | uk | Yes | Yes |
| Turkish | tr | Yes | Yes |
| Maori | mi | **No** | **No** |
| Norwegian | no | Yes | Yes |
| Dutch | nl | Yes | Yes |
| Greek | el | Yes | Yes |
| Romanian | ro | Yes | Yes |
| Swahili | sw | Yes | Yes |
| Hungarian | hu | Yes | Yes |
| Hebrew | he | Yes | Yes |
| Swedish | sv | Yes | Yes |
| Czech | cs | Yes | Yes |
| Finnish | fi | Yes | Yes |
| Amharic | am | **No** | **No** |
| Tagalog | tl | Yes | Yes |
| Burmese | my | Yes | Yes |
| Tamil | ta | Yes | Yes |
| Kannada | kn | Yes | Yes |
| Pashto | ps | Yes | Yes |
| Yoruba | yo | Yes | Yes |
| Malay | ms | Yes | Yes |
| Haitian Creole | ht | Yes | Yes |
| Nepali | ne | Yes | Yes |
| Sinhala | si | Yes | Yes |
| Catalan | ca | Yes | Yes |
| Malagasy | mg | Yes | Yes |
| Latvian | lv | Yes | Yes |
| Lithuanian | lt | Yes | Yes |
| Estonian | et | Yes | Yes |
| Somali | so | Yes | Yes |
| Tigrinya | ti | **No** | **No** |
| Breton | br | **No** | **No** |
| Fijian | fj | Yes | **No** |
| Maltese | mt | Yes | Yes |
| Corsican | co | Yes | Yes |
| Luxembourgish | lb | Yes | Yes |
| Occitan | oc | Yes | Yes |
| Welsh | cy | Yes | Yes |
| Albanian | sq | Yes | Yes |
| Macedonian | mk | Yes | Yes |
| Icelandic | is | Yes | Yes |
| Slovenian | sl | Yes | Yes |
| Galician | gl | Yes | Yes |
| Basque | eu | Yes | Yes |
| Azerbaijani | az | Yes | Yes |
| Uzbek | uz | Yes | Yes |
| Kazakh | kk | Yes | Yes |
| Mongolian | mn | Yes | Yes |
| Tibetan | bo | **No** | **No** |
| Khmer | km | Yes | **No** |
| Lao | lo | Yes | Yes |
| Telugu | te | Yes | Yes |
| Marathi | mr | Yes | Yes |
| Chichewa | ny | Yes | Yes |
| Esperanto | eo | Yes | Yes |
| Kurdish | ku | **No** | **No** |
| Tajik | tg | Yes | Yes |
| Xhosa | xh | Yes | **No** |
| Yiddish | yi | Yes | Yes |
| Zulu | zu | Yes | Yes |
| Sundanese | su | Yes | Yes |
| Tatar | tt | Yes | Yes |
| Quechua | qu | **No** | **No** |
| Uighur | ug | **No** | **No** |
| Wolof | wo | **No** | **No** |
| Tswana | tn | Yes | Yes |## Additional Resources
- [PyPI page](https://pypi.org/project/llmtranslate/)
- [ISO 639-1 Codes](https://localizely.com/iso-639-1-list/)
- [Github project repository](https://github.com/adam-pawelek/llmtranslate)
- [Documentation](https://llm-translate.com/)## Authors
- Adam Pawełek
- [LinkedIn](https://www.linkedin.com/in/adam-roman-pawelek/)
- [Email](mailto:adam.pwk@outlook.com)
## License
llmtranslate is licensed under the MIT License. See the LICENSE file for more details.