https://github.com/addex12/oromo-language-model
Afaan Oromo Translation Model A machine translation model to translate between English and Afaan Oromo using deep learning. This repository contains code for training, fine-tuning, and deploying a neural machine translation (NMT) model using Hugging Face Transformers and datasets.
https://github.com/addex12/oromo-language-model
Last synced: 2 months ago
JSON representation
Afaan Oromo Translation Model A machine translation model to translate between English and Afaan Oromo using deep learning. This repository contains code for training, fine-tuning, and deploying a neural machine translation (NMT) model using Hugging Face Transformers and datasets.
- Host: GitHub
- URL: https://github.com/addex12/oromo-language-model
- Owner: addex12
- Created: 2025-02-24T20:21:34.000Z (3 months ago)
- Default Branch: develop
- Last Pushed: 2025-02-24T21:26:57.000Z (3 months ago)
- Last Synced: 2025-02-24T21:28:33.430Z (3 months ago)
- Language: Python
- Size: 6.84 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Afaan Oromo-Language-Model
## Overview
This repository provides a **machine translation model** that translates between **English** and **Afaan Oromo**. The model is based on the Hugging Face **transformers** library and uses pre-trained models that are fine-tuned on English ↔ Afaan Oromo datasets.### Key Features:
- **Translation** between English and Afaan Oromo
- Fine-tuning of pre-trained models for improved accuracy
- Preprocessing and tokenization for datasets
- Easy-to-use interface for translation tasks---
## Installation
To get started with this project, follow the steps below:
1. Clone this repository:
```bash
git clone https://github.com/your-username/afaan-oromo-translation.git
cd afaan-oromo-translationCreate and activate a Python virtual environment:
python -m venv venv
source venv/bin/activate # For Linux/Mac
venv\Scripts\activate # For WindowsInstall the required libraries:
pip install -r requirements.txt
How to Use
Translating Text:You can use the pre-trained model for quick translation:
from transformers import MarianMTModel, MarianTokenizer
# Load the pre-trained model and tokenizer
model_name = "Helsinki-NLP/opus-mt-en-orm"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)# Translate text
text = "Hello, how are you?"
inputs = tokenizer(text, return_tensors="pt")
translated = model.generate(**inputs)
output = tokenizer.decode(translated[0], skip_special_tokens=True)print(output) # Output: Afaan Oromo translation
Fine-tuning the Model:
If you want to fine-tune the model with your custom dataset, follow these steps:
Prepare your English ↔ Afaan Oromo dataset in CSV format (with columns english_text and afaan_oromo_text).
Load and preprocess the dataset:
from datasets import load_dataset
dataset = load_dataset("csv", data_files={"train": "path/to/your_dataset.csv"})
train_data = dataset["train"]Train the model:
from transformers import Trainer, TrainingArguments
training_args = TrainingArguments(
output_dir="./afaan_oromo_translator",
evaluation_strategy="epoch",
learning_rate=5e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=3,
weight_decay=0.01,
logging_dir="./logs",
)trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_data,
tokenizer=tokenizer,
)trainer.train()
Contributing
Contributions are welcome! If you'd like to contribute to the project, feel free to fork the repository and submit a pull request.
LicenseThis project is licensed under the MIT License.
ContactIf you have any questions, feel free to contact me:
Name: Adugna Gizaw
Email: [email protected]Acknowledgements
Hugging Face: for providing pre-trained models and the Transformers library
TensorFlow & PyTorch: for the machine learning frameworks