Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ahmedheakl/arazn-llm
Code-Switched translations with Large Language models
https://github.com/ahmedheakl/arazn-llm
Last synced: about 2 months ago
JSON representation
Code-Switched translations with Large Language models
- Host: GitHub
- URL: https://github.com/ahmedheakl/arazn-llm
- Owner: ahmedheakl
- License: mit
- Created: 2024-03-04T15:55:36.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-07-10T00:54:01.000Z (6 months ago)
- Last Synced: 2024-07-10T04:02:32.344Z (6 months ago)
- Language: Python
- Size: 5.42 MB
- Stars: 5
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs 🇪🇬🇬🇧
[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-yellow)](http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e)
[![arXiv](https://img.shields.io/badge/arXiv-2406.18120-b31b1b.svg)](https://arxiv.org/abs/2406.18120)
[![Speech Dataset](https://img.shields.io/badge/🗣️%20Speech%20Dataset-Hugging%20Face-blue)](https://huggingface.co/datasets/ahmedheakl/arzen-llm-speech-ds)
[![Translation Dataset](https://img.shields.io/badge/🔤%20Translation%20Dataset-Hugging%20Face-blue)](https://huggingface.co/datasets/ahmedheakl/arzen-llm-dataset)## Introduction
In recent times, code-switching between Egyptian Arabic and English has become increasingly prevalent. This repository presents our work on developing advanced machine translation (MT) and automatic speech recognition (ASR) systems specifically designed to handle this linguistic phenomenon.
### 🎥 Demo
Check out our demo to see ARZEN-LLM in action!
https://github.com/ahmedheakl/arazn-llm/assets/52796111/f8d0e8af-5444-4664-b653-7401578e2069
### 🎯 Our Goal
Our primary objective is to translate code-switched Egyptian Arabic-English to either English or Egyptian Arabic. We employ state-of-the-art methodologies utilizing large language models such as LLama and Gemma.
### 🔊 ASR Integration
In the realm of ASR, we leverage the Whisper model for code-switched Egyptian Arabic recognition. Our experimental procedures encompass:
- Data preprocessing techniques
- Advanced training methodologiesWe've implemented a consecutive speech-to-text translation system that seamlessly integrates ASR with MT, addressing challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect.
### 📊 Performance
Our evaluation against established metrics demonstrates promising results:
- **English Translation**: Significant improvement of X% over the state-of-the-art
- **Arabic Translation**: Y% improvement in performance### 🌟 Why It Matters
Code-switching is deeply inherent in spoken languages, making it crucial for ASR systems to effectively handle this phenomenon. This capability enables seamless interaction across various domains, including:
- Business negotiations
- Cultural exchanges
- Academic discourse## Open-Source Resources
We're committed to advancing research in this field. Our models and code are available as open-source resources:
- 🤗 **Models**: [Hugging Face Collection](http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e)
- 🗣️ **Speech Dataset**: [ARZEN-LLM Speech Dataset](https://huggingface.co/datasets/ahmedheakl/arzen-llm-speech-ds)
- 🔤 **Translation Dataset**: [ARZEN-LLM Translation Dataset](https://huggingface.co/datasets/ahmedheakl/arzen-llm-dataset)
- 📄 **Research Paper**: [arXiv:2406.18120](https://arxiv.org/abs/2406.18120)Feel free to explore, contribute, and build upon our work!
```bibtex
@article{heakl2024arzen,
title={ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs},
author={Heakl, Ahmed and Zaghloul, Youssef and Ali, Mennatullah and Hossam, Rania and Gomaa, Walid},
journal={arXiv preprint arXiv:2406.18120},
year={2024}
}
```