https://github.com/allwin107/ai-voice-agent
🎙️ AI Voice Agent is a multi-language voice assistant that transcribes speech, generates intelligent responses using LLaMA 3 via Groq, and replies with realistic synthesized speech using ElevenLabs.
https://github.com/allwin107/ai-voice-agent
agent ai ai-agent elevenlabs gorq llm python speech-to-text text-to-speech voice-agent
Last synced: about 1 year ago
JSON representation
🎙️ AI Voice Agent is a multi-language voice assistant that transcribes speech, generates intelligent responses using LLaMA 3 via Groq, and replies with realistic synthesized speech using ElevenLabs.
- Host: GitHub
- URL: https://github.com/allwin107/ai-voice-agent
- Owner: allwin107
- Created: 2025-06-10T06:38:21.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-10T07:17:17.000Z (about 1 year ago)
- Last Synced: 2025-06-10T07:33:47.693Z (about 1 year ago)
- Topics: agent, ai, ai-agent, elevenlabs, gorq, llm, python, speech-to-text, text-to-speech, voice-agent
- Language: Python
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI Voice Agent - proPAL AI - Assignment
A real-time voice interaction system built with LiveKit that combines Speech-to-Text, Large Language Model, and Text-to-Speech capabilities to create an interactive voice agent.
## Features
- Speech-to-Text (STT) using OpenAI's Whisper
- Large Language Model (LLM) integration with Groq (Llama3-70B model)
- Text-to-Speech (TTS) using ElevenLabs
- Real-time streaming support via LiveKit
- Comprehensive metrics tracking and logging to Excel
- Multi-language support
## Quick Start
### 1. Prerequisites
- Python 3.8 or higher
- Virtual environment (recommended)
### 2. Installation
```bash
# Clone or download the project files
git clone https://github.com/allwin107/AI-Voice-Agent.git
cd ai-voice-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
### Requirements
Install the required dependencies:
```bash
pip install -r requirements.txt
```
## Configuration
Create a `.env` file with your API keys:
```bash
GROQ_API_KEY=your_groq_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
# LiveKit Configuration
LIVEKIT_URL=wss://your-project.livekit.cloud
LIVEKIT_API_KEY=your_api_key
LIVEKIT_API_SECRET=your_api_secret
```
### Getting API Keys
1. **Gorq (free LLM)** [console.groq.com](https://console.groq.com) - Fast LLM inference
2. **ElevenLabs**: [elevenlabs.io](https://elevenlabs.io/) - Text-to-Speech
3. **Livekit**: [LiveKit Cloud](https://cloud.livekit.io/) - Real-time communication
## Project Structure
- `app/pipeline/` - Core pipeline components
- `stt.py` - Speech-to-Text using Whisper
- `llm.py` - Language model integration using Groq
- `tts.py` - Text-to-Speech using ElevenLabs
- `voice_agent.py` - Main voice agent pipeline
- `livekit_backend.py` - Livekit Integration
- `app/test/` - Testing Scripts
- `test_stt.py` - Tests the transcription functionality of the STT (Speech-to-Text) module.
- `test_llm.py` - This script is used to test the LLM response generation functionality.
- `test_tts.py` - Test the text-to-speech functionality of the application.
- `test_agent.py` - Test script for the voice agent pipeline
- `test_audio` - Test .wav audio file
- `app/config.py` - Configuration settings for the application
- `.env` - Environment Variables
- `README.md`
- `requirements.txt`
## Usage
### Running Tests
Test individual components:
```bash
python app/pipeline/test_stt.py
python app/pipeline/test_llm.py
python app/pipeline/test_tts.py
python app/pipeline/test_agent.py
```
### Running the Voice Agent
```bash
python app/pipeline/voice_agent.py
```
## Test Your Agent with LiveKit
### 1. Prerequisites Check
Make sure you’ve done this:
1. Activated a LiveKit Cloud instance
2. Have the following values into .env :
- LIVEKIT_WS_URL=wss://.livekit.cloud
- LIVEKIT_API_KEY=...
- LIVEKIT_API_SECRET=...
### 2. Start Your Voice Agent Locally
Run your `livekit_backend.py` script from terminal:
```bash
python app/pipeline/livekit_backend.py
```
If working correctly, logs will say:
Connected to room `your-livekit-room` as `your-participant-name`
This means the agent is live and ready to receive audio.
### 3. Join the Same Room as a Human User
Use the LiveKit Agent Playground:
https://agent.livekit.io
This is essential for testing as the "other participant"
Steps:
1. Go to the Playground URL
2. Input the same Room Name (your-livekit-room)
3. Use your LiveKit credentials:
- API Key, API Secret
- Click Join Room
🎙️ Now when you speak, your local VoiceAgentBot should:
1. Detect your voice
2. Transcribe it
3. Send it to the LLM
4. Reply back via audio in real-time
5. Log metrics
## Performance Metrics
The system tracks several key metrics:
- EOU (End of Utterance) Delay
- TTFT (Time to First Token)
- TTFB (Time to First Byte)
- Total Latency
## Future Improvements
1. Smarter language detection
2. Improved end-of-utterance (EOU) timing
3. Web or mobile interface integration
## 📜 License
This project is created for the proPAL AI Backend Engineering Internship assignment.
**Built with ❤️ for proPAL**