https://github.com/tarun7r/vocal-agent

A cutting-edge voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.
https://github.com/tarun7r/vocal-agent

kokoro llama speech-to-speech speech-to-text text-to-speech whisper

Last synced: 7 months ago
JSON representation

A cutting-edge voice assistant combining real-time speech recognition, AI reasoning, and neural text-to-speech capabilities.

Host: GitHub
URL: https://github.com/tarun7r/vocal-agent
Owner: tarun7r
License: mit
Created: 2025-04-02T08:43:35.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-04-02T09:19:23.000Z (7 months ago)
Last Synced: 2025-04-02T10:23:17.489Z (7 months ago)
Topics: kokoro, llama, speech-to-speech, speech-to-text, text-to-speech, whisper
Language: Python
Homepage:
Size: 11.7 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Real-Time Speech-to-Speech Chatbot 🤖

A cutting-edge voice assistant combining state-of-the-art speech recognition, AI reasoning, and neural text-to-speech capabilities. Built with real-time interaction in mind ( LLM tool calls)

## Features ✨

- 🎙️ Real-time speech recognition using Whisper + Silero VAD
- 🤖 Multimodal reasoning with Llama 3.1 8B through Agno agent
- 🌐 Web integration (Google Search, Wikipedia, Arxiv)
- 🗣️ Natural voice synthesis with Kokoro-82M ONNX
- ⚡ Low-latency audio processing pipeline
- 🔧 Extensible tool system for agent capabilities

## Tech Stack 🛠️

| Component | Technology |
|------------------------|-------------------------------------|
| Speech-to-Text | Whisper (large-v1) + Silero VAD |
| Language Model | Llama 3.1 8B via Ollama |
| Text-to-Speech | Kokoro-82M ONNX |
| Agent Framework | Agno LLM Agent |

## Installation 📦

### Prerequisites
- Python 3.9+
- [Ollama](https://ollama.com/) running locally

### Install Ollama
#### On Mac:
Download and install Ollama from [Ollama Mac download page](https://ollama.com/download/mac).

#### On Linux:
Run the following command in your terminal:
```bash
curl -fsSL https://ollama.com/install.sh | sh
```

```bash
# Clone repository
git clone https://github.com/tarun7r/Vocal-Agent.git

# Install Python dependencies
pip3 install -r requirements.txt

pip3 install --no-deps kokoro-onnx==0.4.7

# Install system dependencies for linux
sudo apt-get install espeak-ng

# For Mac users use brew to install
brew install espeak-ng
```

## Models Setup 🧠

### Llama 3.1 8B:
```bash
ollama pull llama3.1:8b
```

### Kokoro Models:
- Download `kokoro-v1.0.onnx` and `voices-v1.0.bin` from [kokoro-onnx releases](https://github.com/thewh1teagle/kokoro-onnx/releases/tag/model-files-v1.0).
- Place them in the project directory - Refer to the project strcture
## Usage 🚀

Start Ollama service:
```bash
ollama serve

ollama run llama3.1:8b
```

In a separate terminal:
```bash
python3 main.py
```
**Important**: Ensure `ollama serve` is running before executing [main.py](https://github.com/tarun7r/Vocal-Agent/blob/main/main.py)

### Flow after running `main.py`:
```plaintext
Listening... Press Ctrl+C to exit ⠋
speak now - Recording started ⠸
recording - Recording stopped

Transcribed: Who won the 2022 FIFA World Cup?
LLM Tool calls...

Response from the knowledge agent: The 2022 FIFA World Cup was won by Argentina, led by Lionel Messi. They defeated France in the final on December 18, 2022.

[Audio starts playing]
```
![Chatbot Demo](demo.png)
## Configuration ⚙️

Key settings in main.py:
```python
# Audio processing
SAMPLE_RATE = 16000
MAX_PHONEME_LENGTH = 500

# Voice synthesis
SPEED = 1.2 # Adjust speech rate
VOICE_PROFILE = "af_heart" # Choose from voices-v1.0.bin

# Agent settings
MAX_THREADS = 2 # Parallel processing threads
```

## Project Structure 📂
```
.
├── main.py # Core application logic
├── agent_client.py # LLM agent integration
├── kokoro-v1.0.onnx # TTS model
├── voices-v1.0.bin # Voice profiles
├── requirements.txt # Python dependencies
└── README.md
```

# Vocal-Agent Setup Script for macOS

The `vocal_agent_mac.sh` script automates the setup and execution of the Vocal-Agent application on macOS. It ensures all dependencies are installed, sets up the environment, and starts the required services.

## Prerequisites

Before running the script, ensure the following are installed on your system:

1. **Homebrew**: Install Homebrew from [https://brew.sh/](https://brew.sh/).
2. **espeak-ng**: The script will install this using Homebrew if it's not already installed.
3. **Ollama**: Download and install Ollama from [https://ollama.com/download/mac](https://ollama.com/download/mac)
4. Kokor Models: The script will download the onnx mdoels and voice bin using the curl

## How to Use the Script

1. Clone this repository:
```bash
git clone https://github.com/tarun7r/Vocal-Agent.git
cd Vocal-Agent

## License 📄

MIT License - See [LICENSE](https://github.com/tarun7r/Vocal-Agent/blob/main/LICENSE) for details

## Acknowledgements

- [RealtimeSTT](https://github.com/KoljaB/RealtimeSTT) for STS + VAD integration
- [Kokoro-ONNX](https://github.com/thewh1teagle/kokoro-onnx) for efficient TTS
- [Agno](https://docs.agno.com/introduction) for agent framework
- [Ollama](https://ollama.ai/) for local LLM serving
- Project inspiration from - [Weebo](https://github.com/amanvirparhar/weebo)
- You can add more tools to the agent - [Agno Toolkits](https://docs.agno.com/tools/toolkits/toolkits)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tarun7r/vocal-agent

Awesome Lists containing this project

README