https://github.com/mohammad-nour-alawad/voice-to-pandas-llm-backend
FAST API for LLM Inference with Qwen2.5, Whisper AI and Vits TTS
https://github.com/mohammad-nour-alawad/voice-to-pandas-llm-backend
agentic-ai agents code-generation langgraph qwen speach-to-text text-to-speech tts vits vllm whisper-ai
Last synced: about 2 months ago
JSON representation
FAST API for LLM Inference with Qwen2.5, Whisper AI and Vits TTS
- Host: GitHub
- URL: https://github.com/mohammad-nour-alawad/voice-to-pandas-llm-backend
- Owner: mohammad-nour-alawad
- Created: 2024-11-24T03:49:21.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2025-05-17T22:42:55.000Z (5 months ago)
- Last Synced: 2025-05-17T23:25:39.181Z (5 months ago)
- Topics: agentic-ai, agents, code-generation, langgraph, qwen, speach-to-text, text-to-speech, tts, vits, vllm, whisper-ai
- Language: Python
- Homepage:
- Size: 252 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# AI Voice Assistant for Data Visualization & Manipulation



Voice-enabled AI assistant that helps with data visualization and manipulation through natural language commands, powered by state-of-the-art language models and **LangGraph** workflows.
## 🛠️ Technical Stack
### Core Components
- **Backend**: `FastAPI` (Python)
- **Workflow Engine**: `LangGraph`
- **LLM Serving**: `vLLM`
- **Speech-to-Text**: `Whisper` (medium.en)
- **Text-to-Speech**: `VITS` (VCTK voices)### Models
| Component | Model | Specification |
|-----------|-------|---------------|
| STT | Whisper | `medium.en` |
| LLM | Qwen2.5-Coder | `32B-Instruct-AWQ` |
| TTS | VITS | `tts_models/en/vctk/vits` (speaker p225) |## ✨ Key Features
- **Voice Interface**: Speech-to-text and text-to-speech capabilities
- **Intelligent Workflows**: LangGraph-powered decision making
- **Code Generation**: Automatic Python code generation for data tasks
- **Conversational AI**: Context-aware chat responses
- **High Performance**: Optimized inference with vLLM## 🗂 Project Structure
```bash
ai-voice-assistant/
├── api.py # FastAPI endpoints
├── models.py # Model loading and inference
├── prompts.py # Prompt templates
├── schemas.py # Type definitions and Pydantic models
├── workflow.py # LangGraph workflow definition
└── README.md
```## 🛠️ Workflow Graph:
```mermaid
graph TD
A[Client] -->|POST /converse| B(API: converse)
A -->|POST /transcribe| C(API: transcribe)
subgraph Conversational Workflow
B --> D[Initialize State]
D -->|user_input, metadata, history| E[decide_action]
E -->|LLM decision| F{Action?}
F -->|code_generation| G[generate_code]
F -->|chat_response| H[generate_chat_response]
G --> I[Update State with Code]
H --> J[Generate TTS Audio]
J --> K[Update State with Message+Audio]
I & K --> L[Return Response]
end
subgraph Transcription Flow
C --> M[Save Audio File]
M --> N[Whisper STT]
N --> O[Return Text]
end
style B stroke:#4a90e2
style C stroke:#50e3c2
style E stroke:#f5a623
style G stroke:#7ed321
style H stroke:#bd10e0
style N stroke:#ff6b6b
```## 🚀 Getting Started
### Prerequisites
- Python 3.9+
- NVIDIA GPU with CUDA support at least 30 GB (I used `NVIDIA RTX 6000 Ada Generation 48GB VRAM`)
- Docker (recommended)### Installation
```bash
git clone https://github.com/your-repo/Voice-to-Pandas-LLM-backend.git
cd Voice-to-Pandas-LLM-backend
pip install -r requirements.txt
```then run the API using:
```bash
CUDA_VISIBLE_DEVICES=0 uvicorn api:app --host 0.0.0.0 --port 6000
```