https://github.com/navjotdhanawat/py-ai-voice-agent

PipeCat Voice Agent is an AI-powered voice communication system that enables intelligent, real-time phone conversations through WebSocket connections. It combines multiple technologies including speech recognition (Deepgram), natural language processing (GPT-4), tts (Cartesia), and Telephony (Plivo) to create seamless voice inte
https://github.com/navjotdhanawat/py-ai-voice-agent

agent ai deepgram openai pipecat-ai plivo voice

Last synced: 6 months ago
JSON representation

Host: GitHub
URL: https://github.com/navjotdhanawat/py-ai-voice-agent
Owner: navjotdhanawat
Created: 2025-01-30T02:16:06.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-03-01T22:19:23.000Z (8 months ago)
Last Synced: 2025-04-14T10:05:59.670Z (6 months ago)
Topics: agent, ai, deepgram, openai, pipecat-ai, plivo, voice
Language: Python
Homepage:
Size: 9.04 MB
Stars: 3
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# PipeCat Voice Agent

PipeCat Voice Agent is an AI-powered voice communication system that enables intelligent, real-time phone conversations. The system integrates various technologies to provide a seamless voice interaction experience:

## Core Features

- **Real-time Voice Processing**: Utilizes WebSocket connections for bidirectional audio streaming
- **Speech Recognition**: Integrates with Deepgram for accurate speech-to-text conversion
- **Natural Language Processing**: Leverages OpenAI's GPT-4 for intelligent conversation handling
- **Text-to-Speech**: Uses Cartesia for high-quality voice synthesis
- **Telephony Integration**: Built on Plivo's platform for reliable call handling
- **Call Recording**: Supports automatic call recording and storage
- **Voice Activity Detection**: Implements Silero VAD for precise speech detection

## Technical Stack

- **Backend**: FastAPI (Python)
- **Voice Processing**: PipeCat Pipeline System
- **External Services**:
- Plivo (Telephony)
- OpenAI GPT-4 (LLM)
- Deepgram (STT)
- Cartesia (TTS)
- S3 (Recording Storage)

## Local Development with Plivo and ngrok

To enable Plivo to reach your local development server, follow these steps:

1. Install ngrok:

```zsh
npm install -g ngrok
# or
brew install ngrok
```

2. Start your FastAPI server:

```zsh
uvicorn app.main:app --reload
```

3. Start ngrok tunnel (in a new terminal):

```zsh
ngrok http 8000
```

4. Copy the HTTPS URL provided by ngrok (e.g., https://your-tunnel.ngrok.io)

5. Update your environment variables:

```zsh
export BASE_URL=your-ngrok-url
```

6. Configure Plivo:
- Log into your Plivo dashboard
- Update your application's Answer URL to: `{ngrok-url}/api/v1/calls/answer`
- Update your application's Hangup URL to: `{ngrok-url}/api/v1/calls/hangup`

Now Plivo will be able to reach your local server through the secure ngrok tunnel for both HTTP callbacks and WebSocket connections.

## Important Notes

- The ngrok URL changes each time you restart ngrok (unless you have a paid plan)
- Make sure to update your BASE_URL environment variable with the new ngrok URL each time
- For production, replace the ngrok URL with your actual domain name

## Testing

To test the integration:

1. Make sure your server is running and ngrok is active
2. Try making an outbound call using the API endpoint:
```zsh
curl -X POST "http://localhost:8000/api/v1/calls/outbound/{phone-number}"
```
3. The call should connect and establish a WebSocket connection for real-time voice processing

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/navjotdhanawat/py-ai-voice-agent

Awesome Lists containing this project

README