https://github.com/burnycoder/syri
AI voice assistant that uses web browser use with Claude 3.7 Sonnet via Portkey, AssemblyAI for speech-to-text and ElevenLabs for text-to-speech
https://github.com/burnycoder/syri
Last synced: 9 months ago
JSON representation
AI voice assistant that uses web browser use with Claude 3.7 Sonnet via Portkey, AssemblyAI for speech-to-text and ElevenLabs for text-to-speech
- Host: GitHub
- URL: https://github.com/burnycoder/syri
- Owner: BurnyCoder
- Created: 2025-03-18T19:53:47.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-06-15T11:49:51.000Z (about 1 year ago)
- Last Synced: 2025-09-08T20:56:30.706Z (10 months ago)
- Language: Python
- Homepage:
- Size: 421 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Syri - AI Voice Assistant
This is snapshot of early development. Newer versions are private.
An open-source AI voice assistant that uses:
- OpenAI Whisper for speech-to-text
- Web browser-based agent for AI response generation (using Claude 3.7 Sonnet)
- OpenAI TTS for text-to-speech
This project enables a fully conversational AI experience similar to Siri, but using powerful AI models, a web browser agent, and high-quality audio APIs.
## Setup Instructions
### Step 1: Prerequisites
1. **API Keys:**
- Sign up for [OpenAI](https://platform.openai.com) to get an API key (used for both STT and TTS)
- Sign up for [Portkey](https://portkey.ai) to get API keys for Claude 3.7 Sonnet
2. **Install PortAudio** (required for audio recording):
- Debian/Ubuntu: `apt install portaudio19-dev`
- MacOS: `brew install portaudio sox`
3. **For MacOS users only:** Install MPV for audio streaming
- `brew install mpv`
4. **Chrome Browser:**
- Google Chrome must be installed as the web agent will launch and control Chrome
### Step 2: Install Python Dependencies
```bash
uv sync
```
### Step 3: Configure Environment Variables
1. Copy the template file to create your own environment file:
```bash
cp .envtemplate .env
```
2. Edit the `.env` file and replace the placeholder values with your actual API keys:
```
OPENAI_API_KEY=your_actual_openai_key_here
PORTKEY_API_KEY=your_actual_portkey_key_here
PORTKEY_VIRTUAL_KEY_ANTHROPIC=your_actual_portkey_virtual_key_here
```
Optional TTS configuration:
```
SYRI_TTS_VOICE=coral # Options: alloy, echo, fable, onyx, nova, shimmer
SYRI_TTS_SPEED=1.2 # Speech speed multiplier
```
## Usage
You can run the assistant using either of these methods:
### Method 1: Using the runner script (recommended)
```bash
uv run run.py
```
This script performs pre-checks and starts the assistant in an inactive listening state.
1. To start listening, press Enter or run `./scripts/start_listening.sh` (useful for automation)
2. Describe your request
3. Press Enter again or run `./scripts/stop_listening.sh` when done
4. The AI will transcribe your speech, process it through the web agent, and respond both in text (console) and through speech
## Context
This is PoC sketch. Newer version uses OpenAI voice agents and was moved to different repo.
## How It Works
When you speak to Syri:
1. Your voice is recorded using PyAudio
2. The recording is transcribed to text using OpenAI Whisper
3. The transcribed text is sent to a web agent that runs Chrome browser automation
4. The web agent uses Claude 3.7 Sonnet through Portkey to generate responses
5. The response is converted to speech using OpenAI TTS
## License
MIT