{"id":50409449,"url":"https://github.com/Voxray-AI/Voxray","last_synced_at":"2026-06-11T17:00:42.717Z","repository":{"id":341409768,"uuid":"1154619863","full_name":"Voxray-AI/Voxray","owner":"Voxray-AI","description":"Open-source real-time Voice AI infrastructure in Go. Stream audio via WebRTC or WebSocket, connect STT → LLM → TTS pipelines, and build scalable voice agents and conversational AI applications.","archived":false,"fork":false,"pushed_at":"2026-05-16T16:10:03.000Z","size":6192,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-16T17:31:13.337Z","etag":null,"topics":["ai-ivr","audio-streaming","conversational-ai","generative-ai","ivr-application","ivr-system","llm","real-time","real-time-audio","sdp","speech-ai","speech-recognition","speech-to-text","text-to-speech","turn-server","voice-agent","voice-agents","voice-ai","webrtc","websocket"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Voxray-AI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-10T15:41:37.000Z","updated_at":"2026-05-16T16:10:07.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/Voxray-AI/Voxray","commit_stats":null,"previous_names":["voxray-ai/voxray-ai"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/Voxray-AI/Voxray","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Voxray-AI%2FVoxray","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Voxray-AI%2FVoxray/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Voxray-AI%2FVoxray/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Voxray-AI%2FVoxray/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Voxray-AI","download_url":"https://codeload.github.com/Voxray-AI/Voxray/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Voxray-AI%2FVoxray/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34208761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-ivr","audio-streaming","conversational-ai","generative-ai","ivr-application","ivr-system","llm","real-time","real-time-audio","sdp","speech-ai","speech-recognition","speech-to-text","text-to-speech","turn-server","voice-agent","voice-agents","voice-ai","webrtc","websocket"],"created_at":"2026-05-31T03:00:25.607Z","updated_at":"2026-06-11T17:00:42.710Z","avatar_url":"https://github.com/Voxray-AI.png","language":"Go","funding_links":[],"categories":["Audio and Music"],"sub_categories":[],"readme":"# Voxray-AI\n\n[![Go](https://img.shields.io/badge/Go-1.25+-00ADD8?logo=go)](https://go.dev/)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)\n[![Go Reference](https://pkg.go.dev/badge/github.com/Voxray-AI/Voxray.svg)](https://pkg.go.dev/github.com/Voxray-AI/Voxray)\n[![Go Report Card](https://goreportcard.com/badge/github.com/Voxray-AI/Voxray)](https://goreportcard.com/report/github.com/Voxray-AI/Voxray)\n[![codecov](https://codecov.io/gh/Voxray-AI/Voxray/branch/main/graph/badge.svg)](https://codecov.io/gh/Voxray-AI/Voxray)\n[![Docs](https://img.shields.io/badge/docs-online-blue)](https://voxray-cac3ed72.mintlify.app/get-started/introduction)\n\u003e Build production-ready AI voice agents with a single JSON config.\n\u003e WebSocket \u0026 WebRTC · STT → LLM → TTS · Low-latency · Self-hostable\n\nConfig-driven Go server for building real-time voice agents. Wire together speech-to-text, LLM, and text-to-speech providers into low-latency streaming pipelines — no audio plumbing required.\n\n---\n\n## Table of Contents\n\n- [Overview](#overview)\n- [Quick Start](#quick-start)\n- [Features](#features)\n- [Supported Providers](#supported-providers)\n- [Architecture](#architecture)\n- [Requirements](#requirements)\n- [Installation](#installation)\n- [Configuration](#configuration)\n- [Environment Variables](#environment-variables)\n- [Examples](#examples)\n- [Use Cases](#use-cases)\n- [Roadmap](#roadmap)\n- [Documentation](#documentation)\n- [License](#license)\n- [Contributing](#contributing)\n\n---\n\n## Overview\n\nVoxray-AI (`github.com/Voxray-AI/Voxray`) is a **config-driven Go server** for building **real-time voice agents** over **WebSocket** and **WebRTC**. It wires together **STT**, **LLM**, and **TTS** providers into low-latency streaming pipelines. Pipelines, providers, and transports are defined via JSON config, making it easy to swap services and deploy to your own infrastructure.\n\nFor architecture and pipeline details, see [Architecture](docs/ARCHITECTURE.md).\n\n---\n\n## Quick Start\n\nGet the server running end-to-end in under 5 minutes.\n\n**1. Prerequisites**\n\n```bash\ngo version    # Go 1.25+ required (see go.mod)\ngcc --version # only needed for WebRTC/Opus — see Requirements\n```\n\n**2. Clone and build**\n\n```bash\ngit clone https://github.com/your-org/voxray-ai.git\ncd voxray-ai\ngo build -o voxray ./cmd/voxray\n# or: make build\n```\n\n**3. Configure**\n\n```bash\ncp config.example.json config.json\n# Set your API keys in config.json or via environment variables (e.g. OPENAI_API_KEY)\n```\n\n**4. Run**\n\n```bash\n./voxray -config config.json\n# Windows: .\\voxray.exe -config config.json\n```\n\nYou can override config with flags: `-config`, `-transport` (webrtc, daily, twilio, telnyx, plivo, exotel), `-port`, `-proxy` (public hostname for telephony webhooks), `-dialin` (Daily PSTN; requires transport=daily). Use `-init` to scaffold `config.json` and dirs then exit, or run `voxray init [config-path]`.\n\n**5. Connect**\n\n| Endpoint | Method | Description |\n|----------|--------|-------------|\n| `/ws` | GET | WebSocket transport (upgrade) |\n| `/webrtc/offer` | POST | WebRTC signaling (SDP offer/answer) |\n| `/health` | GET | Liveness |\n| `/ready` | GET | Readiness |\n| `/start` | POST | Create session (runner-style WebRTC) |\n| `/sessions/:id/offer`, `/api/v1/sessions/:id/offer` | POST, PATCH | Session SDP offer (after `/start`) |\n| `/telephony/ws` | GET | Telephony media WebSocket (when `runner_transport` is Twilio/Telnyx/Plivo/Exotel) |\n| `/swagger/` | GET | Swagger UI (when built with swag) |\n| `/metrics` | GET | Prometheus metrics |\n\nRunner and telephony behavior are detailed in [docs/CONNECTIVITY.md](docs/CONNECTIVITY.md).\n\n**6. Try the WebRTC browser client (optional)**\n\n```bash\ncd tests/frontend \u0026\u0026 python -m http.server 3000\n# Open http://localhost:3000/webrtc-voice.html, set Server URL to http://localhost:8080, click Start\n```\n\nSee [tests/frontend/README.md](tests/frontend/README.md) for details.\n\n---\n\n## Features\n\n- **Low-latency pipelines** — STT → LLM → TTS with configurable providers and models\n- **Dual transports** — WebSocket (`/ws`) and WebRTC via SmallWebRTC (`/webrtc/offer`)\n- **Telephony \u0026 Daily.co** — Twilio, Telnyx, Plivo, Exotel, and Daily.co (rooms + optional PSTN dial-in); media over WebSocket after provider webhooks\n- **MCP tool integration** — optional MCP server (configurable command/args) so the LLM can call tools\n- **Wide provider support** — OpenAI, Anthropic, Groq, Sarvam, AWS, Google, ElevenLabs, and more\n- **Plugin system** — custom processors and aggregators via an extensible framework\n- **Config-driven** — JSON configuration for all pipeline stages; API keys via config or environment variables\n- **Conversation recording** — mixed audio per session, uploaded asynchronously to S3\n- **Transcript logging** — per-message text logs to Postgres or MySQL\n- **Observability** — Prometheus metrics at `/metrics`\n- **Voice over WebRTC** — optional CGO/Opus build for real-time TTS audio delivery\n\n---\n\n## Supported Providers\n\nProvider sets and capability matrix are defined in [pkg/services](pkg/services/README.md) (`SupportedSTTProviders`, `SupportedLLMProviders`, `SupportedTTSProviders` in `factory.go`). Summary:\n\n| Stage   | Provider      | Notes                                      |\n| :------ | :------------- | :----------------------------------------- |\n| **STT** | OpenAI        | Whisper via OpenAI API (e.g. `gpt-4o-mini-transcribe`) |\n|         | Groq          | —                                          |\n|         | Sarvam        | Indian languages                           |\n|         | ElevenLabs    | —                                          |\n|         | AWS           | Amazon Transcribe                          |\n|         | Google        | Cloud Speech-to-Text                       |\n|         | Whisper       | Direct Whisper integration                  |\n|         | Camb          | —                                          |\n|         | Gradium       | —                                          |\n|         | Soniox        | —                                          |\n| **LLM** | OpenAI        | GPT-4.1, GPT-4o, etc.                      |\n|         | Groq          | —                                          |\n|         | Grok          | —                                          |\n|         | Cerebras      | —                                          |\n|         | AWS           | Amazon Bedrock                             |\n|         | Mistral       | —                                          |\n|         | DeepSeek      | —                                          |\n|         | Anthropic     | Claude                                     |\n|         | Google        | Gemini                                     |\n|         | Google Vertex | ADC-based authentication                   |\n|         | Ollama        | Local/self-hosted models                   |\n|         | Qwen          | —                                          |\n|         | AsyncAI       | —                                          |\n|         | Fish          | —                                          |\n|         | Inworld       | —                                          |\n|         | Minimax       | —                                          |\n|         | Moondream     | —                                          |\n|         | OpenPipe      | —                                          |\n| **TTS** | OpenAI        | `alloy`, `nova`, etc.                      |\n|         | Groq          | —                                          |\n|         | Sarvam        | Indian languages                           |\n|         | ElevenLabs    | —                                          |\n|         | AWS           | Amazon Polly                               |\n|         | Google        | Cloud Text-to-Speech                       |\n|         | Hume          | —                                          |\n|         | Inworld       | —                                          |\n|         | Minimax       | —                                          |\n|         | Neuphonic     | —                                          |\n|         | XTTS          | Self-hosted Coqui XTTS                     |\n\n---\n\n## Architecture\n\nAudio is received from web or native clients over **WebSocket** or **WebRTC**, processed through a configurable **STT → LLM → TTS** pipeline, and streamed back over the same transport. Each stage is pluggable — mix and match providers while keeping a consistent, low-latency pipeline.\n\n```mermaid\nflowchart TB\n  subgraph Client[\"Client\"]\n    Browser[\"Browser / Native app\"]\n  end\n  subgraph Server[\"Server\"]\n    HTTP[\"HTTP\\n/ws, /webrtc/offer\\n/metrics\"]\n  end\n  subgraph Transport[\"Transport\"]\n    WS[\"WebSocket\"]\n    WebRTC[\"SmallWebRTC\"]\n  end\n  subgraph Pipeline[\"Pipeline\"]\n    Runner[\"Runner\"]\n    Chain[\"Processors\\nVAD → STT → LLM → TTS → Sink\"]\n  end\n  subgraph Providers[\"External providers\"]\n    STT_API[\"STT API\"]\n    LLM_API[\"LLM API\"]\n    TTS_API[\"TTS API\"]\n  end\n  Browser --\u003e WS\n  Browser --\u003e WebRTC\n  WS --\u003e HTTP\n  WebRTC --\u003e HTTP\n  HTTP --\u003e Runner\n  Runner --\u003e Chain\n  Chain --\u003e STT_API\n  Chain --\u003e LLM_API\n  Chain --\u003e TTS_API\n  Chain --\u003e WS\n  Chain --\u003e WebRTC\n```\n\n\u003e Audio flows from clients (browser, runner, telephony, or Daily.co) into the server via WebSocket, SmallWebRTC, or telephony WebSocket. The runner wires each transport to the same pipeline (VAD → STT → LLM → TTS); external STT/LLM/TTS are called from [pkg/services](pkg/services/README.md). See [docs/CONNECTIVITY.md](docs/CONNECTIVITY.md) and [docs/SYSTEM_ARCHITECTURE.md](docs/SYSTEM_ARCHITECTURE.md).\n\nFor a deeper dive, see [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) and [docs/SYSTEM_ARCHITECTURE.md](docs/SYSTEM_ARCHITECTURE.md).\n\n---\n\n## Requirements\n\n**Go 1.25+** is the only hard requirement for the default (WebSocket-only) build.\n\n```bash\ngo version    # should be 1.25+ (see go.mod)\n```\n\nFor **voice over WebRTC (TTS audio via Opus)**, CGO and a C compiler (`gcc`) must also be on your PATH:\n\n```bash\ngcc --version # only needed for WebRTC/Opus builds\n```\n\n### C compiler on Windows\n\nCGO requires `gcc` on your PATH. Two options:\n\n**WinLibs (winget):**\n```powershell\nwinget install BrechtSanders.WinLibs.POSIX.UCRT --accept-package-agreements\n# Restart terminal, then verify:\ngcc --version\n```\n\n**MSYS2:**\n\nInstall [MSYS2](https://www.msys2.org/), open **MSYS2 UCRT64**, then:\n```bash\npacman -S mingw-w64-ucrt-x86_64-toolchain\n```\nAdd `C:\\msys64\\ucrt64\\bin` to PATH and verify with `gcc --version`.\n\n\u003e Without CGO, WebRTC TTS will report *opus encoder unavailable (build without cgo)* and the server returns **503** for WebRTC offers.\n\n---\n\n## Installation\n\nThe default build has no external dependencies. The voice/WebRTC build requires CGO and gcc (see [Requirements](#requirements)).\n\n### Default build (WebSocket only, no Opus)\n\n```bash\ngo build -o voxray ./cmd/voxray\n# or:\nmake build \u0026\u0026 make run\n```\n\n### Build with voice (WebRTC TTS + Opus)\n\n**Linux / macOS:**\n```bash\nmake build-voice\n./voxray -config config.json\n# or in one step:\nmake run-voice ARGS=\"-config config.json\"\n```\n\n**Windows (PowerShell):**\n```powershell\n# Build once, then run:\n.\\scripts\\build-voice.ps1\n.\\voxray.exe -config config.json\n\n# Or build and run in one step:\n.\\scripts\\run-voice.ps1 -config config.json\n```\n\n**Manual (any OS):**\n```bash\nCGO_ENABLED=1 go build -o voxray ./cmd/voxray\n./voxray -config config.json\n# or:\nCGO_ENABLED=1 go run ./cmd/voxray -config config.json\n```\n\nAfter a voice build, WebRTC offers succeed and TTS audio is delivered over the peer connection.\n\n---\n\n## Configuration\n\nSet the config path via the `-config` flag or the `VOXRAY_CONFIG` environment variable. Copy [config.example.json](config.example.json) to `config.json` to get started.\n\n### Top-level keys\n\n| Key | Type | Default | Description |\n|-----|------|---------|-------------|\n| `transport` | string | `\"websocket\"` | `\"websocket\"`, `\"smallwebrtc\"`, or `\"both\"` |\n| `host` | string | `\"0.0.0.0\"` | Bind host |\n| `port` | int | `8080` | Bind port |\n| `stt_provider` | string | — | STT provider name (e.g. `\"openai\"`) |\n| `llm_provider` | string | — | LLM provider name (e.g. `\"openai\"`) |\n| `tts_provider` | string | — | TTS provider name (e.g. `\"openai\"`) |\n| `api_keys` | object | — | Map of provider → API key |\n| `metrics_enabled` | bool | `true` | Expose Prometheus `/metrics` |\n| `webrtc_ice_servers` | array | — | ICE server config for WebRTC |\n| `rtc_max_duration_secs` | float | `0` | Max lifetime for RTC/WebSocket voice sessions after first inbound audio; `0` disables |\n| `recording` | object | — | S3 conversation recording (see below) |\n| `transcripts` | object | — | Database transcript logging (see below) |\n| `mcp` | object | — | MCP server: `command`, `args`, `tools_filter` (see [pkg/config/README.md](pkg/config/README.md)) |\n\n### Additional config\n\n| Key | Description |\n|-----|-------------|\n| `provider` | Default provider for STT/LLM/TTS when task-specific (`stt_provider`, etc.) not set |\n| `runner_transport` | `webrtc` \\| `daily` \\| `twilio` \\| `telnyx` \\| `plivo` \\| `exotel` \\| `livekit` \\| `\"\"` |\n| `runner_port`, `proxy_host`, `dialin` | Runner and telephony (e.g. public hostname for webhooks; Daily PSTN dial-in) |\n| `plugins`, `plugin_options` | Pipeline plugins and options (see [docs/EXTENSIONS.md](docs/EXTENSIONS.md)) |\n| `turn_detection`, `turn_stop_secs`, `turn_pre_speech_ms`, `turn_max_duration_secs`, `vad_*`, `user_turn_stop_timeout_secs`, `user_idle_timeout_secs`, `turn_async` | Turn detection and VAD |\n| `allow_interruptions`, `interruption_strategy`, `min_words` | Barge-in / interruption behavior |\n| `cors_allowed_origins`, `max_request_body_bytes`, `server_api_key` | Server and optional API key auth |\n| `legacy_errors`, `shutdown_upload_timeout_secs` | Compatibility and shutdown tuning |\n\nSee [config.example.json](config.example.json) and [examples/voice/README.md](examples/voice/README.md) for all options.\n\n### Recording (S3)\n\nVoxray can record the full mixed conversation audio per session and upload it asynchronously to S3.\n\n```json\n\"recording\": {\n  \"enable\": true,\n  \"bucket\": \"your-recordings-bucket\",\n  \"base_path\": \"recordings/\",\n  \"format\": \"wav\",\n  \"worker_count\": 4\n}\n```\n\n| Field | Description |\n|-------|-------------|\n| `enable` | Turn recording on for all sessions |\n| `bucket` | S3 bucket name |\n| `base_path` | Key prefix inside the bucket (default: `recordings/`) |\n| `format` | File format — currently `wav` (16-bit PCM mono) |\n| `worker_count` | Background uploader thread pool size |\n\nEach session is written locally and, on session end, a background job uploads it to:\n```\n\u003cbase_path\u003e/yyyy/mm/dd/\u003csession-id\u003e.wav\n```\n\nAWS credentials are resolved via the standard AWS SDK v2 chain (env vars, shared config, IAM role, etc.).\n\n### Transcripts (Postgres / MySQL)\n\nPersist per-message text transcripts (user and assistant) to a relational database.\n\n**Postgres:**\n```json\n\"transcripts\": {\n  \"enable\": true,\n  \"driver\": \"postgres\",\n  \"dsn\": \"postgres://user:pass@localhost:5432/voxray?sslmode=disable\",\n  \"table_name\": \"call_transcripts\"\n}\n```\n\n**MySQL:**\n```json\n\"transcripts\": {\n  \"enable\": true,\n  \"driver\": \"mysql\",\n  \"dsn\": \"user:pass@tcp(localhost:3306)/voxray?parseTime=true\",\n  \"table_name\": \"call_transcripts\"\n}\n```\n\n**Expected schema (Postgres):**\n```sql\nCREATE TABLE call_transcripts (\n  id          BIGSERIAL PRIMARY KEY,\n  session_id  TEXT NOT NULL,\n  role        TEXT NOT NULL,   -- \"user\" or \"assistant\"\n  text        TEXT NOT NULL,\n  seq         BIGINT NOT NULL,\n  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()\n);\n```\n\n### Prometheus metrics\n\nThe server exposes a Prometheus-compatible scrape endpoint at `/metrics` on the same host/port as `/ws` and `/webrtc/offer`.\n\n- `\"metrics_enabled\": true` (default) — records HTTP, WebRTC, STT, LLM, and TTS metrics.\n- `\"metrics_enabled\": false` — disables recording; `/metrics` returns `204 No Content` so Prometheus scrape configs don't break.\n\nMetrics are process-local; Prometheus aggregates across instances using `instance`/`pod` labels.\n\n---\n\n## Environment Variables\n\nAll config values can be overridden via environment variables. Unknown keys in config JSON are silently ignored.\n\n### Server\n\n| Variable | Description |\n|----------|-------------|\n| `VOXRAY_CONFIG` | Path to config file (alternative to `-config` flag) |\n| `VOXRAY_HOST` | Bind host |\n| `VOXRAY_PORT` / `PORT` | Bind port |\n| `VOXRAY_LOG_LEVEL` | Log level (`debug`, `info`, `warn`, `error`) |\n| `VOXRAY_JSON_LOGS` | `true` to emit structured JSON logs |\n| `VOXRAY_CORS_ORIGINS` | Comma-separated allowed CORS origins |\n| `VOXRAY_MAX_BODY_BYTES` | Max HTTP request body size in bytes |\n| `VOXRAY_SERVER_API_KEY` | Server-level API key for auth |\n| `VOXRAY_PIPELINE_INPUT_QUEUE_CAP` | Input queue capacity for pipeline |\n| `VOXRAY_WS_WRITE_COALESCE_*` | WebSocket write coalescing settings |\n| `VOXRAY_VAD_BATCH_SIZE` | VAD processor batch size |\n| `VOXRAY_DAILY_DIALIN_WEBHOOK_SECRET` | Daily.co dial-in webhook secret |\n\n### Recording\n\n| Variable | Description |\n|----------|-------------|\n| `VOXRAY_RECORDING_ENABLE` | `true` to enable S3 recording |\n| `VOXRAY_RECORDING_BUCKET` | S3 bucket name |\n| `VOXRAY_RECORDING_BASE_PATH` | Key prefix inside the bucket |\n| `VOXRAY_RECORDING_FORMAT` | File format (e.g. `wav`) |\n| `VOXRAY_RECORDING_WORKER_COUNT` | Uploader thread pool size |\n| `VOXRAY_RECORDING_QUEUE_CAP` | Upload job queue capacity |\n| `VOXRAY_RECORDING_MAX_RETRIES` | Max upload retry attempts |\n\n### Transcripts\n\n| Variable | Description |\n|----------|-------------|\n| `VOXRAY_TRANSCRIPTS_ENABLE` | `true` to enable transcript logging |\n| `VOXRAY_TRANSCRIPTS_DRIVER` | `postgres` or `mysql` |\n| `VOXRAY_TRANSCRIPTS_DSN` | Database connection string |\n| `VOXRAY_TRANSCRIPTS_TABLE` | Target table name |\n\n---\n\n## Examples\n\nFor provider/model-specific examples, see [examples/voice/README.md](examples/voice/README.md).\nFor the browser-based WebRTC client, see [tests/frontend/README.md](tests/frontend/README.md).\n\n### Complete example `config.json`\n\nCopy this, fill in your API keys, and run:\n\n```json\n{\n  \"transport\": \"both\",\n  \"host\": \"0.0.0.0\",\n  \"port\": 8080,\n  \"metrics_enabled\": true,\n\n  \"stt_provider\": \"openai\",\n  \"stt_model\": \"gpt-4o-mini-transcribe\",\n\n  \"llm_provider\": \"openai\",\n  \"model\": \"gpt-4.1-mini\",\n\n  \"tts_provider\": \"openai\",\n  \"tts_voice\": \"alloy\",\n\n  \"api_keys\": {\n    \"openai\": \"YOUR_OPENAI_API_KEY\"\n  },\n\n  \"webrtc_ice_servers\": [\n    \"stun:stun.l.google.com:19302\"\n  ]\n}\n```\n\nRun with:\n```bash\n./voxray -config config.json\n```\n\nThen connect at `http://localhost:8080/ws` (WebSocket) or `http://localhost:8080/webrtc/offer` (WebRTC).\n\n---\n\n## Use Cases\n\n- **AI call centers / IVR** — conversational agents for inbound and outbound calls with low latency\n- **In-app voice copilots** — embed voice agents inside SaaS or productivity apps via WebSocket or WebRTC\n- **Operations and support bots** — voicebots for support, ops, and internal tooling on your own infrastructure\n- **Realtime monitoring and control** — voice interfaces for dashboards, observability tools, and control systems\n- **On-prem / VPC assistants** — self-hosted voice-AI stacks where data must stay within your cloud or datacenter\n\n---\n\n## Roadmap\n\n**Near-term**\n- [ ] More built-in STT/LLM/TTS providers and opinionated presets for common stacks\n- [ ] Deeper observability, tracing, and debugging tools for real-time pipelines\n\n**Planned**\n- [ ] Deployment templates (Docker, Kubernetes)\n- [ ] Additional starter agent examples for popular voice-agent scenarios\n- [ ] Expanded documentation on scaling, deployment patterns, and production hardening\n\n---\n\n## Documentation\n\n### Repository layout\n\n| Package | README |\n|---------|--------|\n| `pkg/pipeline` | [Pipeline, runner, source/sink, task, registry](pkg/pipeline/README.md) |\n| `pkg/transport` | [WebSocket, WebRTC, in-memory transports](pkg/transport/README.md) |\n| `pkg/services` | [LLM, STT, TTS interfaces and provider factory](pkg/services/README.md) |\n| `pkg/recording` | [Conversation recording and S3 upload](pkg/recording/README.md) |\n| `pkg/metrics` | [Prometheus metrics](pkg/metrics/README.md) |\n| `pkg/config` | [Configuration and env overrides](pkg/config/README.md) |\n| `pkg/processors` | [Voice, echo, filters, aggregators](pkg/processors/README.md) |\n| `pkg/runner` | [Session store and runner args](pkg/runner/README.md) |\n| `pkg/utils` | [Backoff, notifier, sentence, aggregators](pkg/utils/README.md) |\n| `pkg/frames` | [Frame types and serialization](pkg/frames/README.md) |\n| `pkg/audio` | [VAD, turn detection, codecs, resample](pkg/audio/README.md) |\n| `scripts` | [Build, run, and maintenance scripts](scripts/README.md) |\n\n### Docs\n\n- [docs/README.md](docs/README.md) — documentation index and reading order\n- [docs/API_CLIENT.md](docs/API_CLIENT.md) — client integration (REST, WebSocket, auth, WebRTC)\n- [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) — high-level architecture and pipeline\n- [docs/SYSTEM_ARCHITECTURE.md](docs/SYSTEM_ARCHITECTURE.md) — system view and entry points\n- [docs/CONNECTIVITY.md](docs/CONNECTIVITY.md) — connectivity and transports\n- [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md) — deployment notes\n- [docs/EXTENSIONS.md](docs/EXTENSIONS.md) — extensions and plugins\n- [docs/FRAMEWORKS.md](docs/FRAMEWORKS.md) — framework integration\n- [docs/WEBSOCKET_SERVICES.md](docs/WEBSOCKET_SERVICES.md) — WebSocket service reconnection\n- [examples/voice/README.md](examples/voice/README.md) — minimal voice pipeline and config samples\n- [tests/frontend/README.md](tests/frontend/README.md) — WebRTC voice client\n\nThe OpenAPI spec is generated from the codebase (`make swagger`); Swagger UI is served at `/swagger/` when available.\n\n---\n\n## License\n\nThis project is licensed under the [Apache License 2.0](LICENSE).\nAttribution details for distribution are provided in [NOTICE](NOTICE).\n\n---\n\n## Contributing\n\nContributions are welcome! Quick development setup:\n\n```bash\ngo test ./...          # run all tests\nmake lint              # lint (or: ./scripts/pre-commit.sh)\nmake swagger           # regenerate API docs (requires swag)\nmake evals             # run eval scenarios (optional)\n```\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for full setup, testing, style, and pull request guidelines.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FVoxray-AI%2FVoxray","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FVoxray-AI%2FVoxray","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FVoxray-AI%2FVoxray/lists"}