{"id":32815758,"url":"https://github.com/rostwal95/media-ui","last_synced_at":"2025-11-07T06:02:05.461Z","repository":{"id":322116418,"uuid":"1085573534","full_name":"rostwal95/media-ui","owner":"rostwal95","description":"Real-time voice agent testing platform with STT→LLM→TTS debugging, latency analytics, and conversation export","archived":false,"fork":false,"pushed_at":"2025-11-02T16:23:03.000Z","size":1105,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-11-02T18:13:08.072Z","etag":null,"topics":["autonomous-agent","grpc","nextjs","reactjs","voice-ai","websocket"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rostwal95.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-29T08:22:49.000Z","updated_at":"2025-11-02T16:23:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rostwal95/media-ui","commit_stats":null,"previous_names":["rostwal95/media-ui"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/rostwal95/media-ui","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rostwal95%2Fmedia-ui","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rostwal95%2Fmedia-ui/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rostwal95%2Fmedia-ui/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rostwal95%2Fmedia-ui/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rostwal95","download_url":"https://codeload.github.com/rostwal95/media-ui/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rostwal95%2Fmedia-ui/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":283136761,"owners_count":26785489,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-07T02:00:06.343Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autonomous-agent","grpc","nextjs","reactjs","voice-ai","websocket"],"created_at":"2025-11-07T06:01:09.373Z","updated_at":"2025-11-07T06:02:05.448Z","avatar_url":"https://github.com/rostwal95.png","language":"TypeScript","readme":"# Media-UI — Real-Time Voice Agent Testing Platform\n\n\u003cdiv align=\"center\"\u003e\n  \u003cbr /\u003e\n  \u003cdiv\u003e\n    \u003cimg src=\"https://img.shields.io/badge/-Next.js_15-000000?style=for-the-badge\u0026logo=next.js\u0026logoColor=white\" alt=\"Next.js\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/-React_19-61DAFB?style=for-the-badge\u0026logo=react\u0026logoColor=black\" alt=\"React\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/-TypeScript-3178C6?style=for-the-badge\u0026logo=typescript\u0026logoColor=white\" alt=\"TypeScript\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/-Node.js-339933?style=for-the-badge\u0026logo=node.js\u0026logoColor=white\" alt=\"Node.js\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/-WebSocket-010101?style=for-the-badge\u0026logo=socket.io\u0026logoColor=white\" alt=\"WebSocket\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/-gRPC-244C5A?style=for-the-badge\u0026logo=grpc\u0026logoColor=white\" alt=\"gRPC\" /\u003e\n    \u003cimg src=\"https://img.shields.io/badge/-TailwindCSS-38B2AC?style=for-the-badge\u0026logo=tailwindcss\u0026logoColor=white\" alt=\"Tailwind CSS\" /\u003e\n  \u003c/div\u003e\n  \u003ch3 align=\"center\"\u003eDebug \u0026 Test Voice-Based Autonomous Agents\u003c/h3\u003e\n  \u003cdiv align=\"center\"\u003e\n     Real-time STT → LLM → TTS testing with latency analytics, barge-in support, and conversation export\n  \u003c/div\u003e\n  \u003cbr /\u003e\n  \n  \u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"docs/tool.png\" alt=\"UI Screenshot\" width=\"100%\" style=\"max-width: 900px; border-radius: 8px; box-shadow: 0 4px 6px rgba(0,0,0,0.1);\"\u003e\n  \u003c/div\u003e\n  \u003cbr /\u003e\n\u003c/div\u003e\n\n## 📋 Table of Contents\n\n- [Introduction](#-introduction)\n- [Tech Stack](#️-tech-stack)\n- [Features](#-features)\n- [Quick Start](#-quick-start)\n- [Architecture](#-architecture)\n- [Configuration](#️-configuration)\n- [Project Structure](#-project-structure)\n- [Development Guide](#-development-guide)\n\n## 🚀 Introduction\n\n**Media-UI** is a full-featured testing platform for voice-based autonomous agents, providing real-time audio streaming, speech recognition debugging, and comprehensive latency analytics.\n\n**Built for:**\n\n- ✅ **QA \u0026 Testing** – Validate STT accuracy, TTS quality, and agent responses\n- ✅ **Performance Analysis** – Track latency metrics, silence gaps, and barge-in behavior\n- ✅ **Debugging** – Export full conversation logs, recordings, and metrics\n- ✅ **Demos \u0026 Presentations** – Clean chat UI with real-time agent interaction\n\n\u003e **⚠️ Note:** This is a **testing/debugging tool**, not a production voice application. Focus is on observability and developer experience.\n\n## ⚙️ Tech Stack\n\n### **Frontend** (Next.js App)\n\n- **Next.js 15** – React framework with App Router\n- **React 19** – Latest features with concurrent rendering\n- **TypeScript 5** – Full type safety\n- **Tailwind CSS 4** – Utility-first styling\n- **Web Audio API** – AudioWorklet for microphone capture \u0026 TTS playback\n- **Radix UI** – Accessible dialog, tooltip, switch components\n- **Lucide Icons** – Clean, consistent iconography\n\n### **Backend** (Node.js WebSocket Bridge)\n\n- **WebSocket (ws)** – Real-time bidirectional communication\n- **ConnectRPC** – gRPC-web protocol over WebSocket\n- **Protocol Buffers** – Type-safe message serialization\n- **ts-node** – Direct TypeScript execution for server\n\n### **Audio Processing**\n\n- **AudioWorklet** – Low-latency PCM capture (`pcm-processor.js`)\n- **16-bit LINEAR16** @ 16kHz – High-quality audio encoding\n- **µ-law decoding** – TTS playback from backend\n- **WAV export** – Mixed recordings with real-time sync\n\n### **Infrastructure**\n\n- **Docker** – Multi-stage production builds\n- **PM2** – Process management for Next.js + WebSocket server\n- **Protocol Buffers** – Generated TypeScript types from `.proto` files\n\n## ⚡ Features\n\n### 🎤 **Real-Time Audio Streaming**\n\n- Microphone capture via AudioWorklet (128-sample quantum)\n- Buffered streaming with 40ms intervals\n- Automatic AudioContext resume handling\n- Device selection support\n\n### 🧠 **Speech Recognition**\n\n- Interim and final transcription results\n- Start-of-input (SOI) and end-of-input (EOI) events\n- Barge-in detection and handling\n- Live text updates during speech\n\n### 🔊 **Text-to-Speech Playback**\n\n- Queue-based audio playback\n- Interruptible during barge-in\n- µ-law and WAV format support\n- Chunk-level playback tracking\n\n### 💬 **Chat Interface**\n\n- Real-time message bubbles (user + agent)\n- Millisecond-precision timestamps\n- Connection status indicator\n- Call duration timer\n\n### 📊 **Latency Metrics**\n\n- **Call-level**: Start latency, greeting playback time\n- **Per-dialogue**:\n  - First interim result latency\n  - Customer utterance length\n  - Prompt playback time\n  - Silence gaps (pre/post agent response)\n  - Barge-in latency\n  - Audio chunks sent\n- Expandable metrics panel with visual indicators\n\n### 📤 **Export Capabilities**\n\n- **Mixed Recording**: Caller + Agent audio synchronized\n- **Backend Logs**: Full conversation with scrubbed audio payloads\n- **Transcript**: HTML export with timestamps\n- **Kibana Link**: Direct link to orchestrator logs\n\n### 🛡️ **Error Handling**\n\n- WebSocket reconnection logic\n- gRPC stream error recovery\n- User-friendly error messages\n- Comprehensive client-side logging\n\n## 🚀 Quick Start\n\n### Prerequisites\n\n- **Node.js 22.x** ([nvm](https://github.com/nvm-sh/nvm))\n- **pnpm** (enable with `corepack enable`)\n\n### Local Development\n\n```bash\n# 1. Install dependencies\npnpm install\n\n# 2. Start WebSocket server (terminal 1)\npnpm dev:server\n# Runs on ws://localhost:3001/ws\n\n# 3. Start Next.js frontend (terminal 2)\npnpm dev\n# Runs on http://localhost:3000\n\n# Or start both concurrently:\npnpm dev:all\n```\n\nVisit **http://localhost:3000** → Configure connection → Start call\n\n### Docker Deployment\n\n```bash\n# Build image\ndocker build -t media-ui .\n\n# Run container\ndocker run -d \\\n  -p 3000:3000 \\\n  -p 3001:3001 \\\n  --name media-ui \\\n  media-ui\n\n# Check logs\ndocker logs -f media-ui\n```\n\n**Services:**\n\n- Frontend: http://localhost:3000\n- WebSocket: ws://localhost:3001/ws\n\n### Available Scripts\n\n```bash\n# Development\npnpm dev              # Next.js dev server (port 3000)\npnpm dev:server       # WebSocket server (port 3001)\npnpm dev:all          # Start both with concurrently\n\n# Production\npnpm build            # Build Next.js app\npnpm start            # Start production server\n\n# Utilities\npnpm lint             # ESLint checks\npnpm typecheck        # TypeScript validation\n```\n\n## 🏗️ Architecture\n\n### High-Level Flow\n\n```\n                        WebSocket (JSON/Protobuf)           gRPC (Protobuf)\n    ┌──────────────────────────────────────────┐    ┌──────────────────────────┐\n    │                                          │    │                          │\n    │                                          ▼    ▼                          │\n┌───┴────────────┐                      ┌─────────────────┐              ┌────┴─────────────┐\n│                │                      │                 │              │                  │\n│    Next.js     │◀────────────────────▶│    Node.js      │◀────────────▶│    Universal     │\n│    Frontend    │                      │  WebSocket      │              │     Harness      │\n│                │   Bidirectional      │    Bridge       │ Bidirectional│    (Backend)     │\n│  (Port 3000)   │   Streaming          │  (Port 3001)    │  Streaming   │                  │\n│                │                      │                 │              │                  │\n└────────┬───────┘                      └────────┬────────┘              └──────────────────┘\n         │                                       │\n         │ ┌─────────────────────────────────────┘\n         │ │\n         │ │  • Bearer Token (JWT)\n         │ │  • Orchestrator Host URL\n         │ │  • Org ID / Conversation ID\n         │ │  • Language \u0026 Agent Config\n         │ │\n         ▼ ▼\n    ┌─────────────────┐\n    │  AudioWorklet   │\n    │  PCM Processor  │\n    ├─────────────────┤\n    │  • 16-bit PCM   │\n    │  • 16 kHz       │\n    │  • 128 samples  │\n    │  • 40ms buffer  │\n    └─────────────────┘\n         │\n         ▼\n    ┌─────────────────┐\n    │   Microphone    │\n    │   Hardware      │\n    └─────────────────┘\n```\n\n### Call State Machine\n\n```\nIDLE\n  ↓ startCall()\nCALL_START (greeting)\n  ↓ greeting received + played\nAUDIO_STREAMING (duplex)\n  ↓ user speaks → ASR → VA response\n  ↓ loop until endCall()\nCALL_END\n  ↓ cleanup\nENDED\n```\n\n### Data Flow: Voice Interaction\n\n```\n1. User speaks → AudioWorklet captures PCM\n2. UseMicrophone hook → sendAudioChunk()\n3. CallStateMachine → buffers 40ms chunks\n4. WebSocket → sends to Node.js bridge\n5. Bridge → forwards to gRPC backend\n6. Backend → ASR (interim/final) + VA response\n7. WebSocket ← receives response with TTS audio\n8. TTSPlayer → decodes µ-law → plays via Web Audio\n9. UI updates with transcript + metrics\n```\n\n## ⚙️ Configuration\n\n### Environment Variables\n\nCreate `.env.local`:\n\n```bash\n# WebSocket URL (auto-detected if not set)\nNEXT_PUBLIC_WS_URL=ws://localhost:3001/ws\n```\n\n### Connection Settings\n\nConfigure via UI (stored in `localStorage`):\n\n| Field              | Description                 | Example                                   |\n| ------------------ | --------------------------- | ----------------------------------------- |\n| **Host**           | Orchestrator gRPC endpoint  | `https://orchestrator.example.com`        |\n| **Bearer Token**   | Authentication JWT          | `eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...` |\n| **Language**       | Speech recognition language | `en-US`, `en-IN`, `fr-FR`                 |\n| **OrgId**          | Organization UUID           | `12345678-1234-1234-1234-123456789abc`    |\n| **ConversationId** | Unique conversation UUID    | Auto-generated or manual                  |\n| **VirtualAgentId** | Agent configuration ID      | `agent-abc123`                            |\n| **WxCC ClusterId** | Cluster routing identifier  | `intgus1`                                 |\n| **User Agent**     | Client identifier           | `web-ui`                                  |\n| **Microphone**     | Audio input device          | Selected from browser enumeration         |\n\n## 📁 Project Structure\n\n```\nmedia-ui/\n├── src/\n│   ├── app/\n│   │   ├── page.tsx              # Main entry (ChatApp wrapper)\n│   │   ├── layout.tsx            # Root layout with fonts\n│   │   └── globals.css           # Tailwind directives\n│   │\n│   ├── components/\n│   │   ├── ChatApp.tsx           # Top-level config + chat manager\n│   │   ├── ChatBotUI.tsx         # Main chat interface\n│   │   ├── ChatBubble.tsx        # Message display component\n│   │   ├── ChatControls.tsx      # Start/stop/mic buttons\n│   │   ├── ChatMetricsPanel.tsx  # Metrics sidebar\n│   │   ├── ConfigScreen.tsx      # Connection configuration form\n│   │   ├── ConnectionIndicator.tsx\n│   │   ├── LatencyMetricsDisplay.tsx\n│   │   ├── TranscriptExporter.tsx\n│   │\n│   │\n│   ├── state/\n│   │   ├── CallStateMachine.ts   # FSM orchestration\n│   │   └── types.ts              # CallState enum + types\n│   │\n│   ├── grpc/\n│   │   ├── bridgingClient.ts     # WebSocket ↔ gRPC bridge\n│   │   ├── generated/            # Protobuf TypeScript files\n│   │   │   ├── InsightInfer_pb.ts\n│   │   │   ├── InsightInfer_connect.ts\n│   │   │   ├── virtualagent_pb.ts\n│   │   │\n│   │   └── protos/               # .proto source files\n│   │\n│   ├── lib/\n│   │   └── audio/\n│   │       ├── TTSPlayer.ts      # TTS playback queue\n│   │       ├── wavRecorder.ts    # WAV export utilities\n│   │       ├── recordingBuilder.ts # Mixed audio timeline\n│   │       └── recStore.ts       # IndexedDB storage\n│   │\n│   ├── hooks/\n│   │   └── UseMicrophone.ts      # AudioWorklet integration\n│   │\n│   ├── server/\n│   │   ├── wsServer.ts           # WebSocket server (port 3001)\n│   │   ├── grpcTransport.ts      # gRPC client setup\n│   │   ├── enumMapper.ts         # Protobuf enum conversions\n│   │   ├── PushableStream.ts     # Async iterable stream\n│   │   ├── utils.ts              # Base64 + logging helpers\n│   │   └── logger.ts             # Structured logging\n│   │\n│   ├── config/\n│   │   └── appProperties.ts      # Audio constants\n│   │\n│   └── scripts/\n│       └── generate_protos.sh    # Protobuf codegen\n│\n├── public/\n│   └── pcm-processor.js          # AudioWorklet processor\n│\n├── docs/\n│   ├── tool.png                  # UI screenshot\n│   ├── Class Diagram.png         # Architecture diagram\n│   └── media-ui-sequence-diagram.png\n│\n├── Dockerfile                    # Multi-stage production build\n├── ecosystem.config.js           # PM2 configuration\n├── next.config.ts                # Next.js configuration\n├── tsconfig.json                 # TypeScript config\n├── tailwind.config.ts            # Tailwind setup\n└── package.json                  # Dependencies + scripts\n```\n\n## 🛠️ Development Guide\n\n### Generating Protobuf Files\n\n```bash\n# Install buf CLI (first time)\nbrew install bufbuild/buf/buf\n\n# Generate TypeScript files from .proto\ncd src/scripts\nbash generate_protos.sh\n\n# Or manually:\nnpx buf generate --path src/grpc/protos\n```\n\n### Adding a New Feature\n\n**Example: Add \"Call Recording Export to S3\"**\n\n```typescript\n// 1. Update CallStateMachine.ts\npublic async endCall() {\n  const recordings = await this.getRecordings();\n\n  // New: Upload to S3\n  if (recordings.mixed) {\n    await uploadToS3(recordings.mixed, this.config.conversationId);\n  }\n\n  return recordings;\n}\n\n// 2. Create upload utility (lib/storage/s3.ts)\nexport async function uploadToS3(blob: Blob, convId: string) {\n  const formData = new FormData();\n  formData.append('file', blob, `${convId}.wav`);\n\n  await fetch('/api/upload', {\n    method: 'POST',\n    body: formData\n  });\n}\n\n// 3. Add API route (app/api/upload/route.ts)\nimport { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';\n\nexport async function POST(req: Request) {\n  const formData = await req.formData();\n  const file = formData.get('file') as File;\n\n  // Upload to S3...\n  return Response.json({ url: s3Url });\n}\n```\n\n### Debugging Tips\n\n#### WebSocket connection issues\n\n```bash\n# Check server is running\ncurl http://localhost:3001\n\n# Test WebSocket with wscat\nnpm install -g wscat\nwscat -c ws://localhost:3001/ws\n\u003e {\"ping\":1}\n\n# Check browser console for connection errors\n```\n\n#### Audio not capturing\n\n```bash\n# Verify microphone permissions in browser\n# Chrome: Settings → Privacy → Microphone\n\n# Check AudioWorklet loading\n# Browser console should show: \"Microphone: Loaded PCM processor\"\n\n# Test with different sample rate\n# Edit src/config/appProperties.ts:\nFIXED_SAMPLE_RATE: 8000  # Try 8kHz instead of 16kHz\n```\n\n#### gRPC errors\n\n```bash\n# Check token expiration\n# JWT decode: https://jwt.io\n\n# Verify host URL format\n# Must include https:// protocol\n\n# Check backend logs for auth failures\n```\n\n### Common Issues\n\n| Issue                    | Solution                                       |\n| ------------------------ | ---------------------------------------------- |\n| \"No token provided\"      | Enter valid bearer token in config screen      |\n| \"AudioContext suspended\" | Click anywhere on page to trigger user gesture |\n| \"WebSocket closed\"       | Restart ws-server: `pnpm dev:server`           |\n| \"VA greeting timeout\"    | Check virtualAgentId is valid in config        |\n| Choppy audio playback    | Reduce network latency or increase buffer size |\n| Recording export fails   | Check browser IndexedDB quota (clear if full)  |\n\n---\n\n\u003cdiv align=\"center\"\u003e\n  \u003cbr /\u003e\n  \u003cp\u003e\u003cstrong\u003e⚠️ Testing Tool Disclaimer\u003c/strong\u003e\u003c/p\u003e\n  \u003cp\u003eThis is a debugging and testing platform. For production voice applications:\u003c/p\u003e\n  \u003cp\u003e✓ Implement proper authentication \u0026nbsp; ✓ Add rate limiting \u0026nbsp; ✓ Secure WebSocket connections (WSS) \u0026nbsp; ✓ Add monitoring/alerting\u003c/p\u003e\n  \u003cbr /\u003e\n  \u003cp\u003eFor architecture details and flow diagrams, see the \u003ccode\u003edocs/\u003c/code\u003e folder\u003c/p\u003e\n\u003c/div\u003e\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frostwal95%2Fmedia-ui","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frostwal95%2Fmedia-ui","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frostwal95%2Fmedia-ui/lists"}