{"id":50754910,"url":"https://github.com/moda20/speaches_ui","last_synced_at":"2026-06-11T04:01:15.000Z","repository":{"id":351620495,"uuid":"1206229723","full_name":"moda20/speaches_ui","owner":"moda20","description":"A vibecoded application to try and test speaches server, an openAI compatible endpoint and proxy for a multitude of audio models and audio features like diarization, speech detection, STT, TTS, and realtime webrtc \u0026 chat","archived":false,"fork":false,"pushed_at":"2026-04-15T18:43:28.000Z","size":254,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-15T20:30:25.966Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://speachesui.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/moda20.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-09T17:52:03.000Z","updated_at":"2026-04-15T18:43:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/moda20/speaches_ui","commit_stats":null,"previous_names":["moda20/speaches_ui"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/moda20/speaches_ui","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moda20%2Fspeaches_ui","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moda20%2Fspeaches_ui/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moda20%2Fspeaches_ui/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moda20%2Fspeaches_ui/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/moda20","download_url":"https://codeload.github.com/moda20/speaches_ui/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/moda20%2Fspeaches_ui/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34181555,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-11T02:00:06.485Z","response_time":57,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-06-11T04:01:12.883Z","updated_at":"2026-06-11T04:01:14.964Z","avatar_url":"https://github.com/moda20.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speaches UI\n\nA modern web interface for the [Speaches](https://github.com/speaches-ai/speaches) API - an OpenAI-compatible speech AI server. This dashboard provides a comprehensive UI for speech recognition, synthesis, and analysis features.\n\n## Features\n\n### 🎤 Audio Transcription \u0026 Translation\n\n- **Speech-to-Text**: Transcribe audio files with support for multiple formats (MP3, WAV, FLAC, M4A)\n- **Translation**: Translate audio content between languages\n- **Multiple Response Formats**: Text, JSON, verbose JSON, SRT, and VTT\n- **Timestamp Support**: Segment-level and word-level timestamps\n- **Speaker Diarization**: Identify and track different speakers in audio\n\n### 🔊 Text-to-Speech Synthesis\n\n- **Natural Voice Generation**: Synthesize speech from text using AI models\n- **Multiple Output Formats**: MP3, WAV, FLAC, Opus, PCM, AAC\n- **Voice Selection**: Choose from available voice profiles\n- **Speed \u0026 Sample Rate Control**: Customize audio output characteristics\n- **Batch Processing**: Synthesize multiple text inputs at once\n- **Streaming Support**: Real-time audio streaming with SSE\n\n### 💬 Voice Chat\n\n- **Real-time Audio Chat**: Interactive voice conversations with AI\n- **Dual Input Support**: Both audio (microphone) and text input\n- **Streaming Responses**: Real-time response streaming\n- **Chat History**: Maintain conversation context\n- **System Prompts**: Configure AI behavior and personality\n- **Temperature Control**: Adjust response randomness\n\n### 📊 Model Management\n\n- **Local Models**: View, manage, and control local AI models\n- **Remote Registry**: Browse and download models from the registry\n- **Model Types**: Support for multiple model categories:\n  - Automatic Speech Recognition (ASR)\n  - Text-to-Speech (TTS)\n  - Speaker Embedding\n  - Voice Activity Detection (VAD)\n- **Load/Unload Control**: Manage model memory usage\n- **Model Details**: View model specifications and metadata\n\n### 🎯 Voice Activity Detection\n\n- **Silence Detection**: Identify speech and silence segments in audio\n- **Adjustable Thresholds**: Fine-tune detection sensitivity\n- **Segment Analysis**: View detailed timestamp information\n- **Visual Timeline**: Interactive visualization of speech segments\n- **Export Options**: Save detection results\n\n### 👥 Speaker Diarization\n\n- **Speaker Identification**: Distinguish between different speakers\n- **Speaker Labeling**: Assign names to detected speakers\n- **Timeline Visualization**: Visual representation of speaker changes\n- **Known Speakers**: Provide reference samples for better accuracy\n- **Multiple Formats**: Export in JSON or RTTM format\n\n### 🔍 Speaker Embeddings\n\n- **Voice Fingerprinting**: Generate unique voice embeddings\n- **Embedding Visualization**: Visual representation of voice characteristics\n- **Comparison Tools**: Compare embeddings for speaker identification\n- **Vector Export**: Export embedding data for further analysis\n\n### 🌐 Realtime API\n\n- **WebRTC Support**: Real-time audio streaming via WebRTC\n- **Low Latency**: Minimal delay for interactive applications\n- **Connection Monitoring**: Track connection status and quality\n- **Audio Level Visualization**: Real-time audio level indicators\n\n### ⚙️ System Health\n\n- **Health Monitoring**: System status and API connectivity\n- **Version Information**: Track server and model versions\n- **Diagnostics**: Troubleshoot and monitor system performance\n\n\n## Api overview\n\nThis UI is a thin client for the [speaches server](https://github.com/speaches-ai/speaches) [api](https://speaches.ai/api/).\n\n## Installation\n\n### Prerequisites\n\n- **Bun**: Latest stable version (recommended)\n- **Speaches Server**: Running instance of [Speaches](https://github.com/speaches-ai/speaches)\n\n```bash\n# Clone the repository\ngit clone https://github.com/moda20/speaches_ui.git\ncd speaches_ui\n\n# Install dependencies\nbun install\n\n# Start development server\nbun run dev\n```\n\n## Development\n\n### Available Scripts\n\n```bash\n# Start development server\nbun run dev\n\n# Create production build\nbun run build\n\n# Preview production build\nbun run preview\n\n# Run linter\nbun run lint\n\n# Format code\nbun run format\n\n# Type check\nbun run type-check\n\n# Run tests\nbun run test\n\n# Run tests in watch mode\nbun run test:watch\n\n# Run tests with UI\nbun run test:ui\n```\n\n## Deployment\n\n### Docker Deployment (Recommended)\n\nBuild and run the application using Docker:\n\n```bash\n# Build the Docker image\ndocker build -t speaches-ui .\n\n# Run the container\ndocker run -d -p 80:80 --name speaches-ui speaches-ui\n```\n\nThe Dockerfile uses a multi-stage build:\n\n- **Build stage**: Uses `oven/bun:1` to build the React application\n- **Production stage**: Uses `nginx:alpine` to serve the static files\n\n**Custom Configuration:**\n\nTo customize the API URL at runtime:\n\n```bash\ndocker run -d -p 80:80 \\\n  -e VITE_API_URL=https://your-speaches-api.com/api \\\n  --name speaches-ui \\\n  speaches-ui\n```\n\n### Platform Deployment\n\nThe application can be deployed to various platforms:\n\n- **Vercel**: Automatic deployment from Git repository\n- **Netlify**: Drag-and-drop or Git deployment\n- **Docker**: Containerized deployment (recommended)\n- **Static Hosting**: Deploy the `dist/` folder to any static host\n\n**Vercel Deployment:**\n\n```bash\n# Install Vercel CLI\nnpm i -g vercel\n\n# Deploy\nvercel\n```\n\n**Netlify Deployment:**\n\n```bash\n# Install Netlify CLI\nnpm i -g netlify-cli\n\n# Deploy\nnetlify deploy --prod\n```\n\n## Technical Stack\n\n### Core Framework\n\n- **React 19**: Latest React with concurrent features and automatic batching\n- **Vite 8**: Lightning-fast build tool with HMR and optimized bundling\n- **TypeScript**: Full type safety across the application\n- **Bun**: Ultra-fast package manager and runtime\n\n### Routing \u0026 Navigation\n\n- **React Router v7**: Client-side routing with lazy loading\n- **Route-based code splitting**: Automatic code splitting for optimal performance\n\n### State Management\n\n- **Zustand**: Lightweight, performant state management\n- **TanStack Query v5**: Powerful data fetching, caching, and synchronization\n- **React Hook Form**: Efficient form state management\n- **Zod**: Runtime type validation and schema validation\n\n### UI Components \u0026 Styling\n\n- **shadcn/ui**: Beautiful, accessible component library built on Radix UI\n- **Tailwind CSS**: Utility-first CSS framework for rapid styling\n- **Radix UI**: Unstyled, accessible component primitives\n- **Lucide React**: Consistent icon library\n- **class-variance-authority**: Component variant management\n- **clsx \u0026 tailwind-merge**: Conditional class name utilities\n\n### Data Fetching \u0026 API\n\n- **Axios**: Promise-based HTTP client with interceptors\n- **OpenAPI Types**: Auto-generated TypeScript types from OpenAPI spec\n- **Bearer Token Authentication**: Secure API communication\n\n### Audio Processing\n\n- **react-h5-audio-player**: Customizable HTML5 audio player\n- **react-dropzone**: Drag-and-drop file upload\n- **wavesurfer.js**: Audio waveform visualization\n- **@wavesurfer/react**: React wrapper for wavesurfer.js\n\n### Data Visualization\n\n- **Recharts**: Declarative charting library for React\n- **Interactive charts**: Line, bar, area, and pie charts\n\n### Form Handling\n\n- **React Hook Form**: Performant form library with minimal re-renders\n- **Zod**: Schema-first validation\n- **@hookform/resolvers**: Seamless Zod integration\n\n### Development Tools\n\n- **ESLint**: Code linting with TypeScript support\n- **Prettier**: Code formatting with consistent style\n- **Vitest**: Fast unit testing framework\n- **React Testing Library**: Component testing utilities\n- **TypeScript ESLint**: TypeScript-specific linting rules\n- **PostCSS**: CSS transformation with autoprefixer\n- **Tailwind CSS**: Utility-first CSS framework\n\n### Performance Optimizations\n\n- **Code Splitting**: Route-based code splitting with React.lazy()\n- **Lazy Loading**: On-demand component loading\n- **Bundle Analysis**: Visualizer for bundle size optimization\n- **React.memo**: Component memoization to prevent unnecessary re-renders\n- **useMemo \u0026 useCallback**: Hook optimizations for expensive computations\n- **TanStack Query Caching**: Automatic data caching and background refetching\n- **Virtualization**: Support for large lists with @tanstack/react-virtual\n\n## Project Structure\n\n```\nspeaches_ui/\n├── src/\n│   ├── components/\n│   │   ├── ui/                    # shadcn/ui components\n│   │   ├── features/\n│   │   │   ├── audio/             # Audio player \u0026 visualizer components\n│   │   │   ├── upload/            # File upload components\n│   │   │   ├── chat/              # Voice chat components\n│   │   │   ├── models/            # Model management components\n│   │   │   └── transcription/     # Transcription components\n│   │   └── layout/                # Layout components (Sidebar, Header)\n│   ├── pages/\n│   │   ├── Models.tsx             # Model management\n│   │   ├── Transcription.tsx      # Audio transcription\n│   │   ├── Synthesis.tsx          # Text-to-speech\n│   │   ├── VoiceChat.tsx          # Voice chat\n│   │   ├── Embeddings.tsx         # Speaker embeddings\n│   │   ├── VAD.tsx                # Voice activity detection\n│   │   ├── Diarization.tsx        # Speaker diarization\n│   │   ├── Realtime.tsx           # WebRTC realtime\n│   │   ├── Dashboard.tsx          # Main dashboard\n│   │   ├── Analytics.tsx          # Analytics\n│   │   ├── Reports.tsx            # Reports\n│   │   └── Settings.tsx           # Settings\n│   ├── services/\n│   │   ├── models.ts              # Model API calls\n│   │   ├── transcription.ts       # Transcription API calls\n│   │   ├── synthesis.ts           # Synthesis API calls\n│   │   ├── voiceChat.ts           # Voice chat API calls\n│   │   ├── embeddings.ts          # Embedding API calls\n│   │   ├── vad.ts                 # VAD API calls\n│   │   ├── diarization.ts         # Diarization API calls\n│   │   ├── realtime.ts            # WebRTC API calls\n│   │   └── health.ts              # Health check API calls\n│   ├── hooks/\n│   │   ├── useAudioRecorder.ts    # Audio recording hook\n│   │   ├── useFileUpload.ts       # File upload hook\n│   │   └── useAudioPlayer.ts      # Audio player hook\n│   ├── stores/\n│   │   ├── modelsStore.ts         # Model state\n│   │   ├── audioStore.ts          # Audio state\n│   │   └── chatStore.ts           # Chat state\n│   ├── types/\n│   │   ├── api.ts                 # API types (from OpenAPI)\n│   │   ├── models.ts              # Model types\n│   │   ├── transcription.ts       # Transcription types\n│   │   └── synthesis.ts           # Synthesis types\n│   ├── lib/\n│   │   ├── api.ts                 # Axios instance configuration\n│   │   ├── env.ts                 # Environment variables\n│   │   ├── utils.ts               # Utility functions\n│   │   └── audio-utils.ts         # Audio utilities\n│   ├── App.tsx                    # Application root\n│   └── main.tsx                   # Entry point\n├── public/                        # Static assets\n├── Dockerfile                     # Docker configuration\n├── nginx.conf                     # Nginx configuration\n├── package.json                   # Dependencies\n├── tsconfig.json                  # TypeScript config\n├── vite.config.ts                 # Vite config\n├── tailwind.config.ts             # Tailwind config\n├── openapi.json                   # OpenAPI specification\n└── IMPLEMENTATION_PLAN.md         # Implementation guide\n```\n\n## Contributing\n\nContributions are welcome! Please read our contributing guidelines before submitting pull requests.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Links\n\n- [Speaches API Documentation](https://speaches.ai/api/)\n- [Speaches GitHub Repository](https://github.com/speaches-ai/speaches)\n\n## Support\n\nFor issues, questions, or contributions, please visit our [GitHub repository](https://github.com/moda20/speaches_ui).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoda20%2Fspeaches_ui","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmoda20%2Fspeaches_ui","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmoda20%2Fspeaches_ui/lists"}