{"id":31559548,"url":"https://github.com/designbyjr/whisper-compression-server","last_synced_at":"2026-04-14T18:34:00.972Z","repository":{"id":317759282,"uuid":"1068106297","full_name":"designbyjr/whisper-compression-server","owner":"designbyjr","description":"Optimized Whisper speech-to-text with advanced model compression (LZMA/ZSTD), local serving, and persistent browser caching. Achieves 69% size reduction with 4-bit quantization.","archived":false,"fork":false,"pushed_at":"2025-10-02T20:24:18.000Z","size":1105,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-02T22:23:03.830Z","etag":null,"topics":["browser-cache","compression","indexeddb","lzma","machine-learning","onnx","quantization","speech-to-text","whisper","zstd"],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/designbyjr.png","metadata":{"files":{"readme":"README-old.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-01T21:34:54.000Z","updated_at":"2025-10-01T21:35:41.000Z","dependencies_parsed_at":"2025-10-06T07:32:43.126Z","dependency_job_id":null,"html_url":"https://github.com/designbyjr/whisper-compression-server","commit_stats":null,"previous_names":["designbyjr/whisper-compression-server"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/designbyjr/whisper-compression-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/designbyjr%2Fwhisper-compression-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/designbyjr%2Fwhisper-compression-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/designbyjr%2Fwhisper-compression-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/designbyjr%2Fwhisper-compression-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/designbyjr","download_url":"https://codeload.github.com/designbyjr/whisper-compression-server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/designbyjr%2Fwhisper-compression-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31810737,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-14T18:05:02.291Z","status":"ssl_error","status_checked_at":"2026-04-14T18:05:01.765Z","response_time":153,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["browser-cache","compression","indexeddb","lzma","machine-learning","onnx","quantization","speech-to-text","whisper","zstd"],"created_at":"2025-10-05T01:53:56.218Z","updated_at":"2026-04-14T18:34:00.952Z","avatar_url":"https://github.com/designbyjr.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Whisper Web Speech-to-Text\n\nA real-time speech-to-text application using OpenAI's Whisper model running locally in the browser with WebGPU acceleration.\n\n## Features\n\n- 🎤 **Real-time recording** with visual waveform feedback\n- 🧠 **Local AI processing** - Whisper model runs entirely in your browser\n- ⚡ **WebGPU acceleration** for faster transcription\n- 🔒 **Privacy-first** - no data sent to external servers\n- 📱 **Responsive design** works on desktop and mobile\n- 📊 **Progress tracking** with download percentage and file details\n\n## Setup\n\n### 1. Install Dependencies\n\n```bash\nnpm install\n```\n\n### 2. Download Whisper Model\n\n**Option A: Whisper Small (Recommended - Better Accuracy)**\n```bash\n./download-whisper-small.sh\n```\nDownloads ~200MB to `public/models-small/`. Better transcription accuracy.\n\n**Option B: Whisper Tiny (Faster)**\n```bash\n./download-models.sh\n```\nDownloads ~131MB to `public/models/`. Faster but less accurate.\n\n**Model Comparison:**\n- **Small**: ~244M parameters, ~200MB, better accuracy, slightly slower\n- **Tiny**: ~40M parameters, ~131MB, faster, lower accuracy\n\nBoth download:\n- Configuration files (JSON)\n- Tokenizer and vocabulary\n- ONNX model files (encoder and decoder)\n\n### 3. Run Development Server\n\n**Option 1: Smart Start (Recommended)**\n```bash\nnpm start\n```\nThis checks for model files and starts both the local model server (port 3001) and the Vite dev server (port 5173).\n\n**Option 2: Manual Dev**\n```bash\nnpm run dev\n```\nStarts both servers simultaneously using concurrently.\n\n### 4. Model Switching (Optional)\n\n**Switch Between Models:**\n```bash\n# Switch to Whisper Small (better accuracy)\nnpm run switch small\n\n# Switch to Whisper Tiny (faster)\nnpm run switch tiny\n\n# Show current model and help\nnpm run switch\n```\n\n**Quick Download:**\n```bash\n# Download Whisper Small\nnpm run download-small\n\n# Download Whisper Tiny  \nnpm run download-tiny\n```\n\nOpen [http://localhost:5173](http://localhost:5173) in your browser.\n\n## Browser Compatibility\n\n- **Chrome/Edge 140+** (WebGPU support)\n- **Firefox 120+** (WebAssembly fallback)\n- **Safari 17+** (WebAssembly fallback)\n\n**Note:** Chrome provides the best performance with WebGPU acceleration.\n\n## How It Works\n\n1. **Model Preloading**: Whisper model loads automatically when you open the app\n2. **Audio Recording**: Uses MediaRecorder API to capture audio in WebM format\n3. **Audio Processing**: Converts to 16kHz mono PCM data that Whisper expects\n4. **Local Transcription**: Whisper model runs in a Web Worker with WebGPU acceleration\n5. **Real-time Results**: Transcribed text appears as you speak\n\n## Model Comparison\n\n| Model | Parameters | Size | Speed | Accuracy | Use Case |\n|-------|------------|------|-------|----------|----------|\n| **Whisper Tiny** | 40M | ~131MB | Faster | Good | Quick transcription, testing |\n| **Whisper Small** | 244M | ~200MB | Slower | Better | Production, high accuracy needed |\n\n**Recommendations:**\n- **Development/Testing**: Use Tiny for faster iteration\n- **Production**: Use Small for better user experience\n- **Mobile/Low-end**: Use Tiny for performance\n- **Desktop/High-end**: Use Small for accuracy\n\n## Project Structure\n\n```\nsrc/\n├── components/\n│   ├── SpeechToText.tsx      # Main recording interface\n│   ├── ModelProgress.tsx     # Download progress display\n│   └── AnimatedWaveform.tsx  # Visual audio feedback\n├── hooks/\n│   ├── useWhisper.ts         # Whisper model integration\n│   ├── useSpeechRecording.ts # Audio recording logic\n│   └── useWorker.ts          # Web Worker management\n└── worker.js                 # Whisper processing worker\n\npublic/\n└── models/                   # Local Whisper model files\n    ├── config.json\n    ├── tokenizer.json\n    └── onnx/\n        ├── encoder_model.onnx\n        └── decoder_model_merged_q4.onnx\n```\n\n## Performance\n\n- **Model Loading**: ~0.5-2 seconds (from local server) / ~2-5 seconds (online fallback)\n- **Transcription**: ~1-3 seconds for 10-second audio clips\n- **Memory Usage**: ~200-400MB (model loaded in browser)\n- **Network**: Models served from localhost:3001 (no internet required after download)\n\n## Technical Details\n\n- **Framework**: React + TypeScript + Vite\n- **AI Model**: OpenAI Whisper Small (244M parameters) for better accuracy\n- **Acceleration**: WebGPU (Chrome) / WebAssembly (other browsers)\n- **Audio Processing**: 16kHz mono PCM, automatic resampling\n- **Cross-Origin Isolation**: Enabled for SharedArrayBuffer support\n\n## Troubleshooting\n\n### Model Not Loading\n- **For Small model**: `./download-whisper-small.sh`\n- **For Tiny model**: `./download-models.sh`\n- Check browser console for download errors\n- Verify dev server is running on port 5173\n\n### Poor Audio Quality\n- Use Chrome for best WebGPU performance\n- Ensure microphone permissions are granted\n- Try recording in a quiet environment\n\n### Transcription Errors\n- Speak clearly and at normal pace\n- Check if audio is being recorded (waveform animation)\n- Verify model files are complete:\n  - Small model: ~200MB total in `public/models-small/`\n  - Tiny model: ~131MB total in `public/models/`\n\n## License\n\nMIT License - see LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdesignbyjr%2Fwhisper-compression-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdesignbyjr%2Fwhisper-compression-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdesignbyjr%2Fwhisper-compression-server/lists"}