https://github.com/sirmews/textcast
Browser audio recorder with Whisper transcription
https://github.com/sirmews/textcast
silero transformers-js vad webaudio webgpu whisper
Last synced: about 2 months ago
JSON representation
Browser audio recorder with Whisper transcription
- Host: GitHub
- URL: https://github.com/sirmews/textcast
- Owner: sirmews
- License: mit
- Created: 2026-04-14T22:36:38.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2026-04-23T21:44:37.000Z (about 2 months ago)
- Last Synced: 2026-04-23T22:20:54.690Z (about 2 months ago)
- Topics: silero, transformers-js, vad, webaudio, webgpu, whisper
- Language: TypeScript
- Homepage: https://textcast-six.vercel.app
- Size: 169 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Agents: AGENTS.md
Awesome Lists containing this project
README
# TextCast

Edit audio by editing text. A local-first, browser-based audio editor for podcasters.
## Features
- ๐๏ธ **Record** - Capture studio-quality PCM audio directly to disk via AudioWorklet + OPFS streaming
- ๐งน **Clean** - Reduce noise, normalize volume, trim silence with Web Audio API preprocessing
- ๐ **Transcribe** - Local Whisper transcription (WebGPU-accelerated, falls back to WASM)
- โ๏ธ **Edit** - Non-destructive text-based editing using a Piece Table engine with 10ms crossfades
- ๐พ **Save** - Persistent recording to Origin Private File System (OPFS)
- ๐ฆ **Export** - "Bake" your edits into a final WAV file using OfflineAudioContext
## How It Works
### Two-Stage Transcription Pipeline
TextCast uses a two-stage architecture for frame-accurate word timestamps:
1. **Whisper Transcription** - Uses `@huggingface/transformers` to generate accurate text
2. **CTC Forced Alignment** - Uses the MMS forced aligner with Viterbi decoding to map words to exact audio frames
This approach eliminates timestamp drift and hallucinations common in seq2seq models. For more on this problem, see [WhisperX](https://github.com/m-bain/whisperX).
### Non-Destructive Editing
Edits don't modify the original audio. Instead, a [Piece Table](https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation) maintains an Edit Decision List (EDL) of which segments to play. The `PlaylistPlayer` schedules Web Audio nodes with microsecond precision and 10ms crossfades for seamless playback.
## Quick Start
```bash
npm install
npm run dev
```
Open http://localhost:4002 in Chrome (WebGPU recommended for faster transcription).
## Deployment
Configured for Vercel with required COOP/COEP headers for SharedArrayBuffer:
```bash
npm run build
vercel --prod
```
## Browser Requirements
- Chrome 113+ (required for OPFS and WebGPU)
- Requires secure context (HTTPS) for MediaRecorder and OPFS
## Known Limitations
- **Mobile Support** - OPFS and Web Audio worklets have limited support on mobile browsers
## Project Structure
```
src/
โโโ components/
โ โโโ editor/ # TranscriptEditor with Piece Table visualization
โ โโโ recorder/ # Recording UI with OPFS streaming
โ โโโ ui/ # Shared UI components
โโโ hooks/
โ โโโ useAudioPlayer.ts # Piece Table-backed playback hook
โโโ lib/
โ โโโ audio/
โ โ โโโ PieceTable.ts # Non-destructive EDL engine
โ โ โโโ PlaylistPlayer.ts # Web Audio scheduler
โ โ โโโ RecorderEngine.ts # AudioWorklet + OPFS streaming
โ โ โโโ offlineRender.ts # WAV export renderer
โ โโโ db/ # IndexedDB for project metadata
โ โโโ transcription/
โ โโโ transformers-whisper.ts # Whisper integration
โ โโโ aligner.ts # Viterbi CTC forced alignment
โ โโโ vad.ts # Voice Activity Detection
โโโ types/
```
## Tech Stack
| Feature | Technology |
|---------|------------|
| Frontend | React 19 + TypeScript + Vite |
| Styling | Tailwind CSS v4 |
| Storage | OPFS (audio) + IndexedDB (metadata) |
| Transcription | [@huggingface/transformers](https://huggingface.co/docs/transformers.js) (WebGPU/WASM) |
| Forced Alignment | [MMS-300M Forced Aligner](https://huggingface.co/onnx-community/mms-300m-1130-forced-aligner-ONNX) |
| VAD | [@ricky0123/vad-web](https://github.com/ricky0123/vad) (Silero v5) |
## References & Inspiration
- [WhisperX](https://github.com/m-bain/whisperX) - Two-stage transcription + alignment architecture
- [VS Code Piece Table](https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation) - Non-destructive editing data structure
- [Silero VAD](https://github.com/snakers4/silero-vad) - Voice Activity Detection
- [Transformers.js](https://huggingface.co/docs/transformers.js) - Running ML models in the browser
- [OPFS](https://developer.mozilla.org/en-US/docs/Web/API/File_System_API/Origin_private_file_system) - Origin Private File System for high-performance storage
- [AudioWorklet](https://developer.mozilla.org/en-US/docs/Web/API/AudioWorklet) - Low-latency audio processing in a separate thread
## License
MIT