https://github.com/qduc/whisper_client_extension
https://github.com/qduc/whisper_client_extension
Last synced: about 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/qduc/whisper_client_extension
- Owner: qduc
- Created: 2025-03-10T06:28:14.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-03-10T06:33:04.000Z (over 1 year ago)
- Last Synced: 2025-03-10T07:29:41.452Z (over 1 year ago)
- Language: JavaScript
- Size: 70.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Whisper Live Transcription Chrome Extension
This Chrome extension captures audio from a browser tab and sends it to a WebSocket server for real-time speech transcription. The transcription is then displayed as an overlay on the web page.
## Features
- Real-time speech transcription displayed as an overlay on any web page
- Customizable overlay appearance (text size, opacity)
- Draggable overlay that remembers position
- Automatic text buffering system that displays accumulated text
- Auto-hide overlay after configurable idle time
- Tab audio capture with background processing
## Installation
### Development Mode
1. Clone the repository
2. Open Chrome and navigate to `chrome://extensions`
3. Enable "Developer mode" in the top-right corner
4. Click "Load unpacked" and select the `src` directory from this project
### Build for Production
```
# Future build steps will go here
```
## Usage
1. Click the extension icon in your browser toolbar
2. Enter the WebSocket server URL (default: `ws://localhost:43007`)
3. Click "Start Transcription" to begin capturing audio
4. Speak, and transcription will appear as an overlay on the current tab
5. Drag the overlay to reposition it if needed
6. Click "Stop Transcription" when done
## Configuration
Access the extension options page to configure:
- Text size (small, medium, large)
- Overlay opacity
- Number of text lines to display in buffer
- Minimum text length before displaying
- Maximum idle time before displaying accumulated text
- Overlay hide timeout after inactivity
## Architecture
The extension consists of several components:
- **Background Script**: Manages WebSocket connections and coordinates audio processing
- **Content Script**: Creates and controls the transcription overlay on web pages
- **Offscreen Document**: Handles audio capture and processing (using Chrome's offscreen API)
- **WebSocket Client**: Communicates with the transcription server
- **Audio Processor**: Converts and resamples audio for the Whisper API
For more details about the architecture, see [ARCHITECTURE.md](ARCHITECTURE.md).
## Server Requirements
This extension requires a compatible WebSocket server running the Whisper speech-to-text model. The server should:
1. Accept WebSocket connections
2. Receive audio data as 16-bit PCM at 16kHz
3. Return JSON messages with transcription text
## Development
### Testing
```
npm test
```
The project uses Jest for testing components. Test files are located in the `__tests__` directory.
## Browser Compatibility
- Chrome 116+
- Edge (Chromium-based)
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.