https://github.com/diegodscamara/whisperlive
🎙️ WhisperLive: Real-time audio transcription powered by OpenAI's Whisper model. Convert live speech to text with high accuracy, supporting multiple languages and real-time processing. Perfect for accessibility, content creation, and live captioning.
https://github.com/diegodscamara/whisperlive
ai blackhole numpy python wave whisper
Last synced: 7 months ago
JSON representation
🎙️ WhisperLive: Real-time audio transcription powered by OpenAI's Whisper model. Convert live speech to text with high accuracy, supporting multiple languages and real-time processing. Perfect for accessibility, content creation, and live captioning.
- Host: GitHub
- URL: https://github.com/diegodscamara/whisperlive
- Owner: diegodscamara
- License: mit
- Created: 2025-03-31T12:24:35.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-03-31T12:31:00.000Z (7 months ago)
- Last Synced: 2025-03-31T13:50:44.522Z (7 months ago)
- Topics: ai, blackhole, numpy, python, wave, whisper
- Language: Python
- Homepage:
- Size: 5.86 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WhisperLive 🎙️
Real-time audio transcription powered by OpenAI's Whisper model.
## Features ✨
- Real-time audio capture and transcription
- Support for multiple languages (auto-detection)
- Continuous streaming transcription
- Automatic sentence detection and formatting
- Save transcriptions to text files
- Hallucination prevention mechanisms
- Clean, formatted output with timestamps## Requirements 📋
- Python 3.8+
- BlackHole 2ch (or similar virtual audio device)
- Required Python packages:
- sounddevice
- numpy
- whisper
- wave## Installation 🚀
1. Clone the repository:
```bash
git clone https://github.com/diegodscamara/whisperlive.git
cd whisperlive
```2. Install dependencies:
```bash
pip install -r requirements.txt
```3. Install BlackHole 2ch (macOS):
```bash
brew install blackhole-2ch
```For other operating systems, use an equivalent virtual audio device.
## Models 🤖
WhisperLive uses OpenAI's Whisper models for transcription. The models are automatically downloaded when you first run the application. By default, it uses the "base" model, which offers a good balance between accuracy and performance.
The models are stored in the `models` directory but are not included in the repository due to their size. They will be downloaded automatically when needed.
Available models:
- `tiny` (74MB) - Fastest, least accurate
- `base` (142MB) - Good balance for most uses
- `small` (466MB) - More accurate but slower
- `medium` (1.5GB) - Even more accurate
- `large` (2.9GB) - Most accurate, slowestTo change the model, modify the `model_name` parameter in `main.py`:
```python
transcriber = WhisperTranscriber(model_name="base") # Change "base" to your preferred model
```## Usage 💡
1. Set up your virtual audio device (BlackHole 2ch) as your system's audio output.
2. Run the transcription:
```bash
python main.py
```3. Start speaking or playing audio. The transcription will appear in real-time.
4. Press `Ctrl+C` to stop the transcription.
## Output Files 📁
The app creates two types of files in the `recordings` directory:
- `recording_YYYYMMDD_HHMMSS.wav` - Audio recording
- `transcript_YYYYMMDD_HHMMSS.txt` - Text transcription## Configuration ⚙️
Default settings in `main.py`:
- Sample rate: 16000 Hz
- Buffer size: 5 seconds
- Minimum process size: 2 seconds
- Model: "base" (can be changed to other Whisper models)## Contributing 🤝
Contributions are welcome! Please feel free to submit a Pull Request.
## License 📄
MIT License - feel free to use this project for any purpose.
## Acknowledgments 🙏
- [OpenAI Whisper](https://github.com/openai/whisper) for the amazing speech recognition model
- [sounddevice](https://python-sounddevice.readthedocs.io/) for audio handling
- [BlackHole](https://github.com/ExistentialAudio/BlackHole) for virtual audio routing