https://github.com/saharmor/whisper-playground

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/
https://github.com/saharmor/whisper-playground

machine-learning openai speech-recognition speech-to-text whisper

Last synced: 6 months ago
JSON representation

Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/

Host: GitHub
URL: https://github.com/saharmor/whisper-playground
Owner: saharmor
License: mit
Created: 2022-10-02T13:56:28.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-04-23T14:58:23.000Z (over 1 year ago)
Last Synced: 2025-04-12T10:59:48.716Z (6 months ago)
Topics: machine-learning, openai, speech-recognition, speech-to-text, whisper
Language: Python
Homepage:
Size: 407 KB
Stars: 812
Watchers: 13
Forks: 140
Open Issues: 16
Metadata Files:
- Readme: README.md
- Funding: .github/FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

awesome-openai-whisper - Whisper Playground - Build real time speech2text web apps using OpenAI's Whisper
awesome-ChatGPT-repositories - whisper-playground - Build real time speech2text web apps using OpenAI's Whisper https://openai.com/blog/whisper/ (NLP)

Whisper Playground

Instantly build real-time speech2text apps in 99 languages using faster-whisper, Diart, and Pyannote

Try it via the online demo

[![visitors](https://hits.sh/github.com/saharmor/whisper-playground.svg?style=plastic&label=visitors&extraCount=55288)](https://hits.sh/github.com/saharmor/whisper-playground/)

https://github.com/ethanzrd/whisper-playground/assets/79014814/44a9bcf0-e374-4c71-8189-1d99824fbdc5

# Setup
1. Have [`Conda`](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) and [`Yarn`](https://classic.yarnpkg.com/lang/en/docs/install/#mac-stable) on your device
2. Clone or fork this repository
3. Install the backend and frontend environment `sh install_playground.sh`
4. Review config.py to make sure the transcription device and compute type match your setup. Review config.js to make sure it conforms to the backend config and that the backend address is correct.
5. Run the backend `cd backend && python server.py`
6. In a different terminal, run the React frontend `cd interface && yarn start`

### Access to Pyannote Models

This repository uses libraries based on pyannote.audio models, which are stored in the Hugging Face Hub. You must accept their terms of use before using them.
Note: You need to have a Hugging Face account to use pyannote

1. Accept terms for the [`pyannote/segmentation`](https://huggingface.co/pyannote/segmentation) model
2. Accept terms for the [`pyannote/embedding`](https://huggingface.co/pyannote/embedding) model
3. Accept terms for the [`pyannote/speaker-diarization`](https://huggingface.co/pyannote/speaker-diarization) model
4. Install [huggingface-cli](https://huggingface.co/docs/huggingface_hub/quick-start#install-the-hub-library) and [log in](https://huggingface.co/docs/huggingface_hub/quick-start#login) with your user access token (can be found in Settings -> Access Tokens)

# Parameters

- Model Size: Choose the model size, from tiny to large-v2.
- Language: Select the language you will be speaking in.
- Transcription Timeout: Set the number of seconds the application will wait before transcribing the current audio data.
- Beam Size: Adjust the number of transcriptions generated and considered, which affects accuracy and transcription generation time.
- Transcription Method: Choose "real-time" for real-time diarization and transcriptions, or "sequential" for periodic transcriptions with more context.

## Troubleshooting

- On MacOS, if building the wheel for safetensors fails, install Rust `brew install rust` and try again.

## Known Bugs

1. [In the sequential mode, there may be uncontrolled speaker swapping.](https://github.com/saharmor/whisper-playground/issues/27)
2. [In real-time mode, audio data not meeting the transcription timeout won't be transcribed.](https://github.com/saharmor/whisper-playground/issues/28)

This repository hasn't been tested for all languages; please create an issue if you encounter any problems.

## License

This repository and the code and model weights of Whisper are released under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/saharmor/whisper-playground

Awesome Lists containing this project

README

Whisper Playground

Instantly build real-time speech2text apps in 99 languages using faster-whisper, Diart, and Pyannote

Try it via the online demo