Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mallorbc/whisper_mic
Project that allows one to use a microphone with OpenAI whisper.
https://github.com/mallorbc/whisper_mic
microphone speech-recognition speech-to-text whisper whisper-ai whisper-api
Last synced: 2 days ago
JSON representation
Project that allows one to use a microphone with OpenAI whisper.
- Host: GitHub
- URL: https://github.com/mallorbc/whisper_mic
- Owner: mallorbc
- License: mit
- Created: 2022-09-23T14:05:52.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-04T07:19:05.000Z (6 months ago)
- Last Synced: 2025-01-03T10:06:32.305Z (9 days ago)
- Topics: microphone, speech-recognition, speech-to-text, whisper, whisper-ai, whisper-api
- Language: Python
- Homepage:
- Size: 54.7 KB
- Stars: 728
- Watchers: 21
- Forks: 160
- Open Issues: 22
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-openai-whisper - Whisper Mic - Project that allows one to use a microphone with OpenAI whisper
README
# Whisper Mic
This repo is based on the work done [here](https://github.com/openai/whisper) by OpenAI. This repo allows you use use a mic as demo. This repo copies some of the README from the original project.## Video Tutorial
The latest video tutorial for this repo can be seen [here](https://youtu.be/S58MGCU7Wgg)
An older video tutorial for this repo can be seen [here](https://www.youtube.com/watch?v=nwPaRSlDSaY)
### Professional Assistance
If are in need of paid professional help, that is available through this [email](mailto:[email protected])
## Setup
Now a pip package!
1. Create a venv of your choice.
2. Run ```pip install whisper-mic```## Available models and languages
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
| tiny | 39 M | `tiny.en` | `tiny` | ~1 GB | ~32x |
| base | 74 M | `base.en` | `base` | ~1 GB | ~16x |
| small | 244 M | `small.en` | `small` | ~2 GB | ~6x |
| medium | 769 M | `medium.en` | `medium` | ~5 GB | ~2x |
| large | 1550 M | N/A | `large` | ~10 GB | 1x |For English-only applications, the `.en` models tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models.
## Microphone Demo
You can use the model with a microphone using the ```whisper_mic``` program. Use ```-h``` to see flag options.
Some of the more important flags are the ```--model``` and ```--english``` flags.
## Transcribing To A File
Using the command: ```whisper_mic --loop --dictate``` will type the words you say on your active cursor.
## Usage In Other Projects
You can use this code in other projects rather than just use it for a demo. You can do this with the ```listen``` method.
```python
from whisper_mic import WhisperMicmic = WhisperMic()
result = mic.listen()
print(result)
```Check out what the possible arguments are by looking at the ```cli.py``` file
## Troubleshooting
If you are having issues, try the following:
```
sudo apt install portaudio19-dev python3-pyaudio
```## Contributing
Some ideas that you can add are:
1. Supporting different implementations of Whisper
2. Adding additional optional functionality.
3. Add tests## License
The model weights of Whisper are released under the MIT License. See their repo for more information.
This code under this repo is under the MIT license. See [LICENSE](LICENSE) for further details.
## Thanks
Until recently, access to high performing speech to text models was only available through paid serviecs. With this release, I am excited for the many applications that will come.