Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/josefalbers/whisper-turbo-mlx
Blazing fast whisper turbo for ASR (speech-to-text) tasks
https://github.com/josefalbers/whisper-turbo-mlx
asr deep-learning mlx speech-recognition speech-to-text whisper whisper-turbo
Last synced: 7 days ago
JSON representation
Blazing fast whisper turbo for ASR (speech-to-text) tasks
- Host: GitHub
- URL: https://github.com/josefalbers/whisper-turbo-mlx
- Owner: JosefAlbers
- License: mit
- Created: 2024-10-17T09:31:49.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-10-20T04:46:00.000Z (2 months ago)
- Last Synced: 2024-12-16T07:03:55.607Z (7 days ago)
- Topics: asr, deep-learning, mlx, speech-recognition, speech-to-text, whisper, whisper-turbo
- Language: Python
- Homepage:
- Size: 527 KB
- Stars: 176
- Watchers: 7
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WTM (Whisper Turbo MLX)
This repository provides a fast and lightweight implementation of the [Whisper](https://github.com/openai/whisper/discussions/2363) model using [MLX](https://github.com/ml-explore/mlx-examples/tree/main/whisper), all contained within a single file of under 300 lines, designed for efficient audio transcription.
![Alt text](https://raw.githubusercontent.com/JosefAlbers/whisper-turbo-mlx/main/assets/benchmark.png)
## Installation
```zsh
brew install ffmpeg
git clone https://github.com/JosefAlbers/whisper-turbo-mlx.git
cd whisper-turbo-mlx
pip install -e .
```## Quick Start
To transcribe an audio file:
```zsh
wtm test.wav
```To use the library in a Python script:
```python
>>> from whisper_turbo import transcribe
>>> transcribe('test.wav', any_lang=True)
```## Quick Parameter
The `quick` parameter allows you to choose between two transcription methods:
- **`quick=True`**: Utilizes a parallel processing method for faster transcription. This method may produce choppier output but is significantly quicker, ideal for situations where speed is a priority (e.g., for feeding the generated transcripts into an LLM to collect quick summaries on many audio recordings).
- **`quick=False`** (default): Engages a recurrent processing method that is slower but yields more faithful and coherent transcriptions (still faster than other reference implementations).You can specify this parameter when calling the `transcribe` function:
```zsh
wtm --quick=True
``````python
>>> transcribe('test.wav', quick=True)
```## Acknowledgements
This project builds upon the reference [MLX implementation](https://github.com/ml-explore/mlx-examples/tree/main/whisper) of the Whisper model. Great thanks to the contributors of the MLX project for their exceptional work and inspiration.
## Contributing
Contributions are welcome! Feel free to submit issues or pull requests.