Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mustafaaljadery/lightning-whisper-mlx
An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
https://github.com/mustafaaljadery/lightning-whisper-mlx
Last synced: about 13 hours ago
JSON representation
An extremely fast implementation of whisper optimized for Apple Silicon using MLX.
- Host: GitHub
- URL: https://github.com/mustafaaljadery/lightning-whisper-mlx
- Owner: mustafaaljadery
- Created: 2024-03-23T18:41:23.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-05-08T21:55:26.000Z (6 months ago)
- Last Synced: 2024-08-02T13:19:19.141Z (3 months ago)
- Language: Python
- Homepage: https://mustafaaljadery.github.io/lightning-whisper-mlx/
- Size: 2.87 MB
- Stars: 476
- Watchers: 8
- Forks: 21
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Lightning Whisper MLX
An incredibly fast implementation of Whisper optimized for Apple Silicon.
![Whisper Decoding Speed](./speed_image.png)
10x faster than Whisper CPP, 4x faster than current MLX Whisper implementation.
## Features
- **Batched Decoding** -> Higher Throughput
- **Distilled Models** -> Faster Decoding (less layers)
- **Quantized Models** -> Faster Memory Movement
- _Coming Soon: Speculative Decoding -> Faster Decoding with Assistant Model_## Installation
Install lightning whisper mlx using pip:
```bash
pip install lightning-whisper-mlx
```## Usage
Models
```
["tiny", "small", "distil-small.en", "base", "medium", distil-medium.en", "large", "large-v2", "distil-large-v2", "large-v3", "distil-large-v3"]
```Quantization
```
[None, "4bit", "8bit"]
```#### Example
```python
from lightning_whisper_mlx import LightningWhisperMLXwhisper = LightningWhisperMLX(model="distil-medium.en", batch_size=12, quant=None)
text = whisper.transcribe(audio_path="/audio.mp3")['text']
print(text)
```## Notes
- The default batch_size is `12`, higher is better for throughput but you might run into memory issues. The heuristic is it really depends on the size of the model. If you are running the smaller models, then higher batch size, larger models, lower batch size. Also keep in mind your unified memory!
## Credits
- [Mustafa](https://github.com/mustafaaljadery) - Creator of Lightning Whisper MLX
- [Awni](https://github.com/awni) - Implementation of Whisper MLX (I built on top of this)
- [Vaibhav](https://github.com/Vaibhavs10) - Inspired me to build this (He created a version optimized for Cuda)