Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kristofferv98/whisper_turboapi
An optimized FastAPI server for OpenAI's Whisper whisper-large-v3-turbo model using MLX turbo optimization
https://github.com/kristofferv98/whisper_turboapi
ai api asynchronous audio audio-processing fastapi huggingface machine-learning macos mlx model-serving nlp openai optimization python speech-to-text synchronous transcription whisper whisper-turbo
Last synced: about 1 month ago
JSON representation
An optimized FastAPI server for OpenAI's Whisper whisper-large-v3-turbo model using MLX turbo optimization
- Host: GitHub
- URL: https://github.com/kristofferv98/whisper_turboapi
- Owner: kristofferv98
- License: other
- Created: 2024-10-23T22:25:58.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-10-24T15:06:27.000Z (3 months ago)
- Last Synced: 2024-10-25T19:39:26.588Z (3 months ago)
- Topics: ai, api, asynchronous, audio, audio-processing, fastapi, huggingface, machine-learning, macos, mlx, model-serving, nlp, openai, optimization, python, speech-to-text, synchronous, transcription, whisper, whisper-turbo
- Language: Python
- Homepage:
- Size: 377 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WhisperTurboAPI
An optimized FastAPI server implementation of OpenAI's Whisper large-v3-turbo model using MLX optimization, designed for low-latency, asynchronous and synchronous audio transcription on macOS.
## Features
- ποΈ **Fast Audio Transcription:** Leverage the turbocharged, MLX-optimized Whisper large-v3-turbo model for quick and accurate transcriptions.
- π **RESTful API Access:** Easily integrate with any environment that supports HTTP requests.
- β‘ **Async/Sync Support:** Seamlessly handle both asynchronous and synchronous transcription requests.
- π **Low Latency:** Optimized for minimal delay by preloading models and efficient processing.
- π§ **High Throughput:** Capable of handling multiple concurrent transcription requests.
- π **Easy Setup:** Simple setup process.## System Requirements
- **Operating System:** macOS with Apple Silicon (recommended for optimal performance due to MLX optimizations)
- **Python Version:** 3.12.3 (required for MLX optimization)
- **Python Version Manager:** [pyenv](https://github.com/pyenv/pyenv) (recommended but optional)## Installation
### 1. Clone the Repository
```bash
git clone https://github.com/kristofferv98/whisper_turboapi.git
cd whisper_turboapi
```### 2. Run the Setup Script
The `setup.sh` script will handle the environment setup, including checking for the required Python version, creating a virtual environment, installing dependencies, and verifying the installation.
```bash
./setup.sh
```Optional Flags:
- `-y` or `--yes`: Run in headless mode (assume 'yes' to all prompts)
- `-p` or `--python-version`: Specify the required Python version (default: 3.12.3)
- `-f` or `--force`: Force actions like recreating the virtual environment
- `-h` or `--help`: Display help messageExample:
```bash
./setup.sh -y
```This will run the setup in headless mode, assuming 'yes' to all prompts.
### 3. Start the Server
After the setup completes, you can start the server using:
```bash
./start_server.sh
```The server supports several command-line options:
```bash
./start_server.sh --host=0.0.0.0 # Custom host (default: 0.0.0.0)
./start_server.sh --port=8080 # Custom port (default: 8000)
./start_server.sh --help # Show help message
```You can combine options:
```bash
./start_server.sh --host=127.0.0.1 --port=8080
```The server will automatically:
- Activate the virtual environment if not already active
- Verify all required packages are installed
- Start the FastAPI server with the specified host and port**Note**: The `start_server.sh` script ensures that the virtual environment is activated and that all required packages are installed before starting the server.
## Usage Examples
### Simple Python Client
Hereβs a basic synchronous client example using `requests`.
```python
import requests
import osdef transcribe_audio(file_path, quick=True, any_lang=True, server_url="http://localhost:8000"):
"""
Simplified function to transcribe an audio file using the WhisperTurboAPI server.Args:
file_path (str): Path to the audio file.
quick (bool, optional): Whether to use quick mode. Default is True.
any_lang (bool, optional): Whether to allow any language detection. Default is True.
server_url (str, optional): URL of the transcription server.Returns:
str: The transcribed text.
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"Audio file not found: {file_path}")
if not file_path.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')):
raise ValueError("Unsupported audio format. Use WAV, MP3, M4A, or FLAC")
with open(file_path, 'rb') as f:
files = {'file': (os.path.basename(file_path), f, 'audio/wav')}
params = {
'quick': str(quick).lower(),
'any_lang': str(any_lang).lower()
}
response = requests.post(f"{server_url}/transcribe", files=files, params=params)
if response.status_code == 200:
return response.json()['text']
else:
raise Exception(f"Request failed: {response.status_code}, {response.text}")# Example usage
if __name__ == "__main__":
# Replace 'path/to/audio.wav' with the path to your audio file
audio_file = 'path/to/audio.wav'
try:
transcription = transcribe_audio(audio_file, quick=True, any_lang=False)
print("Transcription:")
print(transcription)
except Exception as e:
print(f"An error occurred: {e}")
```### Asynchronous Client
An asynchronous client example using `aiohttp`.
```python
import aiohttp
import asyncio
import osasync def transcribe_async(file_path, quick=True, any_lang=True, server_url="http://localhost:8000"):
"""
Asynchronous function to transcribe an audio file using the WhisperTurboAPI server.Args:
file_path (str): Path to the audio file.
quick (bool, optional): Whether to use quick mode. Default is True.
any_lang (bool, optional): Whether to allow any language detection. Default is True.
server_url (str, optional): URL of the transcription server.Returns:
str: The transcribed text.
"""
if not os.path.exists(file_path):
raise FileNotFoundError(f"Audio file not found: {file_path}")if not file_path.lower().endswith(('.wav', '.mp3', '.m4a', '.flac')):
raise ValueError("Unsupported audio format. Use WAV, MP3, M4A, or FLAC")async with aiohttp.ClientSession() as session:
with open(file_path, 'rb') as f:
data = aiohttp.FormData()
data.add_field('file', f, filename=os.path.basename(file_path))
params = {
'quick': str(quick).lower(),
'any_lang': str(any_lang).lower()
}async with session.post(f"{server_url}/transcribe", data=data, params=params) as response:
if response.status == 200:
result = await response.json()
return result['text']
else:
error_detail = await response.text()
raise Exception(f"Error: {response.status} - {error_detail}")# Example usage
if __name__ == "__main__":
async def main():
audio_file = 'path/to/audio.wav'
try:
transcription = await transcribe_async(audio_file, quick=False, any_lang=True)
print("Transcription:")
print(transcription)
except Exception as e:
print(f"An error occurred: {e}")asyncio.run(main())
```### CURL
You can use `curl` to test the API. Here's a working example:
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "file=@path/to/your/audio.wav" \
http://localhost:8000/transcribe
```For example with a sample audio file:
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "file=@sample_audio.wav" \
http://localhost:8000/transcribe
```The response will be JSON formatted:
```json
{
"text": "Transcribed text here...",
"elapsed_time": 1.44,
"quick_mode": true,
"any_lang": true
}
```## API Reference
### POST /transcribe
Transcribe an audio file.
**Headers**:
- `Content-Type: multipart/form-data` (required)**Parameters**:
- `file` (form data, required): Audio file (WAV, MP3, M4A, FLAC)**Example Request**:
```bash
curl -X POST \
-H "Content-Type: multipart/form-data" \
-F "[email protected]" \
http://localhost:8000/transcribe
```**Response**:
```json
{
"text": "The transcribed text will appear here...",
"elapsed_time": 1.44,
"quick_mode": true,
"any_lang": true
}
```**Status Codes**:
- `200 OK`: Successful transcription
- `400 Bad Request`: Invalid file format or missing file
- `500 Internal Server Error`: Server processing error### GET /health
Check server status.
**Response**:
```json
{
"status": "healthy",
"version": "1.0.0"
}
```## Demo:
For testing, example clients are provided in:
- `examples/demo.py`: Demonstrates both synchronous and asynchronous client
- `examples/simple_demo.py`: Basic synchronous client## License
MIT License
## Acknowledgements
Based on `whisper-turbo-mlx` by JosefAlbers, which provides a fast and lightweight implementation of the Whisper model using MLX.