https://github.com/kurkigal/speech-to-text-service
Speech-to-text service powered by Faster-Whisper, FastAPI, and Typer.
https://github.com/kurkigal/speech-to-text-service
fastapi python python-programming ruff speech-to-text stt
Last synced: 2 months ago
JSON representation
Speech-to-text service powered by Faster-Whisper, FastAPI, and Typer.
- Host: GitHub
- URL: https://github.com/kurkigal/speech-to-text-service
- Owner: KurKigal
- License: mit
- Created: 2025-12-22T13:12:25.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2025-12-22T13:33:57.000Z (6 months ago)
- Last Synced: 2025-12-24T00:30:15.240Z (6 months ago)
- Topics: fastapi, python, python-programming, ruff, speech-to-text, stt
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Speech-to-Text Service
[](https://www.python.org/downloads/)
[](https://opensource.org/licenses/MIT)
[](https://github.com/astral-sh/ruff)
[](https://fastapi.tiangolo.com)
A high-performance, asynchronous speech-to-text service powered by **Faster-Whisper**, **FastAPI**, and **Typer**.
This project provides a robust architecture for audio transcription, exposing both a RESTful API for microservices integration and a CLI for offline batch processing. It features modular design, lazy model loading, and rigorous configuration management, making it suitable for both research and production environments.
## Project Status (Demo / MVP)
This repository is currently a **demo / MVP** implementation intended to showcase a clean, production-oriented architecture for speech-to-text services (FastAPI + Faster-Whisper + CLI).
While the core API/CLI workflow is functional, the project is **actively evolving** and several capabilities are still planned or experimental. Expect breaking changes, refactors, and incremental improvements as the roadmap items are implemented.
Contributions, suggestions, and issue reports are welcome.
## Key Features
- **High Performance:** Utilizes `faster-whisper` (CTranslate2) for up to 4x faster inference than OpenAI's original implementation.
- **Production Ready:** FastAPI factory pattern with health checks (`/v1/health`) and efficient resource management.
- **Dual Interface:** - **REST API:** Fully typed endpoints for seamless integration.
- **CLI Tool:** Typer-based command-line interface for local testing and automation.
- **Robust Engineering:**
- Lazy-loading strategies to optimize memory usage.
- Audio validation and normalization (16 kHz resampling) utilities.
- Centralized configuration via Pydantic Settings (environment variable driven).
- **Quality Assurance:** Comprehensive `pytest` suite with async HTTP fixtures and service mocking.
## Project Structure
The project follows a modern `src`-layout to prevent import errors and separate source code from tests/scripts.
```text
.
├── pyproject.toml # Build metadata, dependencies, and tool configs
├── README.md # Project documentation
├── scripts/
│ └── download_model.py # Helper script to pre-fetch model weights
├── src/
│ └── stt_service/
│ ├── app.py # FastAPI application factory
│ ├── cli.py # CLI entrypoint (Typer)
│ ├── config.py # Environment & Settings management
│ ├── models.py # Shared Data Transfer Objects (DTOs)
│ ├── api/
│ │ └── routes.py # API Route definitions
│ ├── services/
│ │ └── transcription.py# Core business logic adapter
│ └── utils/
│ └── audio.py # Audio processing utilities
└── tests/
├── conftest.py # Pytest fixtures
├── test_api.py # Integration tests
└── test_transcription_service.py
```
## Getting Started
### Prerequisites
* Python 3.9 or higher
* FFmpeg (required for audio processing)
### Installation
1. **Clone and Setup Virtual Environment**
```powershell
# Create a virtual environment
python -m venv .venv
# Activate environment (Windows)
.\.venv\Scripts\activate
# For Linux/Mac: source .venv/bin/activate
```
2. **Install Dependencies**
```powershell
# Install project in editable mode
pip install -e .
# Install development dependencies (testing, linting)
pip install -e .[dev]
```
3. **Configuration (Optional)**
The service is configured via environment variables. You can create a `.env` file in the root directory.
| Variable | Default | Description |
| --- | --- | --- |
| `STT_WHISPER_MODEL_SIZE` | `base` | Model size (tiny, base, small, medium, large-v2) |
| `STT_WHISPER_COMPUTE_TYPE` | `int8` | Quantization type (`float16` for GPU, `int8` for CPU) |
| `STT_DEVICE` | `auto` | `cuda` or `cpu` |
4. **Download Model Weights**
Recommended to run before starting the service to avoid timeouts on the first request.
```powershell
python scripts/download_model.py
```
## Usage
### 1. Running the API Server
Start the production server using the CLI wrapper or Uvicorn directly.
```powershell
# Using the CLI wrapper
stt-service serve --host 0.0.0.0 --port 8000
# OR using Uvicorn directly
uvicorn stt_service.app:create_app --factory --reload
```
*Swagger UI will be available at: `http://localhost:8000/docs`*
### 2. Using the CLI
Transcribe audio files directly from your terminal.
```powershell
stt-service transcribe path/to/audio.wav --language en
```
## API Reference
* **GET** `/v1/health`
Returns service status and loaded model information.
* **POST** `/v1/transcribe`
Upload an audio file for transcription.
Supports parameters for language and beam size.
## Development
To maintain code quality, we use `ruff` for linting and formatting.
```powershell
# Run tests
pytest
# Run linter
ruff check .
# Format code
ruff format .
```
## Roadmap (Planned Improvements)
The following items are not yet implemented and represent the next iterations for this demo/MVP:
* [ ] implementation of speaker diarization.
* [ ] WebSocket support for real-time streaming transcription.
* [ ] Database integration for persistent transcript storage.
* [ ] Web-based UI for easier file uploads.
---