{"id":34501327,"url":"https://github.com/kurkigal/speech-to-text-service","last_synced_at":"2026-04-20T08:31:05.511Z","repository":{"id":329954709,"uuid":"1121115595","full_name":"KurKigal/speech-to-text-service","owner":"KurKigal","description":"Speech-to-text service powered by Faster-Whisper, FastAPI, and Typer.","archived":false,"fork":false,"pushed_at":"2025-12-22T13:33:57.000Z","size":18,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-24T00:30:15.240Z","etag":null,"topics":["fastapi","python","python-programming","ruff","speech-to-text","stt"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KurKigal.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-22T13:12:25.000Z","updated_at":"2025-12-23T08:26:03.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/KurKigal/speech-to-text-service","commit_stats":null,"previous_names":["kurkigal/speech-to-text-service"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/KurKigal/speech-to-text-service","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurKigal%2Fspeech-to-text-service","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurKigal%2Fspeech-to-text-service/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurKigal%2Fspeech-to-text-service/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurKigal%2Fspeech-to-text-service/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KurKigal","download_url":"https://codeload.github.com/KurKigal/speech-to-text-service/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KurKigal%2Fspeech-to-text-service/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27992996,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-24T02:00:07.193Z","response_time":83,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","python","python-programming","ruff","speech-to-text","stt"],"created_at":"2025-12-24T02:00:55.519Z","updated_at":"2025-12-24T02:01:44.677Z","avatar_url":"https://github.com/KurKigal.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Speech-to-Text Service\n\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n[![Code Style: Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)\n[![FastAPI](https://img.shields.io/badge/FastAPI-0.109+-009688.svg?style=flat\u0026logo=fastapi\u0026logoColor=white)](https://fastapi.tiangolo.com)\n\nA high-performance, asynchronous speech-to-text service powered by **Faster-Whisper**, **FastAPI**, and **Typer**.\n\nThis project provides a robust architecture for audio transcription, exposing both a RESTful API for microservices integration and a CLI for offline batch processing. It features modular design, lazy model loading, and rigorous configuration management, making it suitable for both research and production environments.\n\n## Project Status (Demo / MVP)\n\nThis repository is currently a **demo / MVP** implementation intended to showcase a clean, production-oriented architecture for speech-to-text services (FastAPI + Faster-Whisper + CLI).\n\nWhile the core API/CLI workflow is functional, the project is **actively evolving** and several capabilities are still planned or experimental. Expect breaking changes, refactors, and incremental improvements as the roadmap items are implemented.\n\nContributions, suggestions, and issue reports are welcome.\n\n## Key Features\n\n- **High Performance:** Utilizes `faster-whisper` (CTranslate2) for up to 4x faster inference than OpenAI's original implementation.\n- **Production Ready:** FastAPI factory pattern with health checks (`/v1/health`) and efficient resource management.\n- **Dual Interface:** - **REST API:** Fully typed endpoints for seamless integration.\n  - **CLI Tool:** Typer-based command-line interface for local testing and automation.\n- **Robust Engineering:**\n  - Lazy-loading strategies to optimize memory usage.\n  - Audio validation and normalization (16 kHz resampling) utilities.\n  - Centralized configuration via Pydantic Settings (environment variable driven).\n- **Quality Assurance:** Comprehensive `pytest` suite with async HTTP fixtures and service mocking.\n\n## Project Structure\n\nThe project follows a modern `src`-layout to prevent import errors and separate source code from tests/scripts.\n\n```text\n.\n├── pyproject.toml              # Build metadata, dependencies, and tool configs\n├── README.md                   # Project documentation\n├── scripts/\n│   └── download_model.py       # Helper script to pre-fetch model weights\n├── src/\n│   └── stt_service/\n│       ├── app.py              # FastAPI application factory\n│       ├── cli.py              # CLI entrypoint (Typer)\n│       ├── config.py           # Environment \u0026 Settings management\n│       ├── models.py           # Shared Data Transfer Objects (DTOs)\n│       ├── api/\n│       │   └── routes.py       # API Route definitions\n│       ├── services/\n│       │   └── transcription.py# Core business logic adapter\n│       └── utils/\n│           └── audio.py        # Audio processing utilities\n└── tests/\n    ├── conftest.py             # Pytest fixtures\n    ├── test_api.py             # Integration tests\n    └── test_transcription_service.py\n\n```\n\n## Getting Started\n\n### Prerequisites\n\n* Python 3.9 or higher\n* FFmpeg (required for audio processing)\n\n### Installation\n\n1. **Clone and Setup Virtual Environment**\n```powershell\n# Create a virtual environment\npython -m venv .venv\n\n# Activate environment (Windows)\n.\\.venv\\Scripts\\activate\n# For Linux/Mac: source .venv/bin/activate\n\n```\n\n2. **Install Dependencies**\n```powershell\n# Install project in editable mode\npip install -e .\n\n# Install development dependencies (testing, linting)\npip install -e .[dev]\n\n```\n\n3. **Configuration (Optional)**\n\nThe service is configured via environment variables. You can create a `.env` file in the root directory.\n\n| Variable | Default | Description |\n| --- | --- | --- |\n| `STT_WHISPER_MODEL_SIZE` | `base` | Model size (tiny, base, small, medium, large-v2) |\n| `STT_WHISPER_COMPUTE_TYPE` | `int8` | Quantization type (`float16` for GPU, `int8` for CPU) |\n| `STT_DEVICE` | `auto` | `cuda` or `cpu` |\n\n4. **Download Model Weights**\n\nRecommended to run before starting the service to avoid timeouts on the first request.\n\n```powershell\npython scripts/download_model.py\n\n```\n\n## Usage\n\n### 1. Running the API Server\n\nStart the production server using the CLI wrapper or Uvicorn directly.\n\n```powershell\n# Using the CLI wrapper\nstt-service serve --host 0.0.0.0 --port 8000\n\n# OR using Uvicorn directly\nuvicorn stt_service.app:create_app --factory --reload\n\n```\n\n*Swagger UI will be available at: `http://localhost:8000/docs`*\n\n### 2. Using the CLI\n\nTranscribe audio files directly from your terminal.\n\n```powershell\nstt-service transcribe path/to/audio.wav --language en\n\n```\n\n## API Reference\n\n* **GET** `/v1/health`  \n  Returns service status and loaded model information.\n\n* **POST** `/v1/transcribe`  \n  Upload an audio file for transcription.  \n  Supports parameters for language and beam size.\n\n## Development\n\nTo maintain code quality, we use `ruff` for linting and formatting.\n\n```powershell\n# Run tests\npytest\n\n# Run linter\nruff check .\n\n# Format code\nruff format .\n\n```\n\n## Roadmap (Planned Improvements)\n\nThe following items are not yet implemented and represent the next iterations for this demo/MVP:\n\n* [ ] implementation of speaker diarization.\n* [ ] WebSocket support for real-time streaming transcription.\n* [ ] Database integration for persistent transcript storage.\n* [ ] Web-based UI for easier file uploads.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkurkigal%2Fspeech-to-text-service","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkurkigal%2Fspeech-to-text-service","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkurkigal%2Fspeech-to-text-service/lists"}