{"id":29212169,"url":"https://github.com/dwain-barnes/vui-fastapi-server","last_synced_at":"2026-04-21T22:35:45.267Z","repository":{"id":300816119,"uuid":"1007266777","full_name":"dwain-barnes/vui-fastapi-server","owner":"dwain-barnes","description":"A  OpenAI-compatible Text-to-Speech API server powered by VUI - a small conversational speech model","archived":false,"fork":false,"pushed_at":"2025-06-23T18:21:14.000Z","size":12,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-02T22:36:18.121Z","etag":null,"topics":["fastapi","openai-compatible","text-to-speech","tts","vui"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dwain-barnes.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-23T18:11:05.000Z","updated_at":"2025-06-23T18:21:17.000Z","dependencies_parsed_at":"2025-06-23T19:28:46.639Z","dependency_job_id":"48faa5e3-c870-4a81-8e8a-890765efea17","html_url":"https://github.com/dwain-barnes/vui-fastapi-server","commit_stats":null,"previous_names":["dwain-barnes/vui-fastapi-server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dwain-barnes/vui-fastapi-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwain-barnes%2Fvui-fastapi-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwain-barnes%2Fvui-fastapi-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwain-barnes%2Fvui-fastapi-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwain-barnes%2Fvui-fastapi-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dwain-barnes","download_url":"https://codeload.github.com/dwain-barnes/vui-fastapi-server/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dwain-barnes%2Fvui-fastapi-server/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32113308,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-21T11:25:29.218Z","status":"ssl_error","status_checked_at":"2026-04-21T11:25:28.499Z","response_time":128,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["fastapi","openai-compatible","text-to-speech","tts","vui"],"created_at":"2025-07-02T22:30:38.143Z","updated_at":"2026-04-21T22:35:45.259Z","avatar_url":"https://github.com/dwain-barnes.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VUI FastAPI Server\n\nA high-performance OpenAI-compatible Text-to-Speech API server powered by [VUI](https://github.com/fluxions-ai/vui) - a small conversational speech model that runs on device.\n\n## Features\n\n- 🎯 **OpenAI-compatible API** - Drop-in replacement for OpenAI's TTS API\n- 🚀 **High Performance** - GPU acceleration with optional `torch.compile()` optimization\n- 🎵 **Multiple Audio Formats** - WAV, MP3, Opus, FLAC, AAC, PCM support\n- 📡 **Streaming Support** - Real-time audio streaming capabilities\n- 🐳 **Docker Ready** - Easy deployment with CUDA support\n- 💬 **Conversational Quality** - Natural speech with human-like characteristics\n\n## Quick Start\n\n### Prerequisites\n\n- NVIDIA GPU with CUDA support\n- Docker with NVIDIA Container Toolkit\n- Hugging Face account with accepted terms for:\n  - [pyannote/voice-activity-detection](https://huggingface.co/pyannote/voice-activity-detection)\n  - [pyannote/segmentation](https://huggingface.co/pyannote/segmentation)\n\n### Installation\n\n1. **Clone the repository**\n```bash\ngit clone https://github.com/dwain-barnes/vui-fastapi-server.git\ncd vui-fastapi-server\n```\n\n2. **Get your Hugging Face token**\n   - Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)\n   - Create a new token with read permissions\n   - Accept the terms for the required models (links above)\n\n3. **Build the Docker image**\n```bash\ndocker build --build-arg HUGGING_FACE_HUB_TOKEN=your_hf_token_here -t vui-fastapi .\n```\n\n4. **Run the server**\n```bash\ndocker run --gpus all -p 8000:8000 -e USE_GPU=1 vui-fastapi\n```\n\nThe server will be available at `http://localhost:8000`\n\n## API Usage\n\n### OpenAI-Compatible Endpoint\n\nThe server implements the OpenAI Text-to-Speech API specification:\n\n```bash\ncurl -X POST http://localhost:8000/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\n    \"model\": \"vui\",\n    \"input\": \"Hello world! This is a test of the VUI text-to-speech system.\",\n    \"voice\": \"default\",\n    \"response_format\": \"wav\"\n  }' \\\n  --output speech.wav\n```\n\n### Parameters\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `model` | string | `\"vui\"` | Model identifier (currently only \"vui\" supported) |\n| `input` | string | **required** | Text to convert to speech (max 4096 characters) |\n| `voice` | string | `null` | Voice selection (currently ignored) |\n| `response_format` | string | `\"wav\"` | Audio format: `wav`, `mp3`, `opus`, `flac`, `aac`, `pcm` |\n| `speed` | float | `1.0` | Speech speed (currently ignored) |\n| `stream` | boolean | `false` | Enable streaming response |\n\n### Examples\n\n**Basic usage:**\n```bash\ncurl -X POST http://localhost:8000/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"input\": \"Hello world!\", \"response_format\": \"mp3\"}' \\\n  --output hello.mp3\n```\n\n**Streaming response:**\n```bash\ncurl -X POST http://localhost:8000/v1/audio/speech \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"input\": \"This is a streaming test\", \"stream\": true}' \\\n  --output stream.wav\n```\n\n**Python example:**\n```python\nimport requests\n\nresponse = requests.post(\n    \"http://localhost:8000/v1/audio/speech\",\n    json={\n        \"model\": \"vui\",\n        \"input\": \"Hello from Python!\",\n        \"response_format\": \"wav\"\n    }\n)\n\nwith open(\"output.wav\", \"wb\") as f:\n    f.write(response.content)\n```\n\n## Development\n\n### Project Structure\n\n```\nvui-fastapi-server/\n├── main.py                 # FastAPI application\n├── requirements.txt        # Python dependencies\n├── download_pyannote.py    # PyAnnote model downloader\n├── Dockerfile             # Docker configuration\n└── README.md              # This file\n```\n\n### Local Development\n\n1. **Install dependencies**\n```bash\npip install -r requirements.txt\n```\n\n2. **Install VUI**\n```bash\ngit clone https://github.com/fluxions-ai/vui.git\npip install -e ./vui\n```\n\n3. **Set environment variables**\n```bash\nexport HUGGING_FACE_HUB_TOKEN=your_token_here\nexport USE_GPU=1  # Optional: force GPU usage\n```\n\n4. **Run the server**\n```bash\nuvicorn main:app --host 0.0.0.0 --port 8000\n```\n\n### Configuration\n\nEnvironment variables:\n- `USE_GPU`: Set to `1` to force GPU usage, `0` for CPU\n- `HUGGING_FACE_HUB_TOKEN`: Required for downloading gated models\n\n## Performance\n\nThe server includes several optimizations:\n\n- **GPU Acceleration**: Automatic CUDA detection and usage\n- **Model Compilation**: `torch.compile()` for improved inference speed\n- **Model Warmup**: Pre-loads and warms up the model during startup\n- **Streaming**: Chunked transfer encoding for real-time audio streaming\n\nTypical performance on modern GPUs:\n- **Generation Speed**: 1-5x real-time depending on text length\n- **Latency**: ~1-3 seconds for short phrases\n- **Quality**: High-quality conversational speech with natural characteristics\n\n## API Documentation\n\nOnce the server is running, visit:\n- **Interactive API docs**: http://localhost:8000/docs\n- **OpenAPI spec**: http://localhost:8000/v1/openapi.json\n\n## Troubleshooting\n\n### Common Issues\n\n**Model compilation errors:**\n```\nRuntimeError: Failed to find C compiler\n```\nThe Dockerfile includes build tools to resolve this. If building locally, install:\n```bash\n# Ubuntu/Debian\nsudo apt-get install build-essential gcc g++\n\n# macOS\nxcode-select --install\n```\n\n**GPU not detected:**\n```\nVUI model loaded on cpu\n```\nEnsure NVIDIA Container Toolkit is installed and `--gpus all` flag is used.\n\n**Hugging Face token errors:**\n```\nToken is required but no token found\n```\nMake sure you've accepted the terms for the required models and provided a valid token.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is built on top of [VUI](https://github.com/fluxions-ai/vui) by Fluxions AI. Please refer to the original VUI repository for licensing information.\n\n## Acknowledgments\n\n- [Fluxions AI](https://github.com/fluxions-ai) for the VUI model\n- [pyannote.audio](https://github.com/pyannote/pyannote-audio) for voice activity detection\n- OpenAI for the TTS API specification\n\n## Links\n\n- [VUI Model](https://huggingface.co/fluxions/vui)\n- [VUI Demo](https://huggingface.co/spaces/fluxions/vui-space)\n- [Original VUI Repository](https://github.com/fluxions-ai/vui)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdwain-barnes%2Fvui-fastapi-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdwain-barnes%2Fvui-fastapi-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdwain-barnes%2Fvui-fastapi-server/lists"}