{"id":47716870,"url":"https://github.com/alez007/yasha","last_synced_at":"2026-04-07T10:00:52.346Z","repository":{"id":309344627,"uuid":"1032537538","full_name":"alez007/yasha","owner":"alez007","description":"Yet another self-hosted agent","archived":false,"fork":false,"pushed_at":"2026-04-05T05:32:24.000Z","size":2357,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-05T07:52:32.684Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alez007.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-08-05T13:01:43.000Z","updated_at":"2026-04-05T05:32:32.000Z","dependencies_parsed_at":"2025-08-11T11:56:26.721Z","dependency_job_id":"06c95635-409c-43d9-839e-dec1354b7cfe","html_url":"https://github.com/alez007/yasha","commit_stats":null,"previous_names":["alez007/yasha"],"tags_count":11,"template":false,"template_full_name":null,"purl":"pkg:github/alez007/yasha","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alez007%2Fyasha","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alez007%2Fyasha/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alez007%2Fyasha/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alez007%2Fyasha/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alez007","download_url":"https://codeload.github.com/alez007/yasha/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alez007%2Fyasha/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31508282,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-07T03:10:19.677Z","status":"ssl_error","status_checked_at":"2026-04-07T03:10:13.982Z","response_time":105,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-02T19:01:05.664Z","updated_at":"2026-04-07T10:00:52.305Z","avatar_url":"https://github.com/alez007.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Yasha\n\nSelf-hosted, multi-model AI inference server. Runs LLMs alongside specialized models (TTS, speech-to-text, embeddings) on a single GPU, exposing an OpenAI-compatible API. Built on [vLLM](https://github.com/vllm-project/vllm) and [Ray](https://github.com/ray-project/ray).\n\n## Requirements\n\n- **NVIDIA GPU** — 16 GB+ VRAM recommended for a full stack (LLM + TTS + STT + embeddings); 8 GB is sufficient for lighter setups\n- **Docker** with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)\n- **HuggingFace token** for gated models\n\n## Features\n\n- **Multi-model on a single GPU** — run chat, embedding, STT, and TTS models simultaneously with tunable per-model GPU memory allocation\n- **Per-model isolated deployments** — each model runs in its own Ray Serve deployment with independent lifecycle, health checks, and failure isolation\n- **OpenAI-compatible API** — drop-in replacement for any OpenAI SDK client\n- **Streaming** — SSE streaming for chat completions and TTS audio\n- **Tool/function calling** — auto tool choice with configurable parsers\n- **Plugin system** — opt-in TTS backends installed as isolated uv workspace packages\n- **Multi-GPU support** — assign models to specific GPUs by index or named Ray resource, with full tensor parallelism support\n- **Client disconnect detection** — cancels in-flight inference when the client disconnects, freeing GPU resources immediately\n- **Ray dashboard** — monitor deployments, resources, and request logs\n\n## Supported OpenAI Endpoints\n\n| Endpoint | Usecase |\n|---|---|\n| `POST /v1/chat/completions` | Chat / text generation (streaming and non-streaming) |\n| `POST /v1/embeddings` | Text embeddings |\n| `POST /v1/audio/transcriptions` | Speech-to-text |\n| `POST /v1/audio/translations` | Audio translation |\n| `POST /v1/audio/speech` | Text-to-speech (SSE streaming or single-response) |\n| `GET /v1/models` | List available models |\n\n## Plugin Support\n\nYasha's TTS system is built around a plugin architecture — each TTS backend is an opt-in package with its own isolated dependencies. Plugins ship inside this repo (`plugins/`) or can be installed from PyPI.\n\nTo enable plugins, pass them as extras at sync time:\n\n```bash\nuv sync --extra kokoro\nuv sync --extra kokoro --extra orpheus  # multiple plugins\n```\n\nWhen using Docker, set the `YASHA_PLUGINS` environment variable:\n\n```\nYASHA_PLUGINS=kokoro,orpheus\n```\n\nFor a full guide on writing your own plugin, see [Plugin Development](docs/plugins.md).\n\n## Getting Started\n\nPull the latest image from GHCR:\n\n```bash\ndocker pull ghcr.io/alez007/yasha:latest\n```\n\nGrab an example config for your GPU and edit it to your liking:\n\n```bash\ndocker run --rm ghcr.io/alez007/yasha:latest cat /yasha/config/models.example.16GB.yaml \u003e models.yaml\n```\n\nStart the server, mounting your config and a cache directory so models are only downloaded once:\n\n```bash\ndocker run --rm --shm-size=8g --gpus all \\\n  -e HF_TOKEN=your_token_here \\\n  -e YASHA_PLUGINS=your_plugins_here \\\n  -v ./models.yaml:/yasha/config/models.yaml \\\n  -v ./models-cache:/yasha/.cache/models \\\n  -p 8265:8265 -p 8000:8000 ghcr.io/alez007/yasha:latest\n```\n\n- API: `http://localhost:8000`\n- Ray dashboard: `http://localhost:8265`\n\nExample configs are included for 8 GB, 16 GB, 24 GB, and 2×16 GB GPU setups.\n\n## Development\n\nSee [Development](docs/development.md) for instructions on building the dev image, running with live source mounting, and attaching to the container.\n\n## Model Configuration\n\nSee [Model Configuration](docs/model-configuration.md) for a full reference of all `models.yaml` fields, GPU pinning options, and environment variables.\n\n## Home Assistant Integration\n\nYasha can serve as a voice backend for [Home Assistant](https://www.home-assistant.io/) via the Wyoming protocol. See [Home Assistant Integration](docs/home-assistant.md) for setup instructions.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falez007%2Fyasha","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falez007%2Fyasha","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falez007%2Fyasha/lists"}