{"id":49760309,"url":"https://github.com/michaelasper/kir-ai","last_synced_at":"2026-05-24T05:04:20.004Z","repository":{"id":356809323,"uuid":"1232525153","full_name":"michaelasper/kir-ai","owner":"michaelasper","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-11T03:45:08.000Z","size":6976,"stargazers_count":0,"open_issues_count":69,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-11T05:15:34.366Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/michaelasper.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-08T02:42:01.000Z","updated_at":"2026-05-11T03:45:12.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/michaelasper/kir-ai","commit_stats":null,"previous_names":["michaelasper/kir-ai"],"tags_count":54,"template":false,"template_full_name":null,"purl":"pkg:github/michaelasper/kir-ai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelasper%2Fkir-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelasper%2Fkir-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelasper%2Fkir-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelasper%2Fkir-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/michaelasper","download_url":"https://codeload.github.com/michaelasper/kir-ai/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/michaelasper%2Fkir-ai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32979305,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-13T06:31:55.726Z","status":"ssl_error","status_checked_at":"2026-05-13T06:31:51.336Z","response_time":115,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-11T05:10:38.964Z","updated_at":"2026-05-16T02:23:48.121Z","avatar_url":"https://github.com/michaelasper.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"assets/kir-ai.png\"\u003e\n    \u003csource media=\"(prefers-color-scheme: light)\" srcset=\"assets/kir-ai.png\"\u003e\n    \u003cimg alt=\"kir-ai\" src=\"assets/kir-ai.png\" width=\"120\"\u003e\n  \u003c/picture\u003e\n\n  \u003ch1\u003ekir-ai\u003c/h1\u003e\n  \u003cp\u003eRust-first local inference on Apple Silicon with explicit, OpenAI-compatible runtime boundaries.\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![License][license-shield]][license-url]\n[![CI][ci-shield]][ci-url]\n[![Release][release-shield]][release-url]\n[![Rust][rust-shield]][rust-url]\n[![Apple Metal][metal-shield]][metal-url]\n[![Local Inference][inference-shield]][docs-setup]\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e ·\n  \u003ca href=\"#features--highlights\"\u003eFeatures\u003c/a\u003e ·\n  \u003ca href=\"#usage\"\u003eUsage\u003c/a\u003e ·\n  \u003ca href=\"#documentation-map\"\u003eDocs\u003c/a\u003e ·\n  \u003ca href=\"https://github.com/michaelasper/kir-ai/issues\"\u003eReport Bug\u003c/a\u003e\n\u003c/div\u003e\n\n**kir-ai** is an OpenAI-shaped local inference workspace for Apple Silicon that keeps core inference, request contracts, and safety checks in Rust. The project is built around explicit runtime selection: protocol verification, native Metal execution, and MLX sidecar interop all live behind the same CLI/server surface with strict capability boundaries.\n\n## Why / The Problem\n\nMany local inference stacks are easiest to ship with ad-hoc Python glue, but that coupling makes behaviour harder to audit and scale. `kir-ai` addresses this by making protocol handling and runtime orchestration explicit in a Rust workspace while preserving the API shape your clients already expect.\n\nYou get an engine that:\n- exposes OpenAI-style endpoints consistently,\n- fails closed for unsupported request features,\n- separates testing pathways from model-serving pathways,\n- and keeps model lifecycle (plan/pull/verify/serve) under explicit commands.\n\n## Features / Highlights\n\n- **OpenAI-compatible edge** for `/v1/chat/completions`, `/v1/completions`, streaming SSE, and model listing.\n- **Strict capability gating** in request validation and runtime mapping; unsupported features return stable errors instead of silent fallback behaviour.\n- **Two serving modes**: protocol-test mode for client contract work and snapshot-backed serving for native Metal/MLX paths.\n- **Native Metal first-class support** for Qwen and Gemma text pipelines with bounded prefill and typed cache identities.\n- **Model lifecycle tooling** in `llm-engine`: `model plan`, `model list`, `model inspect`, `model verify`, and `model pull`.\n- **Operational controls** with admin endpoints for metrics, snapshot verification/pull, lane-level request cancellation, and model metadata.\n- **Failure-safe semantics** including request validation for unsafe fields (`max_tokens`, sampling controls, stop sequences, tool schemas, malformed JSON, and token budgets).\n\n## When to Use\n\nUse `kir-ai` when you want a local inference server that is explicit about execution mode and protocol behaviour. If you are iterating on client integration, choose protocol-test mode first. If you are preparing model-backed inference runs, switch to snapshot-based serving.\n\nAvoid `kir-ai` as a first step if your immediate need is a managed multi-user cloud inference platform.\n\n## Quick Start\n\n1. Install and prepare the workspace.\n\n   ```sh\n   curl -fsSL https://raw.githubusercontent.com/michaelasper/kir-ai/main/scripts/install-macos.sh | bash\n   ```\n\n2. Start the protocol test backend.\n\n   ```sh\n   kirai\n   ```\n\n3. Send a smoke request.\n\n   ```sh\n   curl -s http://127.0.0.1:3000/v1/chat/completions \\\n     -H 'content-type: application/json' \\\n     -d '{\n       \"model\": \"local-qwen36\",\n       \"messages\": [{\"role\": \"user\", \"content\": \"hello\"}],\n       \"max_tokens\": 8\n     }' | jq\n   ```\n\nExpected response: OpenAI-shaped `chat.completion` JSON with `local-qwen36`.\n\n### Install and Runtime Options\n\n- `KIR_AI_DIR`, `KIR_AI_REF` choose install location and revision.\n- `KIR_AI_SKIP_BUILD=1` for dependency setup without compile.\n- `KIR_AI_SKIP_PYTHON=1` for Rust-only install paths.\n- `KIR_AI_FORCE_CLONE=1` to force a fresh checkout path.\n\nFor full script controls, see [`docs/ci-and-release.md`][docs-setup].\n\n### Serve with a Snapshot\n\n```sh\nkirai serve \\\n  --snapshot .llm-models/\u003cmanifest-snapshot-path\u003e \\\n  --model-id local-qwen36 \\\n  --max-new-tokens 256 \\\n  --max-prefill-tokens 2048\n```\n\nFor MLX manifests, set the loopback endpoint:\n\n```sh\nkirai serve \\\n  --snapshot .llm-models/\u003cmlx-snapshot-path\u003e \\\n  --loader mlx \\\n  --family qwen \\\n  --model-id local-qwen35-4b \\\n  --mlx-endpoint http://127.0.0.1:8080/v1\n```\n\n## Usage\n\n### Core Endpoints\n\n- `GET /health`\n- `GET /v1/models`\n- `GET /admin/models` and `/admin/models/{alias}`\n- `POST /v1/chat/completions` and `POST /v1/completions`\n- `POST /admin/models/{alias}/verify`\n- `POST /admin/models/{alias}/plan`\n- `POST /admin/models/{alias}/pull`\n- `POST /admin/requests/{request_id}/cancel`\n- `GET /admin/metrics`\n\nFor request and response examples, see [`docs/getting-started.md`][docs-getting-started].\nFor the full HTTP contract, see [`docs/http-api-reference.md`][http-api-doc].\n\n## Native Text Snapshot Flow\n\nUse `kirai` model commands to plan, inspect, verify, and pull profiles before serving.\n\n```sh\nkirai model plan Qwen/Qwen3-0.6B \\\n  --revision main \\\n  --profile qwen3-dense-safetensors-bf16\n\nkirai model pull Qwen/Qwen3.6-35B-A3B \\\n  --metadata-only \\\n  --model-home .llm-models\n\nkirai model inspect .llm-models/\u003csnapshot-path\u003e\n```\n\nWant direct source commands? Use `cargo run -p llm-engine -- ...` from a local checkout (development mode).\n\n## Documentation Map\n\n| Need | Document |\n| --- | --- |\n| Start with a working response | [`docs/getting-started.md`][docs-getting-started] |\n| Developer machine setup | [`docs/setup.md`][docs-setup] |\n| Run server and native text paths | [`docs/how-to-run-server.md`][docs-run-server] |\n| Model snapshot lifecycle | [`docs/how-to-manage-models.md`][docs-models] |\n| CLI reference | [`docs/cli-reference.md`][docs-cli] |\n| HTTP API reference | [`docs/http-api-reference.md`][http-api-doc] |\n| Configuration and formats | [`docs/configuration-reference.md`][docs-config] |\n| Project architecture | [`docs/architecture.md`][docs-architecture] |\n| CI and release details | [`docs/ci-and-release.md`][docs-ci-release] |\n| Development guide | [`docs/development.md`][docs-dev] |\n\nThe product direction and implementation milestones are tracked in [`rust-metal-inference-engine-north-star.md`][north-star].\n\n## Current Limitations\n\n- Native Metal text execution currently covers dense Qwen, Qwen3/Qwen3.6 MoE, and Gemma 4 paths.\n- Native paths are correctness-first and intentionally conservative for sampling and throughput.\n- The server does not execute `generation_config.json` or downloaded chat templates (`chat_template.jinja`) as runtime config.\n- Tool-call and JSON-object validation paths may buffer to preserve fail-closed semantics.\n- Snapshot serving requires explicit backend mode; implicit no-snapshot stub serving is not supported.\n\n## Compatibility\n\n- Rust workspace version: `1.95`\n- Runtime target profile: Apple Silicon first-class, macOS-first CI.\n\n## License\n\nThis project is licensed under MIT. See upstream license terms at the official MIT license text.\n\n[ci-shield]: https://img.shields.io/github/actions/workflow/status/michaelasper/kir-ai/ci.yml?branch=main\u0026style=flat-square\u0026label=ci\n[ci-url]: https://github.com/michaelasper/kir-ai/actions/workflows/ci.yml\n[release-shield]: https://img.shields.io/github/actions/workflow/status/michaelasper/kir-ai/release.yml?label=release\u0026style=flat-square\n[release-url]: https://github.com/michaelasper/kir-ai/actions/workflows/release.yml\n[rust-shield]: https://img.shields.io/badge/rust-1.95-f5a97f?style=flat-square\u0026logo=rust\u0026logoColor=white\n[rust-url]: https://www.rust-lang.org/\n[metal-shield]: https://img.shields.io/badge/apple%20metal-native-c6a0f6?style=flat-square\u0026logo=apple\u0026logoColor=white\n[metal-url]: https://developer.apple.com/metal/\n[license-shield]: https://img.shields.io/badge/license-MIT-a6da95?style=flat-square\u0026logo=opensourceinitiative\u0026logoColor=white\n[license-url]: https://opensource.org/licenses/MIT\n[inference-shield]: https://img.shields.io/badge/local-inference-91d7e3?style=flat-square\n[docs-getting-started]: docs/getting-started.md\n[docs-setup]: docs/setup.md\n[docs-run-server]: docs/how-to-run-server.md\n[docs-models]: docs/how-to-manage-models.md\n[docs-cli]: docs/cli-reference.md\n[http-api-doc]: docs/http-api-reference.md\n[docs-config]: docs/configuration-reference.md\n[docs-architecture]: docs/architecture.md\n[docs-ci-release]: docs/ci-and-release.md\n[docs-dev]: docs/development.md\n[north-star]: rust-metal-inference-engine-north-star.md\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaelasper%2Fkir-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmichaelasper%2Fkir-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmichaelasper%2Fkir-ai/lists"}