{"id":51308790,"url":"https://github.com/redhillsmediafl/caix","last_synced_at":"2026-07-04T11:00:46.116Z","repository":{"id":367773986,"uuid":"1281935256","full_name":"RedHillsMediaFL/caix","owner":"RedHillsMediaFL","description":"caix — native Apple Core AI inference server for Apple silicon (beta): OpenAI/Anthropic API, dashboard, streaming chat with tools/skills/MCP, MTP speculative decoding.","archived":false,"fork":false,"pushed_at":"2026-07-01T02:12:04.000Z","size":1068,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-07-01T02:21:27.898Z","etag":null,"topics":["anthropic-api","apple-silicon","coreml","inference-server","llm","llm-inference","local-llm","macos","mlx","neural-engine","ollama-alternative","on-device-ai","openai-api","speculative-decoding","swift"],"latest_commit_sha":null,"homepage":"https://huggingface.co/redhillsmediafl","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RedHillsMediaFL.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"docs/ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-06-27T05:24:54.000Z","updated_at":"2026-07-01T01:58:06.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/RedHillsMediaFL/caix","commit_stats":null,"previous_names":["redhillsmediafl/caix"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/RedHillsMediaFL/caix","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RedHillsMediaFL%2Fcaix","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RedHillsMediaFL%2Fcaix/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RedHillsMediaFL%2Fcaix/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RedHillsMediaFL%2Fcaix/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RedHillsMediaFL","download_url":"https://codeload.github.com/RedHillsMediaFL/caix/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RedHillsMediaFL%2Fcaix/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35118971,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["anthropic-api","apple-silicon","coreml","inference-server","llm","llm-inference","local-llm","macos","mlx","neural-engine","ollama-alternative","on-device-ai","openai-api","speculative-decoding","swift"],"created_at":"2026-07-01T02:00:43.770Z","updated_at":"2026-07-04T11:00:46.090Z","avatar_url":"https://github.com/RedHillsMediaFL.png","language":"Swift","funding_links":[],"categories":[],"sub_categories":[],"readme":"# caix\n\n[![Release](https://img.shields.io/github/v/release/RedHillsMediaFL/caix?include_prereleases\u0026sort=semver\u0026color=E5484D)](https://github.com/RedHillsMediaFL/caix/releases)\n[![Apple silicon · Core AI](https://img.shields.io/badge/Apple%20silicon-Core%20AI-black?logo=apple\u0026logoColor=white)](https://github.com/RedHillsMediaFL/caix)\n[![Swift](https://img.shields.io/badge/Swift-F05138?logo=swift\u0026logoColor=white)](https://github.com/RedHillsMediaFL/caix)\n[![HF models](https://img.shields.io/badge/HF%20models-redhillsmediafl-ffcc4d)](https://huggingface.co/redhillsmediafl)\n\nLocal Core AI inference server for Apple silicon — private, on-device LLMs behind OpenAI- and\nAnthropic-compatible APIs. Free and open, no paywall.\n\n\u003e **Beta.** caix depends on Apple's beta **Core AI** runtime. Expect breakage across macOS/Xcode\n\u003e beta lines. Inference is local; downloads still use the network.\n\ncaix serves Core AI `.aimodel` language bundles on Apple silicon (Neural Engine + GPU). It exposes:\n\n- a **web dashboard** for machine stats, model management, and live usage,\n- a **chat** view with markdown, streaming, tools, skills, and MCP,\n- **OpenAI-compatible** (`/v1/chat/completions`) and **Anthropic-compatible** (`/v1/messages`) APIs.\n\nOpenAI and Anthropic clients can point at the local server.\n\ncaix supports classic speculative packages and EAGLE target+draft packages. Use the model-card flags\nfor each package.\n\nMultimodal status: caix parses OpenAI and Anthropic image/audio/video content blocks, but no verified\nruntime bundle consumes them yet. Multimodal requests return a clear 400 until that changes.\n\n---\n\n## Testing and support\n\ncaix is free and open. No paywall.\n\nBenchmark procedure is in [docs/BENCHMARKS.md](docs/BENCHMARKS.md) and [docs/TESTING.md](docs/TESTING.md).\n\nSupport conversions, benchmark time, hosting, and tester access:\n[redhillsmediafl.com/open-source](https://redhillsmediafl.com/open-source).\n\nSupport is optional and never gates features.\n\n---\n\n## Performance\n\nNo public speed table is current.\n\nBenchmark rules and raw-log requirements are in [docs/BENCHMARKS.md](docs/BENCHMARKS.md). Do not\npublish speed numbers without the raw run directory, exact model repo revision, caix commit,\nhardware, OS build, and exact command.\n\nPlanned work is tracked in [docs/ROADMAP.md](docs/ROADMAP.md).\nThe dry-run staged-execution planner surface is documented in [docs/CLUSTER.md](docs/CLUSTER.md).\n\n---\n\n## Requirements\n\n| | |\n|---|---|\n| **Mac** | Apple silicon (M1 or newer). Unified memory limits model size. A 26B 4-bit model needs ~20 GB free. |\n| **macOS** | macOS 27 beta, or another beta line that ships Apple's Core AI runtime. |\n| **Xcode** | Matching Core AI-capable Xcode beta with `swift` and `CoreAI.framework`. |\n| **Disk** | A few GB for the build + the size of each model (e.g. ~13 GB for the 26B 4-bit bundle). |\n| **Internet** | The build fetches Swift packages. Model downloads use the network. Inference is offline. |\n\ncaix links directly against Core AI. The OS, Xcode, and `apple/coreai-models` revision need to match\nthe active beta line. Stable toolchain support waits on stable Core AI.\n\nSwift Package Manager fetches the HTTP server, tokenizer, and other dependencies. No manual\nthird-party installs.\n\n---\n\n## Install\n\n### Option A — Homebrew\n\nHomebrew is the intended default install path after tap testing and Core AI stabilization.\n\nCurrent beta path:\n\n```bash\nbrew install redhillsmediafl/caix/caix\ncaix doctor\n```\n\nAfter macOS 27/Core AI are stable and the release formula passes audit/test:\n\n```bash\nbrew tap RedHillsMediaFL/caix\nbrew install caix\ncaix doctor\n```\n\nHomebrew/core comes later. See [Homebrew notes](docs/HOMEBREW.md).\nReleases stay below `1.0.0` while Core AI is beta; see [release notes](docs/RELEASES.md).\n\n### Option B — source install\n\nRequires the Core AI-capable macOS/Xcode beta from [Requirements](#requirements):\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/RedHillsMediaFL/caix/main/scripts/install.sh | bash\n```\n\nThis clones or updates `~/caix`, builds the Core AI runtime binary, and links the `caix` launcher into\n`~/.local/bin` when possible.\n\n### Option C — download a prebuilt binary\n\n1. Go to the [**Releases**](https://github.com/redhillsmediafl/caix/releases) page and download\n   `caix-\u003cversion\u003e-macos-arm64.tar.gz`.\n2. Extract and run:\n\n```bash\ntar -xzf caix-*-macos-arm64.tar.gz\ncd caix-*-macos-arm64\n./caix serve\n```\n\nThe first downloaded run may be blocked by macOS Gatekeeper. Allow it once:\n**System Settings → Privacy \u0026 Security → \"Open Anyway\"**, or:\n\n```bash\nxattr -dr com.apple.quarantine ./bin/caix\n```\n\n\u003e The prebuilt binary needs a Mac whose macOS includes the matching **Core AI** runtime (same beta\n\u003e line it was built against). If it won't launch (`CoreAI.framework` load error), use Option D.\n\n### Option D — build from source\n\n```bash\ngit clone https://github.com/redhillsmediafl/caix.git\ncd caix\nsudo xcode-select -s /Applications/Xcode-beta.app    # your Core AI–capable Xcode (once)\n./scripts/install.sh                                 # = COREAI_RUNTIME=1 swift build -c release\n```\n\nThe first build downloads Swift dependencies and compiles. It takes a few minutes.\n\n\u003e **Core AI dependency (beta).** Runtime builds track `apple/coreai-models` on `main` so caix follows\n\u003e Apple's active Core AI work. `Package.resolved` records the exact SHA used by each checkout; keep\n\u003e that SHA in benchmark and release evidence, and snapshot a known-good revision for a release train\n\u003e if upstream `main` regresses.\n\n---\n\n## Get a model\n\ncaix serves Core AI `.aimodel` bundles. Get one of these two ways:\n\n**A. Download a converted bundle.** The default install path is `~/.caix/models/exports`:\n\n```bash\n# list converted Qwen repos and exact install commands\ncaix catalog redhillsmediafl/qwen\n# prompt through families and models, then download:\ncaix catalog install\n# or install one repo directly:\ncaix catalog install redhillsmediafl/rhm-qwen2.5-0.5b-instruct-caix\n```\n\n`caix catalog install` wraps `hf download`, uses local Hugging Face credentials/cache when present,\nand runs a disk preflight first.\n\n### Model catalog\n\nConverted bundles are in the [redhillsmediafl Hugging Face org](https://huggingface.co/redhillsmediafl?search=caix)\nand in family collections:\n\n- [Qwen caix](https://huggingface.co/collections/redhillsmediafl/qwen-caix-6a3fd5f6d272c154dbfcda67)\n- [Gemma caix](https://huggingface.co/collections/redhillsmediafl/gemma-caix-6a3fd5f57b67589b85e6eac6)\n- [GLM caix](https://huggingface.co/collections/redhillsmediafl/glm-caix-6a40604dfb87315cc99a558e)\n- [Mistral caix](https://huggingface.co/collections/redhillsmediafl/mistral-caix-6a404e43a972c2f2621a000e)\n- [Ornith caix](https://huggingface.co/collections/redhillsmediafl/ornith-caix-6a3ff0de0d269f65f53ef064)\n- [Qwythos caix](https://huggingface.co/collections/redhillsmediafl/qwythos-caix-6a409e86fb87315cc9a2d69f)\n- [gpt-oss caix](https://huggingface.co/collections/redhillsmediafl/gpt-oss-caix-6a404e428791003d8e6e79fc)\n\nUse the catalog for current install commands:\n\n```bash\ncaix catalog redhillsmediafl\ncaix catalog redhillsmediafl/qwen\ncaix catalog install \u003crepo\u003e\n```\n\nModel cards list exact revisions, licenses, storage, RAM notes, and any speculative or EAGLE flags.\nCatalog entries are published only when they are backed by open-source/open-weight models and either\nhave runtime verification, or are converted artifacts that our current hardware cannot fully smoke.\nThose cards must say what passed and what still needs tester hardware. Blocked and component-only\npackages must be labeled that way.\n\nTo submit a model or tester result, see [docs/CATALOG_SUBMISSIONS.md](docs/CATALOG_SUBMISSIONS.md).\n\n**B. Convert a bundle.** See [Converting models](#converting-models-advanced).\n\nA bundle is a folder containing a `*.aimodel/` directory, `metadata.json`, and `tokenizer/`. Drop it\nunder `models/exports/` and caix will list it.\n\n---\n\n## Run\n\n```bash\ncaix serve\ncaix dashboard\ncaix chat\n```\n\nFrom a source checkout, `./caix serve` also works.\n\nOpen:\n\n- **http://localhost:1237** — the dashboard (machine stats, models, live usage)\n- **http://localhost:1237/chat** — the chat view (markdown, streaming, tools, skills, MCP)\n\n`caix serve` prewarms the smallest chat-suitable installed model before it starts listening. Use\n`--prewarm \u003cmodel|all|off\u003e` or `--no-prewarm` to override.\n\nFor OpenAI-compatible clients, use `http://localhost:1237/v1` with any API key:\n\n```bash\ncurl http://localhost:1237/v1/chat/completions -H 'content-type: application/json' -d '{\n  \"model\": \"\u003cyour-model-name\u003e\",\n  \"messages\": [{\"role\":\"user\",\"content\":\"Hello!\"}]\n}'\n```\n\n### OpenCode\n\nThe repo includes `opencode.json` with a local `caix` OpenAI-compatible provider pointed at\n`http://127.0.0.1:1237/v1` and known RHM/caix model IDs. Copy or symlink it into\n`~/.config/opencode/opencode.json`, start caix, then verify OpenCode can see the provider:\n\n```bash\nopencode models caix\n```\n\nOpenCode reads the static provider map; the live server list is the source of truth for installed\nbundles:\n\n```bash\ncurl http://127.0.0.1:1237/v1/models\n```\n\nWhen OpenCode sends an installed model ID to `/v1/chat/completions`, caix hot-loads the matching\nlocal bundle from the exports directory. If the ID is not installed yet, use the dashboard's RHM\ninstaller or `caix catalog install` to add the converted bundle first.\n\n### Other devices\n\ncaix binds to `127.0.0.1` by default. To reach it from other devices, put it behind\n[Tailscale](https://tailscale.com):\n\n```bash\ntailscale serve --bg http://127.0.0.1:1237\n```\n\nThis gives you a private HTTPS URL on your tailnet without opening public ports.\n\n---\n\n## What is included\n\n- Core AI inference on Neural Engine + GPU. This is not llama.cpp or MLX.\n- MTP/speculative decoding for supported pairs; caix auto-tunes the draft window up to the safe cap.\n- Dashboard: RAM/GPU stats, load/unload/convert/delete, per-model usage, rolling/lifetime tok/s.\n  Usage stats persist across restarts.\n- Chat: markdown, streaming, code blocks with copy, tables, and a reasoning panel when a model emits\n  `reasoning_content`. The chat page includes a redacted session log for debugging.\n- Terminal chat: `caix chat` / `caix tui` talks to a local caix server and includes shell-tool\n  controls (`ask`, `on`, `off`).\n- Skills: choose or write a system prompt.\n- Tools: built-in `calculator`, `clock`, and `fetch_url`. Use qwen-family models for tool-heavy\n  chats; the gemma MTP model is weak at tool calls.\n- MCP: connect a streamable-HTTP MCP server and use its tools in chat.\n- APIs: OpenAI (`/v1/chat/completions`, `/v1/models`) and Anthropic (`/v1/messages`), streaming.\n- UI model management: download and convert from a Hugging Face repo. New architectures are flagged\n  with the Core AI authoring work they need.\n\n---\n\n## Known limitations\n\n- **Beta toolchain required.** See [Requirements](#requirements).\n- **Cold loads are slow.** Core AI compiles the model before first use (~30-60 s for a 13 GB model).\n  `caix serve` prewarms by default; `--no-prewarm` accepts traffic immediately.\n- **Server fast path.** Standard language bundles use Apple's kept-hot CoreAILanguageModels engine\n  by default after `/api/load`. `COREAI_LEGACY_ENGINE=1` uses the older sequential path for\n  debugging. Hybrid qwen3_5/qwen3_5_moe bundles with packed recurrent state use the explicit Core\n  AI sequential engine unless `COREAI_FAST_HYBRID_ENGINE=1` is set; the current fast warmup cache\n  shape is too small for those fixed state prefixes.\n- **Background services and external volumes.** If you run caix from a `launchd` agent, Apple's\n  loader needs file-access permission and cannot read external/USB volumes without it. Keep the\n  binary and models on the internal disk, or run `caix serve` from a normal user session. Full Disk\n  Access on the binary fixes the `launchd` case. If exports stay on an external SSD, generate a\n  local model-list index and pass it via `caix_export_index`:\n\n```bash\nscripts/refresh-export-index.sh /Volumes/SSD/ai-dev/coreai-pipeline/exports \\\n  ~/coreai-server/export-index.json\n```\n\n- **Greedy tool-calling.** The MTP/speculative path is greedy, so lower-parameter models may not\n  reliably emit tool calls. Use a qwen-family model for tool-heavy chats.\n- **Authored architectures.** Conversion support exists for gemma3/gemma4,\n  qwen2/qwen3/qwen3_moe/qwen3_5/qwen3_5_moe, glm4, mistral, mixtral, and gpt_oss. New model types\n  are flagged with their required Core AI authoring steps in the UI and support logs.\n- **Ornith-1.0-35B lane.** `ornith-1.0-35b` is registered for local conversion. The authored int4\n  path exports a 17 GB bundle, but runtime warmup is blocked by an MPS reshape failure, so it is not\n  published to HF. The next fix is graph segmentation or a decode-shape workaround. The 397B variant\n  uses the same authored `qwen3_5_moe` path and needs the full 122-shard source download first.\n- **Diffusion models** are not in this beta.\n\n---\n\n## Converting models (advanced)\n\nConversion wraps Apple's `coreai.llm.export` and needs Apple's `coreai-models` Python checkout and\n[`uv`](https://github.com/astral-sh/uv):\n\n```bash\n# point caix at your coreai-models python dir and keep conversion IO on the SSD:\nexport caix_coreai_models=/path/to/coreai-models/python\nexport HF_HOME=${HF_HOME:-/Volumes/SSD/hf-cache}\nexport caix_tmpdir=${caix_tmpdir:-/Volumes/SSD/coreai-tmp}\nexport caix_exports=${caix_exports:-$PWD/models/exports}\nscripts/check-disk-pressure.sh --path /Volumes/SSD --floor-gib 500\n\n# check support, then convert:\npython3 python/converter/convert.py --check --hf-id Qwen/Qwen2-0.5B\npython3 python/converter/convert.py --hf-id Qwen/Qwen2-0.5B --compression 4bit --compute-precision float16\n```\n\nRun only one heavy Core AI export, HF transfer, CLI verification, build, or benchmark at a time.\nFor local queues, gate each job with:\n\n```bash\nscripts/conversion-guard.sh --wait\n```\n\nDashboard path: paste a HF repo in **Add model → HuggingFace → Core AI**, pick\ncompression/precision, and click Convert. New architectures are flagged with the required authoring\nwork.\n\n### GGUF repos (llama.cpp models)\n\ncaix can convert GGUF repos. If a repo ships only `.gguf` files, caix dequantizes the GGUF back to\nan HF checkpoint, then runs the normal export.\n\n```bash\n# repo: caix picks the least-compressed quant present (F16 \u003e Q8_0 \u003e Q6_K \u003e ...)\npython3 python/converter/convert.py --gguf unsloth/Qwen3-0.6B-GGUF --name qwen3-0.6b-coreai\n# or pin a specific file:\npython3 python/converter/convert.py --gguf unsloth/Qwen3-0.6B-GGUF --gguf-file Qwen3-0.6B-BF16.gguf\n```\n\nDashboard path: paste a GGUF-only repo. It is routed through the dequant step; the support check\nconfirms the architecture after dequant.\n\n\u003e **Quality caveat.** A GGUF is already quantized. Dequantizing and re-exporting, often to 4-bit, is\n\u003e quant-on-quant and loses quality versus converting the original safetensors. Prefer the original\n\u003e release when it exists. GGUF does not add architecture support.\n\u003e Needs `gguf\u003e=0.10.0` (added automatically for the dequant run). Note: `gemma4` GGUFs are **not**\n\u003e convertible — `transformers` has no GGUF dequantizer for that architecture yet.\n\n---\n\n## Paths\n\n| | |\n|---|---|\n| Models | `~/.caix/models/exports/\u003cname\u003e/` (override with `caix_exports` or `--exports`) |\n| HF cache | `$HF_HOME`; otherwise Hugging Face uses its local default |\n| Converter tmp | `$caix_tmpdir`; converter default is `/Volumes/SSD/coreai-tmp` |\n| Export index | optional JSON from `scripts/refresh-export-index.sh` (set `caix_export_index`) |\n| Web UI | `web/` (served at `/` and `/chat`; restart `caix serve` after editing these files) |\n| Usage stats | `~/.caix/usage.json` (override with `--stats-file`) — survives restarts |\n| Converter | `python/converter/` |\n\n---\n\n## Troubleshooting\n\n- **`swift: command not found`** → install Xcode (beta) and `sudo xcode-select -s /Applications/Xcode-beta.app`.\n- **Build fails fetching `coreai-models`** → that package requires your Core AI–capable Xcode; confirm\n  `xcode-select -p` points at it.\n- **`/v1` returns 503** → the binary was built **without** the runtime. Rebuild with\n  `COREAI_RUNTIME=1 swift build -c release` (the `./caix` launcher and `install.sh` do this for you).\n- **No models listed** → put a bundle under `models/exports/` (a folder with a `*.aimodel/` inside).\n- **First message hangs ~1 min** → expected on cold load; watch the spinner.\n\n---\n\n## License\n\nSee [LICENSE](LICENSE). caix bundles `marked` and `DOMPurify` (their licenses in `web/assets/`).\n\n---\n\n*caix is beta and unaffiliated with Apple. \"Core AI\" is Apple's runtime.*\n\n\u003csub\u003eMore open-source work: [redhillsmediafl.com/open-source](https://redhillsmediafl.com/open-source).\u003c/sub\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredhillsmediafl%2Fcaix","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fredhillsmediafl%2Fcaix","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredhillsmediafl%2Fcaix/lists"}