An open API service indexing awesome lists of open source software.

https://github.com/stampby/halo-ai-core

Bare-metal AI platform for AMD Strix Halo. One script. Everything works. Lego blocks β€” snap in what you need.
https://github.com/stampby/halo-ai-core

agent-framework ai amd arch-linux bare-metal caddy gaia gpu inference lemonade llama-cpp local-ai privacy rocm ryzen-ai self-hosted strix-halo systemd

Last synced: about 2 months ago
JSON representation

Bare-metal AI platform for AMD Strix Halo. One script. Everything works. Lego blocks β€” snap in what you need.

Awesome Lists containing this project

README

          

🌐 **English** | [FranΓ§ais](README.fr.md) | [EspaΓ±ol](README.es.md) | [Deutsch](README.de.md) | [PortuguΓͺs](README.pt.md) | [ζ—₯本θͺž](README.ja.md) | [δΈ­ζ–‡](README.zh.md) | [ν•œκ΅­μ–΄](README.ko.md) | [Русский](README.ru.md) | [ΰ€Ήΰ€Ώΰ€¨ΰ₯ΰ€¦ΰ₯€](README.hi.md) | [Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ©](README.ar.md)

# halo-ai core

### the 1-bit monster β€” local ai inference, bare metal, no python at runtime

**rocm c++ Β· ternary weights (.h1b) Β· fused HIP kernels Β· wave32 wmma Β· 17 c++ specialists Β· zero telemetry Β· zero cloud**

*stamped by the architect*

[![License](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![ROCm](https://img.shields.io/badge/ROCm_7.13-ED1C24?style=flat&logo=amd&logoColor=white)](https://github.com/ROCm/TheRock)
[![rocm-cpp](https://img.shields.io/badge/rocm--cpp-inference_engine-00d4ff?style=flat)](https://github.com/stampby/rocm-cpp)
[![agent-cpp](https://img.shields.io/badge/agent--cpp-17_specialists-00d4ff?style=flat)](https://github.com/stampby/agent-cpp)
[![halo-1bit](https://img.shields.io/badge/halo--1bit-.h1b_format-00d4ff?style=flat)](https://github.com/stampby/halo-1bit)
[![Discord](https://img.shields.io/badge/Discord-halo--ai-5865F2?style=flat&logo=discord&logoColor=white)](https://discord.gg/dSyV646eBs)
[![Reddit](https://img.shields.io/badge/Reddit-r/MidlifeCrisisAI-FF4500?style=flat&logo=reddit&logoColor=white)](https://www.reddit.com/r/MidlifeCrisisAI/)
[![Self Hosted](https://img.shields.io/badge/Self_Hosted-100%25_Local-purple?style=flat)](https://github.com/stampby/halo-ai-core)

---

## what is this

halo-ai core is the **install script for the 1-bit monster** β€” a full local AI stack that runs entirely in C++ on AMD Strix Halo hardware. no python at runtime. no cloud. no telemetry. no subscriptions.

one script, three engineering repos:

| repo | what it is |
|------|-----------|
| [**rocm-cpp**](https://github.com/stampby/rocm-cpp) | the inference engine. pure HIP, fused ternary kernels, OpenAI-compatible server with SSE streaming. |
| [**agent-cpp**](https://github.com/stampby/agent-cpp) | the agent framework. 17 single-purpose specialists on a message bus, hash-chained audit log, consent-verification gate. |
| [**halo-1bit**](https://github.com/stampby/halo-1bit) | the model format (.h1b) + training pipeline. absmean ternary, QAT with straight-through estimator, distillation from bf16 teachers. |

halo-ai core clones them, builds them from source, wires them into systemd, and points a caddy reverse proxy at the result. one command, you get a running LLM, a voice loop, a discord bot, a CI runner, and an audit trail. everything local.

*"I know kung fu."*

## install

two paths. the script auto-detects your GPU and picks the right one.

```bash
git clone https://github.com/stampby/halo-ai-core.git
cd halo-ai-core
./install.sh # auto-dispatch: strixhalo β†’ fast; else β†’ source
```

| path | who it's for | time | what it does |
|------|--------|------|------|
| [`./install-strixhalo.sh`](install-strixhalo.sh) | **gfx1151** (Strix Halo) | ~5 min | downloads pre-built binaries from GH Releases, verifies SHA256 + GPG, wires systemd |
| [`./install-source.sh`](install-source.sh) | any other AMD GPU | ~4 hrs | builds TheRock + rocm-cpp + agent-cpp + halo-1bit from source for your arch |

why two scripts: every Strix Halo is the same silicon (gfx1151, wave32, 128 GB unified). one build produces a binary that runs bit-identically on every such box β€” no reason to rebuild from source every time. for anything else (gfx1030, gfx1100, gfx1201, CDNA), the wave32 WMMA kernels don't port 1:1, so source build with arch-specific codegen is the safe option.

**running something other than a strix halo and want the kernels built for your GPU?** see [`release/KERNELS.md`](release/KERNELS.md) for arch coverage, how to build your own, and how to share community builds back.

[![Install Demo](https://img.shields.io/badge/asciinema-watch_install_demo-d40000?style=flat&logo=asciinema&logoColor=white)](docs/install-rocmpp.cast)

## the stack

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ agent-cpp β€” 17 C++ specialists β”‚
β”‚ muse Β· planner Β· forge Β· warden (CVG) Β· scribe β”‚
β”‚ sommelier Β· herald Β· sentinel Β· carpenter Β· anvil β”‚
β”‚ quartermaster Β· magistrate Β· librarian Β· cartograph β”‚
β”‚ echo_ear Β· echo_mouth Β· stdout_sink β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ rocm-cpp server (:8080) β€” OpenAI-compat, SSE streaming β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ librocm_cpp β€” HIP kernels Β· WMMA wave32 Β· KV cache β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ ternary model (.h1b v2) Β· halo-1bit tokenizer (.htok) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ whisper-server (STT) Β· kokoro (TTS) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ ROCm 7.13.0 Β· gfx1151 wave32 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Arch Linux Β· systemd Β· btrfs β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

> *every layer is someone else's lego block if they want it. take the whole monster or take one piece.*

## numbers that matter

| metric | value | note |
|---|---|---|
| **decode speed** | 85 tok/s | BitNet-b1.58-2B, greedy, Strix Halo |
| **model size** | 1.1 GiB | TQ1_0 format, 4Γ— smaller than F16 |
| **KLD vs F16** | 0.0023 | mean bits/token β€” indistinguishable in practice |
| **top-1 agreement** | 96.3% | vs F16 reference, same argmax token |
| **agent binary** | 1.3 MB | agent_cpp, statically linked |
| **cold start** | < 2s | bitnet_decode --server |
| **runtime deps** | 0 python | libc, pthreads, httplib, nlohmann-json, OpenSSL |

details and methodology: [docs/benchmark-comparison.md](docs/benchmark-comparison.md) Β· [docs/replicate.md](docs/replicate.md)

## what you get

### the engine
- **bitnet_decode** β€” OpenAI-compatible HTTP server on :8080 with SSE streaming. chat completions, models list, bearer auth optional.
- **`.h1b` loader** β€” ternary weights, magic `H1B`, 9 int32 config + 2 float32 params. memory-mapped, zero-copy.
- **HIP kernels** β€” fused ternary MatMul, RMSNorm, SiLU, RoPE. wave32 WMMA. no CK, no hipBLAS at runtime.

### the agents
- **17 specialists** β€” each one job, each one thread. message bus with tamper-evident journal.
- **consent-verification gate** β€” warden enforces policy/intent/consent/bounds. structural, not advisory.
- **hash-chained audit log** β€” every inbound and outbound message SHA-256 chained, genesis-seeded per session.
- **optional plugins** β€” Discord read (sentinel) + write (herald), GitHub triage/PR-review/docs (quartermaster/magistrate/librarian), CI runner (anvil), install-help (carpenter).

### the model + training
- **halo-1bit** β€” absmean quantization, QAT with STE, distillation from Qwen3-32B bf16 teacher.
- **.h1b v2 format** β€” production artifacts shipped with each release.

## lego blocks

pick what you want. drop the rest.

| block | what it does | status |
|-------|------|--------|
| **bitnet_decode** | inference server | required |
| **agent_cpp** | agent framework | required |
| **agent_cpp β†’ sentinel+herald** | Discord bot | optional (set DISCORD_TOKEN) |
| **agent_cpp β†’ echo_ear+echo_mouth** | voice loop | optional (whisper + kokoro services) |
| **agent_cpp β†’ quartermaster/magistrate/librarian** | GitHub automation | optional (set GH_TOKEN) |
| **agent_cpp β†’ anvil** | CI runner | optional |
| **caddy** | reverse proxy + bearer auth | optional |
| **man-cave TUI** | FTXUI dashboard over SSH | optional (v2) |
| **orchestrator** | systemd unit wiring | included |

## philosophy

> every piece snaps in and snaps out. no hard dependencies. no vendor lock-in. no cloud tethers.

python shipped the LLM era. C++ owns the next one. python at training time is fine; python at runtime is a liability on hardware you own. **halo-ai core has zero python at runtime.**

the AI industry wants you renting someone else's computer. we think you should own the whole stack β€” the hardware, the models, the weights, the pipeline. when you control your own software, you control your own destiny.

*"they get the kingdom. they forge their own keys."*

## privacy

**zero telemetry. zero tracking. zero data collection.** nothing phones home. your data stays on your machine.

paid API providers (OpenAI, Anthropic, Groq, DeepSeek, xAI, OpenRouter) are supported through sommelier with your own keys β€” but that's your choice, not our default. local-first means local-first.

*"there is no cloud. there is only zuul."*

## docs

| doc | what it covers |
|-----|---|
| [docs/INTEGRATIONS.md](docs/INTEGRATIONS.md) | **point your apps at the stack β€” openai sdk, curl, python, node, c++, webui, mobile** |
| [docs/benchmark-comparison.md](docs/benchmark-comparison.md) | reproducible numbers vs llama.cpp / vLLM / MLX |
| [docs/replicate.md](docs/replicate.md) | step-by-step: build the monster on your box |
| [docs/mlx-setup-guide.md](docs/mlx-setup-guide.md) | the MLX path (comparison / optional) |
| [orchestrator/README.md](orchestrator/README.md) | systemd unit wiring |
| [prototypes/](prototypes/) | next-rung experiments (ternary dequant, etc.) |
| [docs/archive/](docs/archive/) | legacy wiki + pre-monster docs |

## options

```
./install.sh --dry-run preview without installing
./install.sh --yes-all install everything
./install.sh --status check what's running
./install.sh --skip- skip any optional block
./install.sh --help all options
```

## requirements

- Arch Linux (bare metal preferred; podman works for headless)
- AMD Ryzen AI hardware β€” Strix Halo (gfx1151) or Strix Point (gfx1150)
- passwordless sudo
- ~20 GiB free disk (build artifacts, kernels, models)

## credits

this project stands on the shoulders of the people who ship open source.

built on [llama.cpp](https://github.com/ggml-org/llama.cpp) (for eval tooling), [TheRock](https://github.com/ROCm/TheRock) (ROCm distribution), [httplib](https://github.com/yhirose/cpp-httplib), [nlohmann/json](https://github.com/nlohmann/json), [usearch](https://github.com/unum-cloud/usearch), [FTXUI](https://github.com/ArthurSonzogni/FTXUI), [whisper.cpp](https://github.com/ggerganov/whisper.cpp), [Kokoro TTS](https://github.com/remsky/Kokoro-FastAPI), [microsoft/bitnet-b1.58-2B-4T](https://huggingface.co/microsoft/bitnet-b1.58-2B-4T) (reference model).

special thanks to the `r/MidlifeCrisisAI` community for hard questions, especially the ones about PPL and KLD.

---

*"the 1-bit monster is already here. it just had to learn to count."* β€” **stamped by the architect**

MIT