https://github.com/hankthebldr/local-ai-platform

Self-hosted LLM infrastructure with OpenAI-compatible API. CPU-optimized. Source-available.
https://github.com/hankthebldr/local-ai-platform
ai-inference cpu-inference enclave llm local-ai macos ollama openai-compatible privacy self-hosted
Last synced: about 1 month ago
JSON representation
Self-hosted LLM infrastructure with OpenAI-compatible API. CPU-optimized. Source-available.
Host: GitHub
URL: https://github.com/hankthebldr/local-ai-platform
Owner: hankthebldr
Created: 2025-12-08T02:11:36.000Z (7 months ago)
Default Branch: master
Last Pushed: 2026-05-27T16:50:09.000Z (about 1 month ago)
Last Synced: 2026-05-27T17:13:32.657Z (about 1 month ago)
Topics: ai-inference, cpu-inference, enclave, llm, local-ai, macos, ollama, openai-compatible, privacy, self-hosted
Language: Python
Homepage: https://hankthebldr.github.io/local-ai-platform/
Size: 30.1 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project

README

          


  



Enclave




  Self-hosted LLM infrastructure with OpenAI-compatible API. CPU-optimized. Source-available.





  

  

  

  

  

  

  





  Product page ·

  Wiki ·

  Latest release ·

  Changelog



---

Enclave runs LLMs on your hardware. OpenAI-compatible API, Ollama backend, zero cloud dependencies.

> **What's new** — Architecture-aware orchestration (Phases 1–6): per-host detection of memory + deployment topology, four-tier `keep_alive` resolver with arch-detected defaults, scheduler facade with feasibility validation, and tick-based **parallel DAG dispatch** that uses the arch to decide what to run concurrently. Plus: installable Python wheel + sdist, mirrored Docker image on GHCR, Linux source tarball with SHA256/SHA512, n8n release-update workflow, and a curated Wiki seed. See the [CHANGELOG](CHANGELOG.md) for the full PR-by-PR detail.

## What it does

- **OpenAI-compatible API** — drop-in replacement. Point your existing code at `localhost:8000`

- **CPU-optimized inference** — GGUF quantized models via Ollama. 7B at 40-50 tok/s, 13B at 25-30 tok/s

- **Model management** — download, configure, and switch between 18+ models from the registry

- **Multi-agent workflows** — YAML-defined step pipelines with role-based model selection

- **Web dashboard** — monitor models, system health, and API status

- **macOS app** — native desktop wrapper with setup wizard

- **No telemetry by default** — no data leaves your machine unless you opt in; optional, operator-owned error reporting (your own sink, redaction mandatory — see [docs/deployment/error-reporting.md](docs/deployment/error-reporting.md)). No internet required for inference

## Quick start

Three paths — pick one:

### macOS app (DMG) — for end users

1. Download **Enclave.dmg** from the [latest release](https://github.com/hankthebldr/local-ai-platform/releases/latest)

   *(Or grab the rolling [nightly build](https://github.com/hankthebldr/local-ai-platform/releases/tag/nightly) for the freshest master.)*

2. Open the DMG and drag **Enclave.app** to `/Applications`.

3. First launch: macOS Gatekeeper will warn — the app is currently **not signed/notarized**. Bypass once with:

   ```bash

   xattr -dr com.apple.quarantine /Applications/Enclave.app

   ```

   Then double-click **Enclave** in Launchpad.

4. The native window opens the **first-run setup wizard** (`/setup`) which installs Ollama if needed and pulls a starter model. After that you land on the dashboard.

> **Requirements:** macOS 12.0 (Monterey) or later. ~6 GB free disk for the bundled runtime + a small starter model. Ollama is installed automatically by the wizard if missing.

### Docker — any platform with Docker Desktop

For non-developers on Linux / Windows, or anyone who wants Enclave fully isolated in containers. No Python, no virtualenv, no manual Ollama install.

1. Install [Docker Desktop](https://www.docker.com/products/docker-desktop/) (or Docker Engine on Linux) and make sure the whale icon is running.

2. Clone or download this repo, open a terminal in the project folder, and run:

   ```bash

   ./run.sh

   ```

3. The script verifies Docker, brings up the stack (`ollama` + `api`), pulls a small starter model on first run (`llama3.2:3b`, ~2 GB), and opens the dashboard in your browser.

| | URL |

|---|---|

| **Enclave SPA** (the application) | `http://localhost:8000` |

| API docs                          | `http://localhost:8000/docs` |

| Open WebUI (opt-in)               | `http://localhost:8081` — `docker compose -f docker-compose.yml -f docker-compose.webui.yml up -d` |

To stop: `./stop.sh` (data preserved) — or `./stop.sh --reset` to wipe models and chat history.

> **Requirements:** ~4 GB free RAM and ~3 GB free disk for the starter model. Pick a different starter with `ENCLAVE_DEFAULT_MODEL=qwen2.5:3b ./run.sh`.

Prefer to pull the published image directly? (Substitute `` with the latest tag.)

```bash

# Docker Hub — canonical

docker pull hankthebldrr/local-ai-platfrom:

# GHCR mirror — same digest, no Hub account required

docker pull ghcr.io/hankthebldr/enclave:

```

### pip install — embed in an existing Python app

For developers who want to use the Enclave engine inside another Python service. Bundles the FastAPI app, workflow engine, RAG pipeline, and CLI dispatcher.

```bash

# From a GitHub Release asset (no PyPI required)

pip install https://github.com/hankthebldr/local-ai-platform/releases/download/v/enclave--py3-none-any.whl

# Then run the API server with the same uvicorn settings the DMG uses:

enclave-api                 # starts FastAPI on 127.0.0.1:8000

enclave --help              # CLI dispatcher (chat, workflow, query, api)

```

You still need an Ollama runtime reachable at `OLLAMA_URL` (defaults to `http://localhost:11434`). The Python package does **not** install Ollama for you — see the [Wiki › Deployment](https://github.com/hankthebldr/local-ai-platform/wiki/Deployment) page for production setups.

### From source — for developers

```bash

# Install (creates ./venv, installs core+dev deps, sets up systemd unit on Linux)

./setup/install.sh

# Boot Ollama + API + auto-open the dashboard in your browser

./scripts/start.sh

# Or, on macOS, exercise the same native pywebview window the DMG ships

./scripts/start_desktop.sh

# Verify everything boots and every UX route renders

./scripts/verify_local.sh

```

API at `http://localhost:8000` · Dashboard at `http://localhost:8000/` · Docs at `http://localhost:8000/docs` · First-run wizard at `http://localhost:8000/setup`.

## Models

```bash

# List available models

python models/download.py --list

# Download a model

python models/download.py dolphin-mixtral

# List installed

ollama list

```

Default quantization: Q4_K_M (best quality/speed balance). See [MODELS.md](MODELS.md) for the full registry.

## API usage

```bash

curl http://localhost:8000/v1/chat/completions \

  -H "Content-Type: application/json" \

  -d '{

    "model": "mistral",

    "messages": [{"role": "user", "content": "Hello"}]

  }'

```

Compatible with any OpenAI SDK client.

## Code-level artifacts

What ships in this repo, and where to find it:

| Surface | Path | Notes |

|---|---|---|

| FastAPI server (OpenAI-compatible) | [api/main.py](api/main.py) | 16 routers under `api/routers/`, services under `api/services/` |

| Web dashboard + setup wizard | [api/static/](api/static/) | Served at `/` and `/setup` by the FastAPI app |

| CLI chat / query / workflow | [cli/](cli/) | Rich-formatted; `python -m cli.chat`, `cli/workflow.py` |

| Multi-agent workflow engine | [api/services/workflow_engine.py](api/services/workflow_engine.py) | YAML pipelines under [workflows/](workflows/) |

| Custom agents (Gems) | [agents/](agents/) + [api/routers/agents.py](api/routers/agents.py) | YAML-defined personas with pinned context |

| Model registry | [models/download.py](models/download.py) | 18+ models — see [MODELS.md](MODELS.md) |

| macOS desktop wrapper | [desktop/app.py](desktop/app.py) | pywebview window around the FastAPI server |

| DMG builder | [scripts/build_mac.sh](scripts/build_mac.sh) | Bundles a self-contained `.app` + dmg |

| Local dev scripts | [scripts/](scripts/) | `start.sh`, `start_desktop.sh`, `verify_local.sh`, `status.sh`, `test.sh` |

### Build the DMG yourself

The same script CI uses on tag pushes:

```bash

brew install librsvg create-dmg     # one-time

./scripts/generate-icons.sh         # regenerate icns from SVG

./scripts/build_mac.sh              # produces dist/Enclave.app + dist/Enclave.dmg

open dist/Enclave.app               # smoke-test the bundle

```

The build script reads `ENCLAVE_VERSION` (or falls back to `git describe`) and stamps it into `Info.plist`. Override for a one-off custom build:

```bash

ENCLAVE_VERSION=v1.2.3-local ./scripts/build_mac.sh

```

### Release pipeline

| Trigger | Workflow | Artifact |

|---|---|---|

| Tag push `v*.*.*` | [release.yml](.github/workflows/release.yml) | Stable GitHub Release with signed DMG |

| Push to `master` | [release.yml](.github/workflows/release.yml) | Rolling `nightly` pre-release (replaced each merge) |

| PR / push to `master` | [ci.yml](.github/workflows/ci.yml) | pytest + lint + macOS `.app` smoke build (boots and probes UX routes) |

| Tag push or release publish | [pages.yml](.github/workflows/pages.yml) | Updates [hankthebldr.github.io/local-ai-platform](https://hankthebldr.github.io/local-ai-platform/) with the latest release version |

Every master merge re-publishes a freshly smoke-tested DMG to the [`nightly`](https://github.com/hankthebldr/local-ai-platform/releases/tag/nightly) release. Stable releases are cut by pushing a `vX.Y.Z` tag.

## Hardware targets

| Machine | RAM | Role | Throughput |

|---------|-----|------|------------|

| Mac M4 Pro | 48GB | Development | 7B @ 50 tok/s |

| MS-01 (Ryzen 9 7945HX) | 64GB | API serving | 34B @ 12 tok/s |

| BD790i (Ryzen 9 7945HX) | 96GB | Research / 70B-class workflows | 70B @ 5 tok/s |

The BD790i is the only host in the fleet that can exercise the full

1.3.0 MCP & Skills co-scheduler against 70B-class models + multi-GB

MCP RSS simultaneously. Bring-up + benchmark recipes:

[docs/deployment/bd790i-testing.md](docs/deployment/bd790i-testing.md).

## Documentation

The canonical operator-facing docs live on the [GitHub Wiki](https://github.com/hankthebldr/local-ai-platform/wiki) (sourced from [docs/wiki/](docs/wiki/) on every tag). Highlights:

- [Quickstart](https://github.com/hankthebldr/local-ai-platform/wiki/Quickstart) — first 60 seconds

- [Architecture](https://github.com/hankthebldr/local-ai-platform/wiki/Architecture) — request flow, services, workflow engine, arch-aware dispatch

- [Workflows](https://github.com/hankthebldr/local-ai-platform/wiki/Workflows) — authoring YAML pipelines + composite step kinds

- [Agents](https://github.com/hankthebldr/local-ai-platform/wiki/Agents) — Gems-style YAML personas

- [Models](https://github.com/hankthebldr/local-ai-platform/wiki/Models) — registry, quantization, throughput

- [Deployment](https://github.com/hankthebldr/local-ai-platform/wiki/Deployment) — DMG · Docker · pip · source · systemd

- [Configuration](https://github.com/hankthebldr/local-ai-platform/wiki/Configuration) — env vars, auth, CORS, perf knobs

- [Troubleshooting](https://github.com/hankthebldr/local-ai-platform/wiki/Troubleshooting) — common failure modes

- [Release notes](https://github.com/hankthebldr/local-ai-platform/wiki/Release-Notes)

Source-of-truth references inside the repo:

- [MODELS.md](MODELS.md) — model registry and selection

- [CLAUDE.md](CLAUDE.md) — developer guide

- [CHANGELOG.md](CHANGELOG.md) — every release, every PR

- [docs/](docs/) — design docs, plans, deployment guides

- Product page: [hankthebldr.github.io/local-ai-platform](https://hankthebldr.github.io/local-ai-platform/)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hankthebldr/local-ai-platform

Awesome Lists containing this project

README

Enclave