https://github.com/BingoWon/orchardgrid-apple
OpenAI-Compatible API Server for Apple Intelligence - Structured Output, Streaming, Multi-turn Conversations
https://github.com/BingoWon/orchardgrid-apple
api-server apple-intelligence foundation-models json-schema macos openai-api structured-output swift swiftui
Last synced: 2 months ago
JSON representation
OpenAI-Compatible API Server for Apple Intelligence - Structured Output, Streaming, Multi-turn Conversations
- Host: GitHub
- URL: https://github.com/BingoWon/orchardgrid-apple
- Owner: BingoWon
- License: mit
- Created: 2025-10-01T17:31:50.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-10-02T23:38:31.000Z (9 months ago)
- Last Synced: 2025-10-03T01:24:48.755Z (9 months ago)
- Topics: api-server, apple-intelligence, foundation-models, json-schema, macos, openai-api, structured-output, swift, swiftui
- Language: Swift
- Size: 43.9 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Apple Intelligence — from every device, for every app.
A distributed compute pool of Apple devices, six on-device capabilities,
one OpenAI-compatible API. Menu-bar app, CLI, or HTTP — your choice.
App Store ·
Website ·
API Docs ·
Dashboard
---
Apple Intelligence runs **only on Apple's Neural Engine** — it cannot be shipped to traditional cloud GPUs. OrchardGrid turns that constraint into a feature: install the app on any Mac, iPhone, or iPad; your device becomes a node in a **programmable pool of on-device AI**, reachable over your LAN or through OrchardGrid's cloud relay. Every byte of inference happens on the devices you own.
## Screenshots
## ✨ Why OrchardGrid?
On-device AI projects usually pick one trade-off. OrchardGrid picks none of them.
| | Cloud LLM APIs | [Ollama](https://ollama.com) | [apfel](https://github.com/Arthur-Ficial/apfel) | **OrchardGrid** |
|---|:---:|:---:|:---:|:---:|
| On-device inference | ❌ | ✅ | ✅ | ✅ |
| Uses Apple's Neural Engine / foundation model | ❌ | ❌ | ✅ | ✅ |
| Capabilities beyond chat (image · vision · speech · sound · NLP) | varies | ❌ | ❌ | **✅ all six** |
| iOS and iPadOS | browser only | ❌ | ❌ | ✅ |
| Menu-bar app (not just a CLI) | — | — | — | ✅ |
| Pool many devices as one API | ❌ | ❌ | ❌ | ✅ |
| Reachable from anywhere (your phone, CI, a teammate) | ✅ | localhost only | localhost only | ✅ |
| OpenAI-compatible `/v1/*` | ✅ | ✅ | ✅ | ✅ |
| MCP tool calling | varies | ❌ | ✅ | ✅ |
| Free | ❌ | ✅ | ✅ | ✅ |
**The short version.** If you have one Mac and want a single-binary CLI for Apple Intelligence, `apfel` is excellent — use it. If you want the same model **also from your iPhone**, **also via Image / Vision / Speech / Sound / NLP**, **also sharable with your team as a unified API**, **also packaged as a menu-bar app your family members can install from the App Store** — that's OrchardGrid.
## 🚀 Quick Start
### 1 · Install
**Homebrew (macOS)** — one command, installs the app **and** the `og` CLI:
```bash
brew install --cask bingowon/orchardgrid/orchardgrid
```
The cask symlinks `/opt/homebrew/bin/og` into `OrchardGrid.app/Contents/Resources/og`. App and CLI share runtime state via the macOS App Group `group.com.orchardgrid.shared`.
**App Store (iOS · iPadOS · macOS)**:
**From source (macOS, development)** — clone, open `orchardgrid-app.xcodeproj`, build. Requires Xcode 26+.
### 2 · First prompt
```bash
og "What is the capital of Austria?"
```
That's it — the prompt runs **in-process** against `SystemLanguageModel.default`, with no local HTTP hop and no cloud round-trip.
### 3 · Share or consume — pick any combination
| You want to… | Do this |
|---|---|
| Use Apple Intelligence from the shell | `og "..."` (or `og --chat`) |
| Call it from another app via OpenAI SDK | Enable **Share Locally** → hit `http://.local:8888/v1/chat/completions` |
| Reach your Mac's AI from your iPhone / CI / a laptop | Enable **Share to Cloud** → `https://orchardgrid.com/v1/chat/completions` with your API key |
| Do image / vision / speech / sound / NLP | Hit the corresponding `/v1/*` endpoint (see [Capabilities](#-capabilities)) |
## ⌨️ The `og` CLI
`og` is a thin, opinionated shell wrapper around Apple Intelligence. By default it runs on-device in the same process. Pass `--host` to talk to a peer or to OrchardGrid cloud instead.
### Inference
```bash
og "prompt" # single-shot, streamed to stdout
og --chat # interactive REPL, Ctrl-C to quit
og --model-info # model availability, source, context size
og -s "You are a pirate." "explain TCP" # system prompt
og --system-file persona.txt "..." # system prompt from file
og --permissive "creative writing..." # relax safety guardrails
```
### Files and stdin
```bash
og -f README.md "summarise this" # attach a file
og -f old.swift -f new.swift "what changed?" # multiple files
git diff HEAD~1 | og "review this diff" # pipe stdin
cat notes.txt | og -f extra.md "merge and distil" # stdin + files mixed
```
### Output
```bash
og -o json "one word capital of France" | jq .content
og --quiet "one word capital of France" # no chrome, just the answer
og --no-color "..." # disable ANSI colour
```
### Generation options
```bash
og --temperature 0.2 "..." # deterministic-ish sampling
og --max-tokens 100 "..." # cap completion length
og --seed 42 "..." # reproducible runs
```
### Context strategy — five ways to trim long conversations
```bash
og --chat --context-strategy newest-first # default: keep the most recent turns
og --chat --context-strategy oldest-first # keep the earliest turns
og --chat --context-strategy sliding-window --context-max-turns 20
og --chat --context-strategy summarize # compress old turns via a side model call
og --chat --context-strategy strict # refuse to trim — throw on overflow
```
### Tool calling via MCP (Model Context Protocol)
Attach any stdio MCP server; `og` discovers its tools, registers them natively with `LanguageModelSession`, and Apple Intelligence decides when to call them.
```bash
og --mcp ./server.py "what is 41 + 1?" # on-device tool call
og --mcp ./a.py --mcp ./b.py --chat # multiple servers
og --mcp-timeout 30 --mcp ./slow.py "..." # per-call timeout
og mcp list ./server.py # introspect a server's tools
og mcp list ./server.py -o json # same, as JSON
```
MCP requires on-device inference — combining `--mcp` with `--host` is rejected at parse time.
### Benchmark
```bash
og benchmark # 5 runs against the local model
og benchmark --runs 20 --bench-prompt "Tell me a joke"
og benchmark --host http://mac.local:8888 # benchmark a peer
og benchmark -o json --quiet | jq .tokensPerSec # scriptable
```
Reports **min / median / p95 / max / mean** for time-to-first-token, total latency, tokens/sec, and output tokens. Respects `--temperature` and `--max-tokens`.
### Cloud account (after `og login`)
```bash
og login # OAuth loopback; opens browser, issues a management key
og logout # drop local creds
og logout --revoke # also revoke the key server-side
og me # account info
og keys # list API keys
og keys create --name "my-bot" # new inference-scope key, printed once
og keys delete # revoke an inference key
og devices # list your devices
og logs --role self --limit 10 # recent usage
og logs --role consumer --status failed --offset 20
```
### Remote endpoints
```bash
og --host https://orchardgrid.com --token sk-… "hi" # cloud
og --host http://mac.local:8888 "hi" # LAN peer
ORCHARDGRID_HOST=https://orchardgrid.com og "hi" # via env
```
### Diagnostics
```bash
og status # local server state, sharing toggles, login state
og --version # version
og --help # full flag reference
```
Deep-dive docs: [`docs/cli-reference.md`](docs/cli-reference.md) · [`docs/openai-api-compatibility.md`](docs/openai-api-compatibility.md) · [`docs/context-strategies.md`](docs/context-strategies.md) · [`demo/`](demo/) (capability-combining shell scripts).
### Environment variables
| Variable | Meaning |
|---|---|
| `ORCHARDGRID_HOST` | Default remote host (if unset, CLI runs on-device) |
| `ORCHARDGRID_TOKEN` | Default bearer token |
| `OG_NO_BROWSER` | Suppress `og login`'s auto browser launch (for SSH / CI) |
| `NO_COLOR` | Disable ANSI colour output |
### Exit codes
| Code | Meaning |
|:---:|---|
| `0` | Success |
| `1` | Runtime error (network, auth, unreachable) |
| `2` | Usage error (bad flag, conflicting options) |
| `3` | Guardrail blocked |
| `4` | Context overflow |
| `5` | Model unavailable (Apple Intelligence not enabled) |
| `6` | Rate limited |
## 🧠 Capabilities
Six on-device Apple frameworks, one consistent OpenAI-flavoured interface. Every capability is reachable through the local API (`:8888` on your LAN) and the cloud relay (`https://orchardgrid.com`).
| Capability | Framework | Endpoint | What it does |
|---|---|---|---|
| **Chat** | FoundationModels | `/v1/chat/completions` | LLM text generation, streaming, structured output, MCP tools |
| **Image** | ImagePlayground | `/v1/images/generations` | Text-to-image (illustration and sketch styles) |
| **NLP** | NaturalLanguage | `/v1/nlp/analyze` | Language detection, NER, tokenisation, embeddings |
| **Vision** | Vision | `/v1/vision/analyze` | OCR, classification, face and barcode detection |
| **Speech** | Speech | `/v1/audio/transcriptions` | Speech-to-text, 50+ languages |
| **Sound** | SoundAnalysis | `/v1/audio/classify` | Environmental sound classification, ~300 categories |
## 🌐 API access
### Local (same LAN)
Enable **Share Locally** in the app. The device listens on `:8888`:
```bash
curl http://.local:8888/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"apple-foundationmodel","messages":[{"role":"user","content":"hi"}]}'
```
```python
from openai import OpenAI
client = OpenAI(base_url="http://mac.local:8888/v1", api_key="unused")
client.chat.completions.create(
model="apple-foundationmodel",
messages=[{"role": "user", "content": "Hello"}],
)
```
### Cloud (reach your devices from anywhere)
Enable **Share to Cloud** in the app, sign in, create an API key at [orchardgrid.com/dashboard/api-keys](https://orchardgrid.com/dashboard/api-keys):
```bash
curl https://orchardgrid.com/v1/chat/completions \
-H "Authorization: Bearer sk-…" \
-H "Content-Type: application/json" \
-d '{"model":"apple-foundationmodel","messages":[{"role":"user","content":"hi"}]}'
```
The cloud **never sees your prompt or response** — it routes a task id to an online device with the right capability, forwards the streamed bytes through, and logs the usage counters. Zero content storage by design.
## 🏗 Architecture
```mermaid
flowchart LR
Client([Any OpenAI-compatible client])
subgraph Cloud["OrchardGrid Cloud · Cloudflare Workers"]
direction TB
Auth[Auth & API keys]
Pool["DevicePoolManager
Durable Object"]
DB[(D1 · SQLite)]
Auth --> Pool
Pool <--> DB
end
subgraph Device["Your Apple devices · macOS · iOS · iPadOS"]
direction TB
WS[WebSocket worker]
API[Local API server
:8888]
Cap["Chat · Image · NLP
Vision · Speech · Sound"]
WS --> Cap
API --> Cap
end
Client -->|HTTPS · SSE| Auth
Pool <-->|WebSocket| WS
Client -.->|Direct LAN| API
```
**Reverse inference.** Unlike traditional AI services where the server owns the GPU, OrchardGrid's cloud has **zero compute** — it's a task router. Your devices sit behind NATs and firewalls; the server pushes tasks out over WebSocket and pipes results back. External clients see a plain HTTP + SSE API. Internally, every token is generated on hardware you own.
## 🔒 Privacy
- **100% on-device inference.** Apple's Neural Engine runs the model. Nothing leaves your device except the answer.
- **Zero content storage in the cloud.** The relay forwards bytes and counts tokens — it does not persist prompts, completions, images, or audio.
- **No telemetry.** No analytics, no crash-reporter uploading your queries.
- **Audit the code.** The app, the CLI, the worker, and the database schema are all in this organisation's repos.
- **Uninstall wipes state.** `brew uninstall --cask --zap` removes the App Group container and `~/.config/orchardgrid`.
## 📊 Honest limits
| Constraint | Detail |
|---|---|
| Context window | 4096 tokens (Apple Intelligence hard limit) |
| Platform | Apple Silicon Macs (M1+) and Apple Intelligence-capable iPhone / iPad |
| Model | One model per modality — whatever Apple ships |
| Streaming | Chat and vision stream; image generation is one-shot |
| Guardrails | Apple's safety system may refuse benign prompts; `--permissive` helps creative tasks |
| Latency | On-device — single-digit seconds per response, no rate-limit but also no cloud GPU scale |
| MCP | Stdio transport only; remote HTTP MCP servers are on the roadmap |
## 🛠 Tech stack
| Layer | Technology |
|---|---|
| Language | Swift 6 · strict concurrency · `@MainActor` managers |
| UI | SwiftUI · macOS menu-bar + iOS navigation |
| Networking | Apple Network framework (`NWListener`) · URLSession · WebSocket |
| AI | FoundationModels · ImagePlayground · NaturalLanguage · Vision · Speech · SoundAnalysis |
| Cloud backend | Cloudflare Workers · Durable Objects · D1 (SQLite) · Hono |
| Auth | Clerk · Apple Sign-In · Bearer API keys (scoped: inference / management) |
| Distribution | Homebrew cask (app + CLI) · App Store (iOS / iPadOS / macOS) |
| CLI | Swift Package · 127 Swift Testing unit tests + 96 pytest integration tests |
| Quality gate | GitHub Actions · CLI + Xcode app run on every push and PR |
## 🤝 Contributing
Bug reports, feature ideas, pull requests — all welcome.
1. Fork and branch from `main`.
2. Commit in [Conventional Commits](https://www.conventionalcommits.org) style (`feat:` / `fix:` / `perf:` / `refactor:`) — these drive the automated release pipeline.
3. Run `make format` (Swift) and `make test` before opening the PR.
4. Describe the **why**, not just the what.
New to the codebase? Start with [CLAUDE.md](CLAUDE.md) — the operator guide — then grep the repo.
Built with Swift 6 and Apple Silicon. No cloud GPUs were harmed in the making of this AI.
Website ·
API Docs ·
App Store ·
Dashboard ·
CLI reference