{"id":47863913,"url":"https://github.com/christopherkarani/Espresso","last_synced_at":"2026-04-04T03:01:05.938Z","repository":{"id":343621758,"uuid":"1173023018","full_name":"christopherkarani/Espresso","owner":"christopherkarani","description":"Train and run transformers directly on Apple's Neural Engine — 4.76x faster than CoreML.","archived":false,"fork":false,"pushed_at":"2026-03-26T19:48:52.000Z","size":20010,"stargazers_count":81,"open_issues_count":1,"forks_count":4,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-03-26T23:35:57.671Z","etag":null,"topics":["ane","apple","apple-neural-engine","neural-engine","swift"],"latest_commit_sha":null,"homepage":"","language":"Swift","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/christopherkarani.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-03-04T23:45:51.000Z","updated_at":"2026-03-26T19:48:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/christopherkarani/Espresso","commit_stats":null,"previous_names":["christopherkarani/swift-ane","christopherkarani/espresso"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/christopherkarani/Espresso","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christopherkarani%2FEspresso","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christopherkarani%2FEspresso/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christopherkarani%2FEspresso/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christopherkarani%2FEspresso/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/christopherkarani","download_url":"https://codeload.github.com/christopherkarani/Espresso/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/christopherkarani%2FEspresso/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31385935,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-04T01:22:39.193Z","status":"online","status_checked_at":"2026-04-04T02:00:07.569Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ane","apple","apple-neural-engine","neural-engine","swift"],"created_at":"2026-04-04T00:00:29.921Z","updated_at":"2026-04-04T03:01:05.926Z","avatar_url":"https://github.com/christopherkarani.png","language":"Swift","funding_links":[],"categories":["Libs","Data and Storage"],"sub_categories":["AI"],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\".github/assets/banner.svg\" alt=\"Espresso\" width=\"800\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003eDirect Neural Engine inference for transformers on Apple Silicon — 4.76x faster than CoreML.\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/christopherkarani/Espresso/actions/workflows/ci.yml\"\u003e\u003cimg src=\"https://github.com/christopherkarani/Espresso/actions/workflows/ci.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/christopherkarani/Espresso/actions/workflows/phase8-matrix.yml\"\u003e\u003cimg src=\"https://github.com/christopherkarani/Espresso/actions/workflows/phase8-matrix.yml/badge.svg\" alt=\"ANE Matrix\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://swift.org\"\u003e\u003cimg src=\"https://img.shields.io/badge/Swift-6.2-orange.svg\" alt=\"Swift 6.2\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/christopherkarani/Espresso/blob/main/LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg\" alt=\"License: MIT\"\u003e\u003c/a\u003e\n  \u003cimg src=\"https://img.shields.io/badge/macOS-15+-lightgrey.svg\" alt=\"macOS 15+\"\u003e\n  \u003cimg src=\"https://img.shields.io/badge/Dependencies-0-brightgreen.svg\" alt=\"Zero Dependencies\"\u003e\n  \u003ca href=\"https://github.com/christopherkarani/Espresso/releases\"\u003e\u003cimg src=\"https://img.shields.io/github/v/release/christopherkarani/Espresso?color=purple\" alt=\"Latest Release\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\nEspresso compiles MIL programs straight to ANE silicon through reverse-engineered private APIs (`_ANEClient`, `_ANEInMemoryModel`). No CoreML. No per-token recompilation. Just IOSurface buffers, fused multi-layer kernels, and two verified tokens per decode step.\n\n- **4.76x faster decode** — 1.08 ms/token vs CoreML's 5.09 ms/token on the same 6-layer model\n- **Fused 3-layer kernels** — 6 transformer layers in 2 ANE dispatches, not 6\n- **Zero-copy I/O** — NEON-vectorized reads, vDSP argmax, no marshaling\n- **Full training on ANE** — forward + backward passes with gradient accumulation and Adam\n- **Pure Swift 6.2** — `~Copyable` move-only tensors, strict concurrency, typed throws, zero dependencies\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\".github/assets/demo.gif\" alt=\"Espresso generating tokens on ANE\" width=\"700\"\u003e\n\u003c/p\u003e\n\n## Quick Start\n\n```bash\ngit clone https://github.com/christopherkarani/Espresso.git\ncd Espresso\n./espresso          # builds, downloads demo weights, launches TUI\n```\n\nFive lines to first ANE inference in your own project:\n\n```swift\n// Package.swift — add the dependency\n.package(url: \"https://github.com/christopherkarani/Espresso.git\", from: \"1.0.0\")\n\nimport ANERuntime\n\nlet kernel = try ANEKernel(milText: myMIL, weights: blobs, inputSizes: [input], outputSizes: [output])\ntry kernel.eval()                          // runs on Neural Engine\nlet result = kernel.outputSurface(at: 0)  // zero-copy read\n```\n\nOther entry points:\n\n```bash\n./espresso \"Hello\"                          # generate text\n./espresso doctor                           # check host readiness\n./espresso compare --no-power \"Hello\"       # side-by-side vs CoreML\n./espresso install                          # install to ~/.local/bin\nswift run espresso-bench --ane-only --inference --layers 6\nswift run espc pack-native /path/to/model /tmp/model.esp --overwrite\nswift run esprun inspect /tmp/model.esp\nswift run esprun generate /tmp/model.esp \"Hello\" 32\n```\n\n## ESP Model Platform\n\nEspresso now ships a private-only model platform around portable `.esp` bundles and bundle-aware runtime selection.\n\n- `.esp` is the canonical portable model bundle\n- `.espc` is the derived compiled-cache layer\n- `espc` packs native model directories into `.esp`\n- `esprun` inspects, resolves, and runs bundle artifacts\n- `espresso-generate --bundle \u003cpath\u003e` runs the same bundle boundary used by the runtime\n\nCurrent public docs for this layer:\n\n- [Convert / Optimize / Native-Fast strategy](docs/platform/2026-03-26-convert-optimize-native-fast-plan.md)\n- [Stories Convert -\u003e Optimize execution plan](docs/platform/2026-03-26-stories-convert-optimize-execution-plan.md)\n- [Stories agent prompt](docs/platform/2026-03-26-stories-convert-optimize-agent-prompt.md)\n\n## Benchmark\n\n### Espresso vs CoreML vs llama.cpp\n\n| Backend | ms/token | tok/s | Notes |\n|---------|----------|-------|-------|\n| **Espresso ANE** (exact two-step) | **1.08** | **926** | Direct ANE, 2 dispatches / 6 layers |\n| CoreML `.cpuAndNeuralEngine` | 5.09 | 196 | Apple's standard ANE path |\n| llama.cpp Metal | ~12–20 | ~50–85 | GPU path, CPU-bound decode¹ |\n| llama.cpp CPU (`ggml`) | ~25–40 | ~25–40 | Pure CPU, no ANE¹ |\n| **Espresso speedup vs CoreML** | | **4.76x** | |\n| **Espresso speedup vs llama.cpp Metal** | | **~11x** | |\n\n\u003e ¹ llama.cpp has no ANE backend. Metal figures are representative for GPT-2 117M on M3 Max; actual performance varies by quantization and prompt length.\n\u003e All Espresso / CoreML numbers: 6-layer local artifact · dim=768 · 12 heads · 32k vocab · seqLen=256 · M3 Max · macOS 15.\n\n\u003cdetails\u003e\n\u003csummary\u003eReproduce Espresso benchmarks\u003c/summary\u003e\n\n```bash\nRESULTS_DIR=results/$(date +%Y%m%d-%H%M%S) \\\nREPEATS=5 WARMUP=3 ITERATIONS=20 \\\n./scripts/reproduce_local_real_artifact_claim.sh\n```\n\nMachine-readable output lands in `artifacts/benchmarks/` and is kept out of git.\n\n\u003c/details\u003e\n\n### Platform Compatibility\n\n| SoC | Neural Engine | Tested | Notes |\n|-----|---------------|--------|-------|\n| M1 / M1 Pro / M1 Max / M1 Ultra | 16-core ANE | ✅ | Full feature set |\n| M2 / M2 Pro / M2 Max / M2 Ultra | 16-core ANE | ✅ | Full feature set |\n| M3 / M3 Pro / M3 Max | 18-core ANE | ✅ | Reference hardware (M3 Max) |\n| M4 / M4 Pro / M4 Max | 38-core ANE | ✅ | Faster compile cache warm-up |\n| Intel Mac | — | ❌ | No Neural Engine |\n| Apple A-series (iOS) | ✅ | ⚠️ | Requires entitlement; not App Store safe |\n\nmacOS 15+ required. iOS / tvOS not supported out of the box (private API entitlements differ per platform).\n\n## How It Works\n\n```\n                    ┌─────────────────────┐\n                    │   MIL Program Text   │  Generated per-kernel\n                    └──────────┬──────────┘\n                               ▼\n                    ┌─────────────────────┐\n                    │  _ANEClient compile  │  Private API (dlopen)\n                    └──────────┬──────────┘\n                               ▼\n                    ┌─────────────────────┐\n                    │    ANE E5 Binary     │  Cached by system\n                    └──────────┬──────────┘\n                               ▼\n              ┌────────────────┼────────────────┐\n              ▼                ▼                ▼\n     ┌──────────────┐ ┌──────────────┐ ┌──────────────┐\n     │  IOSurface   │ │  IOSurface   │ │  IOSurface   │\n     │   (input)    │ │  (weights)   │ │  (output)    │\n     └──────┬───────┘ └──────────────┘ └──────┬───────┘\n            │          ANE Hardware            │\n            └──────────────eval───────────────┘\n```\n\nThe decode loop compiles once and reuses the program across all steps. KV cache lives in IOSurface buffers — not marshaled through CoreML. Each step produces two exact tokens with verified parity. Fused triplet kernels process 3 layers per dispatch, reducing 6 layers to 2 eval calls.\n\n## Architecture\n\n```\nANEInterop (ObjC/C — private API bridge)\n  └── ANETypes (~Copyable value types, IOSurface I/O)\n          ├── MILGenerator (28+ kernel variants)\n          │       └── ANERuntime (compile, eval, surface management)\n          │               └── Espresso (training, generation, decode)\n          │                       ├── EspressoTrain (CLI)\n          │                       └── EspressoBench (CLI)\n          └── CPUOps (Accelerate/vDSP kernels)\n                  └── Espresso\n```\n\n| Module | What it does |\n|--------|-------------|\n| **ANEInterop** | `dlopen` bridge to `_ANEClient` and `_ANEInMemoryModel`. NEON-vectorized I/O. |\n| **ANETypes** | `~Copyable` tensors, `SurfaceIO`, weight serialization, model config. |\n| **MILGenerator** | Generates MIL text for forward, backward, decode, and fused kernels. |\n| **CPUOps** | RMSNorm, RoPE, embedding, softmax, Adam via Accelerate/vDSP. |\n| **ANERuntime** | Compiles MIL to ANE E5 binaries. Manages IOSurface buffers and compile budget. |\n| **Espresso** | Transformer layers, generation harnesses, exact two-token decode, training loop. |\n\n## SPM Integration\n\n```swift\n// Package.swift\ndependencies: [\n    .package(url: \"https://github.com/christopherkarani/Espresso.git\", from: \"1.0.0\")\n],\ntargets: [\n    .target(name: \"MyApp\", dependencies: [\n        .product(name: \"ANERuntime\", package: \"Espresso\"),\n        .product(name: \"ANETypes\",   package: \"Espresso\"),\n    ])\n]\n```\n\n```swift\nimport ANERuntime\nimport ANETypes\n\n// 1. Define your kernel shape\nlet gen = MyMILGenerator(config: .init(dim: 768, heads: 12))\n\n// 2. Compile once to ANE E5 binary\nlet kernel = try ANEKernel(\n    milText: gen.milText,\n    weights: gen.weightBlobs,\n    inputSizes: [gen.inputSize],\n    outputSizes: [gen.outputSize]\n)\n\n// 3. Run inference — stays on ANE the whole time\ntry kernel.eval()\n\n// 4. Read results via zero-copy IOSurface\nlet output = kernel.outputSurface(at: 0)\n```\n\n## Requirements\n\n| | Minimum |\n|---|---|\n| Hardware | Apple Silicon (M1+) with Neural Engine |\n| macOS | 15.0+ |\n| Swift | 6.0+ |\n| Dependencies | None — only Apple system frameworks |\n\n## Testing\n\n```bash\nswift test                                                    # unit tests (no ANE needed)\nANE_HARDWARE_TESTS=1 swift test --filter \"ANERuntimeTests|EspressoTests\"  # hardware tests\nOBJC_CROSS_VALIDATION=1 ANE_HARDWARE_TESTS=1 swift test --filter CrossValidationTests  # parity\n```\n\n7 test suites cover MIL generation, tensor ops, CPU kernels, ANE compilation, hardware eval, cross-validation, and end-to-end generation.\n\n## Disclaimer\n\n\u003e **App Store**: Apps using private ANE APIs (`_ANEClient`, `_ANEInMemoryModel`) will be rejected.\n\u003e\n\u003e **Everywhere else**: Internal tools, research, sideloaded apps, enterprise distribution — all fine.\n\nThis project uses undocumented private Apple APIs discovered through runtime introspection. Results are hardware- and OS-dependent. Benchmarks run on a local artifact family built by this repo, not a pretrained production model. Not affiliated with or endorsed by Apple Inc.\n\n## Contributing\n\nContributions welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\nFile bugs and feature requests via [GitHub Issues](https://github.com/christopherkarani/Espresso/issues).\n\n## License\n\nMIT — see [LICENSE](LICENSE).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchristopherkarani%2FEspresso","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchristopherkarani%2FEspresso","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchristopherkarani%2FEspresso/lists"}