https://github.com/johnmai-dev/ane-lm
LLM inference on Apple Neural Engine (ANE)
https://github.com/johnmai-dev/ane-lm
Last synced: 3 months ago
JSON representation
LLM inference on Apple Neural Engine (ANE)
- Host: GitHub
- URL: https://github.com/johnmai-dev/ane-lm
- Owner: johnmai-dev
- License: mit
- Created: 2026-03-03T14:50:26.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-04T11:31:14.000Z (4 months ago)
- Last Synced: 2026-03-09T03:57:37.890Z (3 months ago)
- Language: C++
- Homepage:
- Size: 159 KB
- Stars: 115
- Watchers: 4
- Forks: 8
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ANE-LM
LLM inference on Apple Neural Engine (ANE) using private `AppleNeuralEngine.framework` APIs.
## Supported Models
- Qwen3 (dense)
- Qwen3.5 (dense, text-only)
## Build
```bash
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
```
## Usage

Download a supported model (e.g. `Qwen3-0.6B` or `Qwen3.5-0.8B` in safetensors format), then:
```bash
# Single-shot generation
./build/ane-lm generate --model /path/to/Qwen3.5-0.8B --prompt "Hello"
# Interactive chat
./build/ane-lm chat --model /path/to/Qwen3.5-0.8B
# Pre-convert weights (BF16 -> FP16, speeds up subsequent loads)
./build/ane-lm convert --model /path/to/Qwen3.5-0.8B
```
### Options
```
--model Path to model directory (required)
--prompt Input prompt (generate mode, default: "Hello")
--max-tokens N Max tokens to generate (default: unlimited)
--temp T Temperature (default: 0.6)
--repeat-penalty P Repetition penalty (default: 1.2, 1.0=off)
--enable-thinking Enable thinking/reasoning mode
--no-ane-cache Disable persistent ANE compile cache
-v, --verbose Show detailed initialization info
```
## Requirements
- macOS 13.0+
- Apple Silicon (M1/M2/M3/M4/M5)
## Acknowledgments
- [maderix/ANE](https://github.com/maderix/ANE) - Training neural networks on Apple Neural Engine via reverse-engineered private APIs
- [llama.cpp](https://github.com/ggml-org/llama.cpp) - LLM inference in C/C++