https://github.com/johnmai-dev/ane-lm

LLM inference on Apple Neural Engine (ANE)
https://github.com/johnmai-dev/ane-lm

Last synced: 3 months ago
JSON representation

LLM inference on Apple Neural Engine (ANE)

Host: GitHub
URL: https://github.com/johnmai-dev/ane-lm
Owner: johnmai-dev
License: mit
Created: 2026-03-03T14:50:26.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-03-04T11:31:14.000Z (4 months ago)
Last Synced: 2026-03-09T03:57:37.890Z (3 months ago)
Language: C++
Homepage:
Size: 159 KB
Stars: 115
Watchers: 4
Forks: 8
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # ANE-LM

LLM inference on Apple Neural Engine (ANE) using private `AppleNeuralEngine.framework` APIs. 

## Supported Models

- Qwen3 (dense)

- Qwen3.5 (dense, text-only)

## Build

```bash

cmake -B build -DCMAKE_BUILD_TYPE=Release

cmake --build build

```

## Usage

![image](assets/image.png)

Download a supported model (e.g. `Qwen3-0.6B` or `Qwen3.5-0.8B` in safetensors format), then:

```bash

# Single-shot generation

./build/ane-lm generate --model /path/to/Qwen3.5-0.8B --prompt "Hello"

# Interactive chat

./build/ane-lm chat --model /path/to/Qwen3.5-0.8B

# Pre-convert weights (BF16 -> FP16, speeds up subsequent loads)

./build/ane-lm convert --model /path/to/Qwen3.5-0.8B

```

### Options

```

--model        Path to model directory (required)

--prompt       Input prompt (generate mode, default: "Hello")

--max-tokens N       Max tokens to generate (default: unlimited)

--temp T             Temperature (default: 0.6)

--repeat-penalty P   Repetition penalty (default: 1.2, 1.0=off)

--enable-thinking    Enable thinking/reasoning mode

--no-ane-cache       Disable persistent ANE compile cache

-v, --verbose        Show detailed initialization info

```

## Requirements

- macOS 13.0+

- Apple Silicon (M1/M2/M3/M4/M5)

## Acknowledgments

- [maderix/ANE](https://github.com/maderix/ANE) - Training neural networks on Apple Neural Engine via reverse-engineered private APIs

- [llama.cpp](https://github.com/ggml-org/llama.cpp) - LLM inference in C/C++

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/johnmai-dev/ane-lm

Awesome Lists containing this project

README