https://github.com/NexaAI/nexa-sdk

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.
https://github.com/NexaAI/nexa-sdk

asr audio edge-computing language-model llm on-device-ai on-device-ml sdk sdk-python stable-diffusion transformers tts vlm whisper

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/NexaAI/nexa-sdk
Owner: NexaAI
License: apache-2.0
Created: 2024-08-16T20:13:07.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-03T21:26:12.000Z (about 1 year ago)
Last Synced: 2025-02-04T15:01:04.629Z (about 1 year ago)
Topics: asr, audio, edge-computing, language-model, llm, on-device-ai, on-device-ml, sdk, sdk-python, stable-diffusion, transformers, tts, vlm, whisper
Language: Python
Homepage: https://docs.nexa.ai/
Size: 195 MB
Stars: 4,315
Watchers: 424
Forks: 613
Open Issues: 54
Metadata Files:
- Readme: README.md
- License: LICENSE
- Codeowners: .github/CODEOWNERS

Awesome Lists containing this project

awesome-llm-services - Nexa SDK
AiTreasureBox - NexaAI/nexa-sdk - 01-03_7374_16](https://img.shields.io/github/stars/NexaAI/nexa-sdk.svg)|Run the latest LLMs and VLMs across GPU, NPU, and CPU with PC (Python/C++) & mobile (Android & iOS) support, running quickly with OpenAI gpt-oss, Granite4, Qwen3VL, Gemma 3n and more.| (Repos)
StarryDivineSky - NexaAI/nexa-sdk
awesome-repositories - NexaAI/nexa-sdk - Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Suppo (Kotlin)

README

          


  

      

      



  🤝 Trusted by Partners

  

  

  

  



  

  


    

        

    

   

    

        

    

    

        

    





# NexaSDK - Run any AI model on any backend

NexaSDK is an easy-to-use developer toolkit for running any AI model locally — across NPUs, GPUs, and CPUs — powered by our NexaML engine, built entirely from scratch for peak performance on every hardware stack. Unlike wrappers that depend on existing runtimes, NexaML is a unified inference engine built at the kernel level. It’s what lets NexaSDK achieve Day-0 support for new model architectures (LLMs, multimodal, audio, vision). NexaML supports 3 model formats: GGUF, MLX, and Nexa AI's own `.nexa` format.

### ⚙️ Differentiation



| Features | **NexaSDK** | **Ollama** | **llama.cpp** | **LM Studio** |

|----------|--------------|-------------|----------------|----------------|

| NPU support | ✅ NPU-first | ❌ | ❌ | ❌ |

| Support any model in GGUF, MLX, NEXA format | ✅ Low-level Control | ❌ | ⚠️ | ❌ |

| Full multimodality support | ✅ Image, Audio, Text | ⚠️ | ⚠️ | ⚠️ |

| Cross-platform support | ✅ Desktop, Mobile, Automotive, IoT | ⚠️ | ⚠️ | ⚠️ |

| One line of code to run | ✅ | ✅ | ⚠️ | ✅ |

| OpenAI-compatible API + Function calling | ✅ | ✅ | ✅ | ✅ |



  

      Legend:

      ✅ Supported   |  

      ⚠️ Partial or limited support    |  

      ❌ No

  





## Recent Wins

- 📣 Day-0 Support for **Qwen3-VL-4B and 8B** in GGUF, MLX, .nexa format for NPU/GPU/CPU. We are the only framework that supports the GGUF format. [Featured in Qwen's post about our partnership](https://x.com/Alibaba_Qwen/status/1978154384098754943).

- 📣 Day-0 Support for **IBM Granite 4.0** on NPU/GPU/CPU. [NexaML engine were featured right next to vLLM, llama.cpp, and MLX in IBM's blog](https://x.com/IBM/status/1978154384098754943).

- 📣 Day-0 Support for **Google EmbeddingGemma** on NPU. We are [featured in Google's social post](https://x.com/googleaidevs/status/1969188152049889511).

- 📣 Supported **vision capability for Gemma3n**: First-ever [Gemma-3n](https://sdk.nexa.ai/model/Gemma3n-E4B) **multimodal** inference for GPU & CPU, in GGUF format.

- 📣 AMD NPU Support for [SDXL](https://huggingface.co/NexaAI/sdxl-turbo-amd-npu) image generation

- 📣 Intel NPU Support [DeepSeek-r1-distill-Qwen-1.5B](https://sdk.nexa.ai/model/DeepSeek-R1-Distill-Qwen-1.5B-Intel-NPU) and [Llama3.2-3B](https://sdk.nexa.ai/model/Llama3.2-3B-Intel-NPU)

- 📣 Apple Neural Engine Support for real-time speech recognition with [Parakeet v3 model](https://sdk.nexa.ai/model/parakeet-v3-ane)

  

# Quick Start

## Step 1: Download Nexa CLI with one click

### macOS

* [arm64 with Apple Neural Engine support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_macos_arm64.pkg)

* [x86_64](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_macos_x86_64.pkg)

### Windows

* [arm64 with Qualcomm NPU support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_arm64.exe)

* [x86_64 with Intel / AMD NPU support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_x86_64.exe)

### Linux

#### For x86_64:

```bash

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_x86_64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

```

#### For arm64:

```bash

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_arm64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

```

## Step 2: Run models with one line of code

You can run any compatible GGUF, MLX, or nexa model from 🤗 Hugging Face by using the `nexa infer `.

### GGUF models

> [!TIP]

> GGUF runs on macOS, Linux, and Windows on CPU/GPU. Note certain GGUF models are only supported by NexaSDK (e.g. Qwen3-VL-4B and 8B).

📝 Run and chat with LLMs, e.g. Qwen3:

```bash

nexa infer ggml-org/Qwen3-1.7B-GGUF

```

🖼️ Run and chat with Multimodal models, e.g. Qwen3-VL-4B:

```bash

nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF

```

### MLX models

> [!TIP]

> MLX is macOS-only (Apple Silicon). Many MLX models in the Hugging Face mlx-community organization have quality issues and may not run reliably.

> We recommend starting with models from our curated [NexaAI Collection](https://huggingface.co/NexaAI/collections) for best results. For example

📝 Run and chat with LLMs, e.g. Qwen3:

```bash

nexa infer NexaAI/Qwen3-4B-4bit-MLX

```

🖼️ Run and chat with Multimodal models, e.g. Gemma3n:

```bash

nexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX

```

### Qualcomm NPU models

> [!TIP]

> You need to download the [arm64 with Qualcomm NPU support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_arm64.exe) and make sure you have Snapdragon® X Elite chip on your laptop.

#### Quick Start (Windows arm64, Snapdragon X Elite)

1. **Login & Get Access Token (required for Pro Models)**  

   - Create an account at [sdk.nexa.ai](https://sdk.nexa.ai)  

   - Go to **Deployment → Create Token**  

   - Run this once in your terminal (replace with your token):  

     ```bash

     nexa config set license ''

     ```

2. Run and chat with our multimodal model, OmniNeural-4B, or other models on NPU

```bash

nexa infer NexaAI/OmniNeural-4B

nexa infer NexaAI/Granite-4-Micro-NPU

nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU

```

## CLI Reference

| Essential Command                          | What it does                                                        |

|----------------------------------|----------------------------------------------------------------------|

| `nexa -h`              | show all CLI commands                              |

| `nexa pull `              | Interactive download & cache of a model                              |

| `nexa infer `             | Local inference          |

| `nexa list`                     | Show all cached models with sizes                                    |

| `nexa remove ` / `nexa clean` | Delete one / all cached models                                   |

| `nexa serve --host 127.0.0.1:8080` | Launch OpenAI‑compatible REST server                            |

| `nexa run `              | Chat with a model via an existing server                             |

👉 To interact with multimodal models, you can drag photos or audio clips directly into the CLI — you can even drop multiple images at once!

See [CLI Reference](https://nexaai.mintlify.app/nexa-sdk-go/NexaCLI) for full commands.

## Acknowledgements

We would like to thank the following projects:

- [ggml](https://github.com/ggml-org/ggml)

- [mlx-lm](https://github.com/ml-explore/mlx-lm)

- [mlx-vlm](https://github.com/Blaizzy/mlx-vlm)

- [mlx-audio](https://github.com/Blaizzy/mlx-audio)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/NexaAI/nexa-sdk

Awesome Lists containing this project

README