{"id":16063047,"url":"https://github.com/NexaAI/nexa-sdk","last_synced_at":"2025-10-22T12:31:20.806Z","repository":{"id":254317731,"uuid":"843570824","full_name":"NexaAI/nexa-sdk","owner":"NexaAI","description":"Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.","archived":false,"fork":false,"pushed_at":"2025-02-03T21:26:12.000Z","size":204290,"stargazers_count":4315,"open_issues_count":54,"forks_count":613,"subscribers_count":424,"default_branch":"main","last_synced_at":"2025-02-04T15:01:04.629Z","etag":null,"topics":["asr","audio","edge-computing","language-model","llm","on-device-ai","on-device-ml","sdk","sdk-python","stable-diffusion","transformers","tts","vlm","whisper"],"latest_commit_sha":null,"homepage":"https://docs.nexa.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NexaAI.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-16T20:13:07.000Z","updated_at":"2025-02-04T07:08:55.000Z","dependencies_parsed_at":"2024-08-26T19:35:59.933Z","dependency_job_id":"baccbc93-9ad7-45df-8c0c-ef75713646b8","html_url":"https://github.com/NexaAI/nexa-sdk","commit_stats":{"total_commits":500,"total_committers":30,"mean_commits":"16.666666666666668","dds":0.804,"last_synced_commit":"e98f3cdd243ecc82e515203381e7f65f2de80bf9"},"previous_names":["nexaai/nexa-sdk","nexaai/nexaai-sdk-cpp"],"tags_count":106,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexaAI%2Fnexa-sdk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexaAI%2Fnexa-sdk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexaAI%2Fnexa-sdk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexaAI%2Fnexa-sdk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NexaAI","download_url":"https://codeload.github.com/NexaAI/nexa-sdk/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237689169,"owners_count":19350904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["asr","audio","edge-computing","language-model","llm","on-device-ai","on-device-ml","sdk","sdk-python","stable-diffusion","transformers","tts","vlm","whisper"],"created_at":"2024-10-09T05:01:04.868Z","updated_at":"2025-10-22T12:31:20.801Z","avatar_url":"https://github.com/NexaAI.png","language":"Python","readme":"\u003cdiv align=\"center\"\u003e\n  \u003cp\u003e\n      \u003cimg width=\"100%\" src=\"assets/banner1.png\" alt=\"Nexa AI Banner\"\u003e\n      \u003cdiv align=\"center\"\u003e\n  \u003cp style=\"font-size: 1.3em; font-weight: 600; margin-bottom: 10px;\"\u003e🤝 Trusted by Partners\u003c/p\u003e\n  \u003cimg src=\"assets/qualcomm.png\" alt=\"Qualcomm\" height=\"40\" style=\"margin: 0 20px;\"\u003e\n  \u003cimg src=\"assets/nvidia.png\" alt=\"NVIDIA\" height=\"40\" style=\"margin: 0 20px;\"\u003e\n  \u003cimg src=\"assets/AMD.png\" alt=\"AMD\" height=\"42\" style=\"margin: 0 20px;\"\u003e\n  \u003cimg src=\"assets/Intel_logo.png\" alt=\"Intel\" height=\"45\" style=\"margin: 0 10px;\"\u003e\n\u003c/div\u003e\n  \u003c/p\u003e\n\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://docs.nexa.ai\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/docs-website-brightgreen?logo=readthedocs\" alt=\"Documentation\"\u003e\n    \u003c/a\u003e\n   \u003ca href=\"https://x.com/nexa_ai\"\u003e\u003cimg alt=\"X account\" src=\"https://img.shields.io/twitter/url/https/twitter.com/diffuserslib.svg?style=social\u0026label=Follow%20%40Nexa_AI\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://discord.com/invite/nexa-ai\"\u003e\n        \u003cimg src=\"https://img.shields.io/discord/1192186167391682711?color=5865F2\u0026logo=discord\u0026logoColor=white\u0026style=flat-square\" alt=\"Join us on Discord\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://join.slack.com/t/nexa-ai-community/shared_invite/zt-3837k9xpe-LEty0disTTUnTUQ4O3uuNw\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/slack-join%20chat-4A154B?logo=slack\u0026logoColor=white\" alt=\"Join us on Slack\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003c/div\u003e\n\n# NexaSDK - Run any AI model on any backend\n\nNexaSDK is an easy-to-use developer toolkit for running any AI model locally — across NPUs, GPUs, and CPUs — powered by our NexaML engine, built entirely from scratch for peak performance on every hardware stack. Unlike wrappers that depend on existing runtimes, NexaML is a unified inference engine built at the kernel level. It’s what lets NexaSDK achieve Day-0 support for new model architectures (LLMs, multimodal, audio, vision). NexaML supports 3 model formats: GGUF, MLX, and Nexa AI's own `.nexa` format.\n\n### ⚙️ Differentiation\n\n\u003cdiv align=\"center\"\u003e\n\n| Features | **NexaSDK** | **Ollama** | **llama.cpp** | **LM Studio** |\n|----------|--------------|-------------|----------------|----------------|\n| NPU support | ✅ NPU-first | ❌ | ❌ | ❌ |\n| Support any model in GGUF, MLX, NEXA format | ✅ Low-level Control | ❌ | ⚠️ | ❌ |\n| Full multimodality support | ✅ Image, Audio, Text | ⚠️ | ⚠️ | ⚠️ |\n| Cross-platform support | ✅ Desktop, Mobile, Automotive, IoT | ⚠️ | ⚠️ | ⚠️ |\n| One line of code to run | ✅ | ✅ | ⚠️ | ✅ |\n| OpenAI-compatible API + Function calling | ✅ | ✅ | ✅ | ✅ |\n\n\u003cp align=\"center\" style=\"margin-top:14px\"\u003e\n  \u003ci\u003e\n      \u003cb\u003eLegend:\u003c/b\u003e\n      \u003cspan title=\"Full support\"\u003e✅ Supported\u003c/span\u003e \u0026nbsp; | \u0026nbsp;\n      \u003cspan title=\"Partial or limited support\"\u003e⚠️ Partial or limited support \u003c/span\u003e \u0026nbsp; | \u0026nbsp;\n      \u003cspan title=\"Not Supported\"\u003e❌ No\u003c/span\u003e\n  \u003c/i\u003e\n\u003c/p\u003e\n\u003c/div\u003e\n\n\n## Recent Wins\n\n- 📣 Day-0 Support for **Qwen3-VL-4B and 8B** in GGUF, MLX, .nexa format for NPU/GPU/CPU. We are the only framework that supports the GGUF format. [Featured in Qwen's post about our partnership](https://x.com/Alibaba_Qwen/status/1978154384098754943).\n- 📣 Day-0 Support for **IBM Granite 4.0** on NPU/GPU/CPU. [NexaML engine were featured right next to vLLM, llama.cpp, and MLX in IBM's blog](https://x.com/IBM/status/1978154384098754943).\n- 📣 Day-0 Support for **Google EmbeddingGemma** on NPU. We are [featured in Google's social post](https://x.com/googleaidevs/status/1969188152049889511).\n- 📣 Supported **vision capability for Gemma3n**: First-ever [Gemma-3n](https://sdk.nexa.ai/model/Gemma3n-E4B) **multimodal** inference for GPU \u0026 CPU, in GGUF format.\n- 📣 AMD NPU Support for [SDXL](https://huggingface.co/NexaAI/sdxl-turbo-amd-npu) image generation\n- 📣 Intel NPU Support [DeepSeek-r1-distill-Qwen-1.5B](https://sdk.nexa.ai/model/DeepSeek-R1-Distill-Qwen-1.5B-Intel-NPU) and [Llama3.2-3B](https://sdk.nexa.ai/model/Llama3.2-3B-Intel-NPU)\n- 📣 Apple Neural Engine Support for real-time speech recognition with [Parakeet v3 model](https://sdk.nexa.ai/model/parakeet-v3-ane)\n  \n# Quick Start\n\n## Step 1: Download Nexa CLI with one click\n\n### macOS\n* [arm64 with Apple Neural Engine support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_macos_arm64.pkg)\n* [x86_64](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_macos_x86_64.pkg)\n\n### Windows\n* [arm64 with Qualcomm NPU support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_arm64.exe)\n* [x86_64 with Intel / AMD NPU support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_x86_64.exe)\n\n### Linux\n#### For x86_64:\n```bash\ncurl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_x86_64.sh -o install.sh \u0026\u0026 chmod +x install.sh \u0026\u0026 ./install.sh \u0026\u0026 rm install.sh\n```\n\n#### For arm64:\n```bash\ncurl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_arm64.sh -o install.sh \u0026\u0026 chmod +x install.sh \u0026\u0026 ./install.sh \u0026\u0026 rm install.sh\n```\n\n## Step 2: Run models with one line of code\n\nYou can run any compatible GGUF, MLX, or nexa model from 🤗 Hugging Face by using the `nexa infer \u003cfull repo name\u003e`.\n\n### GGUF models\n\n\u003e [!TIP]\n\u003e GGUF runs on macOS, Linux, and Windows on CPU/GPU. Note certain GGUF models are only supported by NexaSDK (e.g. Qwen3-VL-4B and 8B).\n\n📝 Run and chat with LLMs, e.g. Qwen3:\n\n```bash\nnexa infer ggml-org/Qwen3-1.7B-GGUF\n```\n\n🖼️ Run and chat with Multimodal models, e.g. Qwen3-VL-4B:\n\n```bash\nnexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF\n```\n\n### MLX models\n\u003e [!TIP]\n\u003e MLX is macOS-only (Apple Silicon). Many MLX models in the Hugging Face mlx-community organization have quality issues and may not run reliably.\n\u003e We recommend starting with models from our curated [NexaAI Collection](https://huggingface.co/NexaAI/collections) for best results. For example\n\n📝 Run and chat with LLMs, e.g. Qwen3:\n\n```bash\nnexa infer NexaAI/Qwen3-4B-4bit-MLX\n```\n\n🖼️ Run and chat with Multimodal models, e.g. Gemma3n:\n\n```bash\nnexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX\n```\n\n### Qualcomm NPU models\n\u003e [!TIP]\n\u003e You need to download the [arm64 with Qualcomm NPU support](https://public-storage.nexa4ai.com/nexa_sdk/downloads/nexa-cli_windows_arm64.exe) and make sure you have Snapdragon® X Elite chip on your laptop.\n\n#### Quick Start (Windows arm64, Snapdragon X Elite)\n\n1. **Login \u0026 Get Access Token (required for Pro Models)**  \n   - Create an account at [sdk.nexa.ai](https://sdk.nexa.ai)  \n   - Go to **Deployment → Create Token**  \n   - Run this once in your terminal (replace with your token):  \n     ```bash\n     nexa config set license '\u003cyour_token_here\u003e'\n     ```\n\n2. Run and chat with our multimodal model, OmniNeural-4B, or other models on NPU\n\n```bash\nnexa infer NexaAI/OmniNeural-4B\nnexa infer NexaAI/Granite-4-Micro-NPU\nnexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU\n```\n\n## CLI Reference\n\n| Essential Command                          | What it does                                                        |\n|----------------------------------|----------------------------------------------------------------------|\n| `nexa -h`              | show all CLI commands                              |\n| `nexa pull \u003crepo\u003e`              | Interactive download \u0026 cache of a model                              |\n| `nexa infer \u003crepo\u003e`             | Local inference          |\n| `nexa list`                     | Show all cached models with sizes                                    |\n| `nexa remove \u003crepo\u003e` / `nexa clean` | Delete one / all cached models                                   |\n| `nexa serve --host 127.0.0.1:8080` | Launch OpenAI‑compatible REST server                            |\n| `nexa run \u003crepo\u003e`              | Chat with a model via an existing server                             |\n\n👉 To interact with multimodal models, you can drag photos or audio clips directly into the CLI — you can even drop multiple images at once!\n\nSee [CLI Reference](https://nexaai.mintlify.app/nexa-sdk-go/NexaCLI) for full commands.\n\n## Acknowledgements\n\nWe would like to thank the following projects:\n- [ggml](https://github.com/ggml-org/ggml)\n- [mlx-lm](https://github.com/ml-explore/mlx-lm)\n- [mlx-vlm](https://github.com/Blaizzy/mlx-vlm)\n- [mlx-audio](https://github.com/Blaizzy/mlx-audio)\n","funding_links":[],"categories":["\u003cimg src=\"./assets/cpu.svg\" width=\"16\" height=\"16\" style=\"vertical-align: middle;\"\u003e Backends","Music \u0026 Audio","Repos","其他_机器学习与深度学习","Go","Kotlin"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNexaAI%2Fnexa-sdk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FNexaAI%2Fnexa-sdk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FNexaAI%2Fnexa-sdk/lists"}