https://github.com/lemonade-sdk/lemonade
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
https://github.com/lemonade-sdk/lemonade
ai amd genai gpu llama llm llm-inference local-server mcp mcp-server mistral npu onnxruntime openai-api qwen radeon rocm ryzen vulkan
Last synced: 22 days ago
JSON representation
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
- Host: GitHub
- URL: https://github.com/lemonade-sdk/lemonade
- Owner: lemonade-sdk
- License: apache-2.0
- Created: 2025-05-15T19:17:39.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2026-02-05T09:51:38.000Z (about 1 month ago)
- Last Synced: 2026-02-05T13:50:55.903Z (about 1 month ago)
- Topics: ai, amd, genai, gpu, llama, llm, llm-inference, local-server, mcp, mcp-server, mistral, npu, onnxruntime, openai-api, qwen, radeon, rocm, ryzen, vulkan
- Language: C++
- Homepage: https://lemonade-server.ai/
- Size: 9.47 MB
- Stars: 2,107
- Watchers: 19
- Forks: 178
- Open Issues: 118
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Notice: NOTICE.md
Awesome Lists containing this project
- awesome - lemonade-sdk/lemonade - Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk (C++)
- awesome-repositories - lemonade-sdk/lemonade - Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk (C++)
- awesome-local-llm - lemonade - a local LLM server with GPU and NPU Acceleration (Inference platforms)
- awesome-mcp - lemonade-sdk/lemonade - Lemonade SDK is a toolkit for serving, benchmarking, and deploying large language models locally with NPU and GPU acceleration, supporting multiple frameworks and providing APIs and CLI tools for integration and evaluation. (MCP Frameworks and libraries / Python)
README
## 🍋 Lemonade: Local LLMs with GPU and NPU acceleration
Download |
Documentation |
Discord
Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs.
Apps like [n8n](https://n8n.io/integrations/lemonade-model/), [VS Code Copilot](https://marketplace.visualstudio.com/items?itemName=lemonade-sdk.lemonade-sdk), [Morphik](https://www.morphik.ai/docs/local-inference#lemonade), and many more use Lemonade to seamlessly run LLMs on any PC.
## Getting Started
1. **Install**: [Windows](https://lemonade-server.ai/install_options.html#windows) · [Linux](https://lemonade-server.ai/install_options.html#linux) · [Docker](https://lemonade-server.ai/install_options.html#docker) · [Source](https://lemonade-server.ai/install_options.html)
2. **Get Models**: Browse and download with the [Model Manager](#model-library)
3. **Chat**: Try models with the built-in chat interface
4. **Mobile**: Take your lemonade to go: [iOS](https://apps.apple.com/us/app/lemonade-mobile/id6757372210) · Android (soon) · [Source](https://github.com/lemonade-sdk/lemonade-mobile)
5. **Connect**: Use Lemonade with your favorite apps:
Want your app featured here? Discord · GitHub Issue · Email · View all apps →
## Using the CLI
To run and chat with Gemma 3:
```
lemonade-server run Gemma-3-4b-it-GGUF
```
To install models ahead of time, use the `pull` command:
```
lemonade-server pull Gemma-3-4b-it-GGUF
```
To check all models available, use the `list` command:
```
lemonade-server list
```
> **Tip**: You can use `--llamacpp vulkan/rocm` to select a backend when running GGUF models.
## Model Library

Lemonade supports **GGUF**, **FLM**, and **ONNX** models across CPU, GPU, and NPU (see [supported configurations](#supported-configurations)).
Use `lemonade-server pull` or the built-in **Model Manager** to download models. You can also import custom GGUF/ONNX models from Hugging Face.
**[Browse all built-in models →](https://lemonade-server.ai/docs/server/server_models/)**
## Image Generation
Lemonade supports image generation using Stable Diffusion models via [stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp).
```bash
# Pull an image generation model
lemonade-server pull SD-Turbo
# Start the server
lemonade-server serve
```
Available models: **SD-Turbo** (fast, 4-step), **SDXL-Turbo**, **SD-1.5**, **SDXL-Base-1.0**
> See `examples/api_image_generation.py` for complete examples.
## Supported Configurations
Lemonade supports the following configurations, while also making it easy to switch between them at runtime.
| Hardware | Engine: OGA | Engine: llamacpp | Engine: FLM | Windows | Linux |
|----------|-------------|------------------|------------|---------|-------|
| **🧠 CPU** | All platforms | All platforms | - | ✅ | ✅ |
| **🎮 GPU** | — | Vulkan: All platforms
ROCm: Selected AMD platforms*
Metal: Apple Silicon | — | ✅ | ✅ |
| **🤖 NPU** | AMD Ryzen™ AI 300 series | — | Ryzen™ AI 300 series | ✅ | — |
* See supported AMD ROCm platforms
Architecture
Platform Support
GPU Models
gfx1151 (STX Halo)
Windows, Ubuntu
Ryzen AI MAX+ Pro 395
gfx120X (RDNA4)
Windows, Ubuntu
Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT
gfx110X (RDNA3)
Windows, Ubuntu
Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT
## Project Roadmap
| Under Development | Under Consideration | Recently Completed |
|---------------------------------------------------|------------------------------------------------|------------------------------------------|
| macOS | vLLM support | Image generation (stable-diffusion.cpp) |
| Apps marketplace | Text to speech | General speech-to-text support (whisper.cpp) |
| lemonade-eval CLI | MLX support | ROCm support for Ryzen AI 360-375 (Strix) APUs |
| | ryzenai-server dedicated repo | Lemonade desktop app |
| | Enhanced custom model support | |
## Integrate Lemonade Server with Your Application
You can use any OpenAI-compatible client library by configuring it to use `http://localhost:8000/api/v1` as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.
Feel free to pick and choose your preferred language.
| Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
|--------|-----|------|----|---------|----|-------|------|-----|
| [openai-python](https://github.com/openai/openai-python) | [openai-cpp](https://github.com/olrea/openai-cpp) | [openai-java](https://github.com/openai/openai-java) | [openai-dotnet](https://github.com/openai/openai-dotnet) | [openai-node](https://github.com/openai/openai-node) | [go-openai](https://github.com/sashabaranov/go-openai) | [ruby-openai](https://github.com/alexrudall/ruby-openai) | [async-openai](https://github.com/64bit/async-openai) | [openai-php](https://github.com/openai-php/client) |
### Python Client Example
```python
from openai import OpenAI
# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:8000/api/v1",
api_key="lemonade" # required but unused
)
# Create a chat completion
completion = client.chat.completions.create(
model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Print the response
print(completion.choices[0].message.content)
```
For more detailed integration instructions, see the [Integration Guide](./docs/server/server_integration.md).
## FAQ
To read our frequently asked questions, see our [FAQ Guide](./docs/faq.md)
## Contributing
We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our [contribution guide](./docs/contribute.md).
New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.
## Maintainers
This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @Geramy @ramkrishna2910 @siavashhub @sofiageo @vgodsoe, and sponsored by AMD. You can reach us by filing an [issue](https://github.com/lemonade-sdk/lemonade/issues), emailing [lemonade@amd.com](mailto:lemonade@amd.com), or joining our [Discord](https://discord.gg/5xXzkMu8Zk).
## Code Signing Policy
Free code signing provided by [SignPath.io](https://signpath.io), certificate by [SignPath Foundation](https://signpath.org).
- **Committers and reviewers**: [Maintainers](#maintainers) of this repo
- **Approvers**: [Owners](https://github.com/orgs/lemonade-sdk/people?query=role%3Aowner)
**Privacy policy**: This program will not transfer any information to other networked systems unless specifically requested by the user or the person installing or operating it. When the user requests it, Lemonade downloads AI models from [Hugging Face Hub](https://huggingface.co/) (see their [privacy policy](https://huggingface.co/privacy)).
## License and Attribution
This project is:
- Built with C++ (server) and Python (SDK) with ❤️ for the open source community,
- Standing on the shoulders of great tools from:
- [ggml/llama.cpp](https://github.com/ggml-org/llama.cpp)
- [OnnxRuntime GenAI](https://github.com/microsoft/onnxruntime-genai)
- [Hugging Face Hub](https://github.com/huggingface/huggingface_hub)
- [OpenAI API](https://github.com/openai/openai-python)
- [IRON/MLIR-AIE](https://github.com/Xilinx/mlir-aie)
- and more...
- Accelerated by mentorship from the OCV Catalyst program.
- Licensed under the [Apache 2.0 License](https://github.com/lemonade-sdk/lemonade/blob/main/LICENSE).
- Portions of the project are licensed as described in [NOTICE.md](./NOTICE.md).









