An open API service indexing awesome lists of open source software.

https://github.com/lemonade-sdk/lemonade

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk
https://github.com/lemonade-sdk/lemonade

ai amd genai gpu llama llm llm-inference local-server mcp mcp-server mistral npu onnxruntime openai-api qwen radeon rocm ryzen vulkan

Last synced: about 1 month ago
JSON representation

Lemonade helps users discover and run local AI apps by serving optimized LLMs right from their own GPUs and NPUs. Join our discord: https://discord.gg/5xXzkMu8Zk

Awesome Lists containing this project

README

          

## 🍋 Lemonade: Refreshingly fast local LLMs, Image and Speech Generation



Discord


Lemonade Server Build


Windows 11


Ubuntu 24.04 | 25.04


macOS (beta)


Get it from the Snap Store


Arch Linux


Made with Python


PRs Welcome


Latest Release


GitHub downloads


GitHub issues


License: Apache


Code style: black


Star History Chart



Lemonade Banner



Download |
Documentation |
Discord

Lemonade helps users discover and run local AI apps by serving optimized LLMs, images, and speech right from their own GPUs and NPUs.

Apps like [n8n](https://n8n.io/integrations/lemonade-model/), [VS Code Copilot](https://marketplace.visualstudio.com/items?itemName=lemonade-sdk.lemonade-sdk), [Morphik](https://www.morphik.ai/docs/local-inference#lemonade), and many more use Lemonade to seamlessly run generative AI on any PC.

## Getting Started

1. **Install**: [Windows](https://lemonade-server.ai/install_options.html#windows) · [Linux](https://lemonade-server.ai/install_options.html#linux) · [macOS (beta)](https://lemonade-server.ai/install_options.html#macos) · [Docker](https://lemonade-server.ai/install_options.html#docker) · [Source](./docs/dev-getting-started.md)
2. **Get Models**: Browse and download with the [Model Manager](#model-library)
3. **Generate**: Try models with the built-in interfaces for chat, image gen, speech gen, and more
4. **Mobile**: Take your lemonade to go: [iOS](https://apps.apple.com/us/app/lemonade-mobile/id6757372210) · [Android](https://play.google.com/store/apps/details?id=com.lemonade.mobile.chat.ai&pli=1) · [Source](https://github.com/lemonade-sdk/lemonade-mobile)
5. **Connect**: Use Lemonade with your favorite apps:


Continue  Deep Tutor  Dify  Gaia  GitHub Copilot  Infinity Arcade  Iterate.ai  n8n  Open WebUI  OpenHands

View all apps →Want your app featured here? Just submit a marketplace PR!

## Using the CLI

To run and chat with Gemma 3:

```
lemonade run Gemma-3-4b-it-GGUF
```

More modalities:

```
# image gen
lemonade run SDXL-Turbo

# speech gen
lemonade run kokoro-v1

# transcription
lemonade run Whisper-Large-v3-Turbo
```

To see models availables and download them:

```
lemonade list

lemonade pull Gemma-3-4b-it-GGUF
```

To see the backends available on your PC:

```
lemonade recipes
```

## Model Library

Model Manager

Lemonade supports a wide variety of LLMs (**GGUF**, **FLM**, and **ONNX**), whisper, stable diffusion, etc. models across CPU, GPU, and NPU.

Use `lemonade pull` or the built-in **Model Manager** to download models. You can also import custom GGUF/ONNX models from Hugging Face.

**[Browse all built-in models →](https://lemonade-server.ai/models.html)**


## Supported Configurations

Lemonade supports multiple recipes (LLM, speech, TTS, and image generation), and each recipe has its own backend and hardware requirements.



Modality
Recipe
Backend
Device
OS




Text generation
llamacpp
vulkan
x86_64 CPU, AMD iGPU, AMD dGPU
Windows, Linux


rocm
Supported AMD ROCm iGPU/dGPU families*
Windows, Linux


cpu
x86_64 CPU
Windows, Linux


metal
Apple Silicon GPU
macOS (beta)


system
x86_64 CPU, GPU
Linux


flm
npu
XDNA2 NPU
Windows, Linux


ryzenai-llm
npu
XDNA2 NPU
Windows


Speech-to-text
whispercpp
npu
XDNA2 NPU
Windows


vulkan
x86_64 CPU
Linux


cpu
x86_64 CPU
Windows, Linux


Text-to-speech
kokoro
cpu
x86_64 CPU
Windows, Linux


Image generation
sd-cpp
rocm
Supported AMD ROCm iGPU/dGPU families*
Windows, Linux


cpu
x86_64 CPU
Windows, Linux

To check exactly which recipes/backends are supported on your own machine, run:

```
lemonade recipes
```

* See supported AMD ROCm platforms




Architecture
Platform Support
GPU Models




gfx1151 (STX Halo)
Windows, Ubuntu
Ryzen AI MAX+ Pro 395


gfx120X (RDNA4)
Windows, Ubuntu
Radeon AI PRO R9700, RX 9070 XT/GRE/9070, RX 9060 XT


gfx110X (RDNA3)
Windows, Ubuntu
Radeon PRO W7900/W7800/W7700/V710, RX 7900 XTX/XT/GRE, RX 7800 XT, RX 7700 XT

## Project Roadmap

| Under Development | Under Consideration | Recently Completed |
|---------------------------|-----------------------------|------------------------|
| MLX support | vLLM support | macOS (beta) |
| More whisper.cpp backends | Enhanced custom model usage | Image generation |
| More SD.cpp backends | | Speech-to-text |
| | | Text-to-speech |
| | | Apps marketplace |

## Integrate Lemonade Server with Your Application

You can use any OpenAI-compatible client library by configuring it to use `http://localhost:8000/api/v1` as the base URL. A table containing official and popular OpenAI clients on different languages is shown below.

Feel free to pick and choose your preferred language.

| Python | C++ | Java | C# | Node.js | Go | Ruby | Rust | PHP |
|--------|-----|------|----|---------|----|-------|------|-----|
| [openai-python](https://github.com/openai/openai-python) | [openai-cpp](https://github.com/olrea/openai-cpp) | [openai-java](https://github.com/openai/openai-java) | [openai-dotnet](https://github.com/openai/openai-dotnet) | [openai-node](https://github.com/openai/openai-node) | [go-openai](https://github.com/sashabaranov/go-openai) | [ruby-openai](https://github.com/alexrudall/ruby-openai) | [async-openai](https://github.com/64bit/async-openai) | [openai-php](https://github.com/openai-php/client) |

### Python Client Example
```python
from openai import OpenAI

# Initialize the client to use Lemonade Server
client = OpenAI(
base_url="http://localhost:8000/api/v1",
api_key="lemonade" # required but unused
)

# Create a chat completion
completion = client.chat.completions.create(
model="Llama-3.2-1B-Instruct-Hybrid", # or any other available model
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)

# Print the response
print(completion.choices[0].message.content)
```

For more detailed integration instructions, see the [Integration Guide](./docs/server/server_integration.md).

## FAQ

To read our frequently asked questions, see our [FAQ Guide](./docs/faq.md)

## Contributing

We are actively seeking collaborators from across the industry. If you would like to contribute to this project, please check out our [contribution guide](./docs/contribute.md).

New contributors can find beginner-friendly issues tagged with "Good First Issue" to get started.


Good First Issue

## Maintainers

This is a community project maintained by @amd-pworfolk @bitgamma @danielholanda @jeremyfowers @Geramy @ramkrishna2910 @siavashhub @sofiageo @superm1 @vgodsoe, and sponsored by AMD. You can reach us by filing an [issue](https://github.com/lemonade-sdk/lemonade/issues), emailing [lemonade@amd.com](mailto:lemonade@amd.com), or joining our [Discord](https://discord.gg/5xXzkMu8Zk).

## Code Signing Policy

Free code signing provided by [SignPath.io](https://signpath.io), certificate by [SignPath Foundation](https://signpath.org).

- **Committers and reviewers**: [Maintainers](#maintainers) of this repo
- **Approvers**: [Owners](https://github.com/orgs/lemonade-sdk/people?query=role%3Aowner)

**Privacy policy**: This program will not transfer any information to other networked systems unless specifically requested by the user or the person installing or operating it. When the user requests it, Lemonade downloads AI models from [Hugging Face Hub](https://huggingface.co/) (see their [privacy policy](https://huggingface.co/privacy)).

## License and Attribution

This project is:
- Built with C++ (server) and React (app) with ❤️ for the open source community,
- Standing on the shoulders of great tools from:
- [ggml/llama.cpp](https://github.com/ggml-org/llama.cpp)
- [ggml/whisper.cpp](https://github.com/ggerganov/whisper.cpp)
- [ggml/stable-diffusion.cpp](https://github.com/leejet/stable-diffusion.cpp)
- [kokoros](https://github.com/lucasjinreal/Kokoros)
- [OnnxRuntime GenAI](https://github.com/microsoft/onnxruntime-genai)
- [Hugging Face Hub](https://github.com/huggingface/huggingface_hub)
- [OpenAI API](https://github.com/openai/openai-python)
- [IRON/MLIR-AIE](https://github.com/Xilinx/mlir-aie)
- and more...
- Accelerated by mentorship from the OCV Catalyst program.
- Licensed under the [Apache 2.0 License](https://github.com/lemonade-sdk/lemonade/blob/main/LICENSE).
- Portions of the project are licensed as described in [NOTICE.md](./NOTICE.md).