Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/sozercan/aikit

🏗ïļ Fine-tune, build, and deploy open-source LLMs easily!
https://github.com/sozercan/aikit

ai buildkit chatgpt docker fine-tuning finetuning gemma gpt inference kubernetes large-language-models llama llm localllama mistral mixtral nvidia open-llm open-source-llm openai

Last synced: 7 days ago
JSON representation

🏗ïļ Fine-tune, build, and deploy open-source LLMs easily!

Awesome Lists containing this project

README

        

# AIKit âœĻ




AIKit is a comprehensive platform to quickly get started to host, deploy, build and fine-tune large language models (LLMs).

AIKit offers two main capabilities:

- **Inference**: AIKit uses [LocalAI](https://localai.io/), which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as [Kubectl AI](https://github.com/sozercan/kubectl-ai), [Chatbot-UI](https://github.com/sozercan/chatbot-ui) and many more, to send requests to open LLMs!

- **[Fine-Tuning](https://sozercan.github.io/aikit/docs/fine-tune)**: AIKit offers an extensible fine-tuning interface. It supports [Unsloth](https://github.com/unslothai/unsloth) for fast, memory efficient, and easy fine-tuning experience.

👉 For full documentation, please see [AIKit website](https://sozercan.github.io/aikit/)!

## Features

- ðŸģ No GPU, Internet access or additional tools needed except for [Docker](https://docs.docker.com/desktop/install/linux-install/)!
- ðŸĪ Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom [distroless](https://github.com/GoogleContainerTools/distroless)-based image
- ðŸŽĩ [Fine-tune support](https://sozercan.github.io/aikit/docs/fine-tune)
- 🚀 Easy to use declarative configuration for [inference](https://sozercan.github.io/aikit/docs/specs-inference) and [fine-tuning](https://sozercan.github.io/aikit/docs/specs-finetune)
- âœĻ OpenAI API compatible to use with any OpenAI API compatible client
- ðŸ“ļ [Multi-modal model support](https://sozercan.github.io/aikit/docs/vision)
- 🖞ïļ [Image generation support](https://sozercan.github.io/aikit/docs/diffusion)
- ðŸĶ™ Support for GGUF ([`llama`](https://github.com/ggerganov/llama.cpp)), GPTQ or EXL2 ([`exllama2`](https://github.com/turboderp/exllamav2)), and GGML ([`llama-ggml`](https://github.com/ggerganov/llama.cpp)) and [Mamba](https://github.com/state-spaces/mamba) models
- ðŸšĒ [Kubernetes deployment ready](https://sozercan.github.io/aikit/docs/kubernetes)
- ðŸ“Ķ Supports multiple models with a single image
- ðŸ–Ĩïļ Supports [AMD64 and ARM64](https://sozercan.github.io/aikit/docs/create-images#multi-platform-support) CPUs and [GPU-accelerated inferencing with NVIDIA GPUs](https://sozercan.github.io/aikit/docs/gpu)
- 🔐 Ensure [supply chain security](https://sozercan.github.io/aikit/docs/security) with SBOMs, Provenance attestations, and signed images
- 🌈 Supports air-gapped environments with self-hosted, local, or any remote container registries to store model images for inference on the edge.

## Quick Start

You can get started with AIKit quickly on your local machine without a GPU!

```bash
docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b
```

After running this, navigate to [http://localhost:8080/chat](http://localhost:8080/chat) to access the WebUI!

### API

AIKit provides an OpenAI API compatible endpoint, so you can use any OpenAI API compatible client to send requests to open LLMs!

```bash
curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "llama-3.1-8b-instruct",
"messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]
}'
```

Output should be similar to:

```jsonc
{
// ...
"model": "llama-3.1-8b-instruct",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of applications and services, allowing developers to focus on writing code rather than managing infrastructure."
}
}
],
// ...
}
```

That's it! 🎉 API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.

## Pre-made Models

AIKit comes with pre-made models that you can use out-of-the-box!

If it doesn't include a specific model, you can always [create your own images](https://sozercan.github.io/aikit/docs/create-images), and host in a container registry of your choice!

## CPU

> [!NOTE]
> AIKit supports both AMD64 and ARM64 CPUs. You can run the same command on either architecture, and Docker will automatically pull the correct image for your CPU.
>
> Depending on your CPU capabilities, AIKit will automatically select the most optimized instruction set.

| Model | Optimization | Parameters | Command | Model Name | License |
| --------------- | ------------ | ---------- | ---------------------------------------------------------------- | ------------------------ | ---------------------------------------------------------------------------------- |
| ðŸĶ™ Llama 3.2 | Instruct | 1B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:1b` | `llama-3.2-1b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.2 | Instruct | 3B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:3b` | `llama-3.2-3b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.1 | Instruct | 8B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b` | `llama-3.1-8b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.3 | Instruct | 70B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.3:70b` | `llama-3.3-70b-instruct` | [Llama](https://ai.meta.com/llama/license/) | |
| Ⓜïļ Mixtral | Instruct | 8x7B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b` | `mixtral-8x7b-instruct` | [Apache](https://choosealicense.com/licenses/apache-2.0/) |
| ðŸ…ŋïļ Phi 3.5 | Instruct | 3.8B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b` | `phi-3.5-3.8b-instruct` | [MIT](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE) |
| ðŸ”Ą Gemma 2 | Instruct | 2B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/gemma2:2b` | `gemma-2-2b-instruct` | [Gemma](https://ai.google.dev/gemma/terms) |
| âŒĻïļ Codestral 0.1 | Code | 22B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/codestral:22b` | `codestral-22b` | [MNLP](https://mistral.ai/licenses/MNPL-0.1.md) |
| QwQ | | 32B | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/qwq:32b` | `qwq-32b-preview` | [Apache 2.0](https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/LICENSE) |

### NVIDIA CUDA

> [!NOTE]
> To enable GPU acceleration, please see [GPU Acceleration](https://sozercan.github.io/aikit/docs/gpu).
>
> Please note that only difference between CPU and GPU section is the `--gpus all` flag in the command to enable GPU acceleration.

| Model | Optimization | Parameters | Command | Model Name | License |
| --------------- | ------------- | ---------- | --------------------------------------------------------------------------- | ------------------------ | --------------------------------------------------------------------------------------------------------------------------- |
| ðŸĶ™ Llama 3.2 | Instruct | 1B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:1b` | `llama-3.2-1b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.2 | Instruct | 3B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:3b` | `llama-3.2-3b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.1 | Instruct | 8B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.1:8b` | `llama-3.1-8b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.3 | Instruct | 70B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.3:70b` | `llama-3.3-70b-instruct` | [Llama](https://ai.meta.com/llama/license/) | |
| Ⓜïļ Mixtral | Instruct | 8x7B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b` | `mixtral-8x7b-instruct` | [Apache](https://choosealicense.com/licenses/apache-2.0/) |
| ðŸ…ŋïļ Phi 3.5 | Instruct | 3.8B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b` | `phi-3.5-3.8b-instruct` | [MIT](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE) |
| ðŸ”Ą Gemma 2 | Instruct | 2B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/gemma2:2b` | `gemma-2-2b-instruct` | [Gemma](https://ai.google.dev/gemma/terms) |
| âŒĻïļ Codestral 0.1 | Code | 22B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/codestral:22b` | `codestral-22b` | [MNLP](https://mistral.ai/licenses/MNPL-0.1.md) |
| QwQ | | 32B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/qwq:32b` | `qwq-32b-preview` | [Apache 2.0](https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/LICENSE) |
| ðŸ“ļ Flux 1 Dev | Text to image | 12B | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/flux1:dev` | `flux-1-dev` | [FLUX.1 [dev] Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev) |

### Apple Silicon (experimental)

> [!NOTE]
> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu). For more information, please see [GPU Acceleration](https://sozercan.github.io/aikit/docs/gpu).
>
> Apple Silicon is an _experimental_ runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.
>
> Only `gguf` models are supported on Apple Silicon.

| Model | Optimization | Parameters | Command | Model Name | License |
| ----------- | ------------ | ---------- | --------------------------------------------------------------------------------------------- | ----------------------- | ---------------------------------------------------------------------------------- |
| ðŸĶ™ Llama 3.2 | Instruct | 1B | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.2:1b` | `llama-3.2-1b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.2 | Instruct | 3B | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.2:3b` | `llama-3.2-3b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸĶ™ Llama 3.1 | Instruct | 8B | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.1:8b` | `llama-3.1-8b-instruct` | [Llama](https://ai.meta.com/llama/license/) |
| ðŸ…ŋïļ Phi 3.5 | Instruct | 3.8B | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/phi3.5:3.8b` | `phi-3.5-3.8b-instruct` | [MIT](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE) |
| ðŸ”Ą Gemma 2 | Instruct | 2B | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/gemma2:2b` | `gemma-2-2b-instruct` | [Gemma](https://ai.google.dev/gemma/terms) |

## What's next?

👉 For more information and how to fine tune models or create your own images, please see [AIKit website](https://sozercan.github.io/aikit/)!