https://github.com/sozercan/aikit

🏗️ Fine-tune, build, and deploy open-source LLMs easily!
https://github.com/sozercan/aikit

ai buildkit chatgpt docker fine-tuning finetuning gemma gpt inference kubernetes large-language-models llama llm localllama mistral mixtral nvidia open-llm open-source-llm openai

Last synced: 28 days ago
JSON representation

🏗️ Fine-tune, build, and deploy open-source LLMs easily!

Host: GitHub
URL: https://github.com/sozercan/aikit
Owner: sozercan
License: mit
Created: 2023-09-20T02:31:11.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-10-28T12:20:22.000Z (6 months ago)
Last Synced: 2024-10-30T01:51:45.814Z (6 months ago)
Topics: ai, buildkit, chatgpt, docker, fine-tuning, finetuning, gemma, gpt, inference, kubernetes, large-language-models, llama, llm, localllama, mistral, mixtral, nvidia, open-llm, open-source-llm, openai
Language: Go
Homepage: https://sozercan.github.io/aikit/
Size: 4.79 MB
Stars: 380
Watchers: 6
Forks: 28
Open Issues: 20
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

alan_awesome_llm - aikit - tune, build, and deploy open-source LLMs easily! (微调 Fine-Tuning)
awesome-ChatGPT-repositories - aikit - 🏗️ Fine-tune, build, and deploy open-source LLMs easily! (Langchain)
StarryDivineSky - sozercan/aikit
awesome-LLM-resourses - aikit - tune, build, and deploy open-source LLMs easily! (微调 Fine-Tuning)
alan_awesome_llm - aikit - tune, build, and deploy open-source LLMs easily! (微调 Fine-Tuning)
awesome-repositories - sozercan/aikit - 🏗️ Fine-tune, build, and deploy open-source LLMs easily! (Go)

README

        # AIKit ✨








AIKit is a comprehensive platform to quickly get started to host, deploy, build and fine-tune large language models (LLMs).

AIKit offers two main capabilities:

- **Inference**: AIKit uses [LocalAI](https://localai.io/), which supports a wide range of inference capabilities and formats. LocalAI provides a drop-in replacement REST API that is OpenAI API compatible, so you can use any OpenAI API compatible client, such as [Kubectl AI](https://github.com/sozercan/kubectl-ai), [Chatbot-UI](https://github.com/sozercan/chatbot-ui) and many more, to send requests to open LLMs!

- **[Fine-Tuning](https://sozercan.github.io/aikit/docs/fine-tune)**: AIKit offers an extensible fine-tuning interface. It supports [Unsloth](https://github.com/unslothai/unsloth) for fast, memory efficient, and easy fine-tuning experience.

👉 For full documentation, please see [AIKit website](https://sozercan.github.io/aikit/)!

## Features

- 🐳 No GPU, Internet access or additional tools needed except for [Docker](https://docs.docker.com/desktop/install/linux-install/)!

- 🤏 Minimal image size, resulting in less vulnerabilities and smaller attack surface with a custom [distroless](https://github.com/GoogleContainerTools/distroless)-based image

- 🎵 [Fine-tune support](https://sozercan.github.io/aikit/docs/fine-tune)

- 🚀 Easy to use declarative configuration for [inference](https://sozercan.github.io/aikit/docs/specs-inference) and [fine-tuning](https://sozercan.github.io/aikit/docs/specs-finetune)

- ✨ OpenAI API compatible to use with any OpenAI API compatible client

- 📸 [Multi-modal model support](https://sozercan.github.io/aikit/docs/vision)

- 🖼️ [Image generation support](https://sozercan.github.io/aikit/docs/diffusion)

- 🦙 Support for GGUF ([`llama`](https://github.com/ggerganov/llama.cpp)), GPTQ or EXL2 ([`exllama2`](https://github.com/turboderp/exllamav2)), and GGML ([`llama-ggml`](https://github.com/ggerganov/llama.cpp)) and [Mamba](https://github.com/state-spaces/mamba) models

- 🚢 [Kubernetes deployment ready](https://sozercan.github.io/aikit/docs/kubernetes)

- 📦 Supports multiple models with a single image

- 🖥️ Supports [AMD64 and ARM64](https://sozercan.github.io/aikit/docs/create-images#multi-platform-support) CPUs and [GPU-accelerated inferencing with NVIDIA GPUs](https://sozercan.github.io/aikit/docs/gpu)

- 🔐 Ensure [supply chain security](https://sozercan.github.io/aikit/docs/security) with SBOMs, Provenance attestations, and signed images

- 🌈 Supports air-gapped environments with self-hosted, local, or any remote container registries to store model images for inference on the edge.

## Quick Start

You can get started with AIKit quickly on your local machine without a GPU!

```bash

docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b

```

After running this, navigate to [http://localhost:8080/chat](http://localhost:8080/chat) to access the WebUI!

### API

AIKit provides an OpenAI API compatible endpoint, so you can use any OpenAI API compatible client to send requests to open LLMs!

```bash

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{

    "model": "llama-3.1-8b-instruct",

    "messages": [{"role": "user", "content": "explain kubernetes in a sentence"}]

  }'

```

Output should be similar to:

```jsonc

{

  // ...

    "model": "llama-3.1-8b-instruct",

    "choices": [

        {

            "index": 0,

            "finish_reason": "stop",

            "message": {

                "role": "assistant",

                "content": "Kubernetes is an open-source container orchestration system that automates the deployment, scaling, and management of applications and services, allowing developers to focus on writing code rather than managing infrastructure."

            }

        }

    ],

  // ...

}

```

That's it! 🎉 API is OpenAI compatible so this is a drop-in replacement for any OpenAI API compatible client.

## Pre-made Models

AIKit comes with pre-made models that you can use out-of-the-box!

If it doesn't include a specific model, you can always [create your own images](https://sozercan.github.io/aikit/docs/create-images), and host in a container registry of your choice!

## CPU

> [!NOTE]

> AIKit supports both AMD64 and ARM64 CPUs. You can run the same command on either architecture, and Docker will automatically pull the correct image for your CPU.

>

> Depending on your CPU capabilities, AIKit will automatically select the most optimized instruction set.

| Model           | Optimization | Parameters | Command                                                          | Model Name               | License                                                                            |

| --------------- | ------------ | ---------- | ---------------------------------------------------------------- | ------------------------ | ---------------------------------------------------------------------------------- |

| 🦙 Llama 3.2     | Instruct     | 1B         | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:1b`   | `llama-3.2-1b-instruct`  | [Llama](https://ai.meta.com/llama/license/)                                        |

| 🦙 Llama 3.2     | Instruct     | 3B         | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.2:3b`   | `llama-3.2-3b-instruct`  | [Llama](https://ai.meta.com/llama/license/)                                        |

| 🦙 Llama 3.1     | Instruct     | 8B         | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.1:8b`   | `llama-3.1-8b-instruct`  | [Llama](https://ai.meta.com/llama/license/)                                        |

| 🦙 Llama 3.3     | Instruct     | 70B        | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/llama3.3:70b`  | `llama-3.3-70b-instruct` | [Llama](https://ai.meta.com/llama/license/)                                        |  |

| Ⓜ️ Mixtral       | Instruct     | 8x7B       | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b`  | `mixtral-8x7b-instruct`  | [Apache](https://choosealicense.com/licenses/apache-2.0/)                          |

| 🅿️ Phi 3.5       | Instruct     | 3.8B       | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b`   | `phi-3.5-3.8b-instruct`  | [MIT](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE) |

| 🔡 Gemma 2       | Instruct     | 2B         | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/gemma2:2b`     | `gemma-2-2b-instruct`    | [Gemma](https://ai.google.dev/gemma/terms)                                         |

| ⌨️ Codestral 0.1 | Code         | 22B        | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/codestral:22b` | `codestral-22b`          | [MNLP](https://mistral.ai/licenses/MNPL-0.1.md)                                    |

| QwQ             |              | 32B        | `docker run -d --rm -p 8080:8080 ghcr.io/sozercan/qwq:32b`       | `qwq-32b-preview`        | [Apache 2.0](https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/LICENSE)        |

### NVIDIA CUDA

> [!NOTE]

> To enable GPU acceleration, please see [GPU Acceleration](https://sozercan.github.io/aikit/docs/gpu).

>

> Please note that only difference between CPU and GPU section is the `--gpus all` flag in the command to enable GPU acceleration.

| Model           | Optimization  | Parameters | Command                                                                     | Model Name               | License                                                                                                                     |

| --------------- | ------------- | ---------- | --------------------------------------------------------------------------- | ------------------------ | --------------------------------------------------------------------------------------------------------------------------- |

| 🦙 Llama 3.2     | Instruct      | 1B         | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:1b`   | `llama-3.2-1b-instruct`  | [Llama](https://ai.meta.com/llama/license/)                                                                                 |

| 🦙 Llama 3.2     | Instruct      | 3B         | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.2:3b`   | `llama-3.2-3b-instruct`  | [Llama](https://ai.meta.com/llama/license/)                                                                                 |

| 🦙 Llama 3.1     | Instruct      | 8B         | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.1:8b`   | `llama-3.1-8b-instruct`  | [Llama](https://ai.meta.com/llama/license/)                                                                                 |

| 🦙 Llama 3.3     | Instruct     | 70B        | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/llama3.3:70b`  | `llama-3.3-70b-instruct` | [Llama](https://ai.meta.com/llama/license/)                                        |  |

| Ⓜ️ Mixtral       | Instruct      | 8x7B       | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/mixtral:8x7b`  | `mixtral-8x7b-instruct`  | [Apache](https://choosealicense.com/licenses/apache-2.0/)                                                                   |

| 🅿️ Phi 3.5       | Instruct      | 3.8B       | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/phi3.5:3.8b`   | `phi-3.5-3.8b-instruct`  | [MIT](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE)                                          |

| 🔡 Gemma 2       | Instruct      | 2B         | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/gemma2:2b`     | `gemma-2-2b-instruct`    | [Gemma](https://ai.google.dev/gemma/terms)                                                                                  |

| ⌨️ Codestral 0.1 | Code          | 22B        | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/codestral:22b` | `codestral-22b`          | [MNLP](https://mistral.ai/licenses/MNPL-0.1.md)                                                                             |

| QwQ             |               | 32B        | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/qwq:32b`       | `qwq-32b-preview`        | [Apache 2.0](https://huggingface.co/Qwen/QwQ-32B-Preview/blob/main/LICENSE)                                                 |

| 📸 Flux 1 Dev    | Text to image | 12B        | `docker run -d --rm --gpus all -p 8080:8080 ghcr.io/sozercan/flux1:dev`     | `flux-1-dev`             | [FLUX.1 [dev] Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev) |

### Apple Silicon (experimental)

> [!NOTE]

> To enable GPU acceleration on Apple Silicon, please see [Podman Desktop documentation](https://podman-desktop.io/docs/podman/gpu). For more information, please see [GPU Acceleration](https://sozercan.github.io/aikit/docs/gpu).

>

> Apple Silicon is an _experimental_ runtime and it may change in the future. This runtime is specific to Apple Silicon only, and it will not work as expected on other architectures, including Intel Macs.

>

> Only `gguf` models are supported on Apple Silicon.

| Model       | Optimization | Parameters | Command                                                                                       | Model Name              | License                                                                            |

| ----------- | ------------ | ---------- | --------------------------------------------------------------------------------------------- | ----------------------- | ---------------------------------------------------------------------------------- |

| 🦙 Llama 3.2 | Instruct     | 1B         | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.2:1b` | `llama-3.2-1b-instruct` | [Llama](https://ai.meta.com/llama/license/)                                        |

| 🦙 Llama 3.2 | Instruct     | 3B         | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.2:3b` | `llama-3.2-3b-instruct` | [Llama](https://ai.meta.com/llama/license/)                                        |

| 🦙 Llama 3.1 | Instruct     | 8B         | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/llama3.1:8b` | `llama-3.1-8b-instruct` | [Llama](https://ai.meta.com/llama/license/)                                        |

| 🅿️ Phi 3.5   | Instruct     | 3.8B       | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/phi3.5:3.8b` | `phi-3.5-3.8b-instruct` | [MIT](https://huggingface.co/microsoft/Phi-3.5-mini-instruct/resolve/main/LICENSE) |

| 🔡 Gemma 2   | Instruct     | 2B         | `podman run -d --rm --device /dev/dri -p 8080:8080 ghcr.io/sozercan/applesilicon/gemma2:2b`   | `gemma-2-2b-instruct`   | [Gemma](https://ai.google.dev/gemma/terms)                                         |

## What's next?

👉 For more information and how to fine tune models or create your own images, please see [AIKit website](https://sozercan.github.io/aikit/)!

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sozercan/aikit

Awesome Lists containing this project

README