https://github.com/makllama/makllama

MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.
https://github.com/makllama/makllama

ai apple-silicon containerd go inference kubernetes llama llm llms

Last synced: about 1 month ago
JSON representation

MaK(Mac+Kubernetes)llama - Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.

Host: GitHub
URL: https://github.com/makllama/makllama
Owner: makllama
License: apache-2.0
Created: 2024-02-23T06:26:04.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-22T07:33:00.000Z (over 1 year ago)
Last Synced: 2025-08-10T06:37:49.350Z (about 2 months ago)
Topics: ai, apple-silicon, containerd, go, inference, kubernetes, llama, llm, llms
Language: Go
Homepage:
Size: 30.3 KB
Stars: 42
Watchers: 2
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


 

 Powered by DALL·E 3



# MaKllama

[![Go Report Card](https://goreportcard.com/badge/github.com/makllama/makllama)](https://goreportcard.com/report/github.com/makllama/makllama)







Running and orchestrating large language models (LLMs) on Kubernetes with macOS nodes.

## Table of Contents

- [MaKllama](#makllama)

  - [Table of Contents](#table-of-contents)

  - [Main Components](#main-components)

  - [Quick Start (~ 1 minute)](#quick-start--1-minute)

    - [1. Prerequisites](#1-prerequisites)

    - [2. Start Containerd + Virtual Kubelet + BW](#2-start-containerd--virtual-kubelet--bw)

    - [3. Deploy TinyLlama with 2 Replicas](#3-deploy-tinyllama-with-2-replicas)

    - [4. Deploy Mods](#4-deploy-mods)

    - [5. Access OpenAI API Compatible Endpoint through Mods](#5-access-openai-api-compatible-endpoint-through-mods)

    - [6. Stop Containerd + Virtual Kubelet + BW](#6-stop-containerd--virtual-kubelet--bw)

  - [Community](#community)

  - [Session Submissions](#session-submissions)

    - [Title](#title)

    - [Description](#description)

    - [Benefits to the Ecosystem](#benefits-to-the-ecosystem)

## Main Components

To run and orchestrate LLMs on Kubernetes with macOS nodes, we need the following components:

- [Virtual Kubelet](https://github.com/makllama/cri): For running `pods` on macOS nodes (forked from [virtual-kubelet/cri](https://github.com/virtual-kubelet/cri)).

- [Containerd](https://github.com/makllama/containerd): For pulling and running Ollama LLM image (forked from [containerd/containerd](https://github.com/containerd/containerd)).

- Runm: A lightweight runtime derived from [llama.cpp](https://github.com/ggerganov/llama.cpp) for running LLMs on macOS nodes (source code will be available soon).

- Bronze Willow: CNI Plugin for macOS (source code will be available soon).

This project is inspired by [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://github.com/ollama/ollama) and [kind](https://kind.sigs.k8s.io/).

## Quick Start (~ 1 minute)

### 1. Prerequisites

* A Kubernetes cluster.

  * [kind](https://kind.sigs.k8s.io/) is not supported.

  * [Antrea](https://github.com/antrea-io/antrea) is preferred for CNI.

  * `kubeconfig` should locate at `~/.kube/config`.

* Mac with Apple Silicon chip.

### 2. Start Containerd + Virtual Kubelet + BW

```bash

$ make # optional

$ sudo ./bin/demo create

 ✓ Starting containerd 🚢

 ✓ Preparing virtual nodes 📦

 ✓ Creating network 🌐

$ kubectl get nodes

NAME            STATUS     ROLES           AGE    VERSION

bj-k8s01        Ready      control-plane   214d   v1.28.2

bj-k8s02        Ready      worker          214d   v1.28.2

bj-k8s03        Ready      worker          214d   v1.28.2

weiqiangt-mba   Ready      agent           23d    v1.15.2-vk-cri-fb9cc09-dev

xiaodong-m1     Ready      agent           23d    v1.15.2-vk-cri-fb9cc09-dev

```

After running the above commands, you should see the macOS nodes appear in the output of `kubectl get nodes`. In the example above, `weiqiangt-mba` and `xiaodong-m1` are the macOS nodes.

### 3. Deploy TinyLlama with 2 Replicas

```bash

$ kubectl apply -f k8s/tinyllama.yml

```

### 4. Deploy Mods

```bash

$ kubectl apply -f k8s/mods.yaml

```

### 5. Access OpenAI API Compatible Endpoint through Mods

```bash

# Retrieve the command for editing config file of mods.

$ echo sed -i \'s/localhost:11434/$(kubectl get svc -o json tinyllama-services | jq '.spec.clusterIP' -r)/g\' '~/.config/mods/mods.yml'

sed -i 's/localhost:11434/198.19.50.27/g' ~/.config/mods/mods.yml

# Copy the output.

$ kubectl exec -it $(kubectl get pods -l app=mods -o jsonpath='{.items[0].metadata.name}') -- bash

root@mods-deployment-77c464f4b8-zn6g5:/# echo "Execute the copied command."

root@mods-deployment-77c464f4b8-zn6g5:/# mods -f "What are some of the best ways to save money?"

```

### 6. Stop Containerd + Virtual Kubelet + BW

```bash

$ sudo ./bin/demo delete

 ✓ Deleting demo 🧹

```

## Community

* [Open an issue](https://github.com/makllama/makllama/issues/new)

## Session Submissions

- KCD Shanghai 2024 (Accepted)

- KubeCon + CloudNativeCon + Open Source Summit + AI_Dev China 2024 (In Evaluation)

### Title

Beyond Containers, Orchestrate LLMs with Kubernetes on macOS

### Description

With the growing popularity of generative AI, there is an increasing demand for large language models (LLMs)

inference capabilities. Kubernetes, being the most popular orchestration platform, is a natural fit for these

inference needs. Although GPUs are expensive and often in short supply, Apple Silicon M-series chips

(with Unified Memory Architecture) have been proven to be an effective alternative for running LLMs

(see ggerganov/llama.cpp performance discussion). However, the prevalent Kubernetes ecosystem is predominantly

focused on Linux-based containers. In this presentation, we will showcase our efforts to facilitate LLMs inference

on Kubernetes using macOS nodes. We will demonstrate how to employ Virtual Kubelet, Containerd, ShimV2, and runm

(derived from llama.cpp: ggerganov/llama.cpp) for deploying open-source foundation models such as gemma, llama2,

and mistral on Kubernetes. Additionally, we will discuss our motivation and the challenges encountered during our

development journey. Our goal is to encourage the community to expand the Kubernetes ecosystem to inclusively

support the execution of LLMs on macOS platforms.

### Benefits to the Ecosystem

- Enable running and orchestrating LLMs on Kubernetes with macOS nodes

- Provide an alternative solution for running LLMs on Kubernetes

- Inspire the community to build a more inclusive Kubernetes ecosystem that supports running LLMs on macOS

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/makllama/makllama

Awesome Lists containing this project

README