https://github.com/inftyai/manta

💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯
https://github.com/inftyai/manta

cache distributed-systems kubernetes llm p2p-network

Last synced: 7 months ago
JSON representation

💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯

Host: GitHub
URL: https://github.com/inftyai/manta
Owner: InftyAI
License: apache-2.0
Created: 2024-09-18T11:14:33.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-06T03:04:41.000Z (over 1 year ago)
Last Synced: 2025-09-26T18:49:28.670Z (7 months ago)
Topics: cache, distributed-systems, kubernetes, llm, p2p-network
Language: Go
Homepage:
Size: 811 KB
Stars: 24
Watchers: 2
Forks: 3
Open Issues: 15
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

A lightweight P2P-based cache system for model distributions on Kubernetes.

[![stability-alpha](https://img.shields.io/badge/stability-alpha-f4d03f.svg)](https://github.com/mkenney/software-guides/blob/master/STABILITY-BADGES.md#alpha)
[![GoReport Widget]][GoReport Status]
[![Latest Release](https://img.shields.io/github/v/release/inftyai/manta?include_prereleases)](https://github.com/inftyai/manta/releases/latest)

[GoReport Widget]: https://goreportcard.com/badge/github.com/inftyai/manta
[GoReport Status]: https://goreportcard.com/report/github.com/inftyai/manta

_Name Story: the inspiration of the name `Manta` is coming from Dota2, called [Manta Style](https://dota2.fandom.com/wiki/Manta_Style), which will create 2 images of your hero just like peers in the P2P network._

**We're reframing the Manta to make it a general distributed cache system with POSIX promise**, the current capacities are still available with the latest v0.0.4 release. Let's see what will happen.

## Architecture

![architecture](./docs/assets/arch.png)

> Note: [llmaz](https://github.com/InftyAI/llmaz) is just one kind of integrations, **Manta** can be deployed and used independently.

## Features Overview

- **Model Hub Support**: Models could be downloaded directly from model hubs (Huggingface etc.) or object storages, no other effort.
- **Model Preheat**: Models could be preloaded to clusters, or specified nodes to accelerate the model serving.
- **Model Cache**: Models will be cached as chunks after downloading for faster model loading.
- **Model Lifecycle Management**: Model lifecycle is managed automatically with different strategies, like `Retain` or `Delete`.
- **Plugin Framework**: _Filter_ and _Score_ plugins could be extended to pick up the best candidates.
- **Memory Management(WIP)**: Manage the reserved memories for caching, together with LRU algorithm for GC.

## You Should Know Before

- Manta is not an all-in-one solution for model management, instead, it offers a lightweight solution to utilize the idle bandwidth and cost-effective disk, helping you save money.
- It requires no additional components like databases or storage systems, simplifying setup and reducing effort.
- All the models will be stored under the host path of `/mnt/models/`
- After all, it's just a **cache system**.

## Quick Start

### Installation

Read the [Installation](./docs//installation.md) for guidance.

### Preheat Model

A sample to preload the `Qwen/Qwen2.5-0.5B-Instruct` model. Once preheated, no longer to fetch the models from cold start, but from the cache instead.

```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
```

If you want to preload the model to specified nodes, use the `NodeSelector`:

```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
nodeSelector:
foo: bar
```

### Use Model

Once you have a Torrent, you can access the model simply from host path of `/mnt/models/. What you need to do is just set the Pod label like:

```yaml
metadata:
labels:
manta.io/torrent-name: "torrent-sample"
```

Note: you can make the Torrent `Standby` by setting the preheat to false (true by default), then preheating will process in runtime, which obviously wll slow down the model loading.

```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
preheat: false
```

### Delete Model

If you want to remove the model weights once `Torrent` is deleted, set the `ReclaimPolicy=Delete`, default to `Retain`:

```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
reclaimPolicy: Delete
```

More details refer to the [APIs](https://github.com/InftyAI/Manta/blob/main/api/v1alpha1/torrent_types.go).

## Roadmap

In the long term, we hope to make Manta **an unified cache system within MLOps**.

- Preloading datasets from model hubs
- RDMA support for faster model loading
- More integrations with MLOps system, including training and serving

## Community

Join us for more discussions:

* **Slack Channel**: [#manta](https://inftyai.slack.com/archives/C07SY8WS45U)

## Contributions

All kinds of contributions are welcomed ! Please following [CONTRIBUTING.md](./CONTRIBUTING.md).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/inftyai/manta

Awesome Lists containing this project

README

A lightweight P2P-based cache system for model distributions on Kubernetes.