https://github.com/inftyai/manta
💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯
https://github.com/inftyai/manta
cache distributed-systems kubernetes llm p2p-network
Last synced: 7 months ago
JSON representation
💫 A lightweight p2p-based cache system for model distributions on Kubernetes. Reframing now to make it an unified cache system with POSIX promise 🎯
- Host: GitHub
- URL: https://github.com/inftyai/manta
- Owner: InftyAI
- License: apache-2.0
- Created: 2024-09-18T11:14:33.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-06T03:04:41.000Z (over 1 year ago)
- Last Synced: 2025-09-26T18:49:28.670Z (7 months ago)
- Topics: cache, distributed-systems, kubernetes, llm, p2p-network
- Language: Go
- Homepage:
- Size: 811 KB
- Stars: 24
- Watchers: 2
- Forks: 3
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
A lightweight P2P-based cache system for model distributions on Kubernetes.
[](https://github.com/mkenney/software-guides/blob/master/STABILITY-BADGES.md#alpha)
[![GoReport Widget]][GoReport Status]
[](https://github.com/inftyai/manta/releases/latest)
[GoReport Widget]: https://goreportcard.com/badge/github.com/inftyai/manta
[GoReport Status]: https://goreportcard.com/report/github.com/inftyai/manta
_Name Story: the inspiration of the name `Manta` is coming from Dota2, called [Manta Style](https://dota2.fandom.com/wiki/Manta_Style), which will create 2 images of your hero just like peers in the P2P network._
**We're reframing the Manta to make it a general distributed cache system with POSIX promise**, the current capacities are still available with the latest v0.0.4 release. Let's see what will happen.
## Architecture

> Note: [llmaz](https://github.com/InftyAI/llmaz) is just one kind of integrations, **Manta** can be deployed and used independently.
## Features Overview
- **Model Hub Support**: Models could be downloaded directly from model hubs (Huggingface etc.) or object storages, no other effort.
- **Model Preheat**: Models could be preloaded to clusters, or specified nodes to accelerate the model serving.
- **Model Cache**: Models will be cached as chunks after downloading for faster model loading.
- **Model Lifecycle Management**: Model lifecycle is managed automatically with different strategies, like `Retain` or `Delete`.
- **Plugin Framework**: _Filter_ and _Score_ plugins could be extended to pick up the best candidates.
- **Memory Management(WIP)**: Manage the reserved memories for caching, together with LRU algorithm for GC.
## You Should Know Before
- Manta is not an all-in-one solution for model management, instead, it offers a lightweight solution to utilize the idle bandwidth and cost-effective disk, helping you save money.
- It requires no additional components like databases or storage systems, simplifying setup and reducing effort.
- All the models will be stored under the host path of `/mnt/models/`
- After all, it's just a **cache system**.
## Quick Start
### Installation
Read the [Installation](./docs//installation.md) for guidance.
### Preheat Model
A sample to preload the `Qwen/Qwen2.5-0.5B-Instruct` model. Once preheated, no longer to fetch the models from cold start, but from the cache instead.
```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
```
If you want to preload the model to specified nodes, use the `NodeSelector`:
```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
nodeSelector:
foo: bar
```
### Use Model
Once you have a Torrent, you can access the model simply from host path of `/mnt/models/. What you need to do is just set the Pod label like:
```yaml
metadata:
labels:
manta.io/torrent-name: "torrent-sample"
```
Note: you can make the Torrent `Standby` by setting the preheat to false (true by default), then preheating will process in runtime, which obviously wll slow down the model loading.
```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
preheat: false
```
### Delete Model
If you want to remove the model weights once `Torrent` is deleted, set the `ReclaimPolicy=Delete`, default to `Retain`:
```yaml
apiVersion: manta.io/v1alpha1
kind: Torrent
metadata:
name: torrent-sample
spec:
hub:
name: Huggingface
repoID: Qwen/Qwen2.5-0.5B-Instruct
reclaimPolicy: Delete
```
More details refer to the [APIs](https://github.com/InftyAI/Manta/blob/main/api/v1alpha1/torrent_types.go).
## Roadmap
In the long term, we hope to make Manta **an unified cache system within MLOps**.
- Preloading datasets from model hubs
- RDMA support for faster model loading
- More integrations with MLOps system, including training and serving
## Community
Join us for more discussions:
* **Slack Channel**: [#manta](https://inftyai.slack.com/archives/C07SY8WS45U)
## Contributions
All kinds of contributions are welcomed ! Please following [CONTRIBUTING.md](./CONTRIBUTING.md).