https://github.com/inftyai/puma

Aim to be a lightweight, high-performance inference engine for heterogeneous devices. WIP.
https://github.com/inftyai/puma

llm llm-inference rust

Last synced: about 1 year ago
JSON representation

Aim to be a lightweight, high-performance inference engine for heterogeneous devices. WIP.

Host: GitHub
URL: https://github.com/inftyai/puma
Owner: InftyAI
License: apache-2.0
Created: 2024-09-15T08:12:38.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2025-02-25T08:28:50.000Z (about 1 year ago)
Last Synced: 2025-03-04T16:15:25.977Z (about 1 year ago)
Topics: llm, llm-inference, rust
Language: Rust
Homepage:
Size: 43.9 KB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 4
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

README

# PUMA

Puma aims to be a lightweight, high-performance inference engine for heterogeneous devices. *Currently under active development.*

## How to Run

### Build

Run `make build` to build the **puma** binary.

### Run

Run `./puma help` to see all available commands.

For example, you can run `./puma version` to see the binary version.

## Supported Backends

Use [llama.cpp](https://github.com/ggerganov/llama.cpp) as the default backend for quick prototyping, will implement our own backend in the future.