https://github.com/inftyai/puma
Aim to be a lightweight, high-performance inference engine for heterogeneous devices. WIP.
https://github.com/inftyai/puma
llm llm-inference rust
Last synced: about 1 year ago
JSON representation
Aim to be a lightweight, high-performance inference engine for heterogeneous devices. WIP.
- Host: GitHub
- URL: https://github.com/inftyai/puma
- Owner: InftyAI
- License: apache-2.0
- Created: 2024-09-15T08:12:38.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-25T08:28:50.000Z (about 1 year ago)
- Last Synced: 2025-03-04T16:15:25.977Z (about 1 year ago)
- Topics: llm, llm-inference, rust
- Language: Rust
- Homepage:
- Size: 43.9 KB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# PUMA
Puma aims to be a lightweight, high-performance inference engine for heterogeneous devices. *Currently under active development.*
## How to Run
### Build
Run `make build` to build the **puma** binary.
### Run
Run `./puma help` to see all available commands.
For example, you can run `./puma version` to see the binary version.
## Supported Backends
Use [llama.cpp](https://github.com/ggerganov/llama.cpp) as the default backend for quick prototyping, will implement our own backend in the future.