https://github.com/containers/ramalama

The goal of RamaLama is to make working with AI boring.
https://github.com/containers/ramalama

ai containers inference-server llamacpp llm podman vllm

Last synced: 4 months ago
JSON representation

The goal of RamaLama is to make working with AI boring.

Host: GitHub
URL: https://github.com/containers/ramalama
Owner: containers
License: mit
Created: 2024-07-24T19:09:58.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2025-04-28T11:24:17.000Z (about 1 year ago)
Last Synced: 2025-04-28T14:15:08.750Z (about 1 year ago)
Topics: ai, containers, inference-server, llamacpp, llm, podman, vllm
Language: Python
Homepage:
Size: 2.49 MB
Stars: 1,563
Watchers: 30
Forks: 170
Open Issues: 82
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE-OF-CONDUCT.md
- Codeowners: .github/CODEOWNERS
- Security: SECURITY.md

Awesome Lists containing this project

awesome-production-machine-learning - RamaLama - RamaLama is an open-source tool that simplifies the local use and serving of AI models for inference through OCI containers, eliminating the need to configure the host system. (Deployment and Serving)
awesome - containers/ramalama - RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers. (Python)
awesome-opensource-ai - RamaLama - Container-centric tool for simplifying local AI model serving. Automatically detects GPUs, pulls optimized container images, and runs models securely in rootless containers with enterprise-grade isolation. ![GitHub stars](https://img.shields.io/github/stars/containers/ramalama?style=social) (3. Inference Engines & Serving)

README

[RamaLama](https://ramalama.ai) strives to make working with AI simple, straightforward, and familiar by using OCI containers.

## Description
RamaLama is an open-source tool that simplifies the local use and serving of AI models for inference from any source through the familiar approach of containers. It allows engineers to use container-centric development patterns and benefits to extend to AI use cases.

RamaLama eliminates the need to configure the host system by instead pulling a container image specific to the GPUs discovered on the host system, and allowing you to work with various models and platforms.

- Eliminates the complexity for users to configure the host system for AI.
- Detects and pulls an [accelerated container image](#accelerated-images) specific to the GPUs on the host system, handling dependencies and hardware optimization.
- RamaLama supports multiple [AI model registries](#transports), including OCI Container Registries.
- Models are treated similarly to how Podman and Docker treat container images.
- Use common [container commands](#commands) to work with AI models.
- Run AI models [securely](#security) in rootless containers, isolating the model from the underlying host.
- Keep data secure by defaulting to no network access and removing all temporary data on application exits.
- Interact with models via REST API or as a chatbot.

## Install
### Install on macOS (Self-Contained Installer)
Download the self-contained macOS installer that includes Python and all dependencies:

1. Download the latest `.pkg` installer from [Releases](https://github.com/containers/ramalama/releases)
2. Double-click to install, or run: `sudo installer -pkg RamaLama-*-macOS-Installer.pkg -target /`

See [macOS Installation Guide](docs/MACOS_INSTALL.md) for detailed instructions.

### Install on Fedora
RamaLama is available in [Fedora](https://fedoraproject.org/) and later. To install it, run:
```
sudo dnf install ramalama
```

### Install via PyPI
RamaLama is available via PyPI at [https://pypi.org/project/ramalama](https://pypi.org/project/ramalama)
```
pip install ramalama
```

### Install script (Linux and macOS)
Install RamaLama by running:
```
curl -fsSL https://ramalama.ai/install.sh | bash
```

### Install on Windows
RamaLama supports Windows with Docker Desktop or Podman Desktop:
```powershell
pip install ramalama
```

**Requirements:**
- Python 3.10 or later
- Docker Desktop or Podman Desktop with WSL2 backend
- For GPU support, see [NVIDIA GPU Setup for WSL2](docs/readme/wsl2-docker-cuda.md)

**Note:** Windows support requires running containers via Docker/Podman. The model store uses hardlinks (no admin required) or falls back to file copies if hardlinks are unavailable.

## Uninstall

### Uninstall via pip
If you installed RamaLama using pip, you can uninstall it with:
```bash
pip uninstall ramalama
```

### Uninstall on Fedora
If you installed RamaLama using DNF:
```bash
sudo dnf remove ramalama
```

### Uninstall on macOS (Self-Contained Installer)
To remove RamaLama installed via the `.pkg` installer:
```bash
# Remove the executable
sudo rm /usr/local/bin/ramalama

# Remove configuration and data files
sudo rm -rf /usr/local/share/ramalama

# Remove man pages (optional)
sudo rm /usr/local/share/man/man1/ramalama*.1
sudo rm /usr/local/share/man/man5/ramalama*.5
sudo rm /usr/local/share/man/man7/ramalama*.7

# Remove shell completions (optional)
sudo rm /usr/local/share/bash-completion/completions/ramalama
sudo rm /usr/local/share/fish/vendor_completions.d/ramalama.fish
sudo rm /usr/local/share/zsh/site-functions/_ramalama
```

See the [macOS Installation Guide](docs/MACOS_INSTALL.md) for more details.

### Remove User Data and Configuration
After uninstalling RamaLama using any method above, you may want to remove downloaded models and configuration files from your home directory:

```bash
# Remove downloaded models and data (can be large)
rm -rf ~/.local/share/ramalama

# Remove configuration files
rm -rf ~/.config/ramalama

# If you ran RamaLama as root, also remove:
sudo rm -rf /var/lib/ramalama
```

**Note:** The model data directory (`~/.local/share/ramalama`) can be quite large depending on how many models you've downloaded. Make sure you want to remove these files before running the commands above.

## Accelerated images

| Accelerator | Image |
| :---------------------------------| :------------------------- |
| GGML_VK_VISIBLE_DEVICES (or CPU) | quay.io/ramalama/ramalama |
| HIP_VISIBLE_DEVICES | quay.io/ramalama/rocm |
| CUDA_VISIBLE_DEVICES | quay.io/ramalama/cuda |
| ASAHI_VISIBLE_DEVICES | quay.io/ramalama/asahi |
| INTEL_VISIBLE_DEVICES | quay.io/ramalama/intel-gpu |
| ASCEND_VISIBLE_DEVICES | quay.io/ramalama/cann |
| MUSA_VISIBLE_DEVICES | quay.io/ramalama/musa |

### GPU support inspection
On first run, RamaLama inspects your system for GPU support, falling back to CPU if none are present. RamaLama uses container engines like Podman or Docker to pull the appropriate OCI image with all necessary software to run an AI Model for your system setup.

How does RamaLama select the right image?

After initialization, RamaLama runs AI Models within a container based on the OCI image. RamaLama pulls container images specific to the GPUs discovered on your system. These images are tied to the minor version of RamaLama.
- For example, RamaLama version 1.2.3 on an NVIDIA system pulls quay.io/ramalama/cuda:1.2. To override the default image, use the `--image` option.

RamaLama then pulls AI Models from model registries, starting a chatbot or REST API service from a simple single command. Models are treated similarly to how Podman and Docker treat container images.

## Hardware Support

### Nvidia GPUs
On systems with NVIDIA GPUs, see [ramalama-cuda](docs/ramalama-cuda.7.md) documentation for the correct host system configuration.

### Intel GPUs
The following Intel GPUs are auto-detected by RamaLama:

| GPU ID | Description |
| :------ | :--------------------------------- |
|`0xe20b` | Intel® Arc™ B580 Graphics |
|`0xe20c` | Intel® Arc™ B570 Graphics |
|`0x7d51` | Intel® Graphics - Arrow Lake-H |
|`0x7dd5` | Intel® Graphics - Meteor Lake |
|`0x7d55` | Intel® Arc™ Graphics - Meteor Lake |

See the [Intel hardware table](https://dgpu-docs.intel.com/devices/hardware-table.html) for more information.

### Moore Threads GPUs
On systems with Moore Threads GPUs, see [ramalama-musa](docs/ramalama-musa.7.md) documentation for the correct host system configuration.

### MLX Runtime (macOS only)
The MLX runtime provides optimized inference for Apple Silicon Macs. MLX requires:
- macOS operating system
- Apple Silicon hardware (M1, M2, M3, or later)
- Usage with `--nocontainer` option (containers are not supported)
- The `mlx-lm` uv package installed on the host system as a uv tool

To install and run Phi-4 on MLX, use `uv`. If `uv` is not installed, you can install it with `curl -LsSf https://astral.sh/uv/install.sh | sh`:
```bash
uv tool install mlx-lm
# or upgrade to the latest version:
uv tool upgrade mlx-lm

ramalama --runtime=mlx serve hf://mlx-community/Unsloth-Phi-4-4bit
```

#### Default Container Engine
When both Podman and Docker are installed, RamaLama defaults to Podman. The `RAMALAMA_CONTAINER_ENGINE=docker` environment variable can override this behaviour. When neither are installed, RamaLama will attempt to run the model with software on the local system.

## Security

### Test and run your models more securely
Because RamaLama defaults to running AI models inside rootless containers using Podman or Docker, these containers isolate the AI models from information on the underlying host. With RamaLama containers, the AI model is mounted as a volume into the container in read-only mode.

This results in the process running the model (llama.cpp or vLLM) being isolated from the host. Additionally, since `ramalama run` uses the `--network=none` option, the container cannot reach the network and leak any information out of the system. Finally, containers are run with the `--rm` option, which means any content written during container execution is deleted when the application exits.

### Here’s how RamaLama delivers a robust security footprint:
- **Container Isolation** – AI models run within isolated containers, preventing direct access to the host system.
- **Read-Only Volume Mounts** – The AI model is mounted in read-only mode, which means that processes inside the container cannot modify the host files.
- **No Network Access** – ramalama run is executed with `--network=none`, meaning the model has no outbound connectivity for which information can be leaked.
- **Auto-Cleanup** – Containers run with `--rm`, wiping out any temporary data once the session ends.
- **Drop All Linux Capabilities** – No access to Linux capabilities to attack the underlying host.
- **No New Privileges** – Linux Kernel feature that disables container processes from gaining additional privileges.

## Transports
RamaLama supports multiple AI model registries types called transports.

### Supported transports

### Default Transport
RamaLama uses the Ollama registry transport by default

How to change transports.

Use the RAMALAMA_TRANSPORT environment variable to modify the default. `export RAMALAMA_TRANSPORT=huggingface` Changes RamaLama to use huggingface transport.

Individual model transports can be modified when specifying a model via the `huggingface://`, `oci://`, `modelscope://`, `ollama://`, or `rlcr://` prefix.

Example:
```
ramalama pull huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf
```

### Transport shortnames
To make it easier for users, RamaLama uses shortname files, which contain alias names for fully specified AI Models, allowing users to refer to models using shorter names.

More information on shortnames.

```
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
"merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...
```

## Commands

### [`ramalama-bench`](https://github.com/containers/ramalama/blob/main/docs/ramalama-bench.1.md)
#### Benchmark specified AI Model.
-

Benchmark specified AI Model

```
$ ramalama bench granite3-moe
```

### [`ramalama-containers`](https://github.com/containers/ramalama/blob/main/docs/ramalama-containers.1.md)
#### List all RamaLama containers.
-

List all containers running AI Models

```
$ ramalama containers
```
Returns for example:
```
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
85ad75ecf866 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 5 hours ago Up 5 hours 0.0.0.0:8080->8080/tcp ramalama_s3Oh6oDfOP
85ad75ecf866 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 4 minutes ago Exited (0) 4 minutes ago granite-server
```

-

List all containers in a particular format

```
$ ramalama ps --noheading --format "{{ .Names }}"
```
Returns for example:

```
ramalama_s3Oh6oDfOP
granite-server
```

### [`ramalama-convert`](https://github.com/containers/ramalama/blob/main/docs/ramalama-convert.1.md)
#### Convert AI Model from local storage to OCI Image.
-

Generate an oci model out of an Ollama model.

```
$ ramalama convert ollama://tinyllama:latest oci://quay.io/rhatdan/tiny:latest
```
Returns for example:
```
Building quay.io/rhatdan/tiny:latest...
STEP 1/2: FROM scratch
STEP 2/2: COPY sha256:2af3b81862c6be03c769683af18efdadb2c33f60ff32ab6f83e42c043d6c7816 /model
--> Using cache 69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344
COMMIT quay.io/rhatdan/tiny:latest
--> 69db4a10191c
Successfully tagged quay.io/rhatdan/tiny:latest
69db4a10191c976d2c3c24da972a2a909adec45135a69dbb9daeaaf2a3a36344
```

-

Generate and run an OCI model with a quantized GGUF converted from Safetensors.

Generate OCI model
```
$ ramalama --image quay.io/ramalama/ramalama-rag convert --gguf Q4_K_M hf://ibm-granite/granite-3.2-2b-instruct oci://quay.io/kugupta/granite-3.2-q4-k-m:latest
```

Returns for example:
```
Converting /Users/kugupta/.local/share/ramalama/models/huggingface/ibm-granite/granite-3.2-2b-instruct to quay.io/kugupta/granite-3.2-q4-k-m:latest...
Building quay.io/kugupta/granite-3.2-q4-k-m:latest...
```

Run the generated model
```
$ ramalama run oci://quay.io/kugupta/granite-3.2-q4-k-m:latest
```

### [`ramalama-info`](https://github.com/containers/ramalama/blob/main/docs/ramalama-info.1.md)
#### Display RamaLama configuration information.
-

Info with no container engine.

```
$ ramalama info
```
Returns for example:
```
{
"Accelerator": "cuda",
"Engine": {
"Name": ""
},
"Image": "quay.io/ramalama/cuda:0.7",
"Inference": {
"Default": "llama.cpp",
"Engines": {
"llama.cpp": "/usr/share/ramalama/inference-spec/engines/llama.cpp.yaml",
"mlx": "/usr/share/ramalama/inference-spec/engines/mlx.yaml",
"vllm": "/usr/share/ramalama/inference-spec/engines/vllm.yaml"
},
"Schema": {
"1-0-0": "/usr/share/ramalama/inference-spec/schema/schema.1-0-0.json"
}
},
"Shortnames": {
"Names": {
"cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
"deepseek": "ollama://deepseek-r1",
"dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
"gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf",
"gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf",
"gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf",
"gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"granite": "ollama://granite3.1-dense",
"granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
"granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
"granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:2b": "ollama://granite3.1-dense:2b",
"granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:8b": "ollama://granite3.1-dense:8b",
"hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"ibm/granite": "ollama://granite3.1-dense:8b",
"ibm/granite:2b": "ollama://granite3.1-dense:2b",
"ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"ibm/granite:8b": "ollama://granite3.1-dense:8b",
"merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
"mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf",
"mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
"mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
"mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
"openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
"openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
"phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
"smollm:135m": "ollama://smollm:135m",
"tiny": "ollama://tinyllama"
},
"Files": [
"/usr/share/ramalama/shortnames.conf",
"/home/dwalsh/.config/ramalama/shortnames.conf",
]
},
"Store": "/usr/share/ramalama",
"UseContainer": true,
"Version": "0.7.5"
}
```

-

Info with Podman engine.

```
$ ramalama info
```
Returns for example:
```
{
"Accelerator": "cuda",
"Engine": {
"Info": {
"host": {
"arch": "amd64",
"buildahVersion": "1.39.4",
"cgroupControllers": [
"cpu",
"io",
"memory",
"pids"
],
"cgroupManager": "systemd",
"cgroupVersion": "v2",
"conmon": {
"package": "conmon-2.1.13-1.fc42.x86_64",
"path": "/usr/bin/conmon",
"version": "conmon version 2.1.13, commit: "
},
"cpuUtilization": {
"idlePercent": 97.36,
"systemPercent": 0.64,
"userPercent": 2
},
"cpus": 32,
"databaseBackend": "sqlite",
"distribution": {
"distribution": "fedora",
"variant": "workstation",
"version": "42"
},
"eventLogger": "journald",
"freeLocks": 2043,
"hostname": "danslaptop",
"idMappings": {
"gidmap": [
{
"container_id": 0,
"host_id": 3267,
"size": 1
},
{
"container_id": 1,
"host_id": 524288,
"size": 65536
}
],
"uidmap": [
{
"container_id": 0,
"host_id": 3267,
"size": 1
},
{
"container_id": 1,
"host_id": 524288,
"size": 65536
}
]
},
"kernel": "6.14.2-300.fc42.x86_64",
"linkmode": "dynamic",
"logDriver": "journald",
"memFree": 65281908736,
"memTotal": 134690979840,
"networkBackend": "netavark",
"networkBackendInfo": {
"backend": "netavark",
"dns": {
"package": "aardvark-dns-1.14.0-1.fc42.x86_64",
"path": "/usr/libexec/podman/aardvark-dns",
"version": "aardvark-dns 1.14.0"
},
"package": "netavark-1.14.1-1.fc42.x86_64",
"path": "/usr/libexec/podman/netavark",
"version": "netavark 1.14.1"
},
"ociRuntime": {
"name": "crun",
"package": "crun-1.21-1.fc42.x86_64",
"path": "/usr/bin/crun",
"version": "crun version 1.21\ncommit: 10269840aa07fb7e6b7e1acff6198692d8ff5c88\nrundir: /run/user/3267/crun\nspec: 1.0.0\n+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL"
},
"os": "linux",
"pasta": {
"executable": "/bin/pasta",
"package": "passt-0^20250415.g2340bbf-1.fc42.x86_64",
"version": ""
},
"remoteSocket": {
"exists": true,
"path": "/run/user/3267/podman/podman.sock"
},
"rootlessNetworkCmd": "pasta",
"security": {
"apparmorEnabled": false,
"capabilities": "CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT",
"rootless": true,
"seccompEnabled": true,
"seccompProfilePath": "/usr/share/containers/seccomp.json",
"selinuxEnabled": true
},
"serviceIsRemote": false,
"slirp4netns": {
"executable": "/bin/slirp4netns",
"package": "slirp4netns-1.3.1-2.fc42.x86_64",
"version": "slirp4netns version 1.3.1\ncommit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236\nlibslirp: 4.8.0\nSLIRP_CONFIG_VERSION_MAX: 5\nlibseccomp: 2.5.5"
},
"swapFree": 8589930496,
"swapTotal": 8589930496,
"uptime": "116h 35m 40.00s (Approximately 4.83 days)",
"variant": ""
},
"plugins": {
"authorization": null,
"log": [
"k8s-file",
"none",
"passthrough",
"journald"
],
"network": [
"bridge",
"macvlan",
"ipvlan"
],
"volume": [
"local"
]
},
"registries": {
"search": [
"registry.fedoraproject.org",
"registry.access.redhat.com",
"docker.io"
]
},
"store": {
"configFile": "/home/dwalsh/.config/containers/storage.conf",
"containerStore": {
"number": 5,
"paused": 0,
"running": 0,
"stopped": 5
},
"graphDriverName": "overlay",
"graphOptions": {},
"graphRoot": "/usr/share/containers/storage",
"graphRootAllocated": 2046687182848,
"graphRootUsed": 399990419456,
"graphStatus": {
"Backing Filesystem": "btrfs",
"Native Overlay Diff": "true",
"Supports d_type": "true",
"Supports shifting": "false",
"Supports volatile": "true",
"Using metacopy": "false"
},
"imageCopyTmpDir": "/var/tmp",
"imageStore": {
"number": 297
},
"runRoot": "/run/user/3267/containers",
"transientStore": false,
"volumePath": "/usr/share/containers/storage/volumes"
},
"version": {
"APIVersion": "5.4.2",
"BuildOrigin": "Fedora Project",
"Built": 1743552000,
"BuiltTime": "Tue Apr 1 19:00:00 2025",
"GitCommit": "be85287fcf4590961614ee37be65eeb315e5d9ff",
"GoVersion": "go1.24.1",
"Os": "linux",
"OsArch": "linux/amd64",
"Version": "5.4.2"
}
},
"Name": "podman"
},
"Image": "quay.io/ramalama/cuda:0.7",
"Inference": {
"Default": "llama.cpp",
"Engines": {
"llama.cpp": "/usr/share/ramalama/inference-spec/engines/llama.cpp.yaml",
"mlx": "/usr/share/ramalama/inference-spec/engines/mlx.yaml",
"vllm": "/usr/share/ramalama/inference-spec/engines/vllm.yaml"
},
"Schema": {
"1-0-0": "/usr/share/ramalama/inference-spec/schema/schema.1-0-0.json"
}
},
"Shortnames": {
"Names": {
"cerebrum": "huggingface://froggeric/Cerebrum-1.0-7b-GGUF/Cerebrum-1.0-7b-Q4_KS.gguf",
"deepseek": "ollama://deepseek-r1",
"dragon": "huggingface://llmware/dragon-mistral-7b-v0/dragon-mistral-7b-q4_k_m.gguf",
"gemma3": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"gemma3:12b": "hf://bartowski/google_gemma-3-12b-it-GGUF/google_gemma-3-12b-it-IQ2_M.gguf",
"gemma3:1b": "hf://bartowski/google_gemma-3-1b-it-GGUF/google_gemma-3-1b-it-IQ2_M.gguf",
"gemma3:27b": "hf://bartowski/google_gemma-3-27b-it-GGUF/google_gemma-3-27b-it-IQ2_M.gguf",
"gemma3:4b": "hf://bartowski/google_gemma-3-4b-it-GGUF/google_gemma-3-4b-it-IQ2_M.gguf",
"granite": "ollama://granite3.1-dense",
"granite-code": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:20b": "hf://ibm-granite/granite-20b-code-base-8k-GGUF/granite-20b-code-base.Q4_K_M.gguf",
"granite-code:34b": "hf://ibm-granite/granite-34b-code-base-8k-GGUF/granite-34b-code-base.Q4_K_M.gguf",
"granite-code:3b": "hf://ibm-granite/granite-3b-code-base-2k-GGUF/granite-3b-code-base.Q4_K_M.gguf",
"granite-code:8b": "hf://ibm-granite/granite-8b-code-base-4k-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab-7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite-lab-8b": "huggingface://ibm-granite/granite-8b-code-base-GGUF/granite-8b-code-base.Q4_K_M.gguf",
"granite-lab:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:2b": "ollama://granite3.1-dense:2b",
"granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"granite:8b": "ollama://granite3.1-dense:8b",
"hermes": "huggingface://NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"ibm/granite": "ollama://granite3.1-dense:8b",
"ibm/granite:2b": "ollama://granite3.1-dense:2b",
"ibm/granite:7b": "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf",
"ibm/granite:8b": "ollama://granite3.1-dense:8b",
"merlinite": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab-7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite-lab:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"merlinite:7b": "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf",
"mistral": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v1": "huggingface://TheBloke/Mistral-7B-Instruct-v0.1-GGUF/mistral-7b-instruct-v0.1.Q5_K_M.gguf",
"mistral:7b-v2": "huggingface://TheBloke/Mistral-7B-Instruct-v0.2-GGUF/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
"mistral:7b-v3": "huggingface://MaziyarPanahi/Mistral-7B-Instruct-v0.3-GGUF/Mistral-7B-Instruct-v0.3.Q4_K_M.gguf",
"mistral_code_16k": "huggingface://TheBloke/Mistral-7B-Code-16K-qlora-GGUF/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
"mistral_codealpaca": "huggingface://TheBloke/Mistral-7B-codealpaca-lora-GGUF/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
"mixtao": "huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
"openchat": "huggingface://TheBloke/openchat-3.5-0106-GGUF/openchat-3.5-0106.Q4_K_M.gguf",
"openorca": "huggingface://TheBloke/Mistral-7B-OpenOrca-GGUF/mistral-7b-openorca.Q4_K_M.gguf",
"phi2": "huggingface://MaziyarPanahi/phi-2-GGUF/phi-2.Q4_K_M.gguf",
"smollm:135m": "ollama://smollm:135m",
"tiny": "ollama://tinyllama"
},
"Files": [
"/usr/share/ramalama/shortnames.conf",
"/home/dwalsh/.config/ramalama/shortnames.conf",
]
},
"Store": "/usr/share/ramalama",
"UseContainer": true,
"Version": "0.7.5"
}
```

-

Using jq to print specific `ramalama info` content.

```
$ ramalama info | jq .Shortnames.Names.mixtao
```
Returns for example:
```
"huggingface://MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf"
```

### [`ramalama-inspect`](https://github.com/containers/ramalama/blob/main/docs/ramalama-inspect.1.md)
#### Inspect the specified AI Model.
-

Inspect the smollm:135m model for basic information.

```
$ ramalama inspect smollm:135m
```
Returns for example:
```
smollm:135m
Path: /var/lib/ramalama/models/ollama/smollm:135m
Registry: ollama
Format: GGUF
Version: 3
Endianness: little
Metadata: 39 entries
Tensors: 272 entries
```

-

Inspect the smollm:135m model for all information in json format.

```
$ ramalama inspect smollm:135m --all --json
```
Returns for example:
```
{
"Name": "smollm:135m",
"Path": "/home/mengel/.local/share/ramalama/models/ollama/smollm:135m",
"Registry": "ollama",
"Format": "GGUF",
"Version": 3,
"LittleEndian": true,
"Metadata": {
"general.architecture": "llama",
"general.base_model.0.name": "SmolLM 135M",
"general.base_model.0.organization": "HuggingFaceTB",
"general.base_model.0.repo_url": "https://huggingface.co/HuggingFaceTB/SmolLM-135M",
...
},
"Tensors": [
{
"dimensions": [
576,
49152
],
"n_dimensions": 2,
"name": "token_embd.weight",
"offset": 0,
"type": 8
},
...
]
}
```

### [`ramalama-list`](https://github.com/containers/ramalama/blob/main/docs/ramalama-list.1.md)
#### List all downloaded AI Models.
-

You can `list` all models pulled into local storage.

```
$ ramalama list
```
Returns for example:
```
NAME MODIFIED SIZE
ollama://smollm:135m 16 hours ago 5.5M
huggingface://afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf 14 hours ago 460M
ollama://moondream:latest 6 days ago 791M
ollama://phi4:latest 6 days ago 8.43 GB
ollama://tinyllama:latest 1 week ago 608.16 MB
ollama://granite3-moe:3b 1 week ago 1.92 GB
ollama://granite3-moe:latest 3 months ago 1.92 GB
ollama://llama3.1:8b 2 months ago 4.34 GB
ollama://llama3.1:latest 2 months ago 4.34 GB
```

### [`ramalama-login`](https://github.com/containers/ramalama/blob/main/docs/ramalama-login.1.md)
#### Log in to a remote registry.
-

Log in to quay.io/username oci registry

```
$ export RAMALAMA_TRANSPORT=quay.io/username
$ ramalama login -u username
```

-

Log in to Ollama registry

```
$ export RAMALAMA_TRANSPORT=ollama
$ ramalama login
```

-

Log in to huggingface registry

```
$ export RAMALAMA_TRANSPORT=huggingface
$ ramalama login --token=XYZ
```

Logging in to Hugging Face requires the `hf tool`. For installation and usage instructions, see the documentation of the [Hugging Face command line interface](https://huggingface.co/docs/huggingface_hub/en/guides/cli).

### [`ramalama-logout`](https://github.com/containers/ramalama/blob/main/docs/ramalama-logout.1.md)
#### Log out of a remote registry.
-

Log out from quay.io/username oci repository

```
$ ramalama logout quay.io/username
```

-

Log out from Ollama registry

```
$ ramalama logout ollama
```

-

Log out from huggingface

```
$ ramalama logout huggingface
```

### [`ramalama-perplexity`](https://github.com/containers/ramalama/blob/main/docs/ramalama-perplexity.1.md)
#### Calculate perplexity for the specified AI Model.
-

Calculate the perplexity of an AI Model.

Perplexity measures how well the model can predict the next token with lower values being better
```
$ ramalama perplexity granite3-moe
```

### [`ramalama-pull`](https://github.com/containers/ramalama/blob/main/docs/ramalama-pull.1.md)
#### Pull the AI Model from the Model registry to local storage.
-

Pull a model

You can `pull` a model using the `pull` command. By default, it pulls from the Ollama registry.
```
$ ramalama pull granite3-moe
```

### [`ramalama-push`](https://github.com/containers/ramalama/blob/main/docs/ramalama-push.1.md)
#### Push the AI Model from local storage to a remote registry.
-

Push specified AI Model (OCI-only at present)

A model can from RamaLama model storage in Huggingface, Ollama, or OCI Model format. The model can also just be a model stored on disk
```
$ ramalama push oci://quay.io/rhatdan/tiny:latest
```

### [`ramalama-rag`](https://github.com/containers/ramalama/blob/main/docs/ramalama-rag.1.md)
#### Generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image.

>[!NOTE]
> this command does not work without a container engine.

-

Generate RAG data from provided documents and convert into an OCI Image.

This command uses a specific container image containing the docling tool to convert the specified content into a RAG vector database. If the image does not exist locally, RamaLama will pull the image down and launch a container to process the data.

**Positional arguments:**

PATH Files/Directory containing PDF, DOCX, PPTX, XLSX, HTML, AsciiDoc & Markdown formatted files to be processed. Can be specified multiple times.

IMAGE OCI Image name to contain processed rag data

```
ramalama rag ./README.md https://github.com/containers/podman/blob/main/README.md quay.io/ramalama/myrag
100% |███████████████████████████████████████████████████████| 114.00 KB/ 0.00 B 922.89 KB/s 59m 59s
Building quay.io/ramalama/myrag...
adding vectordb...
c857ebc65c641084b34e39b740fdb6a2d9d2d97be320e6aa9439ed0ab8780fe0
```

The image can then be used with:

```
ramalama run --rag quay.io/ramalama/myrag instructlab/merlinite-7b-lab
```

### [`ramalama-rm`](https://github.com/containers/ramalama/blob/main/docs/ramalama-rm.1.md)
#### Remove the AI Model from local storage.
-

Specify one or more AI Models to be removed from local storage.

```
$ ramalama rm ollama://tinyllama
```

-

Remove all AI Models from local storage.

```
$ ramalama rm --all
```

### [`ramalama-run`](https://github.com/containers/ramalama/blob/main/docs/ramalama-run.1.md)
#### Run the specified AI Model as a chatbot.

-

Run a chatbot on a model using the run command. By default, it pulls from the Ollama registry.

Note: RamaLama will inspect your machine for native GPU support and then will use a container engine like Podman to pull an OCI container image with the appropriate code and libraries to run the AI Model. This can take a long time to setup, but only on the first run.

```
$ ramalama run instructlab/merlinite-7b-lab
```

-

After the initial container image has been downloaded, you can interact with different models using the container image.

```
$ ramalama run granite3-moe
```
Returns for example:
```
> Write a hello world application in python

print("Hello World")
```

-

In a different terminal window see the running podman container.

```
$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
91df4a39a360 quay.io/ramalama/ramalama:latest /home/dwalsh/rama... 4 minutes ago Up 4 minutes gifted_volhard
```

### [`ramalama-serve`](https://github.com/containers/ramalama/blob/main/docs/ramalama-serve.1.md)
#### Serve REST API on the specified AI Model.
-

Serve a model and connect via a browser.

```
$ ramalama serve llama3
```
When the web UI is enabled, you can connect via your browser at: 127.0.0.1:< port >
The default serving port will be 8080 if available, otherwise a free random port in the range 8081-8090. If you wish, you can specify a port to use with --port/-p.

-

Run two AI Models at the same time. Notice both are running within Podman Containers.

```
$ ramalama serve -d -p 8080 --name mymodel ollama://smollm:135m
09b0e0d26ed28a8418fb5cd0da641376a08c435063317e89cf8f5336baf35cfa

$ ramalama serve -d -n example --port 8081 oci://quay.io/mmortari/gguf-py-example/v1/example.gguf
3f64927f11a5da5ded7048b226fbe1362ee399021f5e8058c73949a677b6ac9c

$ podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
09b0e0d26ed2 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 32 seconds ago Up 32 seconds 0.0.0.0:8081->8081/tcp ramalama_sTLNkijNNP
3f64927f11a5 quay.io/ramalama/ramalama:latest /usr/bin/ramalama... 17 seconds ago Up 17 seconds 0.0.0.0:8082->8082/tcp ramalama_YMPQvJxN97
```

-

To disable the web UI, use the `--webui` off flag.

```
$ ramalama serve --webui off llama3
```

### [`ramalama-stop`](https://github.com/containers/ramalama/blob/main/docs/ramalama-stop.1.md)
#### Stop the named container that is running the AI Model.
-

Stop a running model if it is running in a container.

```
$ ramalama stop mymodel
```

-

Stop all running models running in containers.

```
$ ramalama stop --all
```

### [`ramalama-version`](https://github.com/containers/ramalama/blob/main/docs/ramalama-version.1.md)
#### Display version of the AI Model.
-

Print the version of RamaLama.

```
$ ramalama version
```
Returns for example:
```
ramalama version 1.2.3
```

### Appendix

| Command | Description
| ------------------------------------------------------ | -------------------------------------
| [ramalama(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama.1.md)
| [ramalama-bench(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-bench.1.md)|
| [ramalama-chat(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-chat.1.md)|
| [ramalama-containers(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-containers.1.md)|
| [ramalama-convert(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-convert.1.md)
| [ramalama-info(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-info.1.md)
| [ramalama-inspect(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-inspect.1.md)
| [ramalama-list(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-list.1.md)
| [ramalama-login(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-login.1.md)
| [ramalama-logout(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-logout.1.md)
| [ramalama-perplexity(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-perplexity.1.md)|
| [ramalama-pull(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-pull.1.md)
| [ramalama-push(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-push.1.md)
| [ramalama-rag(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-rag.1.md)
| [ramalama-rm(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-rm.1.md)
| [ramalama-run(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-run.1.md)
| [ramalama-serve(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-serve.1.md)
| [ramalama-stop(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-stop.1.md)
| [ramalama-version(1)](https://github.com/containers/ramalama/blob/main/docs/ramalama-version.1.md)

| --------------------- | | primary RamaLama man page | benchmark specified AI Model | chat with specified OpenAI REST API | list all RamaLama containers | | convert AI Model from local storage to OCI Image | | display RamaLama configuration information | | inspect the specified AI Model | | list all downloaded AI Models | | login to remote registry | | logout from remote registry | calculate perplexity for specified AI Model | | pull AI Model from Model registry to local storage | | push AI Model from local storage to remote registry | | generate and convert Retrieval Augmented Generation (RAG) data from provided documents into an OCI Image| | remove AI Model from local storage | | run specified AI Model as a chatbot | | serve REST API on specified AI Model | | stop named container that is running AI Model | | display version of RamaLama

## Diagram

```
+---------------------------+
| |
| ramalama run granite3-moe |
| |
+-------+-------------------+
|
|
| +------------------+ +------------------+
| | Pull inferencing | | Pull model layer |
+-----------| runtime (cuda) |---------->| granite3-moe |
+------------------+ +------------------+
| Repo options: |
+-+-------+------+-+
| | |
v v v
+---------+ +------+ +----------+
| Hugging | | OCI | | Ollama |
| Face | | | | Registry |
+-------+-+ +---+--+ +-+--------+
| | |
v v v
+------------------+
| Start with |
| cuda runtime |
| and |
| granite3-moe |
+------------------+
```

## In development

Regarding this alpha, everything is under development, so expect breaking changes. If you need to reset your installation, see the [Uninstall](#uninstall) section above for instructions on removing RamaLama and cleaning up all data files, then reinstall.

## Known Issues

- On certain versions of Python on macOS, certificates may not installed correctly, potentially causing SSL errors (e.g., when accessing huggingface.co). To resolve this, run the `Install Certificates` command, typically as follows:

```
/Applications/Python 3.x/Install Certificates.command
```

## Credit where credit is due

This project wouldn't be possible without the help of other projects like:

- [llama.cpp](https://github.com/ggml-org/llama.cpp)
- [vllm](https://github.com/vllm-project/vllm)
- [mlx-lm](https://github.com/ml-explore/mlx-examples)
- [podman](https://github.com/containers/podman)
- [huggingface](https://github.com/huggingface)

so if you like this tool, give some of these repos a :star:, and hey, give us a :star: too while you are at it.

## Community

For general questions and discussion, please use RamaLama's

[`Matrix`](https://matrix.to/#/#ramalama:fedoraproject.org)

For discussions around issues/bugs and features, you can use the GitHub
[Issues](https://github.com/containers/ramalama/issues)
and
[PRs](https://github.com/containers/ramalama/pulls)
tracking system.

### Community / Developer Meetups

We host a public community and developer meetup on Discord every other week to discuss project direction and provide an open forum for users to get help, ask questions, and showcase new features.

[**Join on Discord**](https://discord.gg/MkCXuTRBUn)
[**Meeting Agenda**](https://docs.google.com/document/d/1wiqn7ItKgc8BgyTUQ46eeY23ms_hWbkhAoiP9D1ClfY/edit?tab=t.0#heading=h.b1x47hb6d0pt)

## Roadmap

See the full [Roadmap](./Roadmap.md).

## Contributors

Open to contributors

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/containers/ramalama

Awesome Lists containing this project

README