https://github.com/mmonteleone/corral
Ollama-shaped llama.cpp, MLX, and model management helper
https://github.com/mmonteleone/corral
cli gemma gguf huggingface lamma llm llms local-llm ollama phi qwen shell
Last synced: 2 months ago
JSON representation
Ollama-shaped llama.cpp, MLX, and model management helper
- Host: GitHub
- URL: https://github.com/mmonteleone/corral
- Owner: mmonteleone
- License: mit
- Created: 2026-04-10T00:49:11.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-17T20:40:07.000Z (2 months ago)
- Last Synced: 2026-04-17T22:36:00.194Z (2 months ago)
- Topics: cli, gemma, gguf, huggingface, lamma, llm, llms, local-llm, ollama, phi, qwen, shell
- Language: Shell
- Homepage:
- Size: 179 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Corral ✨🦙
Run local models with the ease of [Ollama](https://ollama.com) and full power of official [llama.cpp](https://github.com/ggml-org/llama.cpp) releases and [MLX](https://github.com/ml-explore/mlx-lm) on Apple Silicon.
**Corral is just a shell script**. It installs and updates official latest llama.cpp and MLX releases, uses the standard Hugging Face registry for models, and provides an Ollama-style CLI for running and managing local models: *search*, *pull*, *run*, *serve*, *launch*, *list*, *remove*, *update*, etc. along with templated usage profiles and tool launchers.
```sh
corral search gemma
corral run unsloth/gemma-4-26B-A4B-it-GGUF
corral launch pi
```
## Why Corral?
- Upstream, official llama.cpp and MLX builds, with their latest performance benefits and model support (*ahem*, [Gemma 4](https://deepmind.google/models/gemma/gemma-4/)) vs downstream integrations and forks
- Ollama-style ergonomics for running *and* managing local models, without an always-on daemon
- The full Hugging Face model registry, not just what Ollama ships
- Model search and discovery against Hugging Face from the command line
- Saved, templated, profiles for pinning a model with a specific set of flags
- Pre-configured launcher for tools including [OpenCode](https://opencode.ai) and [Pi](https://pi.dev)
- Command, model, profile, and quant shell completions for fish, zsh, and bash
- Standard HF cache. Downloaded models are visible to other tools
## Does the world really need this?
Not really.
## Install
```sh
# System-wide
sudo curl -fsSL https://github.com/mmonteleone/corral/releases/latest/download/corral \
-o /usr/local/bin/corral && sudo chmod +x /usr/local/bin/corral
# Or user-local (no sudo)
curl -fsSL https://github.com/mmonteleone/corral/releases/latest/download/corral \
-o ~/.local/bin/corral && chmod +x ~/.local/bin/corral
```
> [!NOTE]
> `~/.local/bin` may not be in `$PATH` by default on macOS. Add it: `export PATH="$HOME/.local/bin:$PATH"`
Then install a backend and set up shell completions:
```sh
corral install
```
On Apple Silicon this installs **both** llama.cpp (`llama-cli`, `llama-server`) and MLX (`mlx-lm`). On other platforms, llama.cpp only. Restrict with `--backend llama.cpp` or `--backend mlx`.
`corral install` downloads the latest official llama.cpp release and, after prompting, adds it to `$PATH` and installs shell completions. Pass `--shell-profile` to accept automatically, or `--no-shell-profile` to skip. For MLX, corral installs `mlx-lm` via `uv` (offering to install `uv` via Homebrew if needed).
## Quick start
```sh
corral search gemma # Find models on Hugging Face
corral run unsloth/gemma-4-26B-A4B-it-GGUF # Chat (downloads on first use)
corral run mlx-community/gemma-4-26b-a4b-it-6bit # MLX model (auto-detected)
corral serve unsloth/gemma-4-26B-A4B-it-GGUF # OpenAI-compatible API + web UI
corral run unsloth/gemma-4-26B-A4B-it-GGUF -- -ngl 999 -c 8192 # Extra flags
# Profiles: save a name + model + flags combo
corral profile set coder unsloth/gemma-4-26B-A4B-it-GGUF -- \
--ctx-size 65536 --temp 0.2 -ngl 999
corral run coder
# Or seed a profile from a built-in template
corral profile set mycoder code unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M
corral run mycoder
# Launch supported coding harnesses against a running server
corral launch pi
corral launch opencode
corral list # Models, profiles, templates
corral remove unsloth/gemma-4-26B-A4B-it-GGUF
corral remove coder
```
## Commands
| Command | Description |
|---|---|
| `install` | Install backend(s) and shell completions |
| `run MODEL\|PROFILE` | Interactive chat (`llama-cli` / `mlx_lm.chat`) |
| `serve MODEL\|PROFILE` | OpenAI-compatible server (`llama-server` / `mlx_lm.server`) |
| `launch TOOL` | Configure and launch `pi` or `opencode` against a running server |
| `pull MODEL` | Download model artifacts without running |
| `search [QUERY]` | Search Hugging Face for compatible models |
| `browse MODEL` | Open a model's Hugging Face page in the browser |
| `list` / `ls` | List cached models, profiles, and templates |
| `remove` / `rm` | Remove cached models or profiles |
| `profile set\|show\|duplicate` | Manage saved profiles |
| `template show\|set\|remove` | Manage flag templates |
| `status` | Platform info and installed backend status |
| `update` | Update backends to latest versions |
| `versions` | Show installed backend versions |
| `prune` | Remove old llama.cpp installs (keeps current) |
| `uninstall` | Remove backends and optionally clean up caches |
| `ps` | Show running model processes |
| `version` | Show the corral version |
Run `corral --help` for per-command flags.
## Models and quants
Models use standard Hugging Face `USER/MODEL` IDs. For llama.cpp, append `:QUANT` to pin a quantization:
```sh
corral run unsloth/gemma-4-26B-A4B-it-GGUF # default quant
corral run unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q6_K # specific quant
```
MLX models use plain IDs without `:QUANT` (e.g. `mlx-community/gemma-4-26b-a4b-it-6bit`).
All models are stored in the standard Hugging Face cache (`~/.cache/huggingface/hub/`).
### Search
```sh
corral search --backend llama.cpp gemma # GGUF-tagged results
corral search --backend mlx gemma # MLX-tagged results
corral search --backend llama.cpp qwen --quants # show GGUF quant variants
```
### List and remove
```sh
corral list # all models, profiles, templates
corral ls --models # only models
corral ls --backend mlx # only MLX models
corral remove USER/MODEL:QUANT # remove one quant (llama.cpp)
corral remove USER/MODEL # remove entire model
corral remove PROFILE_NAME # remove a profile
```
## Profiles and templates
A **profile** saves a model + flags under a name, usable anywhere a model is accepted:
```sh
corral profile set coder unsloth/gemma-4-26B-A4B-it-GGUF -- \
--ctx-size 65536 --temp 0.2 -ngl 999
corral run coder
corral serve coder
corral run coder -- --temp 0.5 # inline flags override profile flags
```
A **template** is a reusable set of flags that can seed profiles. Corral ships two:
| Template | Purpose | Key flags |
|---|---|---|
| `chat` | Conversational | `--temp 0.8` |
| `code` | Coding | `--temp 0.2` |
```sh
corral profile set mycoder code unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q4_K_M # from template
corral run mycoder
```
Create custom templates with `corral template set`. If a template includes a `model=` line, the model is optional when creating profiles from it:
```sh
corral template set work-chat user/our-llm:Q4_K -- --temp 0.6 --ctx-size 16384
corral profile set alice-chat work-chat # model comes from template
corral profile set test-chat work-chat user/new-llm:Q4_K # override model
```
```sh
corral profile show coder # inspect
corral profile duplicate coder coder2
corral template show code
corral template remove work-chat # delete user template
```
### Profile file format
Profiles are plain text in `~/.config/corral/profiles/` with a `model=` line and flags (one per line). Section headers scope flags to a backend, command, or both:
```
model=unsloth/gemma-4-26B-A4B-it-GGUF
--temp 0.2
[mlx]
--max-tokens 4096
[mlx.serve]
--top-k 20
[llama.cpp]
--top-k 20
--repeat-penalty 1.05
--ctx-size 65536
--n-predict 4096
--flash-attn on
-ngl 999
[llama.cpp.serve]
--cache-reuse 256
```
| Section | Scope |
|---|---|
| *(none)* | All backends and commands |
| `[run]` / `[serve]` | One command, any backend |
| `[llama.cpp]` / `[mlx]` | One backend, any command |
| `[llama.cpp.run]` / `[llama.cpp.serve]` / `[mlx.run]` / `[mlx.serve]` | One backend + one command |
`profile set` creates flat profiles. Section headers are added by editing the file directly or inherited from templates. Templates use the same format (`model=` optional) and live in `~/.config/corral/templates/`. A user-defined template with the same name as a built-in takes precedence.
## Launch coding harnesses
`corral launch` configures a supported coding harness to use a currently running `corral serve` instance, then launches the harness.
Supported harnesses currently include `pi` and `opencode`. Corral inspects running servers via `corral ps`, matches the server's local OpenAI-compatible endpoint and model name, and writes that into the harness config. Existing configs are preserved with a timestamped backup next to any modified config file
## Shell completions
Completions for commands, models, quants, and profiles are available for **fish**, **zsh**, and **bash**. They install automatically during `corral install` when shell profile edits are accepted. To add them later:
```sh
corral install --shell-profile
```
## Configuration
| Variable | Purpose |
|---|---|
| `CORRAL_INSTALL_ROOT` | Override llama.cpp install directory |
| `CORRAL_PROFILES_DIR` | Override profiles directory (default: `~/.config/corral/profiles`) |
| `CORRAL_TEMPLATES_DIR` | Override templates directory (default: `~/.config/corral/templates`) |
| `HF_TOKEN` | Authenticate for private/gated HF models (`HF_HUB_TOKEN` and `HUGGING_FACE_HUB_TOKEN` also work) |
## Uninstall
```sh
corral uninstall --self # remove all backends + corral itself
corral uninstall --backend mlx # remove one backend
corral uninstall --self --delete-hf-cache # also wipe downloaded models
```
All uninstall commands prompt for confirmation. Add `--force` to skip.
## Compatibility
| | Platforms |
|---|---|
| **llama.cpp** | macOS arm64/x86_64, Linux x86_64/arm64 |
| **MLX** | macOS arm64 only (Apple Silicon) |
Requires `curl`, `tar`, `jq`, and standard POSIX tools. MLX operations require `uv`. Shell completions support fish, zsh, and bash. `install` and `update` are atomic. `remove` refuses to delete models currently in use.
## Development
Source entry point is [src/corral.sh](src/corral.sh) with modules in [src/lib/](src/lib/). The standalone distributable is built by [tools/build.sh](tools/build.sh), which inlines modules and stamps the version from the current git tag.
Current module split is feature-oriented: helpers, cache, profiles, shell integration, runtime lifecycle, process discovery, inventory/removal, launch, search, and completions.
Within that split, public cross-module helpers are named without a leading underscore; underscore-prefixed helpers are intended to stay private to their defining module.
```sh
bash tools/build.sh # build standalone artifact
shellcheck src/corral.sh src/lib/*.sh # lint
bash tests/unit.sh # full unit suite
bash tests/smoke.sh # full smoke suite
```
```sh
bash tests/unit.sh test_parse_model_spec_without_quant # single unit test
bash tests/smoke.sh test_search_returns_results # single smoke test
```
`dist/corral` is generated output. Edit `src/corral.sh`, `src/lib/*.sh`, `src/templates/*.conf`, `src/launch/*.tmpl`, or `src/jq/search-quants.jq`, then rebuild.
## License
MIT License. Copyright (c) 2026 Michael Monteleone.
Corral is not affiliated with [Ollama](https://ollama.com), [llama.cpp](https://github.com/ggml-org/llama.cpp), [MLX](https://github.com/ml-explore/mlx-lm), or [Hugging Face](https://huggingface.co).