https://github.com/zerfoo/zonnx
ONNX-to-GGUF model converter CLI. Convert HuggingFace ONNX models to GGUF format for use with Zerfoo and llama.cpp. CGo-free, single static binary.
https://github.com/zerfoo/zonnx
Last synced: about 2 months ago
JSON representation
ONNX-to-GGUF model converter CLI. Convert HuggingFace ONNX models to GGUF format for use with Zerfoo and llama.cpp. CGo-free, single static binary.
- Host: GitHub
- URL: https://github.com/zerfoo/zonnx
- Owner: zerfoo
- License: apache-2.0
- Created: 2025-08-21T01:30:37.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2026-03-26T15:36:47.000Z (about 2 months ago)
- Last Synced: 2026-03-27T00:33:29.429Z (about 2 months ago)
- Language: Go
- Size: 179 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# zonnx
[](https://pkg.go.dev/github.com/zerfoo/zonnx)
[](https://opensource.org/licenses/Apache-2.0)
Standalone CLI for converting ONNX and SafeTensors models to GGUF format. Ships as a single static binary — zero CGo.
Part of the [Zerfoo](https://github.com/zerfoo) ML ecosystem.
## Features
- **ONNX / SafeTensors to GGUF** — produce portable GGUF files compatible with [zerfoo](https://github.com/zerfoo/zerfoo) and llama.cpp
- **Post-conversion quantization** — quantize weights to Q4_0 or Q8_0 during conversion
- **HuggingFace integration** — download ONNX models and tokenizer files in one step
- **Model inspection** — introspect metadata, IOs, nodes, and tensor stats for ONNX and GGUF files
- **Architecture-aware mappings** — tensor name and metadata mappings tuned per model family
- **CGo-free** — single static binary, easy to distribute and run in minimal containers
## Installation
```bash
go install github.com/zerfoo/zonnx/cmd/zonnx@latest
```
Or build from source:
```bash
go build -o zonnx ./cmd/zonnx
```
Requires Go 1.26+. `CGO_ENABLED=0` works.
## Quick Start
```bash
# Download an ONNX model from HuggingFace
zonnx download --model google/gemma-2-2b-it --output ./models
# Convert ONNX to GGUF
zonnx convert --arch gemma --output ./models/model.gguf ./models/model.onnx
# Convert SafeTensors to GGUF
zonnx convert --format safetensors --arch bert --output ./models/model.gguf ./models/bert-dir/
# Convert with quantization
zonnx convert --quantize q4_0 --output ./models/model-q4.gguf ./models/model.onnx
# Inspect a model file
zonnx inspect --pretty ./models/model.gguf
```
## Supported Architectures
| Architecture | `--arch` | Input Formats | Notes |
|-------------|----------|---------------|-------|
| Llama | `llama` (default) | ONNX | Llama 3, Code Llama |
| Gemma | `gemma` | ONNX | Gemma, Gemma 2, Gemma 3 |
| BERT | `bert` | ONNX, SafeTensors | Classification, embeddings |
| RoBERTa | `roberta` | ONNX, SafeTensors | Same layer structure as BERT |
Any architecture string can be passed via `--arch`. Metadata mapping is generic; tensor name mapping currently covers decoder (Llama-style) and encoder (BERT/RoBERTa) models.
## Commands
### `convert`
```
zonnx convert [flags]
```
| Flag | Default | Description |
|------|---------|-------------|
| `--output` | `.gguf` | Output GGUF file path |
| `--arch` | `llama` | Model architecture for metadata/tensor mapping |
| `--format` | `onnx` | Input format: `onnx` or `safetensors` |
| `--quantize` | (none) | Quantize weights: `q4_0` or `q8_0` |
### `download`
```
zonnx download --model [--output ] [--api-key ]
```
The `--api-key` flag takes precedence over the `HF_API_KEY` environment variable.
### `inspect`
```
zonnx inspect [--type onnx|gguf] [--pretty]
```
Type is inferred from file extension when not specified.
## Metadata Mapped
These HuggingFace `config.json` fields are mapped to GGUF metadata for all architectures:
| config.json field | GGUF key |
|-------------------|----------|
| `hidden_size` | `{arch}.embedding_length` |
| `num_hidden_layers` | `{arch}.block_count` |
| `num_attention_heads` | `{arch}.attention.head_count` |
| `num_key_value_heads` | `{arch}.attention.head_count_kv` |
| `intermediate_size` | `{arch}.feed_forward_length` |
| `vocab_size` | `{arch}.vocab_size` |
| `max_position_embeddings` | `{arch}.context_length` |
| `rms_norm_eps` | `{arch}.attention.layer_norm_rms_epsilon` |
| `rope_theta` | `{arch}.rope.freq_base` |
BERT/RoBERTa additionally map `layer_norm_eps`, `num_labels`, and `pooler_type`.
## Design Principles
- **GGUF-only output** — emits only GGUF files, no runtime code
- **No `zerfoo` imports** — strictly decoupled from the inference runtime
- **Explicit schema** — GGUF output captures all model attributes directly
## Development
```bash
make test # go test ./...
make lint # golangci-lint run
make format # gofmt + goimports
```
## License
Apache 2.0