{"id":29878288,"url":"https://github.com/zerfoo/zerfoo","last_synced_at":"2026-06-13T03:08:12.503Z","repository":{"id":306752507,"uuid":"1026519278","full_name":"zerfoo/zerfoo","owner":"zerfoo","description":"Pure Go machine learning framework. Train, run, and serve ML models with go build. Zero CGo.","archived":false,"fork":false,"pushed_at":"2026-04-11T08:12:20.000Z","size":46160,"stargazers_count":5,"open_issues_count":1,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2026-04-11T10:29:03.414Z","etag":null,"topics":["autodiff","deep-learning","distributed-training","float16","float8","fp16","fp8","go","golang","graph-ml","machine-learning","ml-framework","neural-network","onnx","transformer"],"latest_commit_sha":null,"homepage":"https://zerfoo.feza.ai","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zerfoo.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":"security/apikey.go","support":"support/api.go","governance":null,"roadmap":"docs/roadmap-progress-2026-03-17.md","authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-26T03:59:49.000Z","updated_at":"2026-04-11T08:12:07.000Z","dependencies_parsed_at":"2025-07-27T12:35:32.457Z","dependency_job_id":"97a08d91-7438-41b3-80f4-05343c5dfac7","html_url":"https://github.com/zerfoo/zerfoo","commit_stats":null,"previous_names":["zerfoo/zerfoo"],"tags_count":75,"template":false,"template_full_name":null,"purl":"pkg:github/zerfoo/zerfoo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerfoo%2Fzerfoo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerfoo%2Fzerfoo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerfoo%2Fzerfoo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerfoo%2Fzerfoo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zerfoo","download_url":"https://codeload.github.com/zerfoo/zerfoo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zerfoo%2Fzerfoo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31757482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-13T13:27:56.013Z","status":"ssl_error","status_checked_at":"2026-04-13T13:21:23.512Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["autodiff","deep-learning","distributed-training","float16","float8","fp16","fp8","go","golang","graph-ml","machine-learning","ml-framework","neural-network","onnx","transformer"],"created_at":"2025-07-31T07:01:46.067Z","updated_at":"2026-06-13T03:08:12.497Z","avatar_url":"https://github.com/zerfoo.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# zerfoo\n\nPure Go ML framework -- inference, training, and serving. Embed any GGUF model in your Go application with `go build ./...`.\n\n[![CI](https://github.com/zerfoo/zerfoo/actions/workflows/ci.yml/badge.svg)](https://github.com/zerfoo/zerfoo/actions/workflows/ci.yml)\n[![Go 1.26+](https://img.shields.io/badge/Go-1.26+-00ADD8.svg)](https://go.dev/)\n[![Go Reference](https://pkg.go.dev/badge/github.com/zerfoo/zerfoo.svg)](https://pkg.go.dev/github.com/zerfoo/zerfoo)\n[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](LICENSE)\n\n**241 tok/s** on Gemma 3 1B Q4_K_M -- up to 28% faster than Ollama. Faster on all 4 benchmarked models. Zero CGo. 41 model architectures (25 families). **Run models larger than RAM** via memory-mapped I/O (229B MiniMax-M2 on 128 GB). EAGLE speculative decoding with built-in head training, QuaRot quantization, Q4_K fused GEMV (14x faster), Multi-LoRA serving, BitNet ternary inference. CUDA graph capture, Apple Metal kernels. Time-series training 4.6x faster with CUDA graphs. Tabular ML and time-series forecasting built in.\n\n### Benchmarks\n\nDecode throughput comparison against [Ollama](https://ollama.com/) on NVIDIA DGX Spark GB10 (Grace Blackwell, sm_121, 128 GB LPDDR5x).\n\n| Model | Size | Quant | Zerfoo (tok/s) | Ollama (tok/s) | Ratio |\n|-------|------|-------|----------------|----------------|-------|\n| Gemma 3 1B | 1B | Q4_K_M | **241** | 188 | **1.28x** |\n| DeepSeek R1 1.5B | 1.5B | Q4_K_M | **190** | 174 | **1.09x** |\n| Llama 3.2 3B | 3B | Q4_K_M | **95** | 93 | **1.02x** |\n| Mistral 7B | 7B | Q5_K_M | **46** | 45 | **1.02x** |\n\nFaster than Ollama on all models. Up to 28% faster on small models, 2% faster at 7B.\n\n\u003cdetails\u003e\n\u003csummary\u003eMethodology\u003c/summary\u003e\n\n- **Hardware**: NVIDIA DGX Spark GB10 (Grace Blackwell, sm_121, 128 GB LPDDR5x unified memory)\n- **Prompt**: \"Explain the theory of relativity in simple terms.\"\n- **Tokens**: 128 decode tokens per run\n- **Sampling**: greedy (temperature = 0)\n- **Runs**: 3-run median\n- **Date**: 2026-03-31\n- **Ollama version**: 0.17.7\n- **Zerfoo version**: v1.38.4+ (ztensor v1.1.2+, 7 GPU regression fixes)\n- **Notes**: Zerfoo uses CUDA graph capture (184/185 instructions, 99.5%) with flash attention decode. Fused kernels: softmax+V multiply, repeat-interleave for GQA, fused AddRMSNorm, fused SwiGLU, fused QKNormRoPE, merged QKV, merged gate+up. Auto-disable mmap on CUDA for ARM64 compatibility. Q4_K/Q5_K/Q6_K/Q5_0 weights re-quantized to Q4_0 for fast vectorized GEMV.\n\n\u003c/details\u003e\n\n### Memory-Mapped Model Loading\n\nZerfoo memory-maps GGUF files by default — no flags, no configuration. The entire file (or all shards of a split GGUF) is mmap'd via `syscall.Mmap`. Tensor data stays on disk and is paged into RAM on demand by the OS. Split GGUF files (multiple shards) are detected and mapped automatically from any shard path.\n\n**Results on DGX Spark (128 GB RAM, CPU-only):**\n\n| Model | Params | Quant | File Size | Shards | Load time | Generates tokens | Ollama |\n|-------|--------|-------|-----------|--------|-----------|-----------------|--------|\n| MiniMax-M2 | 229B (MoE) | Q4_K_M | 128.8 GB | 3 | **6.3s** | ✅ yes | ❌ fails to load |\n\n```go\n// 128.8 GB model across 3 shards on a 128 GB machine.\n// 809 tensors mapped. No heap allocation for weights.\nm, _ := zerfoo.Load(\"./MiniMax-M2-Q4_K_M-00001-of-00003.gguf\")\ndefer m.Close()\nresult, _ := m.Generate(ctx, \"The meaning of life is\")\n// → \"a priori is something\"\n```\n\nStartup maps all shards and parses tensor metadata — no weight data is read until inference. The OS pages 128.8 GB of Q4_K_M quantized weights from NVMe as each matrix multiply streams through its superblocks. Ollama returns a 500 error on the same model on the same hardware.\n\n## Advanced Inference Features\n\n### EAGLE Speculative Decoding\n\nSelf-speculative decoding using a lightweight prediction head — no draft model needed. Based on [EAGLE-3](https://arxiv.org/abs/2503.01840).\n\n```go\nm, _ := zerfoo.Load(\"google/gemma-3-1b\")\ndefer m.Close()\nresult, _ := m.Generate(ctx, \"Explain quantum computing.\",\n    zerfoo.WithEAGLE(\"eagle-head.gguf\"),\n)\n```\n\nTrain your own EAGLE head:\n\n```bash\nzerfoo eagle-train --model model.gguf --corpus data.txt --output eagle-head.gguf --epochs 5\n```\n\n### QuaRot Weight Fusion\n\nHadamard rotation fused into weights at load time for uniform 4-bit quantization. Based on [QuaRot](https://arxiv.org/abs/2404.00456).\n\n```bash\nzerfoo run --quarot model.gguf \"Hello world\"\n```\n\n### Quantized KV Cache\n\nReduce KV cache memory by 6-7x with Q4 or Q3 quantization:\n\n```go\nresult, _ := m.Generate(ctx, prompt,\n    zerfoo.WithKVDtype(\"q4\"),  // 7.5x memory reduction\n)\n```\n\n### Tiered KV Cache\n\nAutomatically spill KV cache across three storage tiers as sequences grow — no OOM, no manual tuning:\n\n- **Hot**: uncompressed tensors in GPU/CPU memory (recent tokens)\n- **Warm**: compressed in CPU memory via block quantization\n- **Cold**: serialized to disk as binary files (oldest tokens)\n\nLayers are promoted and demoted automatically based on access frequency. Async prefetch moves cold layers back to hot before the decoder needs them.\n\n```go\nresult, _ := m.Generate(ctx, prompt,\n    zerfoo.WithTieredKV(generate.TieredKVStoreConfig{\n        ChunkSize:        64,  // warm-tier compression chunk size\n        DemoteThreshold:  2,   // demote layers accessed \u003c 2 times\n        PromoteThreshold: 8,   // promote layers accessed \u003e 8 times\n        // ColdDir: \"/var/cache/kv\" // optional: persist cold tier across calls\n    }),\n)\n```\n\nEnable it on the Model API via `zerfoo.WithTieredKV` (wraps `generate.WithTieredKV`). Useful for long-context inference where the KV cache exceeds GPU memory.\n\n### TransMLA — MHA-to-MLA Conversion\n\nConvert any MHA/GQA model to Multi-Head Latent Attention via SVD decomposition. Reduces KV cache by 80%+. Based on [TransMLA](https://arxiv.org/abs/2502.07864).\n\n```bash\nzerfoo transmla --rank 512 --input model.gguf --output model-mla.gguf\n```\n\n### Multi-LoRA Serving\n\nServe multiple LoRA adapters from a single base model. Per-request adapter selection via the OpenAI-compatible API:\n\n```bash\ncurl http://localhost:8080/v1/chat/completions \\\n  -d '{\"model\": \"gemma3-1b:my-lora\", \"messages\": [{\"role\": \"user\", \"content\": \"Hello\"}]}'\n```\n\n### BitNet Ternary Inference\n\nNative support for ternary weight models ({-1, 0, 1}) where matrix multiplication becomes integer addition/subtraction. Based on [BitNet b1.58](https://arxiv.org/abs/2402.17764).\n\n### Native Sparse Attention (NSA)\n\nHardware-aligned three-path sparse attention: coarse compression, fine-grained selection, and sliding window. Fused CUDA kernel. Based on [NSA](https://arxiv.org/abs/2502.11089).\n\n### Hybrid CPU/GPU MoE\n\nPlace shared MoE experts on GPU, offload routed experts to CPU with SIMD kernels. Predictive prefetching achieves 98% hit rate. Based on [KTransformers](https://arxiv.org/abs/2501.14018).\n\n### Audio Transcription\n\nTranscribe WAV audio to text using Whisper or Voxtral speech-to-text models. Audio is chunked into 30-second segments, mel spectrograms are extracted, and each chunk is decoded in parallel:\n\n```go\nwavData, _ := os.ReadFile(\"speech.wav\")\n\nm, _ := zerfoo.Load(\"openai/whisper-large-v3\")\ndefer m.Close()\n\ntext, err := m.Transcribe(context.Background(), wavData)\nfmt.Println(text)\n```\n\nSupports 16 kHz mono WAV input. Whisper uses 80 mel bins; Voxtral uses 128. Long audio is automatically chunked into 30-second segments and concatenated.\n\n## Quick Start\n\n```go\nm, _ := zerfoo.Load(\"google/gemma-3-4b\")  // downloads from HuggingFace\ndefer m.Close()\nresponse, _ := m.Chat(\"Explain Go interfaces in one sentence.\")\nfmt.Println(response)\n```\n\n## Installation\n\n```bash\ngo get github.com/zerfoo/zerfoo\n```\n\n## HuggingFace Download\n\n`Load` accepts HuggingFace model IDs. Models are downloaded and cached automatically:\n\n```go\n// Download by repo ID (defaults to Q4_K_M quantization)\nm, err := zerfoo.Load(\"google/gemma-3-4b\")\n\n// Specify a quantization variant\nm, err := zerfoo.Load(\"google/gemma-3-4b/Q8_0\")\n\n// Or load a local GGUF file\nm, err := zerfoo.Load(\"./models/gemma-3-1b.gguf\")\n```\n\n## Streaming\n\nStream tokens as they are generated via a channel:\n\n```go\nm, _ := zerfoo.Load(\"google/gemma-3-4b\")\ndefer m.Close()\n\nch, err := m.ChatStream(context.Background(), \"Tell me a joke.\")\nif err != nil {\n    log.Fatal(err)\n}\nfor tok := range ch {\n    if !tok.Done {\n        fmt.Print(tok.Text)\n    }\n}\nfmt.Println()\n```\n\n## Embeddings\n\nExtract L2-normalized embeddings and compute similarity:\n\n```go\nm, _ := zerfoo.Load(\"google/gemma-3-4b\")\ndefer m.Close()\n\nembeddings, _ := m.Embed([]string{\n    \"Go is a statically typed language.\",\n    \"Rust has a borrow checker.\",\n})\nscore := embeddings[0].CosineSimilarity(embeddings[1])\nfmt.Printf(\"similarity: %.4f\\n\", score)\n```\n\n## Structured Output\n\nConstrain model output to valid JSON matching a schema:\n\n```go\nimport \"github.com/zerfoo/zerfoo/generate/grammar\"\n\nm, _ := zerfoo.Load(\"google/gemma-3-4b\")\ndefer m.Close()\n\nschema := grammar.JSONSchema{\n    Type: \"object\",\n    Properties: map[string]*grammar.JSONSchema{\n        \"name\": {Type: \"string\"},\n        \"age\":  {Type: \"number\"},\n    },\n    Required: []string{\"name\", \"age\"},\n}\n\nresult, _ := m.Generate(context.Background(),\n    \"Generate a person named Alice who is 30.\",\n    zerfoo.WithSchema(schema),\n)\nfmt.Println(result.Text) // {\"name\": \"Alice\", \"age\": 30}\n```\n\n## Tool Calling\n\nDetect tool/function calls in model output (OpenAI-compatible):\n\n```go\nimport \"github.com/zerfoo/zerfoo/serve\"\n\nm, _ := zerfoo.Load(\"google/gemma-3-4b\")\ndefer m.Close()\n\ntools := []serve.Tool{{\n    Type: \"function\",\n    Function: serve.ToolFunction{\n        Name:        \"get_weather\",\n        Description: \"Get the current weather for a city\",\n        Parameters:  json.RawMessage(`{\"type\":\"object\",\"properties\":{\"city\":{\"type\":\"string\"}},\"required\":[\"city\"]}`),\n    },\n}}\n\nresult, _ := m.Generate(context.Background(),\n    \"What is the weather in Paris?\",\n    zerfoo.WithTools(tools...),\n)\n\nfor _, tc := range result.ToolCalls {\n    fmt.Printf(\"call %s(%s)\\n\", tc.FunctionName, tc.Arguments)\n}\n```\n\n## Supported Models\n\n### LLM Inference (41 architectures, 25 model families)\n\n| Architecture | Format | Special Features |\n|-------------|--------|-----------------|\n| Gemma 3 | GGUF Q4_K | Production. CUDA graph capture, 241 tok/s |\n| Gemma 3n | GGUF | Mobile-optimized variant |\n| Llama 3 | GGUF | RoPE theta=500K |\n| Llama 4 | GGUF | Latest generation |\n| Mistral | GGUF | Sliding window attention, 44 tok/s (7B Q4_K_M) |\n| Mixtral | GGUF | Mixture of Experts |\n| Qwen 2 | GGUF | Attention bias, RoPE theta=1M |\n| Phi 3/4 | GGUF | Partial rotary factor, Q2_K/Q3_K support |\n| DeepSeek V3 | GGUF | MLA + MoE (batched) |\n| Command R | GGUF | Cohere architecture |\n| Falcon | GGUF | Multi-query attention |\n| RWKV | GGUF | Linear attention |\n| GPT-2 | GGUF | TinyStories, learned position embeddings |\n| Nemotron-H | GGUF | Hybrid Mamba-2 + Attention (NVIDIA) |\n| Nemotron-Cascade-2 | GGUF | Hybrid Mamba-2 + Attention + MoE (30B-A3B) |\n| MiniMax M2 | GGUF | Sigmoid MoE (256 experts), QK norm |\n| OLMo 2 | GGUF | AI2 open language model |\n| InternLM 2 | GGUF | Shanghai AI Lab |\n| EXAONE | GGUF | LG AI Research |\n| StarCoder 2 | GGUF | Code generation, sliding window |\n| DBRX | GGUF | Fine-grained MoE (16 experts, top-4) |\n| GLM-4 / ChatGLM | GGUF | Zhipu AI, dense + MoE variants |\n| Kimi K2 | GGUF | Linear attention MoE (Moonshot AI) |\n| LFM2 | GGUF | Liquid Foundation Model, hybrid MoE |\n| Mamba / Mamba 3 | GGUF | State space models (MIMO SSM) |\n| Jamba | GGUF | Hybrid Mamba-Transformer |\n| Whisper | GGUF | Audio transcription |\n| Voxtral | GGUF | Mistral speech-to-text (encoder-projector-decoder) |\n| LLaVA | GGUF | Vision-language |\n| Qwen-VL | GGUF | Vision-language |\n\nNew architectures are auto-detected from GGUF metadata.\n\n### Tabular ML\n\n| Architecture | Package | Use Case |\n|-------------|---------|----------|\n| MLP / Ensemble | `tabular` | Baseline tabular prediction |\n| FTTransformer | `tabular` | Attention-based tabular |\n| TabNet | `tabular` | Attentive feature selection |\n| SAINT | `tabular` | Self-attention + inter-sample |\n| TabResNet | `tabular` | Residual tabular networks |\n\n### Time-Series Forecasting\n\n| Architecture | Package | Use Case |\n|-------------|---------|----------|\n| TFT | `timeseries` | Temporal Fusion Transformer |\n| N-BEATS | `timeseries` | Basis expansion forecasting |\n| PatchTST | `timeseries` | Patch-based transformer |\n\n### IBM Granite Time Series\n\n| Architecture | Format | Use Case |\n|-------------|--------|----------|\n| Granite TTM | GGUF | Zero-shot/few-shot time series forecasting |\n| Granite FlowState | GGUF | Continuous forecasting, timescale-invariant |\n| Granite TSPulse | GGUF | Anomaly detection, classification, imputation |\n\nGranite Time Series models are converted from HuggingFace using `granite2gguf`\n(part of `zonnx`). Supported tasks: forecasting, anomaly detection,\nclassification, imputation, and embedding extraction.\n\n## Training\n\nTrain tabular and time-series models with built-in AdamW, learning rate schedulers, and early stopping:\n\n```go\nimport \"github.com/zerfoo/zerfoo/tabular\"\n\nmodel := tabular.NewEnsemble[float32](engine, tabular.EnsembleConfig{\n    InputDim:  10,\n    OutputDim: 1,\n    Models:    3,\n})\ntrainer := tabular.NewTrainer(model, engine, tabular.TrainerConfig{\n    LR:     0.001,\n    Epochs: 50,\n})\ntrainer.Fit(ctx, trainX, trainY)\npredictions, _ := model.Predict(ctx, testX)\n```\n\n### Fused SDPA graph node\n\n`layers/attention.FusedSDPA[T]` wraps the existing `ScaledDotProductAttention`\nas a `graph.Node[T]` so callers can compose fused scale + softmax + matmul\nattention inside autograd graphs without re-implementing the math:\n\n```go\nimport (\n    \"github.com/zerfoo/zerfoo/layers/attention\"\n)\n\n// Causal (decoder) SDPA, head_dim=64.\nsdpa := attention.NewFusedSDPA[float32](engine, 64)\n\n// Bidirectional (encoder) SDPA with explicit Q/KV head counts.\nenc := attention.NewFusedSDPA[float32](engine, 64,\n    attention.WithFusedSDPABidirectional[float32](),\n    attention.WithFusedSDPAHeadCounts[float32](8, 8),\n)\n\n// Forward accepts (Q, K, V) or (Q, K, V, mask); Backward returns gradients\n// for [Q, K, V] (and a nil slot for mask when one was supplied).\nout, err := sdpa.Forward(ctx, q, k, v)\n```\n\nThe node is numerically equivalent to the unfused\n`Q @ Kᵀ → scale → softmax → dropout → V` chain (fp32 ≤ 1e-5 fwd / 1e-5 bwd,\nfp64 ≤ 1e-12 fwd / 1e-10 bwd; see `layers/attention/fused_sdpa_node_test.go`).\n\n## CLI\n\n```bash\ngo install github.com/zerfoo/zerfoo/cmd/zerfoo@latest\n```\n\n| Command             | Description                                                             |\n|---------------------|-------------------------------------------------------------------------|\n| `predict`           | Perform model inference on data using configurable model and data providers |\n| `tokenize`          | Tokenize text using the Zerfoo tokenizer                                |\n| `worker`            | Start a distributed training worker                                     |\n| `pull`              | Download and cache a model                                              |\n| `list`              | List cached models                                                      |\n| `rm`                | Remove a cached model                                                   |\n| `run`               | Run interactive chat with a model                                       |\n| `serve`             | Start an OpenAI-compatible inference server                             |\n| `version`           | Print the Zerfoo version                                                |\n| `automl`            | Run automated hyperparameter optimization                               |\n| `train`             | Train a model locally or distributed across multiple GPUs               |\n| `guard`             | Evaluate content safety using Granite Guardian                          |\n| `sentiment`         | Run sentiment classification on text                                    |\n| `finetune-sentiment`| Fine-tune a sentiment classification model                              |\n| `transmla`          | Convert MHA GGUF weights to multi-head latent attention (MLA) via truncated SVD |\n| `eagle-train`       | Train an EAGLE speculative decoding head                                |\n| `transcribe`        | Transcribe audio to text using a speech-to-text model                   |\n| `transmla-validate` | Compare perplexity between original and TransMLA-converted models       |\n\n## Examples\n\nSee the [`examples/`](examples/) directory for runnable programs:\n\n- **[chat](examples/chat/)** -- interactive chatbot CLI\n- **[rag](examples/rag/)** -- retrieval-augmented generation with embeddings\n- **[json-output](examples/json-output/)** -- grammar-guided structured JSON output\n- **[embedding](examples/embedding/)** -- embed inference in an HTTP server\n- **[api-server](examples/api-server/)** -- standalone API server\n- **[inference](examples/inference/)** -- basic text generation\n- **[streaming](examples/streaming/)** -- token streaming\n- **[fine-tuning](examples/fine-tuning/)** -- LoRA fine-tuning\n- **[automl](examples/automl/)** -- automated model selection\n- **[timeseries](examples/timeseries/)** -- time-series forecasting\n- **[distributed-training](examples/distributed-training/)** -- multi-node training\n- **[agentic-tool-use](examples/agentic-tool-use/)** -- function calling agent\n- **[audio-transcription](examples/audio-transcription/)** -- Whisper transcription\n\n## Documentation\n\nFull documentation at **[zerfoo.feza.ai/docs/](https://zerfoo.feza.ai/docs/)**\n\n- **[Getting Started](https://zerfoo.feza.ai/docs/getting-started/installation/)** -- install, pull a model, run inference\n- **[Tutorials](https://zerfoo.feza.ai/docs/tutorials/)** -- step-by-step guides\n- **[API Reference](https://zerfoo.feza.ai/docs/api/)** -- generate, inference, serve APIs\n- **[Cookbooks](https://zerfoo.feza.ai/docs/cookbooks/)** -- 12 runnable code recipes\n- **[Architecture](https://zerfoo.feza.ai/docs/architecture/)** -- GPU setup, architecture overview\n- **[Benchmarks](https://zerfoo.feza.ai/docs/reference/benchmarks/)** -- throughput numbers\n- **[Blog](https://zerfoo.feza.ai/docs/blog/)** -- development updates and deep dives\n- **[CONTRIBUTING.md](CONTRIBUTING.md)** -- how to contribute\n\n## License\n\nApache 2.0\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzerfoo%2Fzerfoo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzerfoo%2Fzerfoo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzerfoo%2Fzerfoo/lists"}