{"id":49990585,"url":"https://github.com/juliusbrussee/cavegemma","last_synced_at":"2026-05-23T08:00:46.146Z","repository":{"id":358530108,"uuid":"1240725952","full_name":"JuliusBrussee/cavegemma","owner":"JuliusBrussee","description":"LoRA fine-tune Gemma 4 31B to speak caveman-mode natively. Style: github.com/JuliusBrussee/caveman","archived":false,"fork":false,"pushed_at":"2026-05-17T21:10:22.000Z","size":161,"stargazers_count":13,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-20T06:38:56.547Z","etag":null,"topics":["caveman","fine-tuning","gemma","llm","lora","qlora","style-transfer","unsloth"],"latest_commit_sha":null,"homepage":"https://huggingface.co/JBrussee/gemma-4-31B-caveman","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JuliusBrussee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-16T13:46:14.000Z","updated_at":"2026-05-19T17:30:31.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/JuliusBrussee/cavegemma","commit_stats":null,"previous_names":["juliusbrussee/finetune-caveman"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/JuliusBrussee/cavegemma","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliusBrussee%2Fcavegemma","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliusBrussee%2Fcavegemma/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliusBrussee%2Fcavegemma/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliusBrussee%2Fcavegemma/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JuliusBrussee","download_url":"https://codeload.github.com/JuliusBrussee/cavegemma/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JuliusBrussee%2Fcavegemma/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33290928,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T02:57:32.698Z","status":"ssl_error","status_checked_at":"2026-05-21T02:57:31.990Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["caveman","fine-tuning","gemma","llm","lora","qlora","style-transfer","unsloth"],"created_at":"2026-05-19T04:55:16.085Z","updated_at":"2026-05-22T07:01:24.344Z","avatar_url":"https://github.com/JuliusBrussee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://em-content.zobj.net/source/apple/391/rock_1faa8.png\" width=\"120\" /\u003e\n\u003c/p\u003e\n\n\u003ch1 align=\"center\"\u003ecavegemma\u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cstrong\u003ewhy use many token when few do trick — now baked in weights\u003c/strong\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://huggingface.co/JBrussee/gemma-4-31B-caveman\"\u003e\u003cimg src=\"https://img.shields.io/badge/🤗%20Model-Gemma%204%2031B%20Caveman-yellow\" alt=\"HF Model\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://huggingface.co/JBrussee/gemma-4-31B-caveman-lora\"\u003e\u003cimg src=\"https://img.shields.io/badge/🤗%20Adapter-LoRA%20534MB-blue\" alt=\"HF LoRA\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/JuliusBrussee/caveman\"\u003e\u003cimg src=\"https://img.shields.io/badge/style-caveman-orange\" alt=\"Style\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-green\" alt=\"License\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"#before--after\"\u003eBefore/After\u003c/a\u003e •\n  \u003ca href=\"#quick-start\"\u003eQuick start\u003c/a\u003e •\n  \u003ca href=\"#training-summary\"\u003eTraining\u003c/a\u003e •\n  \u003ca href=\"#eval-results\"\u003eEval\u003c/a\u003e •\n  \u003ca href=\"#reproduce\"\u003eReproduce\u003c/a\u003e\n\u003c/p\u003e\n\n---\n\n\u003e **why use many token when few do trick**\n\nFine-tune of **Gemma 4 31B** that speak [caveman](https://github.com/JuliusBrussee/caveman) natively — no skill file, no system prompt, no `/caveman` toggle. Drop articles. Drop filler. Drop pleasantries. Keep code byte-exact. Keep error strings exact. Brain big. Mouth small. Weights ship MIT-friendly under Gemma terms.\n\n## Before / After\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd width=\"50%\"\u003e\n\n### 🗣️ Base Gemma 4 31B\n\n\u003e \"The reason your React component re-renders every time the parent updates is because by default, React re-renders all children when a parent re-renders, regardless of whether their props actually changed. To prevent this, you can wrap the component in `React.memo`. However, keep in mind that `memo` does a shallow prop comparison...\"\n\n\u003c/td\u003e\n\u003ctd width=\"50%\"\u003e\n\n### 🪨 Caveman Gemma 4 31B\n\n\u003e \"Parent re-render → child re-render by default. Props 'change' each render if inline obj/array/fn → new ref → child sees new props. Fix: wrap child `React.memo(Child)`, stabilize props with `useMemo`/`useCallback`, lift state. Check: if memo'd but still re-renders, inline prop is culprit.\"\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n**why use many token when few do trick.** Same fix. Same brain. Less mouth.\n\n```\n┌─────────────────────────────────────┐\n│  COMPRESSION (eval)    ████████ 65% │\n│  CODE FENCE EXACT      ████████ 99% │\n│  SEMANTIC SIM          ████████ 94% │\n│  ARTICLE DENSITY       █░░░░░░░  1% │\n│  VIBES                 ████████ OOG │\n└─────────────────────────────────────┘\n```\n\n## Shipped weights\n\nTwo flavors. Pick by VRAM.\n\n| Repo | Format | Size | What it is |\n|---|---|---|---|\n| [`JBrussee/gemma-4-31B-caveman`](https://huggingface.co/JBrussee/gemma-4-31B-caveman) | bf16 merged | 62.5 GB | Full Gemma 4 31B, caveman baked in. Drop-in. |\n| [`JBrussee/gemma-4-31B-caveman-lora`](https://huggingface.co/JBrussee/gemma-4-31B-caveman-lora) | LoRA adapter | 534 MB | Stack on `google/gemma-4-31B-it`. Light download. |\n\n## Quick start\n\n### Merged model — no extra setup\n\n```python\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nimport torch\n\ntok = AutoTokenizer.from_pretrained(\"JBrussee/gemma-4-31B-caveman\")\nmodel = AutoModelForCausalLM.from_pretrained(\n    \"JBrussee/gemma-4-31B-caveman\",\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n)\n\nmsgs = [{\"role\": \"user\", \"content\": \"Why does my React component re-render every time the parent updates?\"}]\ninputs = tok.apply_chat_template(msgs, return_tensors=\"pt\", add_generation_prompt=True).to(model.device)\nout = model.generate(inputs, max_new_tokens=300, do_sample=False)\nprint(tok.decode(out[0, inputs.shape[1]:], skip_special_tokens=True))\n```\n\n### LoRA adapter on base — lighter download\n\n```python\nfrom peft import PeftModel\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\nimport torch\n\nbase = AutoModelForCausalLM.from_pretrained(\n    \"google/gemma-4-31B-it\",\n    torch_dtype=torch.bfloat16,\n    device_map=\"auto\",\n)\ntok = AutoTokenizer.from_pretrained(\"google/gemma-4-31B-it\")\nmodel = PeftModel.from_pretrained(base, \"JBrussee/gemma-4-31B-caveman-lora\")\n```\n\nNo system prompt needed. Ask question. Model talk caveman.\n\n## What model do\n\nRewrite or answer technical question in caveman style. Source-of-truth ruleset = the [JuliusBrussee/caveman](https://github.com/JuliusBrussee/caveman) skill (MIT). Same rules. Now welded into weights.\n\nAuth bug example (verbose → caveman):\n\n\u003e **In:** \"Sure! I'd be happy to help. The issue you're experiencing is most likely caused by your authentication middleware not properly validating the token expiry. Let me take a look...\"\n\u003e\n\u003e **Out:** \"Bug in auth middleware. Token expiry check use `\u003c` not `\u003c=`. Fix:\"\n\n## Training summary\n\n| Field | Value |\n|---|---|\n| Base | `google/gemma-4-31B-it` |\n| Method | QLoRA NF4 + double-quant + bf16 compute |\n| LoRA | rank 16, α 32, dropout 0, targets all linear |\n| Dataset | 1750 train + 193 eval (debug · review · refactor · dialogue · qa) |\n| Schedule | 3 epochs, lr 2e-4 cosine, batch 2 × grad accum 8 (eff 16), `completion_only_loss=True` |\n| Hardware | RunPod RTX PRO 6000 Blackwell 96 GB, ~$1.89/hr |\n| Wall time | ~50 min (Unsloth + TRL 0.17) |\n| Final loss | train 0.024 · eval 0.72 · eval acc 81.5% |\n\nCost end-to-end: **~$4-5 pod time**. Less than lunch.\n\n## Eval results\n\n193-pair holdout, tagged by source category. `code_fence_match` = fraction of source code fences appearing byte-exact in target.\n\n| Category | n | compression | article density | code_fence | semantic_sim |\n|---|---:|---:|---:|---:|---:|\n| dialogue | 28 | 0.59 | 0.020 | 1.000 | 0.91 |\n| debug | 34 | 0.92 | 0.009 | 0.995 | 0.98 |\n| refactor | 27 | 0.92 | 0.005 | 0.963 | 0.98 |\n| qa | 104 | 0.65 | 0.007 | 1.000 | 0.92 |\n\n**Read the numbers:**\n- ✅ Code preservation excellent — 96-100% fence-exact\n- ✅ Article density crushed — 0.5-2% (English baseline ~8%)\n- ✅ Semantic preservation strong — 91-98%\n- ⚠️ Compression weaker than gold pairs — model lands 0.6-0.9, gold sits 0.3-0.5. Filter accepted ≤1.0× source; tighten to ≤0.7 next run, push harder.\n\n## Repo layout\n\n```\ncavegemma/\n├── data/\n│   ├── seeds/                 caveman repo snapshots (SKILL.md, eval prompts)\n│   ├── sources/               per-source HuggingFace loaders\n│   ├── build_corpus.py        orchestrator (6 sources → corpus_raw.jsonl)\n│   ├── synthesize.py          claude/codex CLI driver, two-step rewrite, resumable\n│   ├── filter.py              fence-integrity + dedup + compression band\n│   ├── split.py               90/10 split with seed-pair pinning\n│   └── prompts/, out/         (out gitignored)\n├── training/\n│   ├── train_unsloth.py       Unsloth + TRL SFT trainer, resume from checkpoint\n│   ├── runpod_bootstrap.sh    pip + auth bootstrap for fresh pod\n│   └── config.toml            single source of truth for hyperparams\n├── eval/\n│   ├── metrics.py             compression / article-drop / code-fence / semantic_sim\n│   ├── run_eval.py            score adapter on holdout + workflow prompts\n│   ├── judge.py               LLM-judge via claude CLI on 20 holdouts\n│   └── workflow_prompts.jsonl 10 hand-curated workflow eval prompts\n├── scripts/\n│   ├── infer.py               smoke-test against caveman eval prompts\n│   └── push_to_hub.py         publish adapter + model card\n└── artifacts/                 (gitignored)\n```\n\n## Reproduce\n\nEnd-to-end. ~6-8 hours wall, ~$4-5 pod.\n\n```bash\n# 1. Local setup\nuv sync\nuv run python data/extract_seeds.py            # 20 gold seed pairs\nuv run python data/build_corpus.py             # 3000 rows from 6 HF sources\n\n# 2. Synthesis (Claude Code or Codex CLI required)\nuv run python data/synthesize.py --backend claude --workers 3      # or\nuv run python data/synthesize.py --backend codex --workers 3\n\n# 3. Filter + split\nuv run python data/filter.py --in data/out/raw_pairs.jsonl --out data/out/clean_pairs.jsonl\nuv run python data/split.py --in data/out/clean_pairs.jsonl\n\n# 4. RunPod H100 / RTX PRO 6000 — rsync, ssh, bootstrap, train\nrsync -avz --exclude='.git' --exclude='.venv' -e \"ssh -p \u003cport\u003e -i ~/.ssh/id_ed25519\" ./ root@\u003cpod\u003e:/workspace/cavegemma/\nssh -i ~/.ssh/id_ed25519 -p \u003cport\u003e root@\u003cpod\u003e \"\n  export HF_TOKEN=...\n  export WANDB_API_KEY=...\n  cd /workspace/cavegemma\n  bash training/runpod_bootstrap.sh\n  python training/train_unsloth.py --config training/config.toml\n\"\n\n# 5. Eval + ship\npython eval/run_eval.py --adapter artifacts/adapter --eval data/out/eval.jsonl --workflow eval/workflow_prompts.jsonl --out artifacts/eval_predictions.jsonl\npython scripts/push_to_hub.py --adapter artifacts/adapter --repo \u003chf-user\u003e/gemma-4-31B-caveman-lora\n```\n\n## Datasets\n\nAll permissively licensed. 6 sources in, 1750 train + 193 eval out.\n\n| Source | License | Pulled | Used for |\n|---|---|---:|---|\n| [`OpenAssistant/oasst2`](https://huggingface.co/datasets/OpenAssistant/oasst2) | Apache 2.0 | 400 | Multi-turn dialogue |\n| [`princeton-nlp/SWE-bench_Verified`](https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified) | research-permissive | 400 | Debug-session narratives |\n| [`ronantakizawa/github-codereview`](https://huggingface.co/datasets/ronantakizawa/github-codereview) | permissive subset | 400 | Code review |\n| [`bigcode/commitpackft`](https://huggingface.co/datasets/bigcode/commitpackft) | MIT/Apache subset | 300 | Refactor walkthroughs |\n| [`theblackcat102/evol-codealpaca-v1`](https://huggingface.co/datasets/theblackcat102/evol-codealpaca-v1) | Apache 2.0 | 1200 | Short technical Q\u0026A |\n| [`HuggingFaceH4/ultrachat_200k`](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | MIT | 300 | Short Q\u0026A overflow |\n\nCaveman side synthesized via Claude Code (`claude -p`) and Codex CLI (`codex exec` with GPT-5.5), routed through the canonical [SKILL.md](https://github.com/JuliusBrussee/caveman/blob/main/skills/caveman/SKILL.md) ruleset. Two-step rewrite + fence-integrity filter.\n\n## Limitations\n\n- **Compression weaker than gold caveman.** Model averages 0.6-0.9 vs gold's 0.3-0.5. Training filter accepted ≤ 1.0× source length; tighten to ≤ 0.7 next run.\n- **Review category sparse.** Codex pairs often mutated diff fences, so filter dropped most. Only ~8 review pairs in eval — review behavior extrapolated from debug/refactor neighbors.\n- **Workflow eval gates partly info-only.** Open-ended prompts in `workflow_prompts.jsonl` have no reference; `code_fence_match` checks input fences in answer, `semantic_sim` compares answer to question. Treat as smoke signal, not scoreboard.\n- **Multimodal untouched.** Gemma 4 is vision + audio capable. Fine-tune was text-only on language head; vision/audio paths should still work but unverified.\n\n## Caveman Ecosystem\n\nThree rocks. One philosophy: **model do more with less**.\n\n| Repo | What |\n|---|---|\n| [**caveman**](https://github.com/JuliusBrussee/caveman) | Output compression skill — *why use many token when few do trick* |\n| [**cavemem**](https://github.com/JuliusBrussee/cavemem) | Cross-agent memory — *why agent forget when agent can remember* |\n| [**cavekit**](https://github.com/JuliusBrussee/cavekit) | Spec-driven build loop — *why agent guess when agent can know* |\n| **cavegemma** *(you here)* | Caveman baked into weights — *why prompt every session when weights remember* |\n\nSkill compresses any model at runtime. This repo welds the same style into Gemma 4 31B so caveman survives across hosts, agents, no-system-prompt setups. Cheap inference, no skill loader, same brain.\n\n## License\n\n- Code in this repo: **MIT**\n- Adapter and merged model inherit the **Gemma Prohibited Use Policy** (Apache 2.0 + Gemma terms). See [Google's Gemma terms](https://ai.google.dev/gemma/terms).\n- Style ruleset and seed pairs from [`JuliusBrussee/caveman`](https://github.com/JuliusBrussee/caveman): MIT.\n\n## Citing\n\n```\n@misc{brussee2026cavemanGemma,\n  author = {Julius Brussee},\n  title  = {Caveman-mode Gemma 4 31B},\n  year   = {2026},\n  url    = {https://huggingface.co/JBrussee/gemma-4-31B-caveman}\n}\n```\n\n## Star This Repo\n\nStar cost zero. Help small mouth find big audience. ⭐\n\n[![Star History Chart](https://api.star-history.com/svg?repos=JuliusBrussee/cavegemma\u0026type=Date)](https://star-history.com/#JuliusBrussee/cavegemma\u0026Date)\n\n## Also by Julius Brussee\n\n- **[caveman](https://github.com/JuliusBrussee/caveman)** — the original Claude Code skill this fine-tune is built on\n- **[Revu](https://github.com/JuliusBrussee/revu-swift)** — local-first macOS study app with FSRS spaced repetition. [revu.cards](https://revu.cards)\n\n## See also\n\n- Style source-of-truth: [JuliusBrussee/caveman](https://github.com/JuliusBrussee/caveman)\n- Agent / pitfall notes: [`AGENTS.md`](AGENTS.md)\n\n---\n\n\u003cp align=\"center\"\u003e\u003cem\u003ewhy use many token when few do trick\u003c/em\u003e 🪨\u003c/p\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliusbrussee%2Fcavegemma","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjuliusbrussee%2Fcavegemma","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjuliusbrussee%2Fcavegemma/lists"}