{"id":49918550,"url":"https://github.com/itayinbarr/little-coder","last_synced_at":"2026-05-16T18:11:31.713Z","repository":{"id":352816620,"uuid":"1207740274","full_name":"itayinbarr/little-coder","owner":"itayinbarr","description":"A coding agent optimized to smaller LLMs","archived":false,"fork":false,"pushed_at":"2026-05-13T09:44:21.000Z","size":975,"stargazers_count":1027,"open_issues_count":0,"forks_count":67,"subscribers_count":12,"default_branch":"main","last_synced_at":"2026-05-13T11:35:46.363Z","etag":null,"topics":["ai-coding-assistant","aider-polygot","benchmark","code-generation","coding-agent","coding-agents","local-llm","ollama","qwen","small-language-models","tool-use"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/itayinbarr.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-11T10:37:11.000Z","updated_at":"2026-05-13T11:22:54.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/itayinbarr/little-coder","commit_stats":null,"previous_names":["itayinbarr/little-coder"],"tags_count":40,"template":false,"template_full_name":null,"purl":"pkg:github/itayinbarr/little-coder","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itayinbarr%2Flittle-coder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itayinbarr%2Flittle-coder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itayinbarr%2Flittle-coder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itayinbarr%2Flittle-coder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/itayinbarr","download_url":"https://codeload.github.com/itayinbarr/little-coder/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/itayinbarr%2Flittle-coder/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33113562,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-16T04:41:52.686Z","status":"ssl_error","status_checked_at":"2026-05-16T04:41:52.009Z","response_time":115,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-coding-assistant","aider-polygot","benchmark","code-generation","coding-agent","coding-agents","local-llm","ollama","qwen","small-language-models","tool-use"],"created_at":"2026-05-16T18:11:27.504Z","updated_at":"2026-05-16T18:11:31.701Z","avatar_url":"https://github.com/itayinbarr.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# little-coder\n\n**A coding agent tuned for small local models, built on top of [pi](https://pi.dev).**\n\nThe research story behind all this — why scaffold–model fit matters, how a 9.7 B Qwen beat frontier entries on Aider Polyglot, and what the load-bearing mechanisms actually do — is written up on Substack: **[*Honey, I Shrunk the Coding Agent*](https://open.substack.com/pub/itayinbarr/p/honey-i-shrunk-the-coding-agent)**. Start there if you want the \"why\"; stay here for the \"how\".\n\n## How it relates to pi\n\n[pi](https://pi.dev) is the minimal substrate — agent loop, multi-provider API, TUI, session tree, compaction, extension model. Four built-in tools (read / write / edit / bash) and a ~1000-token system prompt.\n\nlittle-coder is **pi + 20 extensions + 30 skill markdown files + a Python benchmark harness**. It doesn't fork pi or shadow its CLI — pi is a plain dependency in `package.json`, and everything little-coder-specific lives under `.pi/extensions/`, `skills/`, and `benchmarks/`. You can mix little-coder with pi packages from anyone else, add your own extensions, or disable ours per-project via `.pi/settings.json`.\n\nIf you've never used pi, it's useful to skim [pi.dev](https://pi.dev) first — the rest of this doc assumes pi's model of `--agent-import-path`, `--mode rpc`, and `.pi/extensions/` auto-discovery.\n\n## Install\n\nOne-line install (Node.js 20.6+ required):\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/itayinbarr/little-coder/main/install.sh | bash\n```\n\nOr with npm directly:\n\n```bash\nnpm install -g little-coder\n```\n\nOr with [bun](https://bun.sh):\n\n```bash\nbun add -g little-coder\n```\n\nThat's the whole install. No clone, no `npm install` in a workspace, no PATH fiddling. `little-coder` is now on your PATH and works from any directory.\n\n\u003e **Note for `bun add -g` users.** The launcher (`bin/little-coder.mjs`) is a Node.js script with `#!/usr/bin/env node` at the top, so Node ≥ 20.6 still has to be on your PATH for the binary to start — bun is fine for installing/updating the package, but the runtime is Node. If you want a fully node-less setup, replace the shebang in `$(bun pm bin -g)/little-coder` with `#!/usr/bin/env bun`.\n\n## Run\n\n```bash\ncd ~/your-project\nlittle-coder --model llamacpp/qwen3.6-35b-a3b\n```\n\nThis is the canonical setup little-coder is tuned for: a local llama.cpp server hosting Qwen3.6-35B-A3B. See **[Local model setup (optional)](#local-model-setup-optional)** below for how to serve it.\n\nCloud models work the same way:\n\n```bash\nlittle-coder --model anthropic/claude-haiku-4-5\nlittle-coder --model openai/gpt-4o-mini \"What does this codebase do?\"\nlittle-coder --model ollama/qwen3.5             # local Ollama\nlittle-coder --model lmstudio/local-model       # local LM Studio (whatever model you have loaded)\nlittle-coder --list-models                      # see everything pi knows about\n```\n\nThe agent uses the directory you launched it from as its working directory — `Read` / `Write` / `Edit` / `Bash` operate on your project, not on little-coder's install path.\n\nFor local providers (llama.cpp, Ollama, LM Studio) pi expects *some* value in the API-key env even though local servers ignore it:\n\n```bash\nexport LLAMACPP_API_KEY=noop\nexport OLLAMA_API_KEY=noop\nexport LMSTUDIO_API_KEY=noop\n```\n\n`LLAMACPP_BASE_URL`, `OLLAMA_BASE_URL`, and `LMSTUDIO_BASE_URL` override the defaults (`http://127.0.0.1:8888/v1`, `http://127.0.0.1:11434/v1`, `http://127.0.0.1:1234/v1`).\n\nFor cloud providers, set the standard env (`ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, etc.) and pi will discover it.\n\n## Local model setup (optional)\n\nSkip this section if you're using a cloud model.\n\n**Option A — llama.cpp** (fastest for local; supports Qwen3.6-35B-A3B MoE):\n\n```bash\n# One-time: build llama.cpp with CUDA (sm_XXX = your GPU arch; Blackwell = 120)\ngit clone https://github.com/ggml-org/llama.cpp \u0026\u0026 cd llama.cpp\ncmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=120 -DLLAMA_CURL=ON\ncmake --build build --config Release -j\n\n# Fetch the model GGUF and the matching vision projector.\n# The mmproj (~900 MB) is what lets the model see attached screenshots.\npip install -U \"huggingface_hub[cli]\"\nhf download unsloth/Qwen3.6-35B-A3B-GGUF Qwen3.6-35B-A3B-UD-Q4_K_M.gguf --local-dir ~/models\nhf download unsloth/Qwen3.6-35B-A3B-GGUF mmproj-F16.gguf            --local-dir ~/models\n\n# Serve it (MoE trick: experts in RAM, attention on GPU → 22 GB model on 8 GB VRAM)\nbuild/bin/llama-server -m ~/models/Qwen3.6-35B-A3B-UD-Q4_K_M.gguf \\\n   --mmproj ~/models/mmproj-F16.gguf \\\n   --host 127.0.0.1 --port 8888 --jinja \\\n   -c 16384 -ngl 99 --n-cpu-moe 999 --flash-attn on\n```\n\nIf you only need text and want to skip the projector download, drop the second `hf download` line and the `--mmproj` flag — little-coder still works text-only, but the TUI's image attachment will be rejected by the server with a 4xx.\n\n**Option B — Ollama** (simpler, but slower on MoE):\n\n```bash\ncurl -fsSL https://ollama.com/install.sh | sh\nollama pull qwen3.5        # 9.7B — the paper's model\n# or: ollama pull qwen3.6-35b-a3b\n```\n\n**Option C — LM Studio** (GUI; OpenAI-compatible server on port 1234):\n\n1. Install [LM Studio](https://lmstudio.ai/) and download a model (e.g. Qwen3.6 35B A3B GGUF).\n2. Open the **Developer** / **Local Server** tab, load the model, and click **Start Server** (default `http://127.0.0.1:1234`).\n3. Run little-coder:\n   ```bash\n   export LMSTUDIO_API_KEY=noop\n   little-coder --model lmstudio/local-model\n   ```\n   The shipped `lmstudio/local-model` id routes to whatever model LM Studio currently has loaded — no extra config needed for the single-model case. If you serve on a non-default port, set `LMSTUDIO_BASE_URL=http://127.0.0.1:\u003cport\u003e/v1`. To target a specific model when you have several loaded, add an entry to `~/.config/little-coder/models.json` (see **Configuring models** below).\n\n**Serving from another machine on your LAN.** Each provider's `*_BASE_URL` env var accepts any host, not just `127.0.0.1`, so you can run inference on a beefier box and connect from a laptop or another device on the same WiFi.\n\nOn the **server** (the box with the GPU):\n\n- *llama.cpp*: start `llama-server` with `--host 0.0.0.0` (or your specific LAN interface) instead of `127.0.0.1`. Everything else from Option A unchanged.\n- *LM Studio*: in the Server tab, enable **Serve on local network** so it binds `0.0.0.0:1234` instead of `127.0.0.1:1234`.\n- *Ollama*: `OLLAMA_HOST=0.0.0.0:11434 ollama serve` (or set `OLLAMA_HOST=0.0.0.0` in the user systemd unit).\n- If `ufw` / `firewalld` is active, allow your LAN subnet to the relevant port (e.g. `sudo ufw allow from 192.168.0.0/16 to any port 8888 proto tcp`).\n- Find the LAN IP with `hostname -I` (Linux) or `ipconfig getifaddr en0` (macOS).\n\nOn the **client** (the machine running little-coder):\n\n```bash\n# Pick the env vars matching whichever provider is running on the server\nexport LLAMACPP_API_KEY=noop\nexport LLAMACPP_BASE_URL=http://\u003cserver-lan-ip\u003e:8888/v1\n\n# Sanity check reachability before launching the agent\ncurl -s http://\u003cserver-lan-ip\u003e:8888/v1/models | head\n\nlittle-coder --model llamacpp/qwen3.6-35b-a3b\n```\n\nThe streaming chat-completions adapter works over a local network the same way it does over loopback — no client code change, no proxy needed. The per-model profile in `.pi/settings.json` (context/thinking-budget/temperature) still applies because it's keyed by `\u003cprovider\u003e/\u003cmodel-id\u003e`, which the client picks regardless of where the server lives.\n\nAll small-model-specific extensions auto-disable for large/cloud models so they don't interfere.\n\n---\n\n## Configuring models\n\nThe shipped model list lives in **`models.json`** at the package root. The `llama-cpp-provider` extension reads it at startup and registers each provider via pi's `registerProvider()`. Editing this file in your global install **does** take effect — but it's overwritten on `npm install -g little-coder@latest`, so for anything you want to keep, use a user override file instead.\n\nUser override resolution (first match wins):\n\n1. `$LITTLE_CODER_MODELS_FILE` — explicit path, useful for ad-hoc tests.\n2. `$XDG_CONFIG_HOME/little-coder/models.json`\n3. `~/.config/little-coder/models.json`\n\nMerge semantics: each top-level provider key in your override file **fully replaces** the same key in the shipped `models.json`. Providers only in your file are added; providers only in the shipped file are kept. (We don't deep-merge per-model fields — you redeclare the whole provider entry, which avoids \"your override silently inherited new fields from a future package release\" surprises.)\n\nExample — switch the llama.cpp port and bump `qwen3.6-35b-a3b` to a 150K context, leave ollama untouched:\n\n```json\n{\n  \"providers\": {\n    \"llamacpp\": {\n      \"api\": \"openai-completions\",\n      \"baseUrl\": \"http://127.0.0.1:1234/v1\",\n      \"apiKey\": \"LLAMACPP_API_KEY\",\n      \"models\": [\n        {\n          \"id\": \"qwen3.6-35b-a3b\",\n          \"name\": \"Qwen3.6-35B-A3B (local llama.cpp, 150K)\",\n          \"reasoning\": true,\n          \"input\": [\"text\"],\n          \"contextWindow\": 150000,\n          \"maxTokens\": 4096,\n          \"cost\": { \"input\": 0, \"output\": 0, \"cacheRead\": 0, \"cacheWrite\": 0 }\n        }\n      ]\n    }\n  }\n}\n```\n\nThen verify with `little-coder --list-models` — you should see your overridden entry.\n\n`LLAMACPP_BASE_URL`, `OLLAMA_BASE_URL`, and `LMSTUDIO_BASE_URL` env vars still beat both files for those three providers.\n\n`.pi/settings.json` is a separate concern: it controls per-model **profiles** (context_limit, thinking_budget, temperature, benchmark_overrides) referenced by the `\u003cprovider\u003e/\u003cid\u003e` key. Profiles don't register or describe models — they only tune how little-coder runs against models that are already registered.\n\n---\n\n## Permissions\n\nlittle-coder gates `Bash` tool calls against a built-in safe-prefix whitelist (`ls`, `cat`, `head`, `tail`, `git log/status/diff`, `find`, `grep`, `cp`, `mv`, `mkdir`, `touch`, etc.) before pi's own confirmation flow ever sees them. `rm` and `sudo` are intentionally not on the list — add them via `LITTLE_CODER_BASH_ALLOW` per deployment if you really need them.\n\nTwo env vars control the gate:\n\n| Env var | Values | Effect |\n|---|---|---|\n| `LITTLE_CODER_PERMISSION_MODE` | `auto` *(default)* / `accept-all` / `manual` | `auto`: block any bash command not on the whitelist. `accept-all`: skip the gate entirely, every bash call passes (the benchmark runner sets this). `manual`: same as `auto` but with a different rejection message. |\n| `LITTLE_CODER_BASH_ALLOW` | comma-separated prefixes | Extra allow-prefixes merged with the built-in list. **Trailing whitespace is meaningful**: `\"make \"` allows `make test` but not `makefoo`; `\"make\"` allows both. |\n\nExamples:\n\n```bash\n# Add 'make' (with word-boundary) and 'docker compose ps' on top of the defaults\nexport LITTLE_CODER_BASH_ALLOW=\"make ,docker compose ps\"\n\n# Skip the gate entirely (use this only inside controlled environments)\nexport LITTLE_CODER_PERMISSION_MODE=accept-all\n```\n\nWrite/Edit confirmations are pi's responsibility; little-coder doesn't intercept those.\n\n---\n\n## Paper / benchmark results\n\n| Release | Model | Benchmark | Result |\n|---|---|---|---|\n| [**v0.0.2**](https://github.com/itayinbarr/little-coder/releases/tag/v0.0.2) (commit `1d62bde`) — the paper | Qwen3.5-9B via Ollama | Aider Polyglot (225 exercises) | **45.56 %** mean of two runs; matched-model vanilla Aider baseline 19.11 %. Paper: [*Honey, I Shrunk the Coding Agent* on Substack](https://open.substack.com/pub/itayinbarr/p/honey-i-shrunk-the-coding-agent). |\n| [**v0.0.5**](https://github.com/itayinbarr/little-coder/releases/tag/v0.0.5) — pre-pi Python | Qwen3.6-35B-A3B via llama.cpp | Aider Polyglot | **78.67 %**. [Full narrative](docs/benchmark-qwen3.6-35b-a3b.md). |\n| [**v0.1.4**](https://github.com/itayinbarr/little-coder/releases/tag/v0.1.4) — on pi | Qwen3.6-35B-A3B via llama.cpp | Terminal-Bench-Core v0.1.1 (80 tasks) | **40.0 %** in 6 h 50 min. [Write-up](docs/benchmark-terminal-bench-v0.1.1.md). |\n| [**v0.1.13**](https://github.com/itayinbarr/little-coder/releases/tag/v0.1.13) — on pi, TB 2.0 leaderboard | Qwen3.6-35B-A3B via llama.cpp | Terminal-Bench 2.0 (89 tasks × 5 trials = 445) | **24.6 % ± 3.2** — accepted to the [Terminal-Bench 2.0 leaderboard](https://www.tbench.ai/leaderboard/terminal-bench/2.0) (rank 120). |\n| [**v0.1.24**](https://github.com/itayinbarr/little-coder/releases/tag/v0.1.24) — on pi, TB 2.0 leaderboard, smaller model | Qwen3.5-9B (Q4_K_M) via llama.cpp (5.3 GB on GPU, 2× faster per-token than the 35B-A3B) | Terminal-Bench 2.0 (89 tasks × 5 trials = 445) | **9.2 % ± 2.4** — accepted to the [Terminal-Bench 2.0 leaderboard](https://www.tbench.ai/leaderboard/terminal-bench/2.0) (rank 142). |\n| [**v0.1.27**](https://github.com/itayinbarr/little-coder/releases/tag/v0.1.27) — on pi, GAIA validation | Qwen3.6-35B-A3B via llama.cpp | GAIA validation set (165 tasks) | **40.00 %** (66 / 165). L1 60.4 % / L2 37.2 % / L3 7.7 %. Test-split run pending. |\n\nAll runs used a consumer laptop: i9-14900HX, 32 GB RAM, **8 GB VRAM** on RTX 5070 Laptop (Blackwell). No cloud inference at any point.\n\n---\n\n## Roadmap\n\n**Phase 1 — wide benchmark baseline: complete.** The paper established that scaffold–model fit moves a 9.7 B model from 19 % to 45 % on Aider Polyglot, and the goal of Phase 1 was to find out how wide that impact radius is. We now have a four-benchmark baseline on a single laptop-class GPU:\n\n1. **Aider Polyglot** — 45.56 % (paper, Qwen3.5-9B) and 78.67 % (v0.0.5, Qwen3.6-35B-A3B).\n2. **Terminal-Bench-Core v0.1.1** — 40.0 % (v0.1.4).\n3. **Terminal-Bench 2.0** — accepted to the [official leaderboard](https://www.tbench.ai/leaderboard/terminal-bench/2.0): Qwen3.6-35B-A3B at **24.6 % ± 3.2** (rank 120) and Qwen3.5-9B at **9.2 % ± 2.4** (rank 142). The v0.1.24 prompt-repetition fix (re-add tool descriptions + concision guideline, validated by a 4 / 4 pilot on the previously-regressing `prove-plus-comm` task) was the prompt for both submissions.\n4. **GAIA** — validation set at v0.1.27: **40.00 %** (66 / 165) on Qwen3.6-35B-A3B. Per-level L1 60.4 % / L2 37.2 % / L3 7.7 %.\n\nThat spans short coding exercises (Polyglot), interactive shell-bound tasks (Terminal-Bench), and tool-using research (GAIA), all on the same scaffold. The data needed to choose what to fix next is now in hand.\n\n**Phase 2 — iterative improvement on real-world tasks: starting now.** The motivating question shifts from *how wide is the impact radius?* to *which scaffolding changes compound on long-horizon real work?* The signal we have already points at concrete things to try — thinking-budget / quality-monitor behavior on long-horizon tasks, deliberate.py-style parallel branches on failure, better shell-session recovery for interactive-process traps, evidence-handling on multi-document GAIA L3 tasks — but the priority order comes from real-world use, not from a benchmark suite. Expect smaller, more frequent releases driven by what little-coder actually struggles with on day-to-day coding work.\n\n**Future benchmarks (deferred).** New benchmarks like **ProgramBench**, SWE-bench Verified (multi-file real-world patches), and a GAIA test-split run come back into scope after Phase 2 has produced enough scaffolding signal to make a fresh measurement worth running. Re-benchmarking before the next round of changes lands would mostly re-measure the same baseline.\n\n---\n\n## Troubleshooting\n\n**`little-coder: command not found`** — npm's global bin directory isn't on your PATH. Run `npm config get prefix` to see where it installed; add `\u003cprefix\u003e/bin` to your PATH. Or reinstall with `sudo` if your prefix needs root.\n\n**`ECONNREFUSED 127.0.0.1:8888`** — llama.cpp isn't running. Start `llama-server` first, or switch `--model` to an Ollama/cloud ID.\n\n**LAN client times out (no `RST`, just hangs)** — the inference box's firewall is dropping the SYN. The usual cause is `ufw` with a default-deny policy that allow-lists only SSH / a few dev ports. From the server: `sudo ufw status verbose` to confirm; `sudo ufw allow from \u003cyour-lan-subnet\u003e/24 to any port 8888 proto tcp` to fix (scoped to the LAN so you're not exposing the box). Docker-published ports bypass `ufw` via `PREROUTING` NAT, which is why a Docker container can be reachable while a plain `llama-server` on the same host isn't.\n\n**Image attachment is accepted but the request returns 4xx** — your llama-server is running without a vision projector. Re-launch it with `--mmproj ~/models/mmproj-F16.gguf` (or another mmproj variant from the same GGUF repo). The `--list-models` `images` column reflects what the client *will attempt to send*, not what the server can answer; the projector is what gives the model eyes.\n\n**No API key env var warning** — pi expects *some* key even for local providers. Export `LLAMACPP_API_KEY=noop` (or `OLLAMA_API_KEY=noop`) before launching.\n\n**No pi \"Update Available\" banner** — that's intentional. little-coder defaults `PI_SKIP_VERSION_CHECK=1` so the bundled pi runtime doesn't nag about updating itself; little-coder pins pi to a known-good version per release. If you actually want the banner back, `export PI_SKIP_VERSION_CHECK=0` before launching.\n\n**Extension load failures on startup** — run `little-coder --list-models --verbose`; extension errors surface there. If the install looks corrupt: `npm uninstall -g little-coder \u0026\u0026 npm install -g little-coder`.\n\n**Node version too old** — little-coder needs Node ≥ 20.6.0. Check with `node --version`. Easiest fix: `nvm install 20 \u0026\u0026 nvm use 20`.\n\n---\n\n## Developing little-coder locally\n\nIf you want to hack on the extensions or skills:\n\n```bash\ngit clone https://github.com/itayinbarr/little-coder.git\ncd little-coder\nnpm install\nnpm link            # makes the local checkout available as `little-coder`\nlittle-coder --model llamacpp/qwen3.6-35b-a3b\n```\n\nTo unlink: `npm unlink -g little-coder`.\n\nThe benchmarks harness (`benchmarks/`) is dev-only and not shipped with the npm package. Run it from a clone with `python3 benchmarks/aider_polyglot.py …` etc.\n\n---\n\n## Architecture\n\n```\nlittle-coder/\n├── .pi/\n│   ├── settings.json               # per-model profiles + benchmark_overrides (terminal_bench, gaia)\n│   └── extensions/                 # 20 TypeScript extensions, auto-discovered by pi\n│       ├── llama-cpp-provider/     # data-driven provider registration from models.json — ships llamacpp, ollama, lmstudio (+ user override file)\n│       ├── write-guard/            # Write refuses on existing files — the whitepaper invariant\n│       ├── extra-tools/            # glob, webfetch, websearch (pi ships grep/find)\n│       ├── skill-inject/           # per-turn tool-skill selection (error \u003e recency \u003e intent)\n│       ├── knowledge-inject/       # algorithm cheat-sheet scoring (word=1.0, bigram=2.0, threshold=2.0)\n│       ├── output-parser/          # repair malformed ```tool, \u003ctool_call\u003e, bare JSON\n│       ├── quality-monitor/        # empty / hallucinated / loop detection + correction follow-up\n│       ├── thinking-budget/        # cap thinking tokens per turn, retry with thinking off\n│       ├── permission-gate/        # bash whitelist (ls, cat, git log/status/diff, etc.)\n│       ├── checkpoint/             # snapshot files before Write/Edit\n│       ├── tool-gating/            # enforces _allowed_tools at exec + schema levels\n│       ├── turn-cap/               # max_turns abort (Polyglot unbounded, TB 40, GAIA 30)\n│       ├── benchmark-profiles/     # reads settings.json → systemPromptOptions + sets temperature\n│       ├── shell-session/          # ShellSession[Cwd|Reset] — tmux-proxy + subprocess backends\n│       ├── browser/                # Playwright BrowserNavigate/Click/Type/Scroll/Extract/Back/History\n│       ├── evidence/               # EvidenceAdd/Get/List — per-session store, 1 KB snippet cap\n│       └── evidence-compact/       # preserves evidence across pi's auto-compaction\n├── skills/                         # 30 markdown files the extensions inject on demand\n│   ├── tools/*.md                  #   14 tool-usage cards\n│   ├── knowledge/*.md              #   13 algorithm cheat sheets\n│   └── protocols/*.md              #    3 research/cite/decomposition workflows\n├── benchmarks/\n│   ├── rpc_client.py               # PiRpc — spawns `pi --mode rpc`, demuxes events + UI requests\n│   ├── aider_polyglot.py           # Polyglot driver with per-language transforms\n│   ├── tb_adapter/                 # Terminal-Bench 1.0 BaseAgent (tmux-proxy)\n│   ├── harbor_adapter/             # Terminal-Bench 2.0 BaseAgent (async env.exec proxy)\n│   ├── tb_pilot.sh / harbor_pilot.sh\n│   ├── tb_status.sh / harbor_status.sh\n│   └── test_rpc_client.py\n├── AGENTS.md                       # project system prompt (pi discovers it automatically)\n├── models.json                     # canonical provider registration (loaded by llama-cpp-provider; user override at $XDG_CONFIG_HOME/little-coder/models.json)\n└── docs/\n    ├── benchmark-*.md              # per-benchmark narratives\n    └── architecture.md             # v0.0.5-era Python architecture (historical)\n```\n\n**Key invariant.** pi is a minimal base by design. Every little-coder mechanism ships as a pi extension that hooks pi's lifecycle events (`before_agent_start`, `context`, `before_provider_request`, `tool_call`, `tool_result`, `turn_end`, `session_compact`). Extensions are independent and can be enabled/disabled per deployment via `.pi/settings.json`. If you don't want one, delete its directory or disable it in settings; if you want to add another, drop it next to the existing ones.\n\n---\n\n## Reproducing the paper (v0.0.2)\n\n```bash\ngit clone https://github.com/itayinbarr/little-coder.git\ncd little-coder\ngit checkout v0.0.2\n# Follow that version's README for its Python setup (pip install -e .)\n```\n\nThe paper ran `ollama/qwen3.5` through the Python little-coder at commit **`1d62bde`** (tag [`v0.0.2`](https://github.com/itayinbarr/little-coder/releases/tag/v0.0.2)). The 45.56 % mean figure is the average of two full 225-exercise runs on that exact codebase. For the 78.67 % headline, check out tag [`v0.0.5`](https://github.com/itayinbarr/little-coder/releases/tag/v0.0.5) — both are pre-pi Python and follow the pre-pi setup.\n\n---\n\n## Citation\n\n```bibtex\n@misc{inbar2026littlecoder,\n  title        = {little-coder: A Coding Agent Optimized for Small Local Language Models},\n  subtitle     = {Architectural Adaptation Lets a 9.7B Model Outperform Frontier Models on Aider Polyglot},\n  author       = {Inbar, Itay},\n  year         = {2026},\n  month        = apr,\n  howpublished = {\\url{https://open.substack.com/pub/itayinbarr/p/honey-i-shrunk-the-coding-agent}},\n  note         = {White paper}\n}\n```\n\n---\n\n## Attribution\n\nlittle-coder v0.0.x was a derivative work of [CheetahClaws / ClawSpring](https://github.com/SafeRL-Lab/clawspring) by SafeRL-Lab, Apache 2.0. That upstream provided the Python agent substrate, tool system, multi-provider support, and REPL.\n\nlittle-coder v0.1.0+ replaces that substrate with **[pi](https://github.com/badlogic/pi-mono)** (`@mariozechner/pi-coding-agent`) by Mario Zechner — Apache 2.0 / MIT. pi provides the agent loop, provider abstraction, TUI, and extension model. little-coder rebuilds its small-model adaptations on top of pi as extensions.\n\nAll little-coder-specific mechanisms — Write-vs-Edit invariant, skill / knowledge injection, thinking-budget cap, output-parser, quality-monitor, per-model profiles, per-benchmark overrides, ShellSession / Browser / Evidence tool families, evidence-aware compaction — are preserved across versions.\n\n---\n\n## License\n\nApache 2.0 — see [LICENSE](LICENSE) for details. NOTICE tracks upstream attribution.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitayinbarr%2Flittle-coder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fitayinbarr%2Flittle-coder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fitayinbarr%2Flittle-coder/lists"}