{"id":51397085,"url":"https://github.com/solariun/easyai","last_synced_at":"2026-07-04T03:10:27.970Z","repository":{"id":353758087,"uuid":"1220782770","full_name":"solariun/easyai","owner":"solariun","description":"Easyai - run local models with your tools, easy tools def, buildin RAG, fs tools, web search and fetch, MCP server and ai client with local tools all made simple and easy","archived":false,"fork":false,"pushed_at":"2026-06-19T19:14:49.000Z","size":6302,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"develop","last_synced_at":"2026-06-19T19:16:51.437Z","etag":null,"topics":["ai","ai-webui","fs-tools","gguff","llama-cpp","local","rag","server","tools","web-fetch","web-search","webui"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/solariun.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY_AUDIT.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-25T10:23:54.000Z","updated_at":"2026-06-19T19:14:53.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/solariun/easyai","commit_stats":null,"previous_names":["solariun/easy","solariun/easyai"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/solariun/easyai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solariun%2Feasyai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solariun%2Feasyai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solariun%2Feasyai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solariun%2Feasyai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/solariun","download_url":"https://codeload.github.com/solariun/easyai/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/solariun%2Feasyai/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35108417,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","ai-webui","fs-tools","gguff","llama-cpp","local","rag","server","tools","web-fetch","web-search","webui"],"created_at":"2026-07-04T03:10:27.081Z","updated_at":"2026-07-04T03:10:27.954Z","avatar_url":"https://github.com/solariun.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# easyai\n\n\u003e **A C++17 framework anyone can use to build AI agents that talk to\n\u003e their own services — no llama.cpp, JSON-Schema, or template-engine\n\u003e knowledge required.**\n\neasyai turns [llama.cpp](https://github.com/ggml-org/llama.cpp) into an\n*agent engine* you can drop into any program in a dozen lines.  You give\nit C++ functions; it gives the model the ability to call them.  That's\nthe whole pitch.\n\nIt ships **one unified library** (`libeasyai`) you can\n`find_package(easyai)` and link against — **plus a complete set of\nready-to-run applications** built on it: a private, OpenAI-compatible\n**AI server** with a polished web dashboard, a full-screen agent\n**CLI/TUI**, an **MCP provider**, and a local **REPL** — all backed by a\n**batteries-included toolset** (web search \u0026 fetch, sandboxed files,\nshell, Python, memory/RAG, MCP). See [`LIB_GUIDE.md`](LIB_GUIDE.md) for\nthe OpenAI-Python-SDK-shaped `easyai::Session` quickstart and the tour of\nthe lib surface.\n\n| Library             | Purpose                                                                                                                                       |\n|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|\n| `libeasyai`         | Everything in one shared object — `easyai::Engine` (local llama.cpp), `easyai::Client` (OpenAI-protocol HTTP), `easyai::Session` (one-call agent), `easyai::Tool` + built-ins (datetime/web/fs/bash/python/memory/tool_lookup), external-tool loader, RAG store, MCP server/client, the `preamble` composer. Linked via `easyai` (alias `easyai::easyai`). Legacy aliases `easyai::engine` and `easyai::cli` still resolve to the same target. |\n\n| Binary               | What it gives you                                                                                                                                  |\n|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|\n| `easyai-local`       | Local-only REPL: loads a GGUF in-process via `easyai::Engine` (driven through `easyai::Session`). Drop-in `llama-cli` replacement — one-shot scripting (`-p`), tools, presets, optional `\u003cthink\u003e` strip, sandboxed `*_file` tools, opt-in `bash` tool. |\n| `easyai-cli`         | Agentic OpenAI-protocol client — no local model.  Full-screen chat **TUI** (opencode-style look \u0026 feel: markdown, live tool rows with diffs, `/`-command + `@`-file completion, themes — default for interactive terminals; `--plain` for the legacy line REPL), `--shell` (hybrid AI shell), or `-p` one-shot.  Full sampling control (`--temperature`, `--top-p`, `--top-k`, `--min-p`, `--repeat-penalty`, `--frequency-penalty`, `--presence-penalty`, `--seed`, `--max-tokens`, `--stop`), plan tool, server-management subcommands (`--list-models`, `--list-tools`, `--health`, `--props`, `--metrics`, `--set-preset`).  HTTPS via OpenSSL; `--insecure-tls` / `--ca-cert` for dev/internal CAs.  Full doc: [`easyai-cli.md`](easyai-cli.md). |\n| `easyai-server`      | Drop-in `llama-server` replacement: OpenAI-compat HTTP **with full SSE streaming**, embedded SvelteKit webui, Bearer auth, Prometheus `/metrics`, KV-cache controls, flash-attn, mlock.  Built-in **[MODELS dashboard](MODELS.md)** (`/models`) — native hardware-aware model recommendations, local-model parameters + in-process hot-swap, and a GGUF download manager, password-gated.  Speaks MCP, OpenAI, Ollama from one process.  Full doc: [`easyai-server.md`](easyai-server.md). |\n| `easyai-mcp-server`  | **Standalone Model Context Protocol provider — no model loaded.** Same tool catalogue as `easyai-server` (built-ins + knowledge tools + external-tools), exposed over `POST /mcp` with a configurable cpp-httplib worker pool (`--threads`) and an in-flight `tools/call` cap (`--max-concurrent-calls`) for thousands-of-clients deployments.  Full doc: [`easyai-mcp-server.md`](easyai-mcp-server.md). |\n| `easyai-library-demo`| Five-line `easyai::Session` template — pair with [`LIB_GUIDE.md`](LIB_GUIDE.md).  The smallest \"build an agent, register a tool, chat\" program in the repo. |\n| `easyai-agent`       | A demo agent showing every built-in tool plus an inline custom tool.                                                                                |\n| `easyai-recipes`     | Tutorial agent paired with `manual.md` — implements `today_is` and `weather` (HTTP-calling) from scratch.                                          |\n| `easyai-chat`        | A bare-bones REPL with no tools — useful as a sanity check.                                                                                          |\n\n## ⭐ Applications \u0026 a batteries-included toolset\n\neasyai is more than a library — it's a **complete, self-hostable AI stack** with a\n**rich set of tools** the model can use out of the box.\n\n**The applications**\n\n- 🖥️ **easyai-server** — your own private, **OpenAI-compatible AI server** with a\n  polished chat web UI and the **[MODELS dashboard](MODELS.md)** (`/models`): browse \u0026\n  fit-score HuggingFace models against *your* hardware, read a model's GGUF\n  parameters, **hot-swap the running model in one click**, and download weights — all\n  password-gated. The dashboard keeps the **1000 most-recently-updated GGUF repos**\n  in a searchable catalog, cached on disk (`--data-dir`, default `/var/lib/easyai/data`)\n  and refreshed from HuggingFace on request once it is \u003e1h old. Full SSE streaming,\n  Prometheus `/metrics`, Bearer auth, KV-cache / flash-attn knobs. Speaks **MCP, OpenAI\n  and Ollama** from one process. A drop-in `llama-server`, supercharged.\n- 💬 **easyai-cli** — a gorgeous full-screen agent **TUI** (markdown, live tool rows\n  with diffs, `/`-commands, `@`-file completion, themes), a hybrid **AI shell**\n  (`--shell`), or one-shot `-p` scripting — against any OpenAI-protocol endpoint, with\n  full sampling control and server-management subcommands.\n- 🔌 **easyai-mcp-server** — expose the **entire toolset as an MCP provider** that any\n  agent (Claude Desktop, Cursor, …) can call, with a tunable worker pool for\n  thousands-of-clients deployments.\n- ⚡ **easyai-local** — a local GGUF **REPL** (`llama-cli`++) with tools, presets, and\n  sandboxing — no server required.\n\n**The toolset** — registered with a single flag, available to every app and the MCP\nserver:\n\n| Tool | What it does |\n|------|--------------|\n| 🌐 **web** | Live internet **search + fetch** (SearXNG / Google CSE / direct URL). |\n| 📁 **fs** | Sandboxed **read / write / list / grep** over a directory. |\n| 🐚 **bash** · 🧮 **evaluate** | Run shell commands, or **isolated stdlib-only Python** for compute. |\n| 🧠 **memory + RAG** | Persistent knowledge store with **automatic vocabulary injection** ([`RAG.md`](RAG.md)). |\n| 🔗 **MCP client** | Consume tools from **any remote MCP server** ([`MCP.md`](MCP.md)). |\n| 🛠️ **external tools** | Wire up **your own CLIs** from a JSON manifest — zero code ([`EXTERNAL_TOOLS.md`](EXTERNAL_TOOLS.md)). |\n| 🧩 **plan · datetime · tool_lookup · remote-model** | Planning, authoritative time, large-catalogue tool discovery, peer-model delegation. |\n\nFull catalogue and safety model: **[`AI_TOOLS.md`](AI_TOOLS.md)**.\n\n\u003e **Status** — used in production on a Linux Vulkan box (Radeon 680M)\n\u003e as a self-hosted ChatGPT-style assistant.  Apple Silicon (Metal),\n\u003e Linux/Windows Vulkan, NVIDIA CUDA, and AMD ROCm are all wired up out\n\u003e of the box.  `scripts/install_easyai_server.sh` handles the whole\n\u003e Debian/Ubuntu deployment in one command (systemd-coredump,\n\u003e hardened unit, optional `--enable-verbose`, drop-in compat with\n\u003e `install_llama_server.sh`).\n\n---\n\n## What's new\n\nA running log of user-facing changes. Latest first — keep this list\ncurrent as features land so anyone returning to the repo (or\nlanding on it for the first time) sees what shipped recently.\n\n### 2026-05-27 — Unified library + `easyai::Session` (OpenAI-Python-SDK shape)\n\nSingle library now — `libeasyai` carries everything (Engine, Client,\nevery tool, the system-prompt composer, Session). The previous split\ninto `libeasyai` + `libeasyai-cli` is gone. Demos all link a single\ntarget (`easyai`). Legacy aliases (`easyai::engine`, `easyai::cli`)\nstill resolve to it so existing CMakeLists keep working.\n\nThe new `easyai::Session` (in `easyai/session.hpp`) is the\nrecommended entry point — five fluent lines for \"build an agent,\nregister a tool, chat\", mirroring the OpenAI Python SDK call site:\n\n```cpp\nauto session = easyai::Session::remote(\"http://localhost:8080\");\nsession.with_default_tools()\n       .system_append(\"Speak in plain English.\")\n       .on_token([](const std::string \u0026 p){ std::fputs(p.c_str(), stdout); });\nstd::string err;\nsession.init(err);\nsession.chat(\"hello\");\n```\n\nPair with `easyai::Tool::builder(...).system_addendum(\"...\")` to let a\ncustom tool ship its own system-prompt guardrails — Session\nauto-concatenates them. Full reference in\n[`LIB_GUIDE.md`](LIB_GUIDE.md); minimum demo in `services/library_demo.cpp`\n(binary `easyai-library-demo`). `services/local.cpp` has been\nmigrated to Session as a reference for in-process agents; `cli.cpp`\nand `server.cpp` follow in a later pass and continue to work via the\nexisting Engine/Client paths.\n\n### 2026-05-26 — `python3` model-facing rename to `evaluate` (back-compat alias)\n\nFinal fix to a stubborn failure mode: models with a strong \"Python\nwrites files / runs subprocesses / fetches URLs\" training prior were\nreaching for the `python3` tool to do exactly those system-side\nthings even when the system prompt, the tool description, and the\nsandbox PermissionError all said *don't*. The lighter fixes (Shape-C\nshort triggers, write/edit policy block, runtime sandbox enforcement)\ntook the failure rate down but didn't kill it.\n\n**The rename:** the **model-facing** tool name changed from\n`python3` → `evaluate`. The runtime is still Python 3 (operators\nstill see `--no-python`, `[SERVER] allow_python`, the Python sandbox\npreamble, etc.). The split is deliberate:\n\n| Surface | Name |\n|---|---|\n| Tool name in `\u003ctools\u003e` / `tools/list` / what the model dispatches | `evaluate` |\n| Tool short description | `\"Evaluate Python 3 code for compute / algorithm prototyping. FORBIDDEN: filesystem, subprocess, network, ctypes. Stdlib compute only.\"` |\n| Operator CLI flag | `--no-python` (unchanged) |\n| Operator INI key | `[SERVER] allow_python` (unchanged) |\n\n`canonical_tool_name(\"python3\")` returns `\"evaluate\"` so resumed chat\nsessions, external-tools manifest reservation lists, and any caller\nthat dispatches `python3` by name still work — the dispatcher routes\nthe legacy name to the new tool, no second schema shipped.\n\nReframing: model now sees an \"evaluate\" affordance with an explicit\n**FORBIDDEN: filesystem, subprocess, network, ctypes** list, not a\n\"python3\" affordance with a \"don't write files\" caveat. The first\nnon-generic word the model parses on the bullet line is \"evaluate\" —\nno \"python = open(f, 'w')\" training prior to override.\n\n### 2026-05-26 — Shape-C tools wire shape, `evaluate` read-only, `fs.ops` 50/20\n\nFive linked changes refactor how tools reach the model:\n\n* **Shape-C wire shape.** Per-turn `\u003ctools\u003e` blocks now ship\n  `name + short_description + schema` (~2 000 tokens saved per\n  session on a typical catalogue). Full multi-line manual stays\n  in libeasyai and is returned by `tool_lookup(name=\"\u003cx\u003e\")` on\n  demand. `Tool::short_description` + `wire_description()` are\n  new; tools without an explicit short trigger fall back to the\n  first 120 chars of `description`.\n\n* **`tool_lookup` gains a MANUAL view.** No-arg call returns the\n  INDEX (numbered `name: short trigger` list); `name=\"\u003csubstr\u003e\"`\n  returns the FULL description for every match. The model uses\n  the index to scan and drills in only when it needs the manual.\n\n* **`evaluate` (formerly `python3`) is now read-only on disk.**\n  `kPythonSandboxPreamble` rejects any write-mode `open()` (mode\n  `'w'/'a'/'x'/'+'` or `os.open` with\n  `O_WRONLY|O_RDWR|O_CREAT|O_TRUNC|O_APPEND`) regardless of path.\n  Read-only opens inside the sandbox still work. `PermissionError`\n  points the model at the filesystem write tool registered this\n  session. Defense-in-depth — adversarial bypasses (`ctypes`,\n  `subprocess`, `_io.FileIO`, closure-cell introspection) are\n  documented residuals.\n\n* **`fs(action=\"ops\")` batch caps raised to 50 ops / 20 files.**\n  One call can land up to 50 file operations across up to 20\n  distinct files. Same-path edits auto-reorder bottom-up so every\n  `start_line` refers to the file's ORIGINAL line numbers — no\n  manual offset math. Report header names the touched files;\n  successful `read` ops in a batch clip at 2 KiB; failed ops show\n  the full diagnostic so the model can self-correct without\n  re-running.\n\n* **`fs(action=\"ops\")` batch lives on the unified `fs` surface.**\n  Default `ToolMode` stays `Split` (one focused tool per action —\n  small models drive it more reliably). To pick up the batch, run\n  with `--tools-mode unified` (or `--tools-mode both`).\n\nPlus: MCP server adds an `initialize.instructions` field carrying\nthe closed-set rule\nplus the same write/edit policy; memory vocabulary block moved to\nthe preamble tail and cached by `(mtime, file count)` so prompt-\neval KV stays warm across memory writes.\n\nSecurity audit: see [SECURITY_AUDIT.md §23 (eighth pass)](SECURITY_AUDIT.md#23-eighth-pass--2026-05-26-shape-c-tools-refactor).\nOne MEDIUM finding (`tools_block` rendered untrusted fields verbatim\n— fixed by `sanitize_for_prompt`), two LOW residuals documented.\n\n### 2026-05-17 — MTP speculative decoding (`--spec-type draft-mtp`) + installer `--mtp`\n\nllama.cpp's Multi-Token Prediction merged upstream on 2026-05-16; we\nbumped our vendored llama.cpp checkout to `39cf5d619` (same-day HEAD,\nall 262 commits since the previous pin) and wired the MTP path\nthrough the three layers in one go.\n\n**Library API** ([include/easyai/engine.hpp](include/easyai/engine.hpp)):\n```cpp\nengine.spec_type(\"draft-mtp\")      // or: none (default), draft-simple,\n                                    //     draft-eagle3, ngram-simple,\n                                    //     ngram-map-k, ngram-map-k4v,\n                                    //     ngram-mod, ngram-cache\n       .spec_draft_n_max(6);        // max draft tokens per step\n```\nUnknown strings land in `Engine::last_error()` and leave speculation\noff (no silent default switch).\n\n**Server CLI**:\n```bash\neasyai-server -m /path/to/mtp-model.gguf \\\n  --spec-type draft-mtp --spec-draft-n-max 6\n```\nINI keys: `[ENGINE] spec_type` and `[ENGINE] spec_draft_n_max`.\n\n**Installer shortcut**:\n```bash\n./install_easyai_server.sh --mtp                # n_max=6 (default)\n./install_easyai_server.sh --mtp --mtp-n-max 8  # override\n```\nThe installer bakes the two flags into the systemd `ExecStart` so the\nservice inherits MTP without `systemctl edit`.\n\n**Caveat**: MTP needs a model TRAINED with MTP heads (DeepSeek V3,\nMimoVL, and similar). Plain models will refuse to load with\n`--spec-type draft-mtp`. The installer's `--mtp` flag is the operator\nsaying \"I know what I'm doing\"; there's no validation.\n\nClassic standalone-draft-model speculative decoding (the\n`--draft-model PATH` path) is not yet wired — only MTP, which doesn't\nneed a separate model file. The old installer compat lines for\n`--draft-model` / `--draft-max` / `--draft-min` still warn and skip.\n\n### 2026-05-16 — Memory vocabulary auto-injection + shared `easyai::preamble::build()`\n\nEvery binary that loads `--memory \u003cdir\u003e` now auto-injects a compact\nkeyword-vocabulary block into the system prompt so the model knows\nwhat it has tagged without having to call `keywords_knowledge`\nfirst. The block looks like:\n\n```\n# MEMORY VOCABULARY (the keywords your private memory currently\nhas tagged — the FIRST place to look for anything you might\nalready know)\n12 entries (most-common first; call search_knowledge(\nkeywords=[\"\u003cname\u003e\", ...]) to recall):\neasyai(8) claude(5) bitnet(3) build(3) iteration(2) …\n```\n\nSorted count desc / name asc, capped at top 40. Empty store →\nblock omitted, no wasted tokens.\n\n| Binary | When the vocab is computed |\n| --- | --- |\n| `easyai-server` | Every request (fresh disk scan, ~10-50ms — rounding error vs. inference). New saves visible on the next request. |\n| `easyai-local`  | Once at startup, appended to the system prompt. New saves visible after restart. |\n| `easyai-cli`    | Once when building the system prefix sent to the remote server. |\n\nThe AUTHORITATIVE preamble used to live as a `build_authoritative_\npreamble` inside `services/server.cpp` with parallel partial\ncopies in `local.cpp` and nothing in `cli.cpp`. That drift is gone:\nthe builder is now public in libeasyai —\n\n```cpp\n// include/easyai/preamble.hpp\nnamespace easyai::preamble {\n    struct Options {\n        bool        inject_datetime  = true;\n        std::string knowledge_cutoff = \"2024-10\";\n        std::string memory_root;        // empty → vocab block omitted\n    };\n    std::string build(const Options \u0026 opt);\n}\n```\n\n— and all three binaries call it. Change the renderer once, every\nbinary updates. Third-party hosts of libeasyai get the same\nbehaviour out of the box.\n\nSee `RAG.md` §5 \"Automatic vocabulary injection\" and `design.md`\n§5c for the full design.\n\n### 2026-05-15 — `split` is the new tools-mode default\n\nSame-day follow-up to the morning's `--tools-mode` landing: **`split`\nis now the out-of-the-box default**, not `unified`.\n\nReason: smaller / quantised tool-callers (Llama 3 8B, Qwen 2.5 7B,\nPhi-3.5, GPT-OSS-20B) dispatch much more reliably against flat\none-verb-per-tool schemas than against a `fs(action=\"...\")`\ndiscriminated-union dispatcher.  Large models handle either shape\nfine.  The split surface costs ~15-20% extra system-prompt tokens for\na 30-50% reduction in retry / \"unknown action\" hops in practice —\nworth it for everyone, surprising for nobody.\n\n| Surface | Registered out of the box | Old behaviour | New default |\n| --- | --- | --- | --- |\n| Multi-action families | `fs`, `web` | 2 dispatchers + 7 knowledge tools | `read_file`, `write_file`, `append_file`, `edit_file`, `list_file`, `glob_file`, `grep_file`, `check_path_file`, `cwd_file`, `sandbox_path_file`, `search_web`, `fetch_web`, `knowledge_save`, `knowledge_append`, `search_knowledge`, `knowledge_load`, `knowledge_list`, `knowledge_delete`, `keywords_knowledge` — 19 focused tools |\n\n```bash\n# new default (no flag)\neasyai-cli --url http://ai.local:8080 --sandbox ~/proj\n\n# opt back in to the legacy dispatcher (3 tools instead of 19)\neasyai-cli --tools-mode unified --url ai.local:8080 --sandbox ~/proj\n\n# best of both worlds — costs more tokens, lets the model pick\neasyai-cli --tools-mode both --url ai.local:8080 --sandbox ~/proj\n```\n\nLibrary callers: `Toolbelt::tool_mode_` now defaults to\n`ToolMode::Split`; pass `ToolMode::Unified` explicitly if your prompt\nrelies on the legacy tool names.\n\nINI: `[cli] tools_mode = unified|split|both` (default `split`).\n\n### 2026-05-15 — `--tools-mode` lets small models work with one-verb-per-tool\n\n`fs` and `web` ship as **unified dispatchers** with an\n`action` parameter (e.g. `fs(action=\"read\", ...)`).  That shape keeps\nthe system prompt small and lets a large model batch many actions, but\n**smaller / quantised tool-callers** (Llama 3 8B, Qwen 2.5 7B, Phi-3.5,\nGPT-OSS-20B) gravitate toward one-purpose tools — `read_file`, `edit_file`,\netc. — because the verb IS the tool name and the parameter schema is\nflat.\n\nThree modes, selected by the new flag (defaults flipped to `split` in\nthe same-day follow-up entry above):\n\n```\neasyai-cli --tools-mode unified     # legacy: one dispatcher per family\neasyai-cli --tools-mode split       # one focused tool per action\neasyai-cli --tools-mode both        # register both surfaces side-by-side\n```\n\n| Mode | Tools registered (with `--sandbox` + `--memory`) |\n| --- | --- |\n| `unified` | `fs`, `web` — 2 dispatchers + 7 `knowledge_*` tools |\n| `split` (new default) | `read_file`, `write_file`, `append_file`, `edit_file`, `list_file`, `glob_file`, `grep_file`, `check_path_file`, `cwd_file`, `sandbox_path_file`, `search_web`, `fetch_web`, `knowledge_save`, `knowledge_append`, `search_knowledge`, `knowledge_load`, `knowledge_list`, `knowledge_delete`, `keywords_knowledge` — 19 focused tools |\n| `both` | unified + split, same handlers under both names |\n\nSame handlers under the hood — behaviour is identical to the unified\nsurface; only the registration shape changes.  Library API:\n\n```cpp\neasyai::cli::Toolbelt()\n    .sandbox(\"/srv/data\")\n    .tool_mode(easyai::cli::ToolMode::Split)   // or Both, or Unified\n    .apply(client);\n```\n\nINI: `[cli] tools_mode = unified|split|both`.\n\n### 2026-05-13 — `easyai-cli` session resume flips back to opt-in\n\nReverts the 2026-05-12 default flip: loading the existing\n`.easyai_session` is **opt-in again** via `--continue`.  Without the\nflag, any file in cwd is ignored and overwritten on the first turn\n— matching the behaviour shipped originally on 2026-05-12 morning\nbefore the auto-on flip.\n\nWhy: the auto-on default surprised operators who opened a project\ndirectory expecting a fresh agent and instead picked up history\nfrom a previous experiment.  An explicit opt-in matches the rest\nof the cli's surface (nothing else implicitly carries state across\ninvocations) and removes the silent action-at-a-distance.\n\n| | Previous (2026-05-12 → 2026-05-13) | Now |\n| --- | --- | --- |\n| Resume on launch | default ON | opt-in via `--continue` |\n| Start fresh | opt-in via `--no-continue` | **default** |\n| `--compress` without `--continue` | no-op (warning) | no-op (warning) |\n\nSaving is unchanged: every turn (and every tool round-trip) still\nrewrites `.easyai_session` atomically.  `--no-continue` stays as the\nexplicit form of the default — useful for scripts overriding an\noperator's `[cli] auto_continue = on` INI line.\n\nDefault for `[cli] auto_continue` flips to `false`.  Operators who\nprefer the auto-on behaviour can opt in once via INI:\n\n```ini\n[cli]\nauto_continue = true\n```\n\nFull doc: [`easyai-cli.md`](easyai-cli.md) §10.\n\n### 2026-05-13 — Installer: cap `easyai-server` restart attempts at 2\n\nThe systemd unit now carries `StartLimitBurst=2` +\n`StartLimitIntervalSec=60` in `[Unit]`, so the service attempts to\nstart at most **twice** in any 60-second window before giving up and\nleaving the unit in the `failed` state.\n\nBefore, `Restart=on-failure` + `RestartSec=10` with no burst cap\nwould retry indefinitely — a missing model file, a bad CLI flag, or\na GPU that wasn't exposed to the container produced an infinite\nrestart loop that filled journald and never surfaced the real\nproblem.\n\nNow:\n\n| State | Behaviour |\n| --- | --- |\n| Initial start fails | Wait `RestartSec=10`, retry once |\n| Retry also fails | Unit enters `failed` state; no further attempts |\n| Long-running service fails after running \u003e 60 s | Burst counter has reset → still gets one retry (not penalised for late failures) |\n\nRecovery: `journalctl -u easyai-server` to inspect the two failed\nattempts, fix the root cause, then\n`sudo systemctl reset-failed easyai-server`\n+ `sudo systemctl start easyai-server`.\n\nExisting installs: re-run `install_easyai_server.sh --force` (or\n`--upgrade`) to refresh the unit file.  `Restart=on-failure` and\n`RestartSec=10` are unchanged.\n\n### 2026-05-13 — Installer: ship only `system.txt_template`; default install uses the binary's built-in prompt\n\n`scripts/install_easyai_server.sh` no longer drops an active\n`/etc/easyai/system.txt` on first install.  Out-of-the-box, only the\ntemplate `/etc/easyai/system.txt_template` ships (the canonical\n\"factory\" copy of the Deep persona, refreshed on every `--upgrade`),\nand `SERVER.system_file` is left commented out in `easyai.ini` — so\nthe server uses the binary's built-in prompt, which is **already\ngated on actually-registered tools**: it never advertises `fs` /\n`bash` if those are off in the INI.\n\nThe template file was also renamed `system.txt_modelo` →\n`system.txt_template` (English-only convention).\n\n| State | Before (≤ 2026-05-12) | Now (2026-05-13+) |\n| --- | --- | --- |\n| Template file at `/etc/easyai/` | `system.txt_modelo` (Portuguese) | `system.txt_template` |\n| Active `/etc/easyai/system.txt` on first install | dropped (Deep persona) | **NOT installed** |\n| `--force` rewrites `system.txt` | yes | no (file isn't there) |\n| `SERVER.system_file` in `easyai.ini` | commented out | commented out (unchanged) |\n| Out-of-the-box prompt | active `system.txt` (same Deep body) | binary's built-in, tool-gated |\n\nTo activate a custom persona — same one-liner as before:\n\n```bash\nsudo cp /etc/easyai/system.txt_template /etc/easyai/system.txt\nsudoedit /etc/easyai/system.txt              # tweak as needed\nsudoedit /etc/easyai/easyai.ini              # uncomment SERVER.system_file\nsudo systemctl restart easyai-server\n```\n\nExisting installs are unaffected: the installer still **preserves**\nany existing `/etc/easyai/system.txt` across `--upgrade` and `--force`\nruns (it just no longer creates one when it doesn't exist).\n\nFull doc: [`LINUX_SERVER.md`](LINUX_SERVER.md) §6\n(\"`/etc/easyai/system.txt` (operator-supplied) + `system.txt_template`\")\nand §12 (\"Upgrading\").\n\n### 2026-05-12 — Installer: `ttm.pages_limit` updated in place on re-run\n\n`scripts/install_easyai_server.sh` used to print\n`ttm.pages_limit already present; skipping` when `/etc/default/grub`\nalready had a `ttm.pages_limit=N` token — even if N differed from\nthe value the operator just passed via `--gtt`.  Result: re-running\nthe installer with a new GTT size was silently a no-op on the\nGRUB side, and the next reboot kept the stale page count.\n\nThe patch now compares the existing token's page count against the\ntarget, rewrites it in place when they differ (via `sed -i`), and\nruns `update-grub` so the change lands in `/boot/grub/grub.cfg`.\nThe reboot reminder also points at `/proc/cmdline` so operators\ncan verify the new value boots cleanly.\n\nNo flag change.  Operators who pass the same `--gtt` value on every\nrun see the same idempotent \"already present; skipping\" message.\n\n### 2026-05-12 — AI Box logo: softer two-layer aura\n\nTuned the aura halo on the AI Box mark so it reads as a quiet\nemission instead of a neon outline.  The earlier tuning was\ndescribed internally as \"loud\"; this pass cuts both stacked\nGaussian blurs to subtler values:\n\n| Layer | Before (07c2347) | Now (cc92d51) |\n| --- | --- | --- |\n| Outer halo `stdDeviation` | 14 | **10** |\n| Outer halo `flood-opacity` | 0.5 | **0.3** |\n| Inner halo `stdDeviation` | 4  | **3**  |\n| Inner halo `flood-opacity` | 1.0 | **0.6** |\n\nGradient, mark geometry, viewBox headroom and filter cyan flood\n(`#00bcd4`) all unchanged.  Both `webui/AI-brain.svg` (the\ncanonical SVG source) and the inline `constexpr kBrandSvg` in\n[`services/server.cpp`](services/server.cpp) updated in lockstep,\nso the favicon route serves the same softened version every\nembedder sees.\n\n### 2026-05-12 — `easyai-cli` session: per-tool checkpoint survives force-exit\n\nThe previous save points covered every interruption mode **except\nforce-exit** — triple rapid Ctrl-C triggers the force-exit handler\n(`_exit(130)`), which bypasses `atexit` and the post-`chat()`\nsave in `run_one()`.  Operators reported that a long agentic turn\nthat got force-exited left no `.easyai_session` on disk.\n\nFix: layer an additional save into the `on_tool` callback so\n`.easyai_session` is rewritten **after every tool round-trip** in a\nturn, not just at the end of the turn.  Only the in-flight partial\nreply since the last completed tool is lost; everything earlier\n(file edits, bash output, plan steps, RAG queries) is on disk and\nre-loadable.\n\nWiring: `easyai::ui::Streaming::notify_tool(call, result)` is now a\npublic forwarder for the private on_tool UI handler, so external\nembedders can compose extra behaviour onto the `on_tool` slot\n(checkpoint to disk, telemetry, audit log) without losing the\nstreaming output (tool indicators, dim styling, plan rendering).\nThe cli's binary uses it as:\n\n```cpp\ncli.on_tool([\u0026](const ToolCall \u0026 c, const ToolResult \u0026 r) {\n    streaming.notify_tool(c, r);   // canonical UI\n    save_session(cli, \u0026err);       // disk checkpoint\n});\n```\n\nPattern is documented inline in\n[`include/easyai/ui.hpp`](include/easyai/ui.hpp) above the\n`notify_tool` declaration.  No flag / INI change.\n\n### 2026-05-12 — Session resume default-ON + every session knob now in `[cli]` INI\n\nIteration on yesterday's session-persistence feature: loading the\nexisting `.easyai_session` is now the **default** (you don't need\n`--continue` to pick up where you left off).  The semantics flip:\n\n| | Previous (2026-05-12 morning) | Now |\n| --- | --- | --- |\n| Resume on launch | opt-in via `--continue` | **default ON** |\n| Start fresh | default | opt-in via `--no-continue` |\n| `--compress` without `--continue` | hard error | warning (no-op when combined with `--no-continue`) |\n\nThe cli also now exposes every session-related knob plus the raw-log\nknobs through `[cli]` in `/etc/easyai/easyai-cli.ini`:\n\n```ini\n[cli]\nauto_continue = true       # default; load .easyai_session if present\nauto_compress = false      # default; recap on every load when on\nlog_file      =            # default empty; path enables --log-file equivalent\nauto_log      = false      # default; when true, restores the library's legacy /tmp auto-log\nshow_bash     = true       # default; mirror bash subprocess output to the operator terminal\nshow_python   = true       # default; same for python3\n```\n\nCLI flag precedence is unchanged: explicit flag \u003e INI \u003e hardcoded\ndefault.  All `--continue` / `--no-continue` / `--compress` /\n`--log-file` flags continue to work and override the INI for that\ninvocation.\n\n`--continue` is kept as a no-op alias for backward compat (useful in\nscripts that want to force resume even when an operator's INI flipped\n`auto_continue` off).\n\nFull doc: [`easyai-cli.md`](easyai-cli.md) §10.\n\n### 2026-05-12 — easyai-cli session persistence + raw log default OFF\n\nEvery `easyai-cli` invocation now writes a `.easyai_session` file in\nthe current working directory after each chat turn (atomic tempfile\n+ rename, mode 0600).  Three control points:\n\n| Surface | What it does |\n| --- | --- |\n| (no flag) | Start fresh, overwrite on first turn, save every turn |\n| `--continue` | Resume the `.easyai_session` in cwd; warn + start fresh if none |\n| `--continue --compress` | Resume + ask the model for one lossless recap; replace history with the recap before the first prompt |\n| `/compress` (REPL) | Same recap flow, fired mid-session |\n\nThe file is the raw OpenAI-shape message array (greppable, diffable,\nre-loadable).  Two new methods on the public `Client` API\n(`dump_history()` / `load_history()`) make the same persistence\navailable to library embedders.\n\n**Raw log default flipped to OFF.**  Prior versions created\n`/tmp/easyai-cli-remote-\u003cpid\u003e-\u003cepoch\u003e.log` whenever `--verbose` was\nset, AND the library opened a separate `/tmp/easyai-client-\u003cpid\u003e-\u003cepoch\u003e.log`\non every Client construction.  Both are now opt-in:\n\n* The binary's transaction log opens **only** when `--log-file PATH`\n  is given (mode 0600 at PATH).  `--verbose` is now stderr-only.\n* The library's auto-log is suppressed by setting\n  `EASYAI_NO_AUTO_LOG=1` in the cli binary's `main()` before the\n  Client is constructed.  Operator override\n  (`EASYAI_NO_AUTO_LOG=0` in the env) still wins.\n\nNet: a default invocation leaves nothing in `/tmp`.  See\n[`easyai-cli.md`](easyai-cli.md) §9 and §10 for full docs.\n\n### 2026-05-11 — fs(action=\"edit\") seam-line corruption fix (HIGH, post-publish correction)\n\nA user-reported bug: `fs(action=\"edit\")` was silently corrupting\nfiles when the model passed `content` without a trailing `\\n`.\nThe last byte of `content` got glued onto the first preserved line\nafter the edit range — turning `int b = 22;\\n    return a + b;`\ninto `int b = 22;    return a + b;`.  When the deleted range\nhappened to contain the only `}` between two function bodies,\nthis silently swallowed the brace and the file failed to compile\nwith \"function definition is not allowed here\" + \"expected '}'\"\non the next build.\n\nRoot cause: the tool description said \"include a trailing `\\n`\nyourself\" but the model consistently forgot.  Fix:\n`make_fs_edit_handler` now auto-inserts a `\\n` separator on each\nside of `content` if and only if one is needed to keep the seam\nlines apart.  Both guards no-op when `content` is already\ncorrectly terminated (or empty for a pure delete), so the change\nis invisible to model calls that were already doing the right\nthing.\n\nTool description updated to drop the \"include trailing `\\n`\"\nadvice — line semantics are now preserved automatically.\n\nVerified against a 9-case smoke matrix (middle-replace with/without\ntrailing newline, multi-line content lacking newline, pure delete,\npure insert, append-at-EOF on files with and without trailing\nnewline, replace-last-line on a file without trailing newline,\nwhole-file replacement) — all nine pass.\n\nDocumented as §22.8 (post-publish correction) in\n[`SECURITY_AUDIT.md`](SECURITY_AUDIT.md); §22.4's \"no findings\"\nclaim for the fs.edit/append/ops batch surface has been amended\nwith a forward-pointer to §22.8.  No CLI / INI / library API\nchanges; rebuild to pick up the fix.\n\n### 2026-05-11 — Security audit 7th pass (1 HIGH, 1 MEDIUM, 1 LOW; no public-interface change)\n\nRe-applied the standing audit on the ~5,000 LoC added since the 6th\npass (2026-05-08). Three findings, all closed in this commit:\n\n* **HIGH — `run_capped_subprocess` banner sanitization.** The\n  `[bash] $ …` / `[python3] $ …` opening banner used to print the\n  model-supplied command/code through `fprintf` verbatim, so a\n  snippet that embedded an ANSI/OSC sequence could repaint the\n  operator's terminal (window title, screen wipe, OSC 52\n  clipboard write) one line before any child output arrived. The\n  live mirror channel was already hardened in §20.1; the banner\n  is now sanitized the same way (CR/LF/TAB pass; ESC rendered as\n  visible `^[` marker; other C0/DEL dropped). For `python3` the\n  banner now shows the *user's code* only — the 25-line sandbox\n  preamble was previously included, cluttering every transcript.\n* **MEDIUM — python3 sandbox preamble closure tightening.** The\n  preamble that wraps `open()` to pin disk access to the sandbox\n  used to leave `_e_open_orig`, `_e_chk`, and `_e_root` at module\n  scope, so user code could trivially call the raw `_e_open_orig`\n  by name and bypass the check — the comment claimed \"closure cell\"\n  protection that the implementation didn't actually provide.\n  Restructured into an `_e_make_wrappers` factory whose function-\n  local names become real lexical closure cells; the wrappers\n  still work, but the originals are no longer reachable from\n  module scope. (Adversarial bypass via `ctypes` / `subprocess` /\n  `_io.FileIO` is unchanged and still documented as out-of-scope.)\n* **LOW — installer INI-shape validation widened.** §20.4 / §21.4\n  already validated `--temperature`, `--top-p`, `--frequency-penalty`, `--ctx-size` etc.\n  via `require_numeric` to defeat heredoc injection. Today\n  extended the integer roster (`--service-port`, `--threads`,\n  `--threads-batch`, `--ngl`) and added a new `require_no_injection`\n  helper that rejects `\\n` / `\\r` / `=` / `[` / `]` in the\n  non-numeric knobs (`--service-host`, `--alias`, `--webui-title`,\n  `--cache-type-k`, `--cache-type-v`). Same operator-typo /\n  hostile-CI threat model as §20.4.\n\nFull narrative in [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §22.\nRebuild to pick up the fixes — no INI, CLI, or library API changes.\n\n### 2026-05-10 — CLI \"thinking\" label: static dark gray, no shimmer sweep\n\nThe CLI's prompt-eval indicator no longer animates. While the server\nis ingesting the prompt the spinner shows a steady `thinking[ N%]`\nin 256-colour grayscale 244 (mid-gray, RGB 128/128/128) — bright\nenough to read on a dark terminal, dim enough to clearly signal \"in\nprogress, not the model's output.\" Replaces the 10 Hz spotlight\nsweep that landed in `d7e7202`. Drops the dual-cadence heartbeat —\nthe heartbeat now runs at one cadence (250 ms) and skips its\nrepaint entirely while the thinking label is up; only\n`set_thinking_pct()` (driven by the server's `easyai.prompt_progress`\nSSE event) triggers a redraw when the % suffix changes.\n\n### 2026-05-09 — `python3` tool result rendered with the executed snippet\n\nThe tool result returned by `python3` now opens with a fenced\n```python ...``` block carrying the snippet that just ran, followed\nby a `[python3 executed]` notification line, then the exit code and\ncaptured output. Chat UIs that render markdown (the embedded webui,\ntypical clients) display the code with syntax highlighting, so an\noperator skimming the conversation transcript can see what executed\nwithout having to expand the raw tool-call JSON.\n\nThe model's `code` argument is what gets rendered — the\n`kPythonSandboxPreamble` (the disk-restriction monkey-patch) is\ndeliberately stripped from the displayed source so the transcript\nisn't cluttered with the same 25 lines on every call.\n\nResult shape:\n\n```\n```python\n\u003cthe snippet\u003e\n```\n[python3 executed]\nexit=0\n\u003ccaptured stdout+stderr\u003e\n```\n\nSpawn-side errors (pipe / fork failure — the interpreter never\nran) still surface unwrapped, so the error message stays the\nactual cause and isn't dressed up with a misleading \"executed\"\nnotice.\n\n### 2026-05-09 — METRICS line: always on, default every 5 minutes\n\nThe periodic METRICS log line in `easyai-server` is now emitted\n**unconditionally** — no longer gated on `--verbose`. Operators\nneed the CPU / mem / GPU / TCP-state / TIME_WAIT-pressure telemetry\nin journalctl whether or not they're chasing a debug session.\n\n* `metrics_interval` default raised from `1` second to `300`\n  seconds (5 minutes). Low-overhead enough to leave on permanently\n  in production; bump **down** (60, 30, 5) when actively\n  troubleshooting.\n* The systemd installer's `easyai.ini` template was bumped from\n  `metrics_interval = 60` to `metrics_interval = 300` to match.\n* `--verbose` no longer claims the METRICS line in its description\n  or banner — only the request-level `→` / `←` lines remain\n  verbose-only.\n\nExisting operators who pinned `[SERVER] metrics_interval` in their\nINI keep their value; only the unspecified default shifts.\n\n### 2026-05-09 — `python3` is default-on with a sandboxed disk surface\n\nPromoting `python3` from explicit-opt-in (--allow-python) to\nauto-on whenever the operator has signalled \"the model can touch\nfiles\" — same gate as `fs`: --sandbox set OR --allow-bash on. The\nembedded webui inherits this for free since the systemd unit ships\nwith --sandbox /var/lib/easyai/workspace.\n\n* **`--allow-python` removed; `--no-python` is the new opt-out.**\n  Mirrors `--no-web` / `--no-datetime`: the tool defaults on and\n  operators who don't want it pass the `--no-*` flag (or set\n  `[SERVER] allow_python = off` in the INI).\n* **Disk access auto-restricted to the sandbox root.** Every\n  snippet is auto-prefixed with a short Python preamble that\n  monkey-patches `builtins.open`, `io.open`, and `os.open` to\n  reject any path resolving outside the cwd Python was chdir'd\n  into. `open(\"/etc/passwd\")` raises `PermissionError`;\n  `pathlib.Path(\"/etc/hostname\").read_text()` raises through\n  `pathlib`'s internal `open()` call.\n* **Description rewritten to forbid disk use.** \"USE FOR: testing,\n  calculation, data processing, networking, information gathering.\n  NEVER USE FOR DISK — every disk operation has a fs(action=...)\n  equivalent.\" The preamble is defense-in-depth; the description\n  is the primary contract.\n* **Defense-in-depth, not a real sandbox.** The model can still\n  escape via `import ctypes; ctypes.CDLL(\"libc.so.6\").open(...)`,\n  `subprocess.run([\"cat\", \"/etc/passwd\"])`, or `os.system(...)` —\n  the protection is against accident, not adversarial intent. Same\n  threat model as `bash`: explicit operator opt-in, not a real\n  sandbox.\n\n### 2026-05-09 — `python3` tool: isolated Python 3 snippet runner\n\nA second shell-class executor alongside `bash`, gated by its own\n`--allow-python` flag (off by default — same threat model as bash).\nThe model gets one extra tool when enabled:\n\n* `python3(code, timeout_sec?)` — runs the snippet via\n  `python3 -I -S -E -c \u003ccode\u003e`. Isolated mode: no `PYTHON*` env vars,\n  no `site.py` / no .pth files / no site-packages, no cwd on\n  `sys.path`. The standard library is available; `import requests`\n  fails with `ModuleNotFoundError`, by design — predictable behaviour\n  regardless of host Python configuration.\n* Same hardening as `bash`: cwd pinned to `--sandbox`, fds 3+ closed\n  before exec, SIGTERM/SIGKILL deadline, 50 KB / 2000-line\n  stdout+stderr cap, optional operator-facing live mirror via\n  `--no-show-python` to opt out (default ON when `--allow-python`\n  is on).\n* Internally, `bash` and `python3` now share one `run_capped_subprocess`\n  helper — the fork/fd-close/chdir/drain/wait machinery only lives in\n  one place.\n\nWhen to reach for `python3` vs `bash`: data manipulation (JSON, regex,\nDecimal math, statistics, date arithmetic) is one Python snippet; shell\npipelines / build runners / git / package managers stay in `bash`.\n\n`--allow-python` flag is wired through every binary (`easyai-cli`,\n`easyai-local`, `easyai-server`, `easyai-mcp-server`) plus the INI\n`[SERVER] allow_python` key. `EASYAI-*.tools` manifests cannot shadow\nthe new `python3` reserved name.\n\n### 2026-05-09 — One tool per concept: unified `web`, unified `fs`, RAG `--split-rag` removed\n\nA consolidation pass on the built-in tool surface. Three loose\ncollections (web, filesystem, rag) collapsed to one tool each, all\nshaped the same way — single `Tool` with an `action` parameter and a\nflat schema (every parameter optional except `action`). Pattern\nmirrors the rag dispatcher introduced 2026-05-04.\n\n* **`web` tool** — `web(action=\"search\"|\"fetch\")`. Replaces the\n  separate `search_web`, `fetch_web`, and `web_google` tools. Search\n  takes an `engine` parameter (`\"auto\"` default — cascades through\n  google → brave → ddg-lite → bing → ddg, returning the first that\n  succeeds; explicit picks: `\"google\"` opt-in via `--use-google` plus\n  the GOOGLE_API_KEY / GOOGLE_CSE_ID env vars, `\"brave\"` keyless HTML\n  scrape with the best understanding of niche named entities,\n  `\"ddg-lite\"` keyless no-JS DDG endpoint with a Netscape UA (page 1\n  only — bypasses the anti-bot wall the modern DDG endpoint applies),\n  `\"bing\"` keyless RSS feed, `\"ddg\"` keyless HTML scrape but\n  increasingly blocked from server IPs). Both actions take `page` for\n  pagination; `fetch` takes `start` + `limit` for byte-window control.\n* **`fs` tool** — `fs(action=\"read\"|\"write\"|\"list\"|\"glob\"|\"grep\"|\"check_path\"|\"cwd\"|\"sandbox\")`.\n  Replaces seven separate factories plus `get_current_dir` and\n  `get_sandbox_path`. `--allow-fs` now registers one tool, not seven.\n* **`--split-rag` removed.** The legacy seven `rag_*` tools and the\n  `--split-rag` flag are gone everywhere — CLI, INI, examples, all\n  four binaries. The single `rag(action=...)` dispatcher (default\n  since 2026-05-04) is the only RAG layout. On-disk format unchanged.\n* **Public-API breakage.** Anyone consuming `libeasyai` directly: the\n  individual `easyai::tools::search_web()` / `fetch_web()` /\n  `web_google()` / `fs_read_file()` / `fs_write_file()` / `fs_list_dir()`\n  / `glob_file()` / `grep_file()` / `check_path_file()` / `get_current_dir()`\n  / `get_sandbox_path()` factories are removed. Switch to\n  `easyai::tools::web(google_enabled)`,\n  `easyai::tools::fs(root)`, and\n  `easyai::tools::knowledge_split_tools(root)`.\n* **Why.** Three matching surfaces with the same shape make the\n  catalogue smaller (one entry per capability instead of nine), tool\n  prose can use one consolidated description style across all three,\n  and the model reasons about each capability as ONE thing with sub-\n  actions. The flat-schema-with-runtime-validation choice is the\n  same one the unified rag tool already validated against weak /\n  1-bit-quant tool callers.\n\n### 2026-05-08 — Server observability + connection-pool fix + prompt cleanup\n\nDriven by a real production failure: an agentic session hung mid-stream,\nthe cli retried six times, and we had no visibility into what the\nTCP stack was doing on the server. Fixes landed across the cli's\nHTTP transport, the server's verbose logging, the system prompts,\nand the build.\n\n* **Cli keep-alive bug fixed (the actual root cause).**\n  `stream_chat()` / `simple_get()` / `simple_post()` were each\n  constructing a fresh `httplib::Client` per call. The Client's\n  TCP socket dropped at function end, so `set_keep_alive(true)` had\n  nothing to keep alive — every agentic hop opened a new connection.\n  An N-tool-call session piled up N sockets in `TIME_WAIT`,\n  eventually exhausting the client's ephemeral port range or\n  per-process fd ceiling. **Hoisted a single persistent `httplib::Client`\n  onto the `Impl` struct; all three call sites now reuse it.** ONE\n  TCP connection per session instead of N. Cancellation and\n  server-restart paths are preserved (cpp-httplib reconnects\n  internally on dead-socket errors).\n* **Server: HTTP-level `→` / `←` log per request (verbose mode).**\n  `set_pre_routing_handler` + `set_logger` emit arrival and\n  completion lines with method/path/peer/body size, status,\n  duration, response bytes (or `streamed` for SSE), and running\n  totals (req / err / tools / in_flight / bytes_in / bytes_out).\n* **Server: periodic `METRICS` line with TCP state breakdown.**\n  Background ticker every `metrics_interval` seconds\n  (`--metrics-interval N` or `[SERVER] metrics_interval` to tune,\n  `0` disables — **default raised to 300 / always-on as of\n  2026-05-09**, see entry above) emits one\n  line with: CPU% + iowait%, load 1/5/15, process RSS + peak,\n  system memory total/used/%, AMD GTT used/total/% (Linux + AMD\n  only), in-flight requests, cumulative requests / errors / bytes,\n  fd usage vs RLIMIT_NOFILE, AND an explicit TCP state breakdown\n  (ESTABLISHED / TIME_WAIT / CLOSE_WAIT / FIN_WAIT / LISTEN)\n  parsed from `/proc/net/tcp{,6}` with\n  `TIME_WAIT N/M ephemeral ports (X.X% [elevated|HIGH|CRITICAL])`\n  so socket exhaustion shows up in `journalctl` long before\n  connections start failing. Linux-only for the deep metrics;\n  macOS prints `n/a` and the server runs fine — easyai-server's\n  deploy target is Linux.\n* **Tool dispatch timing in every visible log.** Engine wraps\n  `tool-\u003ehandler()` with `steady_clock` and writes `duration_ms`\n  into `ToolResult`. CLI shows `🔧 search_web (412ms)({\"query\":...})`\n  and the webui's reasoning panel shows the same. The\n  `easyai.tool_result` SSE event also gains a `duration_ms` field\n  so future external SSE consumers can render their own timing UI.\n* **`allow_fs = off` in the INI is now honoured.** The server read\n  the flag but never propagated it to the toolbelt — a non-empty\n  `[SERVER] sandbox` re-enabled `*_file` regardless. Default install\n  ships `allow_fs = off` + `sandbox = /var/lib/easyai/workspace`,\n  which hit exactly this. Now `allow_fs` and `allow_bash` are\n  honoured independently of `sandbox`. **Behaviour change:**\n  `--sandbox /foo` alone NO LONGER implies `--allow-fs`; pass\n  `--allow-fs` explicitly to register *_file.\n* **Built-in system prompt is tool-aware.** The hardcoded prompt\n  used to list `*_file` / `bash` / `plan` / host-metric tools by name\n  whether or not they were registered. Models hallucinated calls to\n  unregistered tools (especially `bash` after the `allow_fs` fix\n  above). The `Tool notes:` section is now built dynamically:\n  each bullet is gated on the same flag that controls registration,\n  and the entries for tools the server NEVER registers (`plan`,\n  host metrics) are removed entirely. Same fix in\n  easyai-local's built-in prompt.\n* **RAG tool descriptions spell out \"model-only store\".** Added a\n  `PRIVATE — MODEL-ONLY STORE` paragraph to `knowledge_save` /\n  `knowledge_append`, telling the model that the user has no UI /\n  command / API to read what's saved there. Forbids `\"check the\n  knowledge for the code\"` / `\"I saved it to memory\"` answers and\n  tells the model to `knowledge_load` and put the body inline when\n  the user asks for stored content.\n* **Stay-in-scope replaces \"PROTOTYPE FIRST\".** The old 1./2./3.\n  ritual (\"build → verify → ASK which next step\") was making the\n  agent stop after step 1 and ask, even when the user wanted the\n  simplest end-to-end thing. Collapsed to a single\n  `## Stay strictly in scope` paragraph that keeps the no-extras /\n  no-defensive-scaffolding / no-while-I'm-at-it-cleanups specifics\n  and drops the build-then-ask dance. Updated everywhere the\n  wording lived: server.cpp built-in prompt, local.cpp built-in\n  prompt, cli.cpp [guidance] block, installer's\n  `/etc/easyai/system.txt` template.\n* **Installer GTT default 28 → 29 GiB.** `gtt_gb=29` in\n  `scripts/install_easyai_server.sh`. Matches `ttm.pages_limit=7602176`.\n  Leaves headroom for a Q5_K_M / MXFP4_MOE 30B MoE plus a 32k KV\n  cache fully on the iGPU.\n* **Quick-start editor section added to `LINUX_SERVER.md`.** New\n  section 0 with copy-paste shell snippets for VSCode + Continue.dev,\n  OpenCode, and VSCode + Cline, all pointing at `http://ai.local:80/v1`.\n  Plus a quick-reference table for other OpenAI-compatible clients.\n* **No patches or derivatives of llama.cpp.** A short-lived\n  experiment subclassed `httplib::Server` to log per-TCP-connection\n  accept/close events — that needed widening the access on a\n  private virtual in the vendored cpp-httplib header. Backed out\n  entirely: no CMake patch script, no `#define private protected`\n  trick, no derivative copies. The HTTP `→`/`←` lines and the\n  periodic METRICS line (with system-wide TCP state breakdown\n  including TIME_WAIT pressure) cover the same diagnostic ground\n  using only public APIs and `/proc`.\n\n### 2026-05-08 — `tool_lookup` builtin + tool-discipline rule\n\nBuilds on the same-day \"Built-in system prompt is tool-aware\" work\nabove with a complementary affordance: the model gets a runtime\nintrospection tool so it can verify what's wired up before\ndispatching, and an authoritative discipline rule that points at\nthat tool. Driven by the same failure mode the prompt-cleanup\naddressed (`write` / `read` / `ls` etc. invented by the model);\nthis layer makes the closure explicit and gives the model a\nrecovery path when it's uncertain.\n\n* **New `tool_lookup` builtin.** Read-only introspection over the\n  agent's live tool registry. Call it with no args to get a numbered\n  catalogue of every registered tool (1..N), or pass\n  `name=\"\u003csubstring\u003e\"` to filter — case-insensitive, partial match.\n  Output is plain numbered text the model parses naturally; only\n  active tools are returned. Wired into every binary\n  (`easyai-cli`, `easyai-server`, `easyai-mcp-server`, `easyai-local`,\n  `easyai-agent`, `easyai-recipes`) and the `LocalBackend` library\n  wrapper. Always registered last so its snapshot covers every\n  other tool, including itself. Public C++ API:\n  `easyai::tools::tool_lookup(getter)` where `getter` is a callable\n  returning `std::vector\u003cstd::pair\u003cstd::string,std::string\u003e\u003e` of\n  (name, description) pairs.\n* **Authoritative `[tools]` / \"Tool discipline\" prompt block.**\n  Layered on top of the closed-set rule from the prompt-cleanup\n  commit: *\"This catalogue is the SINGLE SOURCE OF TRUTH; training\n  data is NOT; if a name isn't in this list IT DOES NOT EXIST;\n  call `tool_lookup` first when uncertain; do not retry an\n  unknown-tool call.\"* Common hallucinated names called out by\n  example: `write`, `read`, `ls`, `cat`, `curl`, `python`, `sed`,\n  `grep`, `find`, `mkdir`. Same wording in `easyai-cli` (the\n  `[tools]` block injected into the dynamic prefix), `easyai-server`\n  and `easyai-local` (the `## Tool discipline` section in their\n  `kBuiltinSystem` strings).\n\n### 2026-05-08 — Fifth-pass security hardening (no behaviour change)\n\nA fresh static review of the ~5,000 lines that landed in the last 30\ncommits. Two HIGH, three MEDIUM, two LOW findings — all closed in\nthis commit; every public interface (CLI flags, tool names, library\nheaders, INI keys) is unchanged.\n\n* **bash live-mirror is now control-byte stripped and byte-capped.**\n  When the model calls `bash`, the merged stdout/stderr was being\n  mirrored verbatim to the operator's terminal. A model could emit\n  `\\e]0;HACKED\\a` to retitle the operator's window or `\\e[2J` to wipe\n  the screen — neither showed up in the model-facing tool result.\n  Now: ESC is rendered as a visible `^[`, all other C0 controls are\n  dropped, and the mirror channel is capped at 128 KiB (model still\n  gets the full 32 KiB it always did). Set `[cli] show_bash = false`\n  or `--no-show-bash` to silence the mirror entirely.\n* **`plan` tool render strips control bytes from item text.** Same\n  hijack class, narrower budget — a `plan add` with embedded `\\e[…`\n  no longer reaches the operator's terminal raw.\n* **`get_array` parser now caps stringified-array recursion depth.**\n  Tool-args parsing tolerates `\"items\": \"[…]\"` (the array escaped\n  into a JSON string — small models double-escape sometimes). The\n  unwrap path was recursive without a depth cap; a hostile model\n  emitting deeply-nested escapes blew the stack. Capped at depth 4\n  (legitimate cases stay under depth 2).\n* **`get_sandbox_path` now uses `fs::weakly_canonical`.** Was using\n  `realpath()` with a \"fall back to the unresolved input\" branch\n  that could leak relative-path shape into the model on transient\n  errors. Cosmetic but correct; matches the canonicalisation the\n  sandbox containment check uses.\n* **`--mcp \u003curl\u003e` rejects non-`http(s)://` schemes up front.** The\n  libcurl protocol filter still blocks `file://`, `gopher://` etc.\n  at transport time, but the operator now gets a clear error\n  instead of a curl diagnostic, and embedders using\n  `easyai::mcp::fetch_remote_tools` get the same defence-in-depth.\n* **Installer validates numeric sampling/timeout flags.**\n  `--temperature`, `--top-p`, `--top-k`, `--min-p`,\n  `--repeat-penalty`, `--frequency-penalty`, `--max-tokens`, `--http-timeout`, `--ctx-size`\n  must match `^-?[0-9]+(\\.[0-9]+)?$` before they flow into the INI\n  via heredoc. Closes a defence-in-depth gap where a crafted value\n  containing `\\n` could inject extra INI keys.\n* **`/etc/easyai/easyai.ini.bak` (created by `--force`) gets\n  explicit `chmod 640` and `chown root:easyai`.** Previously\n  inherited whatever the live INI had; matches the new file's\n  posture so a token leak via a backup with looser perms is\n  impossible.\n\nFull write-up: [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §0 (operator\nTL;DR) and §20 (this pass's findings). Read §0 if you operate easyai\nin production — it's the 60-second summary of what easyai does and\ndoesn't protect for you.\n\n### 2026-05-05 — Tool surface + system prompt overhaul\n\nDriven by a production \"models drift, use bash for file work, ignore\ntools\" report. The fix landed across the tool descriptions, the\ndefault prompts, and the CLI flag wiring at once.\n\n* **`--sandbox` and `--allow-bash` now imply `*_file`.** The previous\n  matrix had operators passing `--allow-bash --sandbox DIR` and ending\n  up with bash but no file tools — so the model fell back to\n  `cat \u003e file` / `cat \u003c\u003cEOF` / `sed -i` for everything. Bash is\n  strictly more permissive than `*_file`, so requiring an extra flag\n  was inverted. Both flags now register the full file set (and the\n  new `get_sandbox_path` companion) at once. `--allow-fs` still works\n  for the no-sandbox / no-bash case; otherwise it's redundant.\n* **New `get_sandbox_path` tool.** Returns the absolute path of the\n  sandbox root, pinned at registration time — distinct from\n  `get_current_dir` which is the live process cwd and can drift.\n  Lets the model resolve where its work actually lands without a\n  wasted `pwd` tool hop.\n* **`bash` description rewritten.** Now leads with **PREFER fs tools**\n  and lists the exact bash anti-patterns (`cat \u003e file`, `cat \u003c\u003cEOF`,\n  `echo \u003e file`, `mkdir`, `sed -i`) with the dedicated tool that\n  replaces each. Reserves bash for shell features the dedicated\n  tools don't have — pipelines, `find | xargs`, build runners\n  (make / cmake / cargo / npm), git, package managers, sed/awk for\n  in-place edits.\n* **System prompts inject `[environment]` + `[guidance]`.** When\n  any create/mutate affordance is registered (*_file / bash / plan),\n  the cli prepends two short blocks to the user's `--system` content:\n  the absolute sandbox path (saves a \"where am I\" tool hop on turn 1)\n  and a stay-in-scope behavioral rule (build EXACTLY what the user\n  asked — no extras, no defensive scaffolding, no \"while I'm at it\"\n  cleanups). The same guidance lives in the server's Deep persona\n  and easyai-local's built-in prompt.\n* **Default sampling preset → `precise`** (was `balanced`).\n  Temp 0.2, top_p 0.92, top_k 50, min_p 0.03. Tuned for code,\n  math, and factual Q\u0026A — the dominant use case for a tool-calling\n  agent. Flipped across server, local, cli, webui, library\n  fallbacks, and the systemd installer's INI templates. README's\n  preset table now includes a Behaviour column and a \"Pick when…\"\n  column to make the choice explicit.\n* **`--show-system-prompt`** added to all four binaries\n  (`easyai-cli`, `easyai-server`, `easyai-local`, `easyai-chat`).\n  Resolves the system prompt the binary would actually use (built-in\n  default → `--system-file` → `--system`, plus the cli's injected\n  blocks), prints, exits. No model load, no port bind, no network.\n  Useful for confirming the persona before bouncing a service.\n* **Graceful `Ctrl-C` in `easyai-cli`.** In interactive mode (no\n  `--quiet`), the first `Ctrl-C` mid-turn prints\n  `\u003cexiting: waiting for the ai session to be finished. Ctrl-C\n  again to force.\u003e` and lets the in-flight chat finish naturally\n  (rc=0). Conversation isn't truncated mid-stream. Second `Ctrl-C`\n  is the hard-cancel escape hatch (rc=130). `--quiet` keeps the\n  existing immediate-cancel for batch scripts.\n* **Plan tool tolerance shims.** `args::get_array` now accepts a\n  stringified JSON array (`\"items\": \"[...]\"`) — small/quantised\n  models repeatedly emit this shape. The handler infers a missing\n  `action` from the items' fields plus current plan state, and\n  maps common synonyms (`create` → `add`, `remove` → `delete`,\n  etc.). `add` honours an optional per-item `status` so create +\n  mark \"working\" lands in one call. Errors include the correct\n  shape inline so the model can copy-fix.\n* **Plan re-renders coalesce.** A new `Plan::Batch` RAII guard\n  collapses N per-item `on_change` callbacks across one tool call\n  into a single fire — the UI's \"── plan ──\" block now prints once\n  per batch, not once per item.\n* **New doc: [`easyai-cli.md`](easyai-cli.md)** mirrors\n  `easyai-server.md`. 14 sections covering connection, modes, full\n  flag reference, tool registration, system prompt + injection,\n  sampling, reasoning streams, the raw transaction log, RAG,\n  external tools, management subcommands, worked examples,\n  cross-references.\n* **Tool authoring guide.** New `design.md §5 Writing tool\n  descriptions reliably` (architectural) and `manual.md §3.2.1`\n  (cookbook) document the rag-style multi-action pattern, the\n  per-`.param()` \"Used by add / update / …\" idiom, and the\n  lenient-handler tolerance shims. `AI_TOOLS.md` Chapter 9 has a\n  pointer.\n\n### 2026-05-04 — Single-tool RAG is now the default; concise system prompt\n\n* **Default RAG layout flipped: one `rag(action=...)` tool.** The\n  unified single-tool dispatcher used to be opt-in behind\n  `--experimental-rag`; it is now the default for every binary\n  (`easyai-server`, `easyai-cli`, `easyai-local`, `easyai-mcp-server`).\n  One catalog entry instead of seven keeps the model's tool list\n  short and saves a few hundred tokens per turn. On-disk format,\n  locking, and fix-memory rules are unchanged.\n* **`--split-rag` opts back into the legacy seven `rag_*` tools.**\n  Replaces `--experimental-rag`. Same semantics, opposite default.\n  Wired as a CLI flag on every binary AND as `[SERVER] split_rag`\n  in the INI overlay (`easyai.ini` / `easyai-mcp.ini`; per-model\n  overrides via `[MODEL_\u003cpattern\u003e]` sections). Useful for\n  weak / 1-bit-quant tool callers (Bonsai-class) that handle many\n  flat schemas more reliably than one discriminated schema.\n* **Default system prompts trimmed.** `easyai-server` and\n  `easyai-local` now ship a much shorter built-in prompt focused on\n  a tight **plan → act → iterate** loop with one small concrete\n  next step at a time, finishing as soon as the answer is useful so\n  the user has room to refine. Cuts about three quarters of the old\n  prompt's length while keeping the no-announce-without-call rule\n  and the search → fetch discipline.\n\n### 2026-05-02 (later) — RAG `knowledge_append` + user-focus prompts\n\n* **`knowledge_append` — new RAG tool.** Adds new content to the end\n  of an existing memory without losing the previous body. Read-modify-\n  write under one `unique_lock` on the store's `shared_mutex`, so\n  concurrent appenders queue cleanly (no lost appendix, no torn\n  merge for any reader); on disk the new content is separated from\n  the old by a Markdown horizontal rule (`---`) so the operator\n  reading the `.md` file sees exactly where each appendix starts.\n  Refuses on entries that don't exist (use `knowledge_save`), on\n  fixed memories (`fix-*`), and when the merged size would exceed\n  256 KiB. Optional `keywords[]` parameter merges into the existing\n  keyword list (deduped, capped at 8). Wired into every consumer\n  (server, MCP server, CLI, local backend). Full doc:\n  [`RAG.md`](RAG.md) §4.\n* **User-focus prompt update.** `knowledge_save` and\n  `knowledge_append` tool descriptions now explicitly tell the model\n  to prioritise notes about the user themselves — name, role,\n  hardware, projects, working style, corrections, likes, dislikes —\n  and to grow that memory across sessions with `knowledge_append`\n  instead of rewriting it with `knowledge_save`. The next\n  conversation (tomorrow, three months from now) starts with the\n  user already known, so they don't have to explain themselves\n  twice. The lib ships the canonical seven knowledge tools\n  (`knowledge_save`, `knowledge_append`, `search_knowledge`,\n  `knowledge_load`, `knowledge_list`, `knowledge_delete`,\n  `keywords_knowledge`); all CLI help text, help comments, and docs\n  updated to match.\n\n### 2026-05-02 — Fourth-pass security audit + readability batch\n\n* **`/tmp` log file hardened (security, MEDIUM).** The auto-generated\n  raw transaction log at `/tmp/easyai-\u003cpid\u003e-\u003cepoch\u003e.log` is now\n  created with `O_EXCL | O_NOFOLLOW | O_CLOEXEC` and mode `0600`. The\n  predictable path used to follow symlinks on `fopen(\"w\")`, so a\n  local attacker on a multi-tenant host could plant a symlink\n  pointing at any user-writable file (`~/.bashrc`, `~/.ssh/…`) and\n  have the next `easyai-*` process truncate-and-overwrite it.\n  Mode `0644` (process umask) also leaked prompts — which can\n  contain API keys or PII — to other accounts on the same box.\n  `O_EXCL` makes the create atomic-or-fail and `0600` keeps logs\n  private. Caller-supplied paths (`--log-file PATH`) keep `O_TRUNC`\n  for log rotation but still gain `O_NOFOLLOW + 0600`. Full\n  write-up in [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §19.\n* **Internal readability batch (no public API change).** Three\n  inline patterns were lifted into named helpers so the call sites\n  read top-to-bottom: `file_mtime_unix()` (replaces three copies of\n  the C++17 file_clock→system_clock idiom in `rag_tools.cpp`),\n  `glob_to_regex()` + `kGlobRegexMetachars` (lifts the wildcard\n  state machine out of `glob_file` in `builtin_tools.cpp`), and\n  `looks_like_announce_phrase()` (lifts the 30-line retry predicate\n  out of `Engine::chat_continue` in `engine.cpp`, where it was\n  used twice). All seven binaries build clean.\n\n### 2026-05-01 — MCP CLIENT, RAG memory framing, web_google, macOS installer fix\n\n* **`easyai-server` is now also an MCP client.** Pass `--mcp \u003curl\u003e`\n  (and `--mcp-token \u003ctoken\u003e` if needed) and at startup the server\n  connects to the upstream's `/mcp`, runs `tools/list`, and merges\n  the catalogue into its own. Each remote tool's handler proxies\n  `tools/call` over HTTP. Local tool names win on collision. The\n  implementation is `easyai::mcp::fetch_remote_tools()` in libeasyai\n  — public API, so anything built on the engine library can stack\n  remote MCP catalogues. See [`MCP.md`](MCP.md) §9.5.\n* **`--no-tools` renamed to `--no-local-tools` (server only).** Now\n  that the server can be both an MCP server AND an MCP client, the\n  flag's scope had to be unambiguous: it disables only the LOCAL\n  built-in toolbelt. RAG, external tools, and tools fetched via\n  `--mcp` are unaffected. INI key `load_tools` → `local_tools` to\n  match. The `easyai-local` and `easyai-mcp-server` binaries keep\n  their `--no-tools` spelling — they have no MCP client, so the\n  original name is still accurate.\n* **RAG reframed as memory + fixed memories.** Tool descriptions\n  rewritten in memory verbs (search / store / recall / update /\n  forget). New `fix=true` argument on `knowledge_save` mints an\n  immutable memory: keywords are auto-prefixed with `fix-`, and from\n  then on `knowledge_save` refuses to overwrite it and\n  `knowledge_delete` refuses to\n  remove it. Use this to seed system designs, hard rules, ground-\n  truth definitions the model must not rewrite. Search / load /\n  list output gain a human-readable `modified` date and a `[FIXED]`\n  / `fixed: yes/no` marker. See [`RAG.md`](RAG.md).\n* **Single-tool RAG dispatcher is the default.** One\n  `rag(action=...)` tool exposes save / append / search / load /\n  list / delete / keywords as sub-actions. Same store, same\n  handlers, same on-disk format. Saves a few hundred catalog tokens\n  per turn and keeps the model's tool list short. Pass `--split-rag`\n  (or `[SERVER] split_rag = on` in the INI) to opt back into the\n  legacy seven separate `rag_*` tools — useful for weak / 1-bit-\n  quant tool callers (Bonsai-class) that handle many flat schemas\n  more reliably than one discriminated schema.\n* **`web_google` builtin.** Google Custom Search JSON API. Gated by\n  `--use-google` (also `[SERVER] use_google`). Reads\n  `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` from env at call time so a\n  rotation doesn't drop the tool. Free tier is 100 queries/day.\n* **macOS installer fix: OpenSSL via brew.** Modern macOS no longer\n  ships usable libssl in `/usr/lib`, so `find_package(OpenSSL)`\n  half-detected and broke configure for both `easyai_cli` and the\n  vendored `cpp-httplib`. The installer + `build_macos.sh` now pass\n  `-DOPENSSL_ROOT_DIR=$(brew --prefix openssl@3)` and the cmake\n  guards `TARGET OpenSSL::SSL` so a half-detected OpenSSL degrades\n  to \"HTTPS not in this build\" instead of erroring out.\n\n### 2026-04-30 — `easyai-mcp-server` (standalone MCP provider)\n\n* **New binary `easyai-mcp-server`.** Same tool catalogue as\n  `easyai-server` (built-ins + RAG + operator-defined external-tools)\n  exposed over `POST /mcp` with **no GGUF model loaded** — designed\n  for high-concurrency multi-client deployments. Configurable\n  cpp-httplib worker pool (`--threads`, default 256) and a separate\n  in-flight `tools/call` cap (`--max-concurrent-calls`, default 256)\n  that returns 503 + `Retry-After` on saturation instead of unbounded\n  queueing. Full doc: [`easyai-mcp-server.md`](easyai-mcp-server.md).\n* **RAG concurrency upgrade.** `RagStore::mu` is now\n  `std::shared_mutex`; `search_knowledge` / `knowledge_load` /\n  `knowledge_list` / `keywords_knowledge` take `std::shared_lock` so\n  parallel readers don't serialise on the write path. Benefits every\n  consumer of libeasyai — `easyai-server`, `easyai-cli` with\n  `--RAG`, any third-party program calling\n  `knowledge_split_tools()`. Atomic-rename writes already\n  made on-disk reads tear-free; the lock relaxation is safe.\n* **Doc restructure.** `INI_KFlags.md` content has moved to the top\n  of the new [`easyai-server.md`](easyai-server.md) so the chat\n  server's INI / CLI / API / persona / hardening reference lives in\n  one file. `LINUX_SERVER.md` is unchanged — it remains the\n  systemd-installer-specific operator's guide.\n\n### 2026-04-30 — Tunable incomplete-retry budget + live retry visibility\n\n* **`--max-incomplete-retries N` (also `[ENGINE] max_incomplete_retries`).**\n  Default 10 — how many times the engine discards + nudges + retries\n  when the model finishes a turn announcing an action (\"Let me…\",\n  \"I'll…\") without actually emitting the tool_call. Bump to 15-20\n  for weak / 1-bit-quant models (Bonsai-8B-Q1_0 frequently needs\n  the extra budget); set to 0 to disable retries entirely.\n* **Retries now visible in the Thinking panel.** Engine fires a new\n  `on_incomplete_retry(attempt, max, reason)` callback per retry,\n  the server pipes it into the SSE `reasoning_content` channel, and\n  the webui renders `↻ Retry 3/10: model said: \"Let me search…\" (no\n  tool_call) — nudging.` while it happens. No more frozen UI for 10\n  silent retries followed by a blank bubble.\n* **Engine warnings always log** (regardless of `--verbose`):\n  cancellation, thought-only retry, reasoning→content fallback,\n  incomplete-retry, empty final content. `--verbose` is for raw\n  per-token / per-hop diagnostic noise; actionable warnings stay on\n  so operators see them in `journalctl` without flipping a flag.\n\n### 2026-04-30 — Bonsai 8B Q1_0 onboarding + security pass\n\n* **One-shot installers for macOS and Raspberry Pi 4/5.**\n  `scripts/install_easyai_macos.sh` builds with Metal/AMX, drops the\n  model, prints the run command. `scripts/install_easyai_pi.sh` does\n  the full Pi appliance: systemd unit, mDNS so the box answers as\n  **`pi-ai.local`** on your LAN, port 80 with\n  `CAP_NET_BIND_SERVICE`. Both clone the **PrismML fork** of\n  llama.cpp (the only one with the Q1_0 kernel — upstream loads the\n  GGUF then fails at decode).\n* **Security third-pass audit** — 3 HIGH and 7 MEDIUM findings fixed.\n  The INI overlay used to be silently ignored (every `[ENGINE]` /\n  `[SERVER]` key was a no-op); `--no-mcp-auth` was disconnected from\n  the gate; the sandbox could be escaped by a symlink planted via\n  `bash`. All closed. The `bash` tool now gets the same\n  fork-hardening as external tools — `PR_SET_PDEATHSIG`, fd\n  close-loop bounded against `RLIMIT_NOFILE = unlimited`, process-\n  group kill on timeout. Plus JSON-depth caps on every parser, a\n  bounded INI parser, mode 0600 on RAG entries, and a\n  body-size-bounded auth header. See [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §18.\n* **MCP server.** `easyai-server` is now a Model Context Protocol\n  provider on `POST /mcp` (protocol 2024-11-05). Claude Desktop,\n  Cursor, Continue list and dispatch every registered tool — your\n  built-ins, your RAG, your `--external-tools` manifests — over a\n  single endpoint. Bearer auth via `[MCP_USER]` in the INI; a\n  Python stdio bridge ships at `scripts/mcp-stdio-bridge.py` for\n  Claude Desktop. See [`MCP.md`](MCP.md).\n* **Single INI config — `/etc/easyai/easyai.ini`.** Every CLI flag\n  has an INI key (FlagDef table refactor); precedence is CLI \u003e INI\n  \u003e hardcoded default. Edit the file, `systemctl restart`, done.\n  Full reference in [`easyai-server.md`](easyai-server.md) §1.\n* **RAG: persistent memory.** Seven keyword-only knowledge tools\n  (`knowledge_save`, `knowledge_append`, `search_knowledge`,\n  `knowledge_load`, `knowledge_list`, `knowledge_delete`,\n  `keywords_knowledge`).\n  Multi-keyword search (first keyword required, rest rank by overlap)\n  + pagination. One Markdown file per entry — operator-readable,\n  hand-editable. See [`RAG.md`](RAG.md).\n\n### 2026-04-29 — External tools v2\n\n* **Operator-defined tool packs** via `EASYAI-\u003cname\u003e.tools` JSON\n  manifests dropped in `/etc/easyai/external-tools/`. Per-file\n  fault isolation, sanity warnings (shell-wrapper detection,\n  world-writable binaries, `LD_*` env passthrough), full\n  `fork`+`execve` hardening — never a shell. Give the model\n  focused powers without flipping `--allow-bash`. See\n  [`EXTERNAL_TOOLS.md`](EXTERNAL_TOOLS.md).\n* **`get_current_dir` builtin** — the model can ask where it is,\n  so relative paths in `bash` / `*_file` calls land where you expect.\n* **Cancel-on-disconnect on the server** — closing the browser\n  tab actually stops the decode loop. No more zombie generation\n  eating tokens after the user walked away.\n* **Tolerant tool output** — non-UTF-8 bytes in tool results no\n  longer abort the SSE stream; the bytes get a U+FFFD substitute\n  and the stream stays alive.\n\n---\n\n## All options at a glance\n\nEvery CLI flag, INI key, and library setter the project ships\ntoday, in tables. Skim once to learn the surface; come back when\nyou want to tune something specific. Deeper reference is linked\nper row.\n\nThis repo builds seven binaries. Two are production daemons\n(`easyai-server`, `easyai-mcp-server`), two are user CLIs\n(`easyai-cli`, `easyai-local`), three are example apps the lib\nships to demonstrate the API (`easyai-chat`, `easyai-agent`,\n`easyai-recipes`).\n\n### `easyai-server` — chat HTTP server (also speaks MCP)\n\nFull reference: [`easyai-server.md`](easyai-server.md).\nINI defaults under `/etc/easyai/easyai.ini` — every flag below\nhas a matching INI key (see [`easyai-server.md`](easyai-server.md) §1).\n\n| Flag | Default | What it does |\n|---|---|---|\n| `-m, --model PATH` | (required) | GGUF model file. |\n| `--config PATH` | `/etc/easyai/easyai.ini` | Central INI; CLI \u003e INI \u003e hardcoded. |\n| `--host ADDR` | `127.0.0.1` | Bind address (`0.0.0.0` = any iface). |\n| `--port N` | `8080` | TCP port. |\n| `--max-body N` | 8 MiB | Cap on request body. |\n| `-s, --system-file PATH` | — | Default system prompt, from file. |\n| `--system TEXT` | — | Default system prompt, inline. |\n| `--no-local-tools` | off | Don't expose the local built-in toolbelt. |\n| `--mcp URL` | — | Connect upstream MCP server as client; merge catalogue. |\n| `--mcp-token TOK` | — | Bearer for `--mcp`. |\n| `--no-mcp-auth` | off | Force `/mcp` open even with `[MCP_USER]` populated. |\n| `--http-retries N` | 5 | Extra attempts on transient HTTP failures (MCP client + web tools). 0 disables. Logged on stderr. |\n| `--http-timeout SECONDS` | 600 | Read/write timeout for the listen socket AND the MCP-client connection. Bumped from llama-server's 60 s default to accommodate long thinking turns. |\n| `--sandbox DIR` | server cwd | Root for `fs` / `bash` / `python3` / external `$SANDBOX`. |\n| `--allow-fs` | off | Register the unified `fs` tool (action=read / write / list / glob / grep / check_path / cwd / sandbox). |\n| `--allow-bash` | off | Register `bash` (NOT a hardened sandbox). |\n| `--no-python` | python3 on | Drop the `python3` tool. By default it's auto-registered alongside `fs` whenever `--sandbox` is set or `--allow-bash` is on. Stdlib-only interpreter; disk access auto-restricted to the sandbox root. |\n| `--use-google` | off | Enable engine=`\"google\"` inside the unified `web` tool (needs `GOOGLE_API_KEY` + `GOOGLE_CSE_ID`). |\n| `--external-tools DIR` | — | Load every `EASYAI-*.tools` manifest in `DIR`. |\n| `--memory DIR` | — | Enable persistent memory: registers seven keyword-only knowledge tools (`knowledge_save`, `knowledge_append`, `search_knowledge`, `knowledge_load`, `knowledge_list`, `knowledge_delete`, `keywords_knowledge`) — a passive RAG technique. `--RAG` is still accepted as a back-compat alias. |\n| `--preset NAME` | `precise` | Ambient sampling preset. See [Sampling presets](#sampling-presets) for what each implies. |\n| `--temperature F` | per preset | Override temperature (0.0–2.0). |\n| `--top-p F` | per preset | Nucleus sampling p. |\n| `--top-k N` | per preset | Top-k cutoff. |\n| `--min-p F` | per preset | Min-p threshold. |\n| `--repeat-penalty F` | 1.04 | Repetition penalty (multiplicative on recent logits) — anti-loop safety net for thinking models that lock into rephrasing their own intent. `--repeat-penalty 1.0` disables. |\n| `--frequency-penalty F` | 0.05 | Frequency penalty (additive, scales with count of each token already generated, OpenAI semantics, `[0.0, 2.0]`). Discourages verbatim repetition proportionally to how often a token has already appeared. |\n| `--presence-penalty F` | 0.1 | Presence penalty (additive, fixed cost per token-already-seen, OpenAI semantics, `[-2.0, 2.0]`). Discourages topic stickiness without penalising literal tool-name repetition; pairs well with `--repeat-penalty 1.0` on long agentic flows. See [`design.md` §4b](design.md#4b-sampling-and-the-penalty-stack). |\n| `--max-tokens N` | 12288 | Cap tokens per request. |\n| `--seed U32` | random | RNG seed (0 = random). |\n| `--max-incomplete-retries N` | 10 | Retry budget for \"announce-only\" turns; 0 disables. |\n| `-c, --ctx N` | 262144 (binary) / 1048576 (installer INI) | Context size. The systemd installer writes `[ENGINE] context = 1048576` paired with YaRN ×4 over a 128K base; per-model `[MODEL_*]` profiles override it. |\n| `--batch N` | = ctx | Logical batch size. |\n| `--ngl N` | 99 | GPU layers (0 = CPU only). |\n| `--split-mode, -sm MODE` | `none` | Multi-GPU split strategy: `none`, `layer`, `row`, `tensor`. |\n| `--rope-scaling MODE` | `yarn` | RoPE scaling method: `none`, `linear`, `yarn`. |\n| `--rope-scale F` | 2 | RoPE frequency scale factor. |\n| `--yarn-orig-ctx N` | 131072 | YaRN original context size for scaling. |\n| `-t, --threads N` | hw cores | CPU threads. |\n| `-ctk, --cache-type-k TYPE` | `f16` | K-cache dtype (`f32`,`f16`,`bf16`,`q8_0`,`q4_0`,`q4_1`,`q5_0`,`q5_1`,`iq4_nl`). |\n| `-ctv, --cache-type-v TYPE` | `f16` | V-cache dtype (same set). |\n| `-nkvo, --no-kv-offload` | off | Keep KV cache on CPU even with GPU layers. |\n| `--kv-unified` | off | Single unified KV buffer across sequences. |\n| `--override-kv K=T:V` | — | GGUF metadata override (`int`,`float`,`bool`,`str`); repeatable. |\n| `-a, --alias NAME` | `easyai` | Public model id reported by `/v1/models`. |\n| `--api-key KEY` | — | Require Bearer auth on every `/v1` route. |\n| `-fa, --flash-attn` | auto | Force flash attention on. |\n| `-tb, --threads-batch N` | = threads | Threads for prompt-eval batches. |\n| `-np, --parallel N` | 1 | Compat-only; warns when \u003e1. |\n| `--mlock` | off | mlock model weights into RAM. |\n| `--no-mmap` | off | Disable mmap (read GGUF into RAM). |\n| `--numa STRATEGY` | off | `distribute`,`isolate`,`numactl`,`mirror`. |\n| `--metrics` | off | Expose Prometheus `/metrics`. |\n| `--reasoning on\\|off` | on | Enable model thinking. |\n| `--no-think` | off | Strip `\u003cthink\u003e…\u003c/think\u003e` from replies. |\n| `--inject-datetime on\\|off` | on | Append authoritative date/time to system prompt. |\n| `--knowledge-cutoff YYYY-MM` | `2024-10` | Cutoff hint used by `--inject-datetime`. |\n| `-v, --verbose` | off | Engine logs raw model output + parser actions. |\n| `--webui MODE` | `modern` | `modern` (embedded SvelteKit) or `minimal` (inline). |\n| `--webui-title TEXT` | `Box EasyAI` | Browser tab + sidebar brand. |\n| `--webui-icon PATH` | — | Favicon (`.ico`,`.png`,`.svg`,`.gif`,`.jpg`,`.webp`). |\n| `--webui-placeholder S` | `Type a message…` | Input box placeholder. |\n\n### `easyai-mcp-server` — standalone MCP provider (no model)\n\nSame tool catalogue as `easyai-server` but no GGUF loaded —\ndesigned for high-concurrency multi-client deployments. Full\nreference: [`easyai-mcp-server.md`](easyai-mcp-server.md).\n\n| Flag | Default | What it does |\n|---|---|---|\n| `--config PATH` | `/etc/easyai/easyai-mcp.ini` | Central INI. |\n| `--host ADDR` | `127.0.0.1` | Bind address. |\n| `--port N` | `8089` | TCP port. |\n| `-n, --name ID` | `easyai-mcp` | Server identity on `/health` + MCP `initialize`. |\n| `--max-body N` | 1 MiB | Cap on request body. |\n| `-t, --threads N` | 256 | cpp-httplib worker pool. |\n| `--max-concurrent-calls N` | 256 | In-flight `tools/call` cap (503 on saturation). |\n| `--sandbox DIR` | cwd | Root for `*_file` / `bash` / `$SANDBOX`. |\n| `--allow-fs` | off | Register `*_file` tools. |\n| `--allow-bash` | off | Register `bash`. |\n| `--no-tools` | off | Skip the built-in toolbelt entirely. |\n| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |\n| `--memory DIR` | — | Enable the seven `knowledge_*` tools (alias `--RAG`). |\n| `--api-key TOK` | — | Bearer required for `/health`, `/metrics`, `/v1/tools`. |\n| `--no-mcp-auth` | off | Force `/mcp` open. |\n| `--metrics` | off | Enable Prometheus `/metrics`. |\n| `-v, --verbose` | off | Log every dispatch to stderr. |\n\n### `easyai-cli` — interactive remote CLI\n\nTalks to any OpenAI-compatible endpoint (our `easyai-server`,\nupstream `llama-server`, OpenAI itself, etc.). Interactive terminal\nruns open a full-screen chat **TUI** (opencode-style look \u0026 feel —\nmarkdown rendering, live per-tool rows with diff views, todo\nchecklists, `/`-command and `@`-file completion, `opencode` /\n`opencode-light` themes, `esc esc` interrupt); `--plain` (or\n`[cli] tui = off`) keeps the legacy line REPL, and every non-TTY /\none-shot / `--quiet` path falls back automatically.\n\n| Flag | Default | What it does |\n|---|---|---|\n| `--url URL` | `$EASYAI_URL` | OpenAI-compat endpoint. |\n| `--api-key KEY` | `$EASYAI_API_KEY` | Bearer auth. |\n| `--model NAME` | `$EASYAI_MODEL` | Request body `model` field. |\n| `--timeout SECONDS` | 86400 (24h) | Read+write timeout — sized for multi-hour agentic sessions. Only fires on TRUE silence (every SSE delta resets it). `EASYAI_TIMEOUT` env also accepted. |\n| `--http-retries N` | 5 | Extra attempts on transient HTTP failures (connect refused, read timeout, 5xx). 0 disables. Logged on stderr without `--verbose`. `EASYAI_HTTP_RETRIES` env also accepted. |\n| `--insecure-tls` | off | Skip peer cert check (DEV ONLY). |\n| `--ca-cert PATH` | system | Custom CA bundle (PEM). |\n| `--system TEXT` | — | Inline system prompt. |\n| `--system-file PATH` | — | System prompt from file. |\n| `--temperature F` | server | Sampling temperature. |\n| `--top-p F` | server | Nucleus top-p. |\n| `--top-k N` | server | Top-k cutoff. |\n| `--min-p F` | server | min-p (llama-server / easyai). |\n| `--repeat-penalty F` | 1.04 | Repetition penalty — anti-loop default; pass 1.0 to disable. |\n| `--frequency-penalty F` | server | Frequency penalty (OpenAI standard, `[0.0, 2.0]`). |\n| `--presence-penalty F` | server | Presence penalty (OpenAI standard, `[-2.0, 2.0]`). |\n| `--seed N` | random | Deterministic sampling seed. |\n| `--max-tokens N` | server | Cap reply length. |\n| `--stop SEQ` | — | Add a stop string (repeatable). |\n| `--extra-json '{…}'` | — | Free-form JSON merged into the request body. |\n| `--tools LIST` | datetime,plan,web,system_* | Comma list of locally-registered tools. |\n| `--sandbox DIR` | — | Enable the unified `fs` tool (action=read/write/list/glob/grep/check_path/cwd/sandbox) scoped to `DIR`. |\n| `--allow-bash` | off | Register `bash` (uses `--sandbox` as cwd, else current dir). |\n| `--no-python` | python3 on | Drop the auto-registered `python3` tool (default-on whenever `--sandbox` or `--allow-bash` is set). |\n| `--use-google` | off | Enable engine=`\"google\"` inside the unified `web` tool. |\n| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |\n| `--memory DIR` | — | Enable persistent memory (seven `knowledge_*` tools; alias `--RAG`). |\n| `--tools-mode MODE` | `split` | How `fs` / `web` are exposed. Default `split` (since 2026-05-15): one focused tool per action — `read_file`, `edit_file`, …, `search_web`, `fetch_web`. Knowledge tools are always split (seven separate tools). `unified` registers the legacy single dispatcher per `fs`/`web` family with `action=`. `both` registers both surfaces. INI: `[cli] tools_mode`. |\n| `--no-plan` | off | Don't auto-register the planning tool. |\n| `-p, --prompt TEXT` | (REPL) | One-shot prompt; without it you get a REPL. |\n| `--no-reasoning` | shown | Hide `delta.reasoning_content`. |\n| `--max-reasoning N` | 0 (off) | Abort SSE when accumulated reasoning \u003e N chars. |\n| `--no-retry-on-incomplete` | retry on | Disable auto-retry-with-nudge. |\n| `--verbose` | off | Log HTTP+SSE traffic to stderr (stderr only — no file). |\n| `-q, --quiet` | off | Disable spinner glyph + ctx-fill gauge. |\n| `--log-file PATH` | off | Opt in to a raw transaction log at PATH (mode 0600). Implies `--verbose`. No `/tmp` file is created by default. |\n| `--continue` | off | Load `.easyai_session` from cwd before the first prompt. Default OFF (since 2026-05-13): without this flag any existing session file is ignored and overwritten on the first turn. Session is always saved per turn regardless. INI: `[cli] auto_continue`. |\n| `--no-continue` | — | Explicit form of the default — ignore any existing `.easyai_session` and overwrite on the first turn. Useful to override `[cli] auto_continue = on` set in INI. |\n| `--compress` | off | Ask the model for a lossless recap, replace history with it, save. No-op without `--continue` (nothing in memory to recap). Also `/compress` mid-REPL. INI: `[cli] auto_compress`. |\n| `--list-tools` | — | Print local tools (no chat). |\n| `--list-remote-tools` | — | `GET /v1/tools` (no chat). |\n| `--list-models` | — | `GET /v1/models`. |\n| `--health` | — | `GET /health`. |\n| `--props` | — | `GET /props`. |\n| `--metrics` | — | `GET /metrics` (Prometheus text). |\n| `--set-preset NAME` | — | `POST /v1/preset {preset:NAME}`. |\n\n### `easyai-local` — local-engine REPL\n\nLoads a GGUF model in-process (no server). For remote endpoints\nuse `easyai-cli`.\n\n| Flag | Default | What it does |\n|---|---|---|\n| `-m, --model PATH` | (required) | GGUF file. |\n| `-p, --prompt TEXT` | (REPL) | One-shot: run prompt, print, exit. |\n| `-s, --system-file PATH` | — | System prompt from file. |\n| `--system TEXT` | — | Inline system prompt. |\n| `--preset NAME` | `precise` | Initial preset. See [Sampling presets](#sampling-presets). |\n| `--no-think` | off | Strip `\u003cthink\u003e…\u003c/think\u003e` from output. |\n| `-q, --quiet` | off | Disable spinner glyph + ctx-fill gauge. |\n| `--temperature F` | per preset | Override temperature. |\n| `--top-p F` | per preset | top-p. |\n| `--top-k N` | per preset | top-k. |\n| `--min-p F` | per preset | min-p. |\n| `--repeat-penalty F` | 1.04 | Repetition penalty — anti-loop default; pass 1.0 to disable. |\n| `--frequency-penalty F` | 0.05 | Frequency penalty (`[0.0, 2.0]`). |\n| `--presence-penalty F` | 0.1 | Presence penalty (`[-2.0, 2.0]`). |\n| `--max-tokens N` | 12288 | Cap tokens per turn. |\n| `--seed U32` | random | RNG seed. |\n| `-c, --ctx N` | 262144 | Context size. |\n| `--batch N` | = ctx | Logical batch size. |\n| `--ngl N` | 99 | GPU layers. |\n| `--split-mode, -sm MODE` | `none` | Multi-GPU split strategy: `none`, `layer`, `row`, `tensor`. |\n| `--rope-scaling MODE` | `yarn` | RoPE scaling method: `none`, `linear`, `yarn`. |\n| `--rope-scale F` | 2 | RoPE frequency scale factor. |\n| `--yarn-orig-ctx N` | 131072 | YaRN original context size for scaling. |\n| `-t, --threads N` | hw cores | CPU threads. |\n| `--no-tools` | off | Skip the built-in toolbelt. |\n| `--sandbox DIR` | — | Enable the unified `fs` tool scoped to `DIR`. |\n| `--allow-bash` | off | Register `bash`. |\n| `--no-python` | python3 on | Drop the auto-registered `python3` tool. |\n| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |\n| `--memory DIR` | — | Enable persistent memory (alias `--RAG`). |\n| `-ctk, --cache-type-k TYPE` | `f16` | K-cache dtype. |\n| `-ctv, --cache-type-v TYPE` | `f16` | V-cache dtype. |\n| `-nkvo, --no-kv-offload` | off | Keep KV cache on CPU. |\n| `--kv-unified` | off | Single unified KV buffer. |\n| `--override-kv K=T:V` | — | GGUF metadata override (repeatable). |\n\n### Example apps (lib API demos)\n\nThree small binaries under `services/` show the lib API in\ncontext. They take minimal flags — the real config happens in\nthe C++ source as fluent setter chains. Read these as the\ncanonical \"how do I use the lib?\" answer.\n\n| Binary | Min flags | Purpose |\n|---|---|---|\n| `easyai-chat` | `-m PATH` OR `--url BASE`, `[--system TEXT]` | One-shot chat over Engine OR Client (auto-picks). |\n| `easyai-agent` | `-m PATH`, `[-c CTX]`, `[-ngl N]` | Tiny agentic-loop demo with tool registration. |\n| `easyai-recipes` | `-m PATH` | Five recipes (chat, persona, REPL, tools, agent loop). |\n\n### Library API — `easyai::Agent`\n\nThe 30-second front door. Construct, optionally chain a few\nfluent setters, call `ask()`. Header:\n[`include/easyai/agent.hpp`](include/easyai/agent.hpp).\n\n| Method | Type | Default | What it does |\n|---|---|---|---|\n| `Agent(model_path)` | ctor | — | Local model. |\n| `Agent::remote(base_url, api_key=\"\")` | static | — | Remote endpoint. |\n| `.system(prompt)` | `string` | — | System prompt. |\n| `.sandbox(dir)` | `string` | — | Enable `*_file` scoped to `dir`. |\n| `.allow_bash(on=true)` | `bool` | off | Register `bash`. |\n| `.preset(name)` | `string` | `precise` | Sampling profile. |\n| `.remote_model(id)` | `string` | — | Remote model id (remote mode only). |\n| `.temperature(t) / .top_p(p) / .top_k(k) / .min_p(p)` | scalar | per preset | Sampling overrides. |\n| `.on_token(cb)` | `function` | — | Streaming-token callback. |\n| `.ask(text)` | call | — | One-shot turn; runs tool dispatch inline. |\n| `.reset()` | call | — | Wipe history. |\n| `.last_error()` | accessor | — | Diagnostic. |\n| `.backend()` | accessor | — | Escape hatch to the underlying `Backend \u0026`. |\n\n### Library API — `easyai::Engine` (local llama.cpp)\n\nFull local engine. Header:\n[`include/easyai/engine.hpp`](include/easyai/engine.hpp).\n\n| Method | Type | Default | What it does |\n|---|---|---|---|\n| `.model(gguf_path)` | `string` | — | GGUF file. |\n| `.context(n) / .batch(n)` | `int` | 262144 / = ctx | KV / logical batch size. |\n| `.gpu_layers(n)` | `int` | 99 | 99 = all layers offloaded, 0 = CPU only. |\n| `.threads(n) / .threads_batch(n)` | `int` | hw / = threads | CPU threads. |\n| `.seed(u32)` | `uint32_t` | random | RNG seed. |\n| `.system(prompt)` | `string` | — | System prompt. |\n| `.temperature(t) / .top_p(p) / .top_k(k) / .min_p(p)` | scalar | 0.2 / 0.92 / 50 / 0.03 | Sampling. |\n| `.repeat_penalty(r)` | `float` | 1.04 | Repetition penalty (multiplicative on recent logits) — anti-loop default. Set to 1.0 to disable. |\n| `.frequency_penalty(f)` | `float` | 0.05 | Frequency penalty (additive, scales with count, `[0.0, 2.0]`). |\n| `.presence_penalty(p)` | `float` | 0.1 | Presence penalty (additive, fixed cost per token-already-seen, OpenAI semantics, range `[-2.0, 2.0]`). Pairs well with `repeat_penalty=1.0` on long agentic flows. See [`design.md` §4b](design.md#4b-sampling-and-the-penalty-stack). |\n| `.max_tokens(n)` | `int` | 12288 | Per-turn cap. |\n| `.tool_choice_auto / .tool_choice_required / .tool_choice_none` | call | auto | Tool-choice mode. |\n| `.parallel_tool_calls(on)` | `bool` | off | Allow parallel tool calls. |\n| `.verbose(on)` | `bool` | off | Engine debug logs. |\n| `.max_tool_hops(n)` | `int` | 8 | Agentic-loop cap (bumped to 99999 with `bash`). |\n| `.retry_on_incomplete(on)` | `bool` | on | Auto-retry \"announce-only\" turns. |\n| `.max_incomplete_retries(n)` | `int` | 10 | Retry budget; 0 disables. |\n| `.stop_at_ctx_pct(pct)` | `int` | 100 | Hard ceiling on context fill; 0 disables. |\n| `.cache_type_k(name) / .cache_type_v(name)` | `string` | `f16` | KV-cache dtype. |\n| `.no_kv_offload(on) / .kv_unified(on)` | `bool` | off | KV placement / layout. |\n| `.add_kv_override(spec)` | `string` | — | GGUF metadata override (repeatable). |\n| `.flash_attn(on) / .use_mlock(on) / .use_mmap(on)` | `bool` | auto/off/on | Compute / memory. |\n| `.numa(strategy)` | `string` | off | `distribute` / `isolate` / `numactl` / `\"\"`. |\n| `.split_mode(mode)` | `string` | `none` | Multi-GPU split: `none`, `layer`, `row`, `tensor`. |\n| `.rope_scaling(mode)` | `string` | `yarn` | RoPE scaling: `none`, `linear`, `yarn`. |\n| `.rope_freq_scale(f)` | `float` | 2 | RoPE frequency scale factor. |\n| `.yarn_orig_ctx(n)` | `int` | 131072 | YaRN original context size. |\n| `.enable_thinking(on)` | `bool` | on | Chat-template thinking flag. |\n| `.add_tool(t) / .clear_tools()` | call | — | Tool registration. |\n| `.on_token(cb) / .on_tool(cb) / .on_hop_reset(cb) / .on_incomplete_retry(cb)` | callback | — | Streaming hooks. |\n| `.load() / .reset() / .clear_kv()` | call | — | Lifecycle. |\n| `.set_sampling(t,p,k,m)` | call | — | Re-sample mid-conversation. |\n| `.push_message(role, content, [tool_name, tool_call_id])` | call | — | Append history without generating. |\n| `.replace_history(messages)` | call | — | Full-fidelity history replay. |\n| `.chat(text) / .chat_continue() / .generate_one() / .generate()` | call | — | Inference primitives. |\n| `.request_cancel() / .clear_cancel() / .cancel_requested()` | call | — | Thread-safe cancel. |\n| `.last_error() / .last_was_ctx_full() / .turns() / .tools() / .backend_summary() / .n_ctx() / .model_path() / .perf_data() / .perf_reset()` | accessor | — | Introspection. |\n\n### Library API — `easyai::Client` (remote OpenAI-compat)\n\nRemote counterpart of `Engine`. Tools execute LOCALLY in the\nconsumer process. Header:\n[`include/easyai/client.hpp`](include/easyai/client.hpp).\n\n| Method | Type | Default | What it does |\n|---|---|---|---|\n| `.endpoint(url)` | `string` | — | `http(s)://host[:port]`. |\n| `.api_key(key)` | `string` | — | Bearer token. |\n| `.timeout_seconds(s)` | `int` | 86400 (24h) | Connect+read timeout — sized for multi-hour agentic sessions. |\n| `.http_retries(n)` | `int` | 5 | Extra attempts on transient HTTP failures (pre-stream only — never retries mid-stream). 0 disables. Each retry logs to stderr. |\n| `.verbose(v)` | `bool` | off | Log SSE lines to stderr. |\n| `.log_file(fp)` | `FILE*` | — | Tee every HTTP transaction. |\n| `.max_reasoning_chars(n)` | `int` | 0 (off) | Abort SSE when reasoning \u003e N chars. |\n| `.retry_on_incomplete(v)` | `bool` | on | Auto-retry \"announce-only\" turns. |\n| `.stop_at_ctx_pct(pct)` | `int` | 100 | Bail when server-reported `ctx_used/n_ctx` exceeds. |\n| `.max_tool_hops(n)` | `int` | 8 | Agentic-loop cap. |\n| `.tls_insecure(v) / .ca_cert_path(path)` | `bool` / `string` | off / system | HTTPS-only TLS knobs. |\n| `.model(id)` | `string` | — | Request body `model` field. |\n| `.system(prompt)` | `string` | — | System prompt(s). |\n| `.temperature(t) / .top_p(v) / .top_k(v) / .min_p(v)` | scalar | server | Sampling. |\n| `.repeat_penalty(v)` | float | 1.04 | Repetition penalty — anti-loop default; `1.0` disables. |\n| `.frequency_penalty(v) / .presence_penalty(v)` | float | server | OpenAI-shape penalties. |\n| `.seed(s)` | `long long` | -1 | -1 = randomise. |\n| `.max_tokens(n)` | `int` | server | Cap. |\n| `.stop(sequences)` | `vector\u003cstring\u003e` | — | Stop strings. |\n| `.extra_body_json(raw)` | `string` | — | Free-form JSON merged into request body. |\n| `.add_tool(t) / .clear_tools() / .tools()` | call | — | Tool registration. |\n| `.on_token(cb) / .on_reason(cb) / .on_tool(cb)` | callback | — | Streaming hooks. |\n| `.chat(text) / .chat_continue() / .clear_history()` | call | — | Inference + history. |\n| `.list_models / .list_remote_tools / .health / .metrics / .props / .set_preset` | call | — | Direct endpoint helpers. |\n| `.request_cancel() / .clear_cancel() / .cancel_requested()` | call | — | Thread-safe cancel. |\n| `.last_error() / .last_turn_was_incomplete() / .last_ctx_used() / .last_n_ctx() / .last_ctx_pct() / .last_was_ctx_full()` | accessor | — | Introspection. |\n\n### Library API — `easyai::cli::Toolbelt`\n\nCanonical agent toolset, fluently configured. Replaces the\n\"copy the same `if (sandbox.empty()) … else …` block five times\"\npattern. Header: [`include/easyai/cli.hpp`](include/easyai/cli.hpp).\n\n| Method | Default | What it does |\n|---|---|---|\n| `.sandbox(dir)` | `\"\"` | Root for the unified `fs` tool (empty = no fs tool). |\n| `.allow_fs(on)` | on | Register the unified `fs` tool (off in server unless `--allow-fs`). |\n| `.allow_bash(on)` | off | Register `bash` (also bumps `max_tool_hops` to 99999). |\n| `.with_plan(plan)` | — | Register the planning tool backed by a `Plan\u0026`. |\n| `.no_web(on)` | off | Drop the unified `web` tool. |\n| `.no_datetime(on)` | off | Drop `datetime`. |\n| `.use_google(on)` | off | Enable engine=`\"google\"` inside `web` (env vars required at apply-time). |\n| `.tools()` | — | Materialise `vector\u003cTool\u003e`. |\n| `.apply(engine) / .apply(client)` | — | Register on the consumer + bump hops if bash. |\n\n### Sampling — what each knob does\n\nAt every step the model emits a probability distribution over the whole\nvocabulary (~100k+ tokens). These knobs decide how a token is picked\nfrom it. They work in sequence: the *cutters* (`top_k`, `top_p`,\n`min_p`) narrow the candidate pool over the raw distribution, then\n`temperature` controls how randomly the final token is drawn from the\nsurvivors.\n\n* **`temperature`** — the focus-vs-risk dial; divides the logits before\n  softmax. `→ 0` is greedy (always the top token: deterministic, can\n  repeat). `0.2–0.5` keeps the model tight on format, syntax, and\n  facts. `1.0` is the model's unmodified distribution. `\u003e 1.0` flattens\n  the curve so unlikely tokens get a real chance — more varied and\n  creative, but more prone to error and incoherence. This is the main\n  *behaviour* dial.\n* **`top_k`** — a *fixed* cut of the tail: keep only the K\n  most-probable tokens, discard the rest. Non-adaptive — it always cuts\n  at K whether the model is certain or unsure. A cheap guardrail\n  against ever picking junk from the long tail.\n* **`top_p`** (nucleus) — an *adaptive* cut: keep the smallest set of\n  top tokens whose probabilities sum to P. Adapts to confidence — when\n  the model is sure (one token at 0.9) the nucleus is tiny; when it's\n  unsure (mass spread wide) the nucleus is large. Cuts the tail\n  proportionally.\n* **`min_p`** — also adaptive, but anchored to the *top* token instead\n  of cumulative mass: keep tokens with `prob ≥ min_p × prob_of_top`.\n  `min_p 0.1` keeps anything within 10× of the best; `min_p 0.5` keeps\n  only what's within 2× — aggressive, very focused output.\n\n**How they interact.** They stack. Tightening all of them at once (low\n`top_k` + low `top_p` + low `temperature`) is redundant — they do the\nsame job and you over-constrain into robotic output. Practical rule:\npick *one* adaptive cutter (`top_p ~0.9–0.95` **or** `min_p ~0.05–0.1`),\nleave `top_k` generous as a cheap backstop, and use `temperature` as\nthe real behaviour dial.\n\n**How to tune.**\n* *Code, agentic / tool-calling, structured output, factual Q\u0026A* — low\n  `temperature` (0.2–0.6) and a tight tail cut. High temperature on\n  code means syntax errors, hallucinated APIs, broken tool calls.\n* *Creative writing, brainstorming* — higher `temperature` (0.8–1.2),\n  looser cutters.\n* *Heavily quantised models* — be more conservative (lower\n  `temperature`, tighter cut). Quantisation already adds noise to the\n  logits; high temperature amplifies that noise into real errors.\n\nThe presets below are just curated combinations of these four knobs —\ne.g. `precise` (the project default) encodes `temp 0.2, top_p 0.92,\ntop_k 50, min_p 0.03`.\n\n### Sampling presets\n\nNamed profiles applied via `--preset NAME` (binaries) or\n`Engine::set_sampling()` / `easyai::find_preset()` (lib). Numbers are\nbaselines; `\u003cp","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolariun%2Feasyai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsolariun%2Feasyai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsolariun%2Feasyai/lists"}