https://github.com/solariun/easyai
Easyai - run local models with your tools, easy tools def, buildin RAG, fs tools, web search and fetch, MCP server and ai client with local tools all made simple and easy
https://github.com/solariun/easyai
ai ai-webui fs-tools gguff llama-cpp local rag server tools web-fetch web-search webui
Last synced: 1 day ago
JSON representation
Easyai - run local models with your tools, easy tools def, buildin RAG, fs tools, web search and fetch, MCP server and ai client with local tools all made simple and easy
- Host: GitHub
- URL: https://github.com/solariun/easyai
- Owner: solariun
- Created: 2026-04-25T10:23:54.000Z (2 months ago)
- Default Branch: develop
- Last Pushed: 2026-06-19T19:14:49.000Z (16 days ago)
- Last Synced: 2026-06-19T19:16:51.437Z (16 days ago)
- Topics: ai, ai-webui, fs-tools, gguff, llama-cpp, local, rag, server, tools, web-fetch, web-search, webui
- Language: C++
- Homepage:
- Size: 6.01 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Security: SECURITY_AUDIT.md
Awesome Lists containing this project
README
# easyai
> **A C++17 framework anyone can use to build AI agents that talk to
> their own services — no llama.cpp, JSON-Schema, or template-engine
> knowledge required.**
easyai turns [llama.cpp](https://github.com/ggml-org/llama.cpp) into an
*agent engine* you can drop into any program in a dozen lines. You give
it C++ functions; it gives the model the ability to call them. That's
the whole pitch.
It ships **one unified library** (`libeasyai`) you can
`find_package(easyai)` and link against — **plus a complete set of
ready-to-run applications** built on it: a private, OpenAI-compatible
**AI server** with a polished web dashboard, a full-screen agent
**CLI/TUI**, an **MCP provider**, and a local **REPL** — all backed by a
**batteries-included toolset** (web search & fetch, sandboxed files,
shell, Python, memory/RAG, MCP). See [`LIB_GUIDE.md`](LIB_GUIDE.md) for
the OpenAI-Python-SDK-shaped `easyai::Session` quickstart and the tour of
the lib surface.
| Library | Purpose |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| `libeasyai` | Everything in one shared object — `easyai::Engine` (local llama.cpp), `easyai::Client` (OpenAI-protocol HTTP), `easyai::Session` (one-call agent), `easyai::Tool` + built-ins (datetime/web/fs/bash/python/memory/tool_lookup), external-tool loader, RAG store, MCP server/client, the `preamble` composer. Linked via `easyai` (alias `easyai::easyai`). Legacy aliases `easyai::engine` and `easyai::cli` still resolve to the same target. |
| Binary | What it gives you |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
| `easyai-local` | Local-only REPL: loads a GGUF in-process via `easyai::Engine` (driven through `easyai::Session`). Drop-in `llama-cli` replacement — one-shot scripting (`-p`), tools, presets, optional `` strip, sandboxed `*_file` tools, opt-in `bash` tool. |
| `easyai-cli` | Agentic OpenAI-protocol client — no local model. Full-screen chat **TUI** (opencode-style look & feel: markdown, live tool rows with diffs, `/`-command + `@`-file completion, themes — default for interactive terminals; `--plain` for the legacy line REPL), `--shell` (hybrid AI shell), or `-p` one-shot. Full sampling control (`--temperature`, `--top-p`, `--top-k`, `--min-p`, `--repeat-penalty`, `--frequency-penalty`, `--presence-penalty`, `--seed`, `--max-tokens`, `--stop`), plan tool, server-management subcommands (`--list-models`, `--list-tools`, `--health`, `--props`, `--metrics`, `--set-preset`). HTTPS via OpenSSL; `--insecure-tls` / `--ca-cert` for dev/internal CAs. Full doc: [`easyai-cli.md`](easyai-cli.md). |
| `easyai-server` | Drop-in `llama-server` replacement: OpenAI-compat HTTP **with full SSE streaming**, embedded SvelteKit webui, Bearer auth, Prometheus `/metrics`, KV-cache controls, flash-attn, mlock. Built-in **[MODELS dashboard](MODELS.md)** (`/models`) — native hardware-aware model recommendations, local-model parameters + in-process hot-swap, and a GGUF download manager, password-gated. Speaks MCP, OpenAI, Ollama from one process. Full doc: [`easyai-server.md`](easyai-server.md). |
| `easyai-mcp-server` | **Standalone Model Context Protocol provider — no model loaded.** Same tool catalogue as `easyai-server` (built-ins + knowledge tools + external-tools), exposed over `POST /mcp` with a configurable cpp-httplib worker pool (`--threads`) and an in-flight `tools/call` cap (`--max-concurrent-calls`) for thousands-of-clients deployments. Full doc: [`easyai-mcp-server.md`](easyai-mcp-server.md). |
| `easyai-library-demo`| Five-line `easyai::Session` template — pair with [`LIB_GUIDE.md`](LIB_GUIDE.md). The smallest "build an agent, register a tool, chat" program in the repo. |
| `easyai-agent` | A demo agent showing every built-in tool plus an inline custom tool. |
| `easyai-recipes` | Tutorial agent paired with `manual.md` — implements `today_is` and `weather` (HTTP-calling) from scratch. |
| `easyai-chat` | A bare-bones REPL with no tools — useful as a sanity check. |
## ⭐ Applications & a batteries-included toolset
easyai is more than a library — it's a **complete, self-hostable AI stack** with a
**rich set of tools** the model can use out of the box.
**The applications**
- 🖥️ **easyai-server** — your own private, **OpenAI-compatible AI server** with a
polished chat web UI and the **[MODELS dashboard](MODELS.md)** (`/models`): browse &
fit-score HuggingFace models against *your* hardware, read a model's GGUF
parameters, **hot-swap the running model in one click**, and download weights — all
password-gated. The dashboard keeps the **1000 most-recently-updated GGUF repos**
in a searchable catalog, cached on disk (`--data-dir`, default `/var/lib/easyai/data`)
and refreshed from HuggingFace on request once it is >1h old. Full SSE streaming,
Prometheus `/metrics`, Bearer auth, KV-cache / flash-attn knobs. Speaks **MCP, OpenAI
and Ollama** from one process. A drop-in `llama-server`, supercharged.
- 💬 **easyai-cli** — a gorgeous full-screen agent **TUI** (markdown, live tool rows
with diffs, `/`-commands, `@`-file completion, themes), a hybrid **AI shell**
(`--shell`), or one-shot `-p` scripting — against any OpenAI-protocol endpoint, with
full sampling control and server-management subcommands.
- 🔌 **easyai-mcp-server** — expose the **entire toolset as an MCP provider** that any
agent (Claude Desktop, Cursor, …) can call, with a tunable worker pool for
thousands-of-clients deployments.
- ⚡ **easyai-local** — a local GGUF **REPL** (`llama-cli`++) with tools, presets, and
sandboxing — no server required.
**The toolset** — registered with a single flag, available to every app and the MCP
server:
| Tool | What it does |
|------|--------------|
| 🌐 **web** | Live internet **search + fetch** (SearXNG / Google CSE / direct URL). |
| 📁 **fs** | Sandboxed **read / write / list / grep** over a directory. |
| 🐚 **bash** · 🧮 **evaluate** | Run shell commands, or **isolated stdlib-only Python** for compute. |
| 🧠 **memory + RAG** | Persistent knowledge store with **automatic vocabulary injection** ([`RAG.md`](RAG.md)). |
| 🔗 **MCP client** | Consume tools from **any remote MCP server** ([`MCP.md`](MCP.md)). |
| 🛠️ **external tools** | Wire up **your own CLIs** from a JSON manifest — zero code ([`EXTERNAL_TOOLS.md`](EXTERNAL_TOOLS.md)). |
| 🧩 **plan · datetime · tool_lookup · remote-model** | Planning, authoritative time, large-catalogue tool discovery, peer-model delegation. |
Full catalogue and safety model: **[`AI_TOOLS.md`](AI_TOOLS.md)**.
> **Status** — used in production on a Linux Vulkan box (Radeon 680M)
> as a self-hosted ChatGPT-style assistant. Apple Silicon (Metal),
> Linux/Windows Vulkan, NVIDIA CUDA, and AMD ROCm are all wired up out
> of the box. `scripts/install_easyai_server.sh` handles the whole
> Debian/Ubuntu deployment in one command (systemd-coredump,
> hardened unit, optional `--enable-verbose`, drop-in compat with
> `install_llama_server.sh`).
---
## What's new
A running log of user-facing changes. Latest first — keep this list
current as features land so anyone returning to the repo (or
landing on it for the first time) sees what shipped recently.
### 2026-05-27 — Unified library + `easyai::Session` (OpenAI-Python-SDK shape)
Single library now — `libeasyai` carries everything (Engine, Client,
every tool, the system-prompt composer, Session). The previous split
into `libeasyai` + `libeasyai-cli` is gone. Demos all link a single
target (`easyai`). Legacy aliases (`easyai::engine`, `easyai::cli`)
still resolve to it so existing CMakeLists keep working.
The new `easyai::Session` (in `easyai/session.hpp`) is the
recommended entry point — five fluent lines for "build an agent,
register a tool, chat", mirroring the OpenAI Python SDK call site:
```cpp
auto session = easyai::Session::remote("http://localhost:8080");
session.with_default_tools()
.system_append("Speak in plain English.")
.on_token([](const std::string & p){ std::fputs(p.c_str(), stdout); });
std::string err;
session.init(err);
session.chat("hello");
```
Pair with `easyai::Tool::builder(...).system_addendum("...")` to let a
custom tool ship its own system-prompt guardrails — Session
auto-concatenates them. Full reference in
[`LIB_GUIDE.md`](LIB_GUIDE.md); minimum demo in `services/library_demo.cpp`
(binary `easyai-library-demo`). `services/local.cpp` has been
migrated to Session as a reference for in-process agents; `cli.cpp`
and `server.cpp` follow in a later pass and continue to work via the
existing Engine/Client paths.
### 2026-05-26 — `python3` model-facing rename to `evaluate` (back-compat alias)
Final fix to a stubborn failure mode: models with a strong "Python
writes files / runs subprocesses / fetches URLs" training prior were
reaching for the `python3` tool to do exactly those system-side
things even when the system prompt, the tool description, and the
sandbox PermissionError all said *don't*. The lighter fixes (Shape-C
short triggers, write/edit policy block, runtime sandbox enforcement)
took the failure rate down but didn't kill it.
**The rename:** the **model-facing** tool name changed from
`python3` → `evaluate`. The runtime is still Python 3 (operators
still see `--no-python`, `[SERVER] allow_python`, the Python sandbox
preamble, etc.). The split is deliberate:
| Surface | Name |
|---|---|
| Tool name in `` / `tools/list` / what the model dispatches | `evaluate` |
| Tool short description | `"Evaluate Python 3 code for compute / algorithm prototyping. FORBIDDEN: filesystem, subprocess, network, ctypes. Stdlib compute only."` |
| Operator CLI flag | `--no-python` (unchanged) |
| Operator INI key | `[SERVER] allow_python` (unchanged) |
`canonical_tool_name("python3")` returns `"evaluate"` so resumed chat
sessions, external-tools manifest reservation lists, and any caller
that dispatches `python3` by name still work — the dispatcher routes
the legacy name to the new tool, no second schema shipped.
Reframing: model now sees an "evaluate" affordance with an explicit
**FORBIDDEN: filesystem, subprocess, network, ctypes** list, not a
"python3" affordance with a "don't write files" caveat. The first
non-generic word the model parses on the bullet line is "evaluate" —
no "python = open(f, 'w')" training prior to override.
### 2026-05-26 — Shape-C tools wire shape, `evaluate` read-only, `fs.ops` 50/20
Five linked changes refactor how tools reach the model:
* **Shape-C wire shape.** Per-turn `` blocks now ship
`name + short_description + schema` (~2 000 tokens saved per
session on a typical catalogue). Full multi-line manual stays
in libeasyai and is returned by `tool_lookup(name="")` on
demand. `Tool::short_description` + `wire_description()` are
new; tools without an explicit short trigger fall back to the
first 120 chars of `description`.
* **`tool_lookup` gains a MANUAL view.** No-arg call returns the
INDEX (numbered `name: short trigger` list); `name=""`
returns the FULL description for every match. The model uses
the index to scan and drills in only when it needs the manual.
* **`evaluate` (formerly `python3`) is now read-only on disk.**
`kPythonSandboxPreamble` rejects any write-mode `open()` (mode
`'w'/'a'/'x'/'+'` or `os.open` with
`O_WRONLY|O_RDWR|O_CREAT|O_TRUNC|O_APPEND`) regardless of path.
Read-only opens inside the sandbox still work. `PermissionError`
points the model at the filesystem write tool registered this
session. Defense-in-depth — adversarial bypasses (`ctypes`,
`subprocess`, `_io.FileIO`, closure-cell introspection) are
documented residuals.
* **`fs(action="ops")` batch caps raised to 50 ops / 20 files.**
One call can land up to 50 file operations across up to 20
distinct files. Same-path edits auto-reorder bottom-up so every
`start_line` refers to the file's ORIGINAL line numbers — no
manual offset math. Report header names the touched files;
successful `read` ops in a batch clip at 2 KiB; failed ops show
the full diagnostic so the model can self-correct without
re-running.
* **`fs(action="ops")` batch lives on the unified `fs` surface.**
Default `ToolMode` stays `Split` (one focused tool per action —
small models drive it more reliably). To pick up the batch, run
with `--tools-mode unified` (or `--tools-mode both`).
Plus: MCP server adds an `initialize.instructions` field carrying
the closed-set rule
plus the same write/edit policy; memory vocabulary block moved to
the preamble tail and cached by `(mtime, file count)` so prompt-
eval KV stays warm across memory writes.
Security audit: see [SECURITY_AUDIT.md §23 (eighth pass)](SECURITY_AUDIT.md#23-eighth-pass--2026-05-26-shape-c-tools-refactor).
One MEDIUM finding (`tools_block` rendered untrusted fields verbatim
— fixed by `sanitize_for_prompt`), two LOW residuals documented.
### 2026-05-17 — MTP speculative decoding (`--spec-type draft-mtp`) + installer `--mtp`
llama.cpp's Multi-Token Prediction merged upstream on 2026-05-16; we
bumped our vendored llama.cpp checkout to `39cf5d619` (same-day HEAD,
all 262 commits since the previous pin) and wired the MTP path
through the three layers in one go.
**Library API** ([include/easyai/engine.hpp](include/easyai/engine.hpp)):
```cpp
engine.spec_type("draft-mtp") // or: none (default), draft-simple,
// draft-eagle3, ngram-simple,
// ngram-map-k, ngram-map-k4v,
// ngram-mod, ngram-cache
.spec_draft_n_max(6); // max draft tokens per step
```
Unknown strings land in `Engine::last_error()` and leave speculation
off (no silent default switch).
**Server CLI**:
```bash
easyai-server -m /path/to/mtp-model.gguf \
--spec-type draft-mtp --spec-draft-n-max 6
```
INI keys: `[ENGINE] spec_type` and `[ENGINE] spec_draft_n_max`.
**Installer shortcut**:
```bash
./install_easyai_server.sh --mtp # n_max=6 (default)
./install_easyai_server.sh --mtp --mtp-n-max 8 # override
```
The installer bakes the two flags into the systemd `ExecStart` so the
service inherits MTP without `systemctl edit`.
**Caveat**: MTP needs a model TRAINED with MTP heads (DeepSeek V3,
MimoVL, and similar). Plain models will refuse to load with
`--spec-type draft-mtp`. The installer's `--mtp` flag is the operator
saying "I know what I'm doing"; there's no validation.
Classic standalone-draft-model speculative decoding (the
`--draft-model PATH` path) is not yet wired — only MTP, which doesn't
need a separate model file. The old installer compat lines for
`--draft-model` / `--draft-max` / `--draft-min` still warn and skip.
### 2026-05-16 — Memory vocabulary auto-injection + shared `easyai::preamble::build()`
Every binary that loads `--memory ` now auto-injects a compact
keyword-vocabulary block into the system prompt so the model knows
what it has tagged without having to call `keywords_knowledge`
first. The block looks like:
```
# MEMORY VOCABULARY (the keywords your private memory currently
has tagged — the FIRST place to look for anything you might
already know)
12 entries (most-common first; call search_knowledge(
keywords=["", ...]) to recall):
easyai(8) claude(5) bitnet(3) build(3) iteration(2) …
```
Sorted count desc / name asc, capped at top 40. Empty store →
block omitted, no wasted tokens.
| Binary | When the vocab is computed |
| --- | --- |
| `easyai-server` | Every request (fresh disk scan, ~10-50ms — rounding error vs. inference). New saves visible on the next request. |
| `easyai-local` | Once at startup, appended to the system prompt. New saves visible after restart. |
| `easyai-cli` | Once when building the system prefix sent to the remote server. |
The AUTHORITATIVE preamble used to live as a `build_authoritative_
preamble` inside `services/server.cpp` with parallel partial
copies in `local.cpp` and nothing in `cli.cpp`. That drift is gone:
the builder is now public in libeasyai —
```cpp
// include/easyai/preamble.hpp
namespace easyai::preamble {
struct Options {
bool inject_datetime = true;
std::string knowledge_cutoff = "2024-10";
std::string memory_root; // empty → vocab block omitted
};
std::string build(const Options & opt);
}
```
— and all three binaries call it. Change the renderer once, every
binary updates. Third-party hosts of libeasyai get the same
behaviour out of the box.
See `RAG.md` §5 "Automatic vocabulary injection" and `design.md`
§5c for the full design.
### 2026-05-15 — `split` is the new tools-mode default
Same-day follow-up to the morning's `--tools-mode` landing: **`split`
is now the out-of-the-box default**, not `unified`.
Reason: smaller / quantised tool-callers (Llama 3 8B, Qwen 2.5 7B,
Phi-3.5, GPT-OSS-20B) dispatch much more reliably against flat
one-verb-per-tool schemas than against a `fs(action="...")`
discriminated-union dispatcher. Large models handle either shape
fine. The split surface costs ~15-20% extra system-prompt tokens for
a 30-50% reduction in retry / "unknown action" hops in practice —
worth it for everyone, surprising for nobody.
| Surface | Registered out of the box | Old behaviour | New default |
| --- | --- | --- | --- |
| Multi-action families | `fs`, `web` | 2 dispatchers + 7 knowledge tools | `read_file`, `write_file`, `append_file`, `edit_file`, `list_file`, `glob_file`, `grep_file`, `check_path_file`, `cwd_file`, `sandbox_path_file`, `search_web`, `fetch_web`, `knowledge_save`, `knowledge_append`, `search_knowledge`, `knowledge_load`, `knowledge_list`, `knowledge_delete`, `keywords_knowledge` — 19 focused tools |
```bash
# new default (no flag)
easyai-cli --url http://ai.local:8080 --sandbox ~/proj
# opt back in to the legacy dispatcher (3 tools instead of 19)
easyai-cli --tools-mode unified --url ai.local:8080 --sandbox ~/proj
# best of both worlds — costs more tokens, lets the model pick
easyai-cli --tools-mode both --url ai.local:8080 --sandbox ~/proj
```
Library callers: `Toolbelt::tool_mode_` now defaults to
`ToolMode::Split`; pass `ToolMode::Unified` explicitly if your prompt
relies on the legacy tool names.
INI: `[cli] tools_mode = unified|split|both` (default `split`).
### 2026-05-15 — `--tools-mode` lets small models work with one-verb-per-tool
`fs` and `web` ship as **unified dispatchers** with an
`action` parameter (e.g. `fs(action="read", ...)`). That shape keeps
the system prompt small and lets a large model batch many actions, but
**smaller / quantised tool-callers** (Llama 3 8B, Qwen 2.5 7B, Phi-3.5,
GPT-OSS-20B) gravitate toward one-purpose tools — `read_file`, `edit_file`,
etc. — because the verb IS the tool name and the parameter schema is
flat.
Three modes, selected by the new flag (defaults flipped to `split` in
the same-day follow-up entry above):
```
easyai-cli --tools-mode unified # legacy: one dispatcher per family
easyai-cli --tools-mode split # one focused tool per action
easyai-cli --tools-mode both # register both surfaces side-by-side
```
| Mode | Tools registered (with `--sandbox` + `--memory`) |
| --- | --- |
| `unified` | `fs`, `web` — 2 dispatchers + 7 `knowledge_*` tools |
| `split` (new default) | `read_file`, `write_file`, `append_file`, `edit_file`, `list_file`, `glob_file`, `grep_file`, `check_path_file`, `cwd_file`, `sandbox_path_file`, `search_web`, `fetch_web`, `knowledge_save`, `knowledge_append`, `search_knowledge`, `knowledge_load`, `knowledge_list`, `knowledge_delete`, `keywords_knowledge` — 19 focused tools |
| `both` | unified + split, same handlers under both names |
Same handlers under the hood — behaviour is identical to the unified
surface; only the registration shape changes. Library API:
```cpp
easyai::cli::Toolbelt()
.sandbox("/srv/data")
.tool_mode(easyai::cli::ToolMode::Split) // or Both, or Unified
.apply(client);
```
INI: `[cli] tools_mode = unified|split|both`.
### 2026-05-13 — `easyai-cli` session resume flips back to opt-in
Reverts the 2026-05-12 default flip: loading the existing
`.easyai_session` is **opt-in again** via `--continue`. Without the
flag, any file in cwd is ignored and overwritten on the first turn
— matching the behaviour shipped originally on 2026-05-12 morning
before the auto-on flip.
Why: the auto-on default surprised operators who opened a project
directory expecting a fresh agent and instead picked up history
from a previous experiment. An explicit opt-in matches the rest
of the cli's surface (nothing else implicitly carries state across
invocations) and removes the silent action-at-a-distance.
| | Previous (2026-05-12 → 2026-05-13) | Now |
| --- | --- | --- |
| Resume on launch | default ON | opt-in via `--continue` |
| Start fresh | opt-in via `--no-continue` | **default** |
| `--compress` without `--continue` | no-op (warning) | no-op (warning) |
Saving is unchanged: every turn (and every tool round-trip) still
rewrites `.easyai_session` atomically. `--no-continue` stays as the
explicit form of the default — useful for scripts overriding an
operator's `[cli] auto_continue = on` INI line.
Default for `[cli] auto_continue` flips to `false`. Operators who
prefer the auto-on behaviour can opt in once via INI:
```ini
[cli]
auto_continue = true
```
Full doc: [`easyai-cli.md`](easyai-cli.md) §10.
### 2026-05-13 — Installer: cap `easyai-server` restart attempts at 2
The systemd unit now carries `StartLimitBurst=2` +
`StartLimitIntervalSec=60` in `[Unit]`, so the service attempts to
start at most **twice** in any 60-second window before giving up and
leaving the unit in the `failed` state.
Before, `Restart=on-failure` + `RestartSec=10` with no burst cap
would retry indefinitely — a missing model file, a bad CLI flag, or
a GPU that wasn't exposed to the container produced an infinite
restart loop that filled journald and never surfaced the real
problem.
Now:
| State | Behaviour |
| --- | --- |
| Initial start fails | Wait `RestartSec=10`, retry once |
| Retry also fails | Unit enters `failed` state; no further attempts |
| Long-running service fails after running > 60 s | Burst counter has reset → still gets one retry (not penalised for late failures) |
Recovery: `journalctl -u easyai-server` to inspect the two failed
attempts, fix the root cause, then
`sudo systemctl reset-failed easyai-server`
+ `sudo systemctl start easyai-server`.
Existing installs: re-run `install_easyai_server.sh --force` (or
`--upgrade`) to refresh the unit file. `Restart=on-failure` and
`RestartSec=10` are unchanged.
### 2026-05-13 — Installer: ship only `system.txt_template`; default install uses the binary's built-in prompt
`scripts/install_easyai_server.sh` no longer drops an active
`/etc/easyai/system.txt` on first install. Out-of-the-box, only the
template `/etc/easyai/system.txt_template` ships (the canonical
"factory" copy of the Deep persona, refreshed on every `--upgrade`),
and `SERVER.system_file` is left commented out in `easyai.ini` — so
the server uses the binary's built-in prompt, which is **already
gated on actually-registered tools**: it never advertises `fs` /
`bash` if those are off in the INI.
The template file was also renamed `system.txt_modelo` →
`system.txt_template` (English-only convention).
| State | Before (≤ 2026-05-12) | Now (2026-05-13+) |
| --- | --- | --- |
| Template file at `/etc/easyai/` | `system.txt_modelo` (Portuguese) | `system.txt_template` |
| Active `/etc/easyai/system.txt` on first install | dropped (Deep persona) | **NOT installed** |
| `--force` rewrites `system.txt` | yes | no (file isn't there) |
| `SERVER.system_file` in `easyai.ini` | commented out | commented out (unchanged) |
| Out-of-the-box prompt | active `system.txt` (same Deep body) | binary's built-in, tool-gated |
To activate a custom persona — same one-liner as before:
```bash
sudo cp /etc/easyai/system.txt_template /etc/easyai/system.txt
sudoedit /etc/easyai/system.txt # tweak as needed
sudoedit /etc/easyai/easyai.ini # uncomment SERVER.system_file
sudo systemctl restart easyai-server
```
Existing installs are unaffected: the installer still **preserves**
any existing `/etc/easyai/system.txt` across `--upgrade` and `--force`
runs (it just no longer creates one when it doesn't exist).
Full doc: [`LINUX_SERVER.md`](LINUX_SERVER.md) §6
("`/etc/easyai/system.txt` (operator-supplied) + `system.txt_template`")
and §12 ("Upgrading").
### 2026-05-12 — Installer: `ttm.pages_limit` updated in place on re-run
`scripts/install_easyai_server.sh` used to print
`ttm.pages_limit already present; skipping` when `/etc/default/grub`
already had a `ttm.pages_limit=N` token — even if N differed from
the value the operator just passed via `--gtt`. Result: re-running
the installer with a new GTT size was silently a no-op on the
GRUB side, and the next reboot kept the stale page count.
The patch now compares the existing token's page count against the
target, rewrites it in place when they differ (via `sed -i`), and
runs `update-grub` so the change lands in `/boot/grub/grub.cfg`.
The reboot reminder also points at `/proc/cmdline` so operators
can verify the new value boots cleanly.
No flag change. Operators who pass the same `--gtt` value on every
run see the same idempotent "already present; skipping" message.
### 2026-05-12 — AI Box logo: softer two-layer aura
Tuned the aura halo on the AI Box mark so it reads as a quiet
emission instead of a neon outline. The earlier tuning was
described internally as "loud"; this pass cuts both stacked
Gaussian blurs to subtler values:
| Layer | Before (07c2347) | Now (cc92d51) |
| --- | --- | --- |
| Outer halo `stdDeviation` | 14 | **10** |
| Outer halo `flood-opacity` | 0.5 | **0.3** |
| Inner halo `stdDeviation` | 4 | **3** |
| Inner halo `flood-opacity` | 1.0 | **0.6** |
Gradient, mark geometry, viewBox headroom and filter cyan flood
(`#00bcd4`) all unchanged. Both `webui/AI-brain.svg` (the
canonical SVG source) and the inline `constexpr kBrandSvg` in
[`services/server.cpp`](services/server.cpp) updated in lockstep,
so the favicon route serves the same softened version every
embedder sees.
### 2026-05-12 — `easyai-cli` session: per-tool checkpoint survives force-exit
The previous save points covered every interruption mode **except
force-exit** — triple rapid Ctrl-C triggers the force-exit handler
(`_exit(130)`), which bypasses `atexit` and the post-`chat()`
save in `run_one()`. Operators reported that a long agentic turn
that got force-exited left no `.easyai_session` on disk.
Fix: layer an additional save into the `on_tool` callback so
`.easyai_session` is rewritten **after every tool round-trip** in a
turn, not just at the end of the turn. Only the in-flight partial
reply since the last completed tool is lost; everything earlier
(file edits, bash output, plan steps, RAG queries) is on disk and
re-loadable.
Wiring: `easyai::ui::Streaming::notify_tool(call, result)` is now a
public forwarder for the private on_tool UI handler, so external
embedders can compose extra behaviour onto the `on_tool` slot
(checkpoint to disk, telemetry, audit log) without losing the
streaming output (tool indicators, dim styling, plan rendering).
The cli's binary uses it as:
```cpp
cli.on_tool([&](const ToolCall & c, const ToolResult & r) {
streaming.notify_tool(c, r); // canonical UI
save_session(cli, &err); // disk checkpoint
});
```
Pattern is documented inline in
[`include/easyai/ui.hpp`](include/easyai/ui.hpp) above the
`notify_tool` declaration. No flag / INI change.
### 2026-05-12 — Session resume default-ON + every session knob now in `[cli]` INI
Iteration on yesterday's session-persistence feature: loading the
existing `.easyai_session` is now the **default** (you don't need
`--continue` to pick up where you left off). The semantics flip:
| | Previous (2026-05-12 morning) | Now |
| --- | --- | --- |
| Resume on launch | opt-in via `--continue` | **default ON** |
| Start fresh | default | opt-in via `--no-continue` |
| `--compress` without `--continue` | hard error | warning (no-op when combined with `--no-continue`) |
The cli also now exposes every session-related knob plus the raw-log
knobs through `[cli]` in `/etc/easyai/easyai-cli.ini`:
```ini
[cli]
auto_continue = true # default; load .easyai_session if present
auto_compress = false # default; recap on every load when on
log_file = # default empty; path enables --log-file equivalent
auto_log = false # default; when true, restores the library's legacy /tmp auto-log
show_bash = true # default; mirror bash subprocess output to the operator terminal
show_python = true # default; same for python3
```
CLI flag precedence is unchanged: explicit flag > INI > hardcoded
default. All `--continue` / `--no-continue` / `--compress` /
`--log-file` flags continue to work and override the INI for that
invocation.
`--continue` is kept as a no-op alias for backward compat (useful in
scripts that want to force resume even when an operator's INI flipped
`auto_continue` off).
Full doc: [`easyai-cli.md`](easyai-cli.md) §10.
### 2026-05-12 — easyai-cli session persistence + raw log default OFF
Every `easyai-cli` invocation now writes a `.easyai_session` file in
the current working directory after each chat turn (atomic tempfile
+ rename, mode 0600). Three control points:
| Surface | What it does |
| --- | --- |
| (no flag) | Start fresh, overwrite on first turn, save every turn |
| `--continue` | Resume the `.easyai_session` in cwd; warn + start fresh if none |
| `--continue --compress` | Resume + ask the model for one lossless recap; replace history with the recap before the first prompt |
| `/compress` (REPL) | Same recap flow, fired mid-session |
The file is the raw OpenAI-shape message array (greppable, diffable,
re-loadable). Two new methods on the public `Client` API
(`dump_history()` / `load_history()`) make the same persistence
available to library embedders.
**Raw log default flipped to OFF.** Prior versions created
`/tmp/easyai-cli-remote--.log` whenever `--verbose` was
set, AND the library opened a separate `/tmp/easyai-client--.log`
on every Client construction. Both are now opt-in:
* The binary's transaction log opens **only** when `--log-file PATH`
is given (mode 0600 at PATH). `--verbose` is now stderr-only.
* The library's auto-log is suppressed by setting
`EASYAI_NO_AUTO_LOG=1` in the cli binary's `main()` before the
Client is constructed. Operator override
(`EASYAI_NO_AUTO_LOG=0` in the env) still wins.
Net: a default invocation leaves nothing in `/tmp`. See
[`easyai-cli.md`](easyai-cli.md) §9 and §10 for full docs.
### 2026-05-11 — fs(action="edit") seam-line corruption fix (HIGH, post-publish correction)
A user-reported bug: `fs(action="edit")` was silently corrupting
files when the model passed `content` without a trailing `\n`.
The last byte of `content` got glued onto the first preserved line
after the edit range — turning `int b = 22;\n return a + b;`
into `int b = 22; return a + b;`. When the deleted range
happened to contain the only `}` between two function bodies,
this silently swallowed the brace and the file failed to compile
with "function definition is not allowed here" + "expected '}'"
on the next build.
Root cause: the tool description said "include a trailing `\n`
yourself" but the model consistently forgot. Fix:
`make_fs_edit_handler` now auto-inserts a `\n` separator on each
side of `content` if and only if one is needed to keep the seam
lines apart. Both guards no-op when `content` is already
correctly terminated (or empty for a pure delete), so the change
is invisible to model calls that were already doing the right
thing.
Tool description updated to drop the "include trailing `\n`"
advice — line semantics are now preserved automatically.
Verified against a 9-case smoke matrix (middle-replace with/without
trailing newline, multi-line content lacking newline, pure delete,
pure insert, append-at-EOF on files with and without trailing
newline, replace-last-line on a file without trailing newline,
whole-file replacement) — all nine pass.
Documented as §22.8 (post-publish correction) in
[`SECURITY_AUDIT.md`](SECURITY_AUDIT.md); §22.4's "no findings"
claim for the fs.edit/append/ops batch surface has been amended
with a forward-pointer to §22.8. No CLI / INI / library API
changes; rebuild to pick up the fix.
### 2026-05-11 — Security audit 7th pass (1 HIGH, 1 MEDIUM, 1 LOW; no public-interface change)
Re-applied the standing audit on the ~5,000 LoC added since the 6th
pass (2026-05-08). Three findings, all closed in this commit:
* **HIGH — `run_capped_subprocess` banner sanitization.** The
`[bash] $ …` / `[python3] $ …` opening banner used to print the
model-supplied command/code through `fprintf` verbatim, so a
snippet that embedded an ANSI/OSC sequence could repaint the
operator's terminal (window title, screen wipe, OSC 52
clipboard write) one line before any child output arrived. The
live mirror channel was already hardened in §20.1; the banner
is now sanitized the same way (CR/LF/TAB pass; ESC rendered as
visible `^[` marker; other C0/DEL dropped). For `python3` the
banner now shows the *user's code* only — the 25-line sandbox
preamble was previously included, cluttering every transcript.
* **MEDIUM — python3 sandbox preamble closure tightening.** The
preamble that wraps `open()` to pin disk access to the sandbox
used to leave `_e_open_orig`, `_e_chk`, and `_e_root` at module
scope, so user code could trivially call the raw `_e_open_orig`
by name and bypass the check — the comment claimed "closure cell"
protection that the implementation didn't actually provide.
Restructured into an `_e_make_wrappers` factory whose function-
local names become real lexical closure cells; the wrappers
still work, but the originals are no longer reachable from
module scope. (Adversarial bypass via `ctypes` / `subprocess` /
`_io.FileIO` is unchanged and still documented as out-of-scope.)
* **LOW — installer INI-shape validation widened.** §20.4 / §21.4
already validated `--temperature`, `--top-p`, `--frequency-penalty`, `--ctx-size` etc.
via `require_numeric` to defeat heredoc injection. Today
extended the integer roster (`--service-port`, `--threads`,
`--threads-batch`, `--ngl`) and added a new `require_no_injection`
helper that rejects `\n` / `\r` / `=` / `[` / `]` in the
non-numeric knobs (`--service-host`, `--alias`, `--webui-title`,
`--cache-type-k`, `--cache-type-v`). Same operator-typo /
hostile-CI threat model as §20.4.
Full narrative in [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §22.
Rebuild to pick up the fixes — no INI, CLI, or library API changes.
### 2026-05-10 — CLI "thinking" label: static dark gray, no shimmer sweep
The CLI's prompt-eval indicator no longer animates. While the server
is ingesting the prompt the spinner shows a steady `thinking[ N%]`
in 256-colour grayscale 244 (mid-gray, RGB 128/128/128) — bright
enough to read on a dark terminal, dim enough to clearly signal "in
progress, not the model's output." Replaces the 10 Hz spotlight
sweep that landed in `d7e7202`. Drops the dual-cadence heartbeat —
the heartbeat now runs at one cadence (250 ms) and skips its
repaint entirely while the thinking label is up; only
`set_thinking_pct()` (driven by the server's `easyai.prompt_progress`
SSE event) triggers a redraw when the % suffix changes.
### 2026-05-09 — `python3` tool result rendered with the executed snippet
The tool result returned by `python3` now opens with a fenced
```python ...``` block carrying the snippet that just ran, followed
by a `[python3 executed]` notification line, then the exit code and
captured output. Chat UIs that render markdown (the embedded webui,
typical clients) display the code with syntax highlighting, so an
operator skimming the conversation transcript can see what executed
without having to expand the raw tool-call JSON.
The model's `code` argument is what gets rendered — the
`kPythonSandboxPreamble` (the disk-restriction monkey-patch) is
deliberately stripped from the displayed source so the transcript
isn't cluttered with the same 25 lines on every call.
Result shape:
```
```python
```
[python3 executed]
exit=0
```
Spawn-side errors (pipe / fork failure — the interpreter never
ran) still surface unwrapped, so the error message stays the
actual cause and isn't dressed up with a misleading "executed"
notice.
### 2026-05-09 — METRICS line: always on, default every 5 minutes
The periodic METRICS log line in `easyai-server` is now emitted
**unconditionally** — no longer gated on `--verbose`. Operators
need the CPU / mem / GPU / TCP-state / TIME_WAIT-pressure telemetry
in journalctl whether or not they're chasing a debug session.
* `metrics_interval` default raised from `1` second to `300`
seconds (5 minutes). Low-overhead enough to leave on permanently
in production; bump **down** (60, 30, 5) when actively
troubleshooting.
* The systemd installer's `easyai.ini` template was bumped from
`metrics_interval = 60` to `metrics_interval = 300` to match.
* `--verbose` no longer claims the METRICS line in its description
or banner — only the request-level `→` / `←` lines remain
verbose-only.
Existing operators who pinned `[SERVER] metrics_interval` in their
INI keep their value; only the unspecified default shifts.
### 2026-05-09 — `python3` is default-on with a sandboxed disk surface
Promoting `python3` from explicit-opt-in (--allow-python) to
auto-on whenever the operator has signalled "the model can touch
files" — same gate as `fs`: --sandbox set OR --allow-bash on. The
embedded webui inherits this for free since the systemd unit ships
with --sandbox /var/lib/easyai/workspace.
* **`--allow-python` removed; `--no-python` is the new opt-out.**
Mirrors `--no-web` / `--no-datetime`: the tool defaults on and
operators who don't want it pass the `--no-*` flag (or set
`[SERVER] allow_python = off` in the INI).
* **Disk access auto-restricted to the sandbox root.** Every
snippet is auto-prefixed with a short Python preamble that
monkey-patches `builtins.open`, `io.open`, and `os.open` to
reject any path resolving outside the cwd Python was chdir'd
into. `open("/etc/passwd")` raises `PermissionError`;
`pathlib.Path("/etc/hostname").read_text()` raises through
`pathlib`'s internal `open()` call.
* **Description rewritten to forbid disk use.** "USE FOR: testing,
calculation, data processing, networking, information gathering.
NEVER USE FOR DISK — every disk operation has a fs(action=...)
equivalent." The preamble is defense-in-depth; the description
is the primary contract.
* **Defense-in-depth, not a real sandbox.** The model can still
escape via `import ctypes; ctypes.CDLL("libc.so.6").open(...)`,
`subprocess.run(["cat", "/etc/passwd"])`, or `os.system(...)` —
the protection is against accident, not adversarial intent. Same
threat model as `bash`: explicit operator opt-in, not a real
sandbox.
### 2026-05-09 — `python3` tool: isolated Python 3 snippet runner
A second shell-class executor alongside `bash`, gated by its own
`--allow-python` flag (off by default — same threat model as bash).
The model gets one extra tool when enabled:
* `python3(code, timeout_sec?)` — runs the snippet via
`python3 -I -S -E -c `. Isolated mode: no `PYTHON*` env vars,
no `site.py` / no .pth files / no site-packages, no cwd on
`sys.path`. The standard library is available; `import requests`
fails with `ModuleNotFoundError`, by design — predictable behaviour
regardless of host Python configuration.
* Same hardening as `bash`: cwd pinned to `--sandbox`, fds 3+ closed
before exec, SIGTERM/SIGKILL deadline, 50 KB / 2000-line
stdout+stderr cap, optional operator-facing live mirror via
`--no-show-python` to opt out (default ON when `--allow-python`
is on).
* Internally, `bash` and `python3` now share one `run_capped_subprocess`
helper — the fork/fd-close/chdir/drain/wait machinery only lives in
one place.
When to reach for `python3` vs `bash`: data manipulation (JSON, regex,
Decimal math, statistics, date arithmetic) is one Python snippet; shell
pipelines / build runners / git / package managers stay in `bash`.
`--allow-python` flag is wired through every binary (`easyai-cli`,
`easyai-local`, `easyai-server`, `easyai-mcp-server`) plus the INI
`[SERVER] allow_python` key. `EASYAI-*.tools` manifests cannot shadow
the new `python3` reserved name.
### 2026-05-09 — One tool per concept: unified `web`, unified `fs`, RAG `--split-rag` removed
A consolidation pass on the built-in tool surface. Three loose
collections (web, filesystem, rag) collapsed to one tool each, all
shaped the same way — single `Tool` with an `action` parameter and a
flat schema (every parameter optional except `action`). Pattern
mirrors the rag dispatcher introduced 2026-05-04.
* **`web` tool** — `web(action="search"|"fetch")`. Replaces the
separate `search_web`, `fetch_web`, and `web_google` tools. Search
takes an `engine` parameter (`"auto"` default — cascades through
google → brave → ddg-lite → bing → ddg, returning the first that
succeeds; explicit picks: `"google"` opt-in via `--use-google` plus
the GOOGLE_API_KEY / GOOGLE_CSE_ID env vars, `"brave"` keyless HTML
scrape with the best understanding of niche named entities,
`"ddg-lite"` keyless no-JS DDG endpoint with a Netscape UA (page 1
only — bypasses the anti-bot wall the modern DDG endpoint applies),
`"bing"` keyless RSS feed, `"ddg"` keyless HTML scrape but
increasingly blocked from server IPs). Both actions take `page` for
pagination; `fetch` takes `start` + `limit` for byte-window control.
* **`fs` tool** — `fs(action="read"|"write"|"list"|"glob"|"grep"|"check_path"|"cwd"|"sandbox")`.
Replaces seven separate factories plus `get_current_dir` and
`get_sandbox_path`. `--allow-fs` now registers one tool, not seven.
* **`--split-rag` removed.** The legacy seven `rag_*` tools and the
`--split-rag` flag are gone everywhere — CLI, INI, examples, all
four binaries. The single `rag(action=...)` dispatcher (default
since 2026-05-04) is the only RAG layout. On-disk format unchanged.
* **Public-API breakage.** Anyone consuming `libeasyai` directly: the
individual `easyai::tools::search_web()` / `fetch_web()` /
`web_google()` / `fs_read_file()` / `fs_write_file()` / `fs_list_dir()`
/ `glob_file()` / `grep_file()` / `check_path_file()` / `get_current_dir()`
/ `get_sandbox_path()` factories are removed. Switch to
`easyai::tools::web(google_enabled)`,
`easyai::tools::fs(root)`, and
`easyai::tools::knowledge_split_tools(root)`.
* **Why.** Three matching surfaces with the same shape make the
catalogue smaller (one entry per capability instead of nine), tool
prose can use one consolidated description style across all three,
and the model reasons about each capability as ONE thing with sub-
actions. The flat-schema-with-runtime-validation choice is the
same one the unified rag tool already validated against weak /
1-bit-quant tool callers.
### 2026-05-08 — Server observability + connection-pool fix + prompt cleanup
Driven by a real production failure: an agentic session hung mid-stream,
the cli retried six times, and we had no visibility into what the
TCP stack was doing on the server. Fixes landed across the cli's
HTTP transport, the server's verbose logging, the system prompts,
and the build.
* **Cli keep-alive bug fixed (the actual root cause).**
`stream_chat()` / `simple_get()` / `simple_post()` were each
constructing a fresh `httplib::Client` per call. The Client's
TCP socket dropped at function end, so `set_keep_alive(true)` had
nothing to keep alive — every agentic hop opened a new connection.
An N-tool-call session piled up N sockets in `TIME_WAIT`,
eventually exhausting the client's ephemeral port range or
per-process fd ceiling. **Hoisted a single persistent `httplib::Client`
onto the `Impl` struct; all three call sites now reuse it.** ONE
TCP connection per session instead of N. Cancellation and
server-restart paths are preserved (cpp-httplib reconnects
internally on dead-socket errors).
* **Server: HTTP-level `→` / `←` log per request (verbose mode).**
`set_pre_routing_handler` + `set_logger` emit arrival and
completion lines with method/path/peer/body size, status,
duration, response bytes (or `streamed` for SSE), and running
totals (req / err / tools / in_flight / bytes_in / bytes_out).
* **Server: periodic `METRICS` line with TCP state breakdown.**
Background ticker every `metrics_interval` seconds
(`--metrics-interval N` or `[SERVER] metrics_interval` to tune,
`0` disables — **default raised to 300 / always-on as of
2026-05-09**, see entry above) emits one
line with: CPU% + iowait%, load 1/5/15, process RSS + peak,
system memory total/used/%, AMD GTT used/total/% (Linux + AMD
only), in-flight requests, cumulative requests / errors / bytes,
fd usage vs RLIMIT_NOFILE, AND an explicit TCP state breakdown
(ESTABLISHED / TIME_WAIT / CLOSE_WAIT / FIN_WAIT / LISTEN)
parsed from `/proc/net/tcp{,6}` with
`TIME_WAIT N/M ephemeral ports (X.X% [elevated|HIGH|CRITICAL])`
so socket exhaustion shows up in `journalctl` long before
connections start failing. Linux-only for the deep metrics;
macOS prints `n/a` and the server runs fine — easyai-server's
deploy target is Linux.
* **Tool dispatch timing in every visible log.** Engine wraps
`tool->handler()` with `steady_clock` and writes `duration_ms`
into `ToolResult`. CLI shows `🔧 search_web (412ms)({"query":...})`
and the webui's reasoning panel shows the same. The
`easyai.tool_result` SSE event also gains a `duration_ms` field
so future external SSE consumers can render their own timing UI.
* **`allow_fs = off` in the INI is now honoured.** The server read
the flag but never propagated it to the toolbelt — a non-empty
`[SERVER] sandbox` re-enabled `*_file` regardless. Default install
ships `allow_fs = off` + `sandbox = /var/lib/easyai/workspace`,
which hit exactly this. Now `allow_fs` and `allow_bash` are
honoured independently of `sandbox`. **Behaviour change:**
`--sandbox /foo` alone NO LONGER implies `--allow-fs`; pass
`--allow-fs` explicitly to register *_file.
* **Built-in system prompt is tool-aware.** The hardcoded prompt
used to list `*_file` / `bash` / `plan` / host-metric tools by name
whether or not they were registered. Models hallucinated calls to
unregistered tools (especially `bash` after the `allow_fs` fix
above). The `Tool notes:` section is now built dynamically:
each bullet is gated on the same flag that controls registration,
and the entries for tools the server NEVER registers (`plan`,
host metrics) are removed entirely. Same fix in
easyai-local's built-in prompt.
* **RAG tool descriptions spell out "model-only store".** Added a
`PRIVATE — MODEL-ONLY STORE` paragraph to `knowledge_save` /
`knowledge_append`, telling the model that the user has no UI /
command / API to read what's saved there. Forbids `"check the
knowledge for the code"` / `"I saved it to memory"` answers and
tells the model to `knowledge_load` and put the body inline when
the user asks for stored content.
* **Stay-in-scope replaces "PROTOTYPE FIRST".** The old 1./2./3.
ritual ("build → verify → ASK which next step") was making the
agent stop after step 1 and ask, even when the user wanted the
simplest end-to-end thing. Collapsed to a single
`## Stay strictly in scope` paragraph that keeps the no-extras /
no-defensive-scaffolding / no-while-I'm-at-it-cleanups specifics
and drops the build-then-ask dance. Updated everywhere the
wording lived: server.cpp built-in prompt, local.cpp built-in
prompt, cli.cpp [guidance] block, installer's
`/etc/easyai/system.txt` template.
* **Installer GTT default 28 → 29 GiB.** `gtt_gb=29` in
`scripts/install_easyai_server.sh`. Matches `ttm.pages_limit=7602176`.
Leaves headroom for a Q5_K_M / MXFP4_MOE 30B MoE plus a 32k KV
cache fully on the iGPU.
* **Quick-start editor section added to `LINUX_SERVER.md`.** New
section 0 with copy-paste shell snippets for VSCode + Continue.dev,
OpenCode, and VSCode + Cline, all pointing at `http://ai.local:80/v1`.
Plus a quick-reference table for other OpenAI-compatible clients.
* **No patches or derivatives of llama.cpp.** A short-lived
experiment subclassed `httplib::Server` to log per-TCP-connection
accept/close events — that needed widening the access on a
private virtual in the vendored cpp-httplib header. Backed out
entirely: no CMake patch script, no `#define private protected`
trick, no derivative copies. The HTTP `→`/`←` lines and the
periodic METRICS line (with system-wide TCP state breakdown
including TIME_WAIT pressure) cover the same diagnostic ground
using only public APIs and `/proc`.
### 2026-05-08 — `tool_lookup` builtin + tool-discipline rule
Builds on the same-day "Built-in system prompt is tool-aware" work
above with a complementary affordance: the model gets a runtime
introspection tool so it can verify what's wired up before
dispatching, and an authoritative discipline rule that points at
that tool. Driven by the same failure mode the prompt-cleanup
addressed (`write` / `read` / `ls` etc. invented by the model);
this layer makes the closure explicit and gives the model a
recovery path when it's uncertain.
* **New `tool_lookup` builtin.** Read-only introspection over the
agent's live tool registry. Call it with no args to get a numbered
catalogue of every registered tool (1..N), or pass
`name=""` to filter — case-insensitive, partial match.
Output is plain numbered text the model parses naturally; only
active tools are returned. Wired into every binary
(`easyai-cli`, `easyai-server`, `easyai-mcp-server`, `easyai-local`,
`easyai-agent`, `easyai-recipes`) and the `LocalBackend` library
wrapper. Always registered last so its snapshot covers every
other tool, including itself. Public C++ API:
`easyai::tools::tool_lookup(getter)` where `getter` is a callable
returning `std::vector>` of
(name, description) pairs.
* **Authoritative `[tools]` / "Tool discipline" prompt block.**
Layered on top of the closed-set rule from the prompt-cleanup
commit: *"This catalogue is the SINGLE SOURCE OF TRUTH; training
data is NOT; if a name isn't in this list IT DOES NOT EXIST;
call `tool_lookup` first when uncertain; do not retry an
unknown-tool call."* Common hallucinated names called out by
example: `write`, `read`, `ls`, `cat`, `curl`, `python`, `sed`,
`grep`, `find`, `mkdir`. Same wording in `easyai-cli` (the
`[tools]` block injected into the dynamic prefix), `easyai-server`
and `easyai-local` (the `## Tool discipline` section in their
`kBuiltinSystem` strings).
### 2026-05-08 — Fifth-pass security hardening (no behaviour change)
A fresh static review of the ~5,000 lines that landed in the last 30
commits. Two HIGH, three MEDIUM, two LOW findings — all closed in
this commit; every public interface (CLI flags, tool names, library
headers, INI keys) is unchanged.
* **bash live-mirror is now control-byte stripped and byte-capped.**
When the model calls `bash`, the merged stdout/stderr was being
mirrored verbatim to the operator's terminal. A model could emit
`\e]0;HACKED\a` to retitle the operator's window or `\e[2J` to wipe
the screen — neither showed up in the model-facing tool result.
Now: ESC is rendered as a visible `^[`, all other C0 controls are
dropped, and the mirror channel is capped at 128 KiB (model still
gets the full 32 KiB it always did). Set `[cli] show_bash = false`
or `--no-show-bash` to silence the mirror entirely.
* **`plan` tool render strips control bytes from item text.** Same
hijack class, narrower budget — a `plan add` with embedded `\e[…`
no longer reaches the operator's terminal raw.
* **`get_array` parser now caps stringified-array recursion depth.**
Tool-args parsing tolerates `"items": "[…]"` (the array escaped
into a JSON string — small models double-escape sometimes). The
unwrap path was recursive without a depth cap; a hostile model
emitting deeply-nested escapes blew the stack. Capped at depth 4
(legitimate cases stay under depth 2).
* **`get_sandbox_path` now uses `fs::weakly_canonical`.** Was using
`realpath()` with a "fall back to the unresolved input" branch
that could leak relative-path shape into the model on transient
errors. Cosmetic but correct; matches the canonicalisation the
sandbox containment check uses.
* **`--mcp ` rejects non-`http(s)://` schemes up front.** The
libcurl protocol filter still blocks `file://`, `gopher://` etc.
at transport time, but the operator now gets a clear error
instead of a curl diagnostic, and embedders using
`easyai::mcp::fetch_remote_tools` get the same defence-in-depth.
* **Installer validates numeric sampling/timeout flags.**
`--temperature`, `--top-p`, `--top-k`, `--min-p`,
`--repeat-penalty`, `--frequency-penalty`, `--max-tokens`, `--http-timeout`, `--ctx-size`
must match `^-?[0-9]+(\.[0-9]+)?$` before they flow into the INI
via heredoc. Closes a defence-in-depth gap where a crafted value
containing `\n` could inject extra INI keys.
* **`/etc/easyai/easyai.ini.bak` (created by `--force`) gets
explicit `chmod 640` and `chown root:easyai`.** Previously
inherited whatever the live INI had; matches the new file's
posture so a token leak via a backup with looser perms is
impossible.
Full write-up: [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §0 (operator
TL;DR) and §20 (this pass's findings). Read §0 if you operate easyai
in production — it's the 60-second summary of what easyai does and
doesn't protect for you.
### 2026-05-05 — Tool surface + system prompt overhaul
Driven by a production "models drift, use bash for file work, ignore
tools" report. The fix landed across the tool descriptions, the
default prompts, and the CLI flag wiring at once.
* **`--sandbox` and `--allow-bash` now imply `*_file`.** The previous
matrix had operators passing `--allow-bash --sandbox DIR` and ending
up with bash but no file tools — so the model fell back to
`cat > file` / `cat < file`, `cat < file`, `mkdir`, `sed -i`) with the dedicated tool that
replaces each. Reserves bash for shell features the dedicated
tools don't have — pipelines, `find | xargs`, build runners
(make / cmake / cargo / npm), git, package managers, sed/awk for
in-place edits.
* **System prompts inject `[environment]` + `[guidance]`.** When
any create/mutate affordance is registered (*_file / bash / plan),
the cli prepends two short blocks to the user's `--system` content:
the absolute sandbox path (saves a "where am I" tool hop on turn 1)
and a stay-in-scope behavioral rule (build EXACTLY what the user
asked — no extras, no defensive scaffolding, no "while I'm at it"
cleanups). The same guidance lives in the server's Deep persona
and easyai-local's built-in prompt.
* **Default sampling preset → `precise`** (was `balanced`).
Temp 0.2, top_p 0.92, top_k 50, min_p 0.03. Tuned for code,
math, and factual Q&A — the dominant use case for a tool-calling
agent. Flipped across server, local, cli, webui, library
fallbacks, and the systemd installer's INI templates. README's
preset table now includes a Behaviour column and a "Pick when…"
column to make the choice explicit.
* **`--show-system-prompt`** added to all four binaries
(`easyai-cli`, `easyai-server`, `easyai-local`, `easyai-chat`).
Resolves the system prompt the binary would actually use (built-in
default → `--system-file` → `--system`, plus the cli's injected
blocks), prints, exits. No model load, no port bind, no network.
Useful for confirming the persona before bouncing a service.
* **Graceful `Ctrl-C` in `easyai-cli`.** In interactive mode (no
`--quiet`), the first `Ctrl-C` mid-turn prints
`` and lets the in-flight chat finish naturally
(rc=0). Conversation isn't truncated mid-stream. Second `Ctrl-C`
is the hard-cancel escape hatch (rc=130). `--quiet` keeps the
existing immediate-cancel for batch scripts.
* **Plan tool tolerance shims.** `args::get_array` now accepts a
stringified JSON array (`"items": "[...]"`) — small/quantised
models repeatedly emit this shape. The handler infers a missing
`action` from the items' fields plus current plan state, and
maps common synonyms (`create` → `add`, `remove` → `delete`,
etc.). `add` honours an optional per-item `status` so create +
mark "working" lands in one call. Errors include the correct
shape inline so the model can copy-fix.
* **Plan re-renders coalesce.** A new `Plan::Batch` RAII guard
collapses N per-item `on_change` callbacks across one tool call
into a single fire — the UI's "── plan ──" block now prints once
per batch, not once per item.
* **New doc: [`easyai-cli.md`](easyai-cli.md)** mirrors
`easyai-server.md`. 14 sections covering connection, modes, full
flag reference, tool registration, system prompt + injection,
sampling, reasoning streams, the raw transaction log, RAG,
external tools, management subcommands, worked examples,
cross-references.
* **Tool authoring guide.** New `design.md §5 Writing tool
descriptions reliably` (architectural) and `manual.md §3.2.1`
(cookbook) document the rag-style multi-action pattern, the
per-`.param()` "Used by add / update / …" idiom, and the
lenient-handler tolerance shims. `AI_TOOLS.md` Chapter 9 has a
pointer.
### 2026-05-04 — Single-tool RAG is now the default; concise system prompt
* **Default RAG layout flipped: one `rag(action=...)` tool.** The
unified single-tool dispatcher used to be opt-in behind
`--experimental-rag`; it is now the default for every binary
(`easyai-server`, `easyai-cli`, `easyai-local`, `easyai-mcp-server`).
One catalog entry instead of seven keeps the model's tool list
short and saves a few hundred tokens per turn. On-disk format,
locking, and fix-memory rules are unchanged.
* **`--split-rag` opts back into the legacy seven `rag_*` tools.**
Replaces `--experimental-rag`. Same semantics, opposite default.
Wired as a CLI flag on every binary AND as `[SERVER] split_rag`
in the INI overlay (`easyai.ini` / `easyai-mcp.ini`; per-model
overrides via `[MODEL_]` sections). Useful for
weak / 1-bit-quant tool callers (Bonsai-class) that handle many
flat schemas more reliably than one discriminated schema.
* **Default system prompts trimmed.** `easyai-server` and
`easyai-local` now ship a much shorter built-in prompt focused on
a tight **plan → act → iterate** loop with one small concrete
next step at a time, finishing as soon as the answer is useful so
the user has room to refine. Cuts about three quarters of the old
prompt's length while keeping the no-announce-without-call rule
and the search → fetch discipline.
### 2026-05-02 (later) — RAG `knowledge_append` + user-focus prompts
* **`knowledge_append` — new RAG tool.** Adds new content to the end
of an existing memory without losing the previous body. Read-modify-
write under one `unique_lock` on the store's `shared_mutex`, so
concurrent appenders queue cleanly (no lost appendix, no torn
merge for any reader); on disk the new content is separated from
the old by a Markdown horizontal rule (`---`) so the operator
reading the `.md` file sees exactly where each appendix starts.
Refuses on entries that don't exist (use `knowledge_save`), on
fixed memories (`fix-*`), and when the merged size would exceed
256 KiB. Optional `keywords[]` parameter merges into the existing
keyword list (deduped, capped at 8). Wired into every consumer
(server, MCP server, CLI, local backend). Full doc:
[`RAG.md`](RAG.md) §4.
* **User-focus prompt update.** `knowledge_save` and
`knowledge_append` tool descriptions now explicitly tell the model
to prioritise notes about the user themselves — name, role,
hardware, projects, working style, corrections, likes, dislikes —
and to grow that memory across sessions with `knowledge_append`
instead of rewriting it with `knowledge_save`. The next
conversation (tomorrow, three months from now) starts with the
user already known, so they don't have to explain themselves
twice. The lib ships the canonical seven knowledge tools
(`knowledge_save`, `knowledge_append`, `search_knowledge`,
`knowledge_load`, `knowledge_list`, `knowledge_delete`,
`keywords_knowledge`); all CLI help text, help comments, and docs
updated to match.
### 2026-05-02 — Fourth-pass security audit + readability batch
* **`/tmp` log file hardened (security, MEDIUM).** The auto-generated
raw transaction log at `/tmp/easyai--.log` is now
created with `O_EXCL | O_NOFOLLOW | O_CLOEXEC` and mode `0600`. The
predictable path used to follow symlinks on `fopen("w")`, so a
local attacker on a multi-tenant host could plant a symlink
pointing at any user-writable file (`~/.bashrc`, `~/.ssh/…`) and
have the next `easyai-*` process truncate-and-overwrite it.
Mode `0644` (process umask) also leaked prompts — which can
contain API keys or PII — to other accounts on the same box.
`O_EXCL` makes the create atomic-or-fail and `0600` keeps logs
private. Caller-supplied paths (`--log-file PATH`) keep `O_TRUNC`
for log rotation but still gain `O_NOFOLLOW + 0600`. Full
write-up in [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §19.
* **Internal readability batch (no public API change).** Three
inline patterns were lifted into named helpers so the call sites
read top-to-bottom: `file_mtime_unix()` (replaces three copies of
the C++17 file_clock→system_clock idiom in `rag_tools.cpp`),
`glob_to_regex()` + `kGlobRegexMetachars` (lifts the wildcard
state machine out of `glob_file` in `builtin_tools.cpp`), and
`looks_like_announce_phrase()` (lifts the 30-line retry predicate
out of `Engine::chat_continue` in `engine.cpp`, where it was
used twice). All seven binaries build clean.
### 2026-05-01 — MCP CLIENT, RAG memory framing, web_google, macOS installer fix
* **`easyai-server` is now also an MCP client.** Pass `--mcp `
(and `--mcp-token ` if needed) and at startup the server
connects to the upstream's `/mcp`, runs `tools/list`, and merges
the catalogue into its own. Each remote tool's handler proxies
`tools/call` over HTTP. Local tool names win on collision. The
implementation is `easyai::mcp::fetch_remote_tools()` in libeasyai
— public API, so anything built on the engine library can stack
remote MCP catalogues. See [`MCP.md`](MCP.md) §9.5.
* **`--no-tools` renamed to `--no-local-tools` (server only).** Now
that the server can be both an MCP server AND an MCP client, the
flag's scope had to be unambiguous: it disables only the LOCAL
built-in toolbelt. RAG, external tools, and tools fetched via
`--mcp` are unaffected. INI key `load_tools` → `local_tools` to
match. The `easyai-local` and `easyai-mcp-server` binaries keep
their `--no-tools` spelling — they have no MCP client, so the
original name is still accurate.
* **RAG reframed as memory + fixed memories.** Tool descriptions
rewritten in memory verbs (search / store / recall / update /
forget). New `fix=true` argument on `knowledge_save` mints an
immutable memory: keywords are auto-prefixed with `fix-`, and from
then on `knowledge_save` refuses to overwrite it and
`knowledge_delete` refuses to
remove it. Use this to seed system designs, hard rules, ground-
truth definitions the model must not rewrite. Search / load /
list output gain a human-readable `modified` date and a `[FIXED]`
/ `fixed: yes/no` marker. See [`RAG.md`](RAG.md).
* **Single-tool RAG dispatcher is the default.** One
`rag(action=...)` tool exposes save / append / search / load /
list / delete / keywords as sub-actions. Same store, same
handlers, same on-disk format. Saves a few hundred catalog tokens
per turn and keeps the model's tool list short. Pass `--split-rag`
(or `[SERVER] split_rag = on` in the INI) to opt back into the
legacy seven separate `rag_*` tools — useful for weak / 1-bit-
quant tool callers (Bonsai-class) that handle many flat schemas
more reliably than one discriminated schema.
* **`web_google` builtin.** Google Custom Search JSON API. Gated by
`--use-google` (also `[SERVER] use_google`). Reads
`GOOGLE_API_KEY` and `GOOGLE_CSE_ID` from env at call time so a
rotation doesn't drop the tool. Free tier is 100 queries/day.
* **macOS installer fix: OpenSSL via brew.** Modern macOS no longer
ships usable libssl in `/usr/lib`, so `find_package(OpenSSL)`
half-detected and broke configure for both `easyai_cli` and the
vendored `cpp-httplib`. The installer + `build_macos.sh` now pass
`-DOPENSSL_ROOT_DIR=$(brew --prefix openssl@3)` and the cmake
guards `TARGET OpenSSL::SSL` so a half-detected OpenSSL degrades
to "HTTPS not in this build" instead of erroring out.
### 2026-04-30 — `easyai-mcp-server` (standalone MCP provider)
* **New binary `easyai-mcp-server`.** Same tool catalogue as
`easyai-server` (built-ins + RAG + operator-defined external-tools)
exposed over `POST /mcp` with **no GGUF model loaded** — designed
for high-concurrency multi-client deployments. Configurable
cpp-httplib worker pool (`--threads`, default 256) and a separate
in-flight `tools/call` cap (`--max-concurrent-calls`, default 256)
that returns 503 + `Retry-After` on saturation instead of unbounded
queueing. Full doc: [`easyai-mcp-server.md`](easyai-mcp-server.md).
* **RAG concurrency upgrade.** `RagStore::mu` is now
`std::shared_mutex`; `search_knowledge` / `knowledge_load` /
`knowledge_list` / `keywords_knowledge` take `std::shared_lock` so
parallel readers don't serialise on the write path. Benefits every
consumer of libeasyai — `easyai-server`, `easyai-cli` with
`--RAG`, any third-party program calling
`knowledge_split_tools()`. Atomic-rename writes already
made on-disk reads tear-free; the lock relaxation is safe.
* **Doc restructure.** `INI_KFlags.md` content has moved to the top
of the new [`easyai-server.md`](easyai-server.md) so the chat
server's INI / CLI / API / persona / hardening reference lives in
one file. `LINUX_SERVER.md` is unchanged — it remains the
systemd-installer-specific operator's guide.
### 2026-04-30 — Tunable incomplete-retry budget + live retry visibility
* **`--max-incomplete-retries N` (also `[ENGINE] max_incomplete_retries`).**
Default 10 — how many times the engine discards + nudges + retries
when the model finishes a turn announcing an action ("Let me…",
"I'll…") without actually emitting the tool_call. Bump to 15-20
for weak / 1-bit-quant models (Bonsai-8B-Q1_0 frequently needs
the extra budget); set to 0 to disable retries entirely.
* **Retries now visible in the Thinking panel.** Engine fires a new
`on_incomplete_retry(attempt, max, reason)` callback per retry,
the server pipes it into the SSE `reasoning_content` channel, and
the webui renders `↻ Retry 3/10: model said: "Let me search…" (no
tool_call) — nudging.` while it happens. No more frozen UI for 10
silent retries followed by a blank bubble.
* **Engine warnings always log** (regardless of `--verbose`):
cancellation, thought-only retry, reasoning→content fallback,
incomplete-retry, empty final content. `--verbose` is for raw
per-token / per-hop diagnostic noise; actionable warnings stay on
so operators see them in `journalctl` without flipping a flag.
### 2026-04-30 — Bonsai 8B Q1_0 onboarding + security pass
* **One-shot installers for macOS and Raspberry Pi 4/5.**
`scripts/install_easyai_macos.sh` builds with Metal/AMX, drops the
model, prints the run command. `scripts/install_easyai_pi.sh` does
the full Pi appliance: systemd unit, mDNS so the box answers as
**`pi-ai.local`** on your LAN, port 80 with
`CAP_NET_BIND_SERVICE`. Both clone the **PrismML fork** of
llama.cpp (the only one with the Q1_0 kernel — upstream loads the
GGUF then fails at decode).
* **Security third-pass audit** — 3 HIGH and 7 MEDIUM findings fixed.
The INI overlay used to be silently ignored (every `[ENGINE]` /
`[SERVER]` key was a no-op); `--no-mcp-auth` was disconnected from
the gate; the sandbox could be escaped by a symlink planted via
`bash`. All closed. The `bash` tool now gets the same
fork-hardening as external tools — `PR_SET_PDEATHSIG`, fd
close-loop bounded against `RLIMIT_NOFILE = unlimited`, process-
group kill on timeout. Plus JSON-depth caps on every parser, a
bounded INI parser, mode 0600 on RAG entries, and a
body-size-bounded auth header. See [`SECURITY_AUDIT.md`](SECURITY_AUDIT.md) §18.
* **MCP server.** `easyai-server` is now a Model Context Protocol
provider on `POST /mcp` (protocol 2024-11-05). Claude Desktop,
Cursor, Continue list and dispatch every registered tool — your
built-ins, your RAG, your `--external-tools` manifests — over a
single endpoint. Bearer auth via `[MCP_USER]` in the INI; a
Python stdio bridge ships at `scripts/mcp-stdio-bridge.py` for
Claude Desktop. See [`MCP.md`](MCP.md).
* **Single INI config — `/etc/easyai/easyai.ini`.** Every CLI flag
has an INI key (FlagDef table refactor); precedence is CLI > INI
> hardcoded default. Edit the file, `systemctl restart`, done.
Full reference in [`easyai-server.md`](easyai-server.md) §1.
* **RAG: persistent memory.** Seven keyword-only knowledge tools
(`knowledge_save`, `knowledge_append`, `search_knowledge`,
`knowledge_load`, `knowledge_list`, `knowledge_delete`,
`keywords_knowledge`).
Multi-keyword search (first keyword required, rest rank by overlap)
+ pagination. One Markdown file per entry — operator-readable,
hand-editable. See [`RAG.md`](RAG.md).
### 2026-04-29 — External tools v2
* **Operator-defined tool packs** via `EASYAI-.tools` JSON
manifests dropped in `/etc/easyai/external-tools/`. Per-file
fault isolation, sanity warnings (shell-wrapper detection,
world-writable binaries, `LD_*` env passthrough), full
`fork`+`execve` hardening — never a shell. Give the model
focused powers without flipping `--allow-bash`. See
[`EXTERNAL_TOOLS.md`](EXTERNAL_TOOLS.md).
* **`get_current_dir` builtin** — the model can ask where it is,
so relative paths in `bash` / `*_file` calls land where you expect.
* **Cancel-on-disconnect on the server** — closing the browser
tab actually stops the decode loop. No more zombie generation
eating tokens after the user walked away.
* **Tolerant tool output** — non-UTF-8 bytes in tool results no
longer abort the SSE stream; the bytes get a U+FFFD substitute
and the stream stays alive.
---
## All options at a glance
Every CLI flag, INI key, and library setter the project ships
today, in tables. Skim once to learn the surface; come back when
you want to tune something specific. Deeper reference is linked
per row.
This repo builds seven binaries. Two are production daemons
(`easyai-server`, `easyai-mcp-server`), two are user CLIs
(`easyai-cli`, `easyai-local`), three are example apps the lib
ships to demonstrate the API (`easyai-chat`, `easyai-agent`,
`easyai-recipes`).
### `easyai-server` — chat HTTP server (also speaks MCP)
Full reference: [`easyai-server.md`](easyai-server.md).
INI defaults under `/etc/easyai/easyai.ini` — every flag below
has a matching INI key (see [`easyai-server.md`](easyai-server.md) §1).
| Flag | Default | What it does |
|---|---|---|
| `-m, --model PATH` | (required) | GGUF model file. |
| `--config PATH` | `/etc/easyai/easyai.ini` | Central INI; CLI > INI > hardcoded. |
| `--host ADDR` | `127.0.0.1` | Bind address (`0.0.0.0` = any iface). |
| `--port N` | `8080` | TCP port. |
| `--max-body N` | 8 MiB | Cap on request body. |
| `-s, --system-file PATH` | — | Default system prompt, from file. |
| `--system TEXT` | — | Default system prompt, inline. |
| `--no-local-tools` | off | Don't expose the local built-in toolbelt. |
| `--mcp URL` | — | Connect upstream MCP server as client; merge catalogue. |
| `--mcp-token TOK` | — | Bearer for `--mcp`. |
| `--no-mcp-auth` | off | Force `/mcp` open even with `[MCP_USER]` populated. |
| `--http-retries N` | 5 | Extra attempts on transient HTTP failures (MCP client + web tools). 0 disables. Logged on stderr. |
| `--http-timeout SECONDS` | 600 | Read/write timeout for the listen socket AND the MCP-client connection. Bumped from llama-server's 60 s default to accommodate long thinking turns. |
| `--sandbox DIR` | server cwd | Root for `fs` / `bash` / `python3` / external `$SANDBOX`. |
| `--allow-fs` | off | Register the unified `fs` tool (action=read / write / list / glob / grep / check_path / cwd / sandbox). |
| `--allow-bash` | off | Register `bash` (NOT a hardened sandbox). |
| `--no-python` | python3 on | Drop the `python3` tool. By default it's auto-registered alongside `fs` whenever `--sandbox` is set or `--allow-bash` is on. Stdlib-only interpreter; disk access auto-restricted to the sandbox root. |
| `--use-google` | off | Enable engine=`"google"` inside the unified `web` tool (needs `GOOGLE_API_KEY` + `GOOGLE_CSE_ID`). |
| `--external-tools DIR` | — | Load every `EASYAI-*.tools` manifest in `DIR`. |
| `--memory DIR` | — | Enable persistent memory: registers seven keyword-only knowledge tools (`knowledge_save`, `knowledge_append`, `search_knowledge`, `knowledge_load`, `knowledge_list`, `knowledge_delete`, `keywords_knowledge`) — a passive RAG technique. `--RAG` is still accepted as a back-compat alias. |
| `--preset NAME` | `precise` | Ambient sampling preset. See [Sampling presets](#sampling-presets) for what each implies. |
| `--temperature F` | per preset | Override temperature (0.0–2.0). |
| `--top-p F` | per preset | Nucleus sampling p. |
| `--top-k N` | per preset | Top-k cutoff. |
| `--min-p F` | per preset | Min-p threshold. |
| `--repeat-penalty F` | 1.04 | Repetition penalty (multiplicative on recent logits) — anti-loop safety net for thinking models that lock into rephrasing their own intent. `--repeat-penalty 1.0` disables. |
| `--frequency-penalty F` | 0.05 | Frequency penalty (additive, scales with count of each token already generated, OpenAI semantics, `[0.0, 2.0]`). Discourages verbatim repetition proportionally to how often a token has already appeared. |
| `--presence-penalty F` | 0.1 | Presence penalty (additive, fixed cost per token-already-seen, OpenAI semantics, `[-2.0, 2.0]`). Discourages topic stickiness without penalising literal tool-name repetition; pairs well with `--repeat-penalty 1.0` on long agentic flows. See [`design.md` §4b](design.md#4b-sampling-and-the-penalty-stack). |
| `--max-tokens N` | 12288 | Cap tokens per request. |
| `--seed U32` | random | RNG seed (0 = random). |
| `--max-incomplete-retries N` | 10 | Retry budget for "announce-only" turns; 0 disables. |
| `-c, --ctx N` | 262144 (binary) / 1048576 (installer INI) | Context size. The systemd installer writes `[ENGINE] context = 1048576` paired with YaRN ×4 over a 128K base; per-model `[MODEL_*]` profiles override it. |
| `--batch N` | = ctx | Logical batch size. |
| `--ngl N` | 99 | GPU layers (0 = CPU only). |
| `--split-mode, -sm MODE` | `none` | Multi-GPU split strategy: `none`, `layer`, `row`, `tensor`. |
| `--rope-scaling MODE` | `yarn` | RoPE scaling method: `none`, `linear`, `yarn`. |
| `--rope-scale F` | 2 | RoPE frequency scale factor. |
| `--yarn-orig-ctx N` | 131072 | YaRN original context size for scaling. |
| `-t, --threads N` | hw cores | CPU threads. |
| `-ctk, --cache-type-k TYPE` | `f16` | K-cache dtype (`f32`,`f16`,`bf16`,`q8_0`,`q4_0`,`q4_1`,`q5_0`,`q5_1`,`iq4_nl`). |
| `-ctv, --cache-type-v TYPE` | `f16` | V-cache dtype (same set). |
| `-nkvo, --no-kv-offload` | off | Keep KV cache on CPU even with GPU layers. |
| `--kv-unified` | off | Single unified KV buffer across sequences. |
| `--override-kv K=T:V` | — | GGUF metadata override (`int`,`float`,`bool`,`str`); repeatable. |
| `-a, --alias NAME` | `easyai` | Public model id reported by `/v1/models`. |
| `--api-key KEY` | — | Require Bearer auth on every `/v1` route. |
| `-fa, --flash-attn` | auto | Force flash attention on. |
| `-tb, --threads-batch N` | = threads | Threads for prompt-eval batches. |
| `-np, --parallel N` | 1 | Compat-only; warns when >1. |
| `--mlock` | off | mlock model weights into RAM. |
| `--no-mmap` | off | Disable mmap (read GGUF into RAM). |
| `--numa STRATEGY` | off | `distribute`,`isolate`,`numactl`,`mirror`. |
| `--metrics` | off | Expose Prometheus `/metrics`. |
| `--reasoning on\|off` | on | Enable model thinking. |
| `--no-think` | off | Strip `…` from replies. |
| `--inject-datetime on\|off` | on | Append authoritative date/time to system prompt. |
| `--knowledge-cutoff YYYY-MM` | `2024-10` | Cutoff hint used by `--inject-datetime`. |
| `-v, --verbose` | off | Engine logs raw model output + parser actions. |
| `--webui MODE` | `modern` | `modern` (embedded SvelteKit) or `minimal` (inline). |
| `--webui-title TEXT` | `Box EasyAI` | Browser tab + sidebar brand. |
| `--webui-icon PATH` | — | Favicon (`.ico`,`.png`,`.svg`,`.gif`,`.jpg`,`.webp`). |
| `--webui-placeholder S` | `Type a message…` | Input box placeholder. |
### `easyai-mcp-server` — standalone MCP provider (no model)
Same tool catalogue as `easyai-server` but no GGUF loaded —
designed for high-concurrency multi-client deployments. Full
reference: [`easyai-mcp-server.md`](easyai-mcp-server.md).
| Flag | Default | What it does |
|---|---|---|
| `--config PATH` | `/etc/easyai/easyai-mcp.ini` | Central INI. |
| `--host ADDR` | `127.0.0.1` | Bind address. |
| `--port N` | `8089` | TCP port. |
| `-n, --name ID` | `easyai-mcp` | Server identity on `/health` + MCP `initialize`. |
| `--max-body N` | 1 MiB | Cap on request body. |
| `-t, --threads N` | 256 | cpp-httplib worker pool. |
| `--max-concurrent-calls N` | 256 | In-flight `tools/call` cap (503 on saturation). |
| `--sandbox DIR` | cwd | Root for `*_file` / `bash` / `$SANDBOX`. |
| `--allow-fs` | off | Register `*_file` tools. |
| `--allow-bash` | off | Register `bash`. |
| `--no-tools` | off | Skip the built-in toolbelt entirely. |
| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |
| `--memory DIR` | — | Enable the seven `knowledge_*` tools (alias `--RAG`). |
| `--api-key TOK` | — | Bearer required for `/health`, `/metrics`, `/v1/tools`. |
| `--no-mcp-auth` | off | Force `/mcp` open. |
| `--metrics` | off | Enable Prometheus `/metrics`. |
| `-v, --verbose` | off | Log every dispatch to stderr. |
### `easyai-cli` — interactive remote CLI
Talks to any OpenAI-compatible endpoint (our `easyai-server`,
upstream `llama-server`, OpenAI itself, etc.). Interactive terminal
runs open a full-screen chat **TUI** (opencode-style look & feel —
markdown rendering, live per-tool rows with diff views, todo
checklists, `/`-command and `@`-file completion, `opencode` /
`opencode-light` themes, `esc esc` interrupt); `--plain` (or
`[cli] tui = off`) keeps the legacy line REPL, and every non-TTY /
one-shot / `--quiet` path falls back automatically.
| Flag | Default | What it does |
|---|---|---|
| `--url URL` | `$EASYAI_URL` | OpenAI-compat endpoint. |
| `--api-key KEY` | `$EASYAI_API_KEY` | Bearer auth. |
| `--model NAME` | `$EASYAI_MODEL` | Request body `model` field. |
| `--timeout SECONDS` | 86400 (24h) | Read+write timeout — sized for multi-hour agentic sessions. Only fires on TRUE silence (every SSE delta resets it). `EASYAI_TIMEOUT` env also accepted. |
| `--http-retries N` | 5 | Extra attempts on transient HTTP failures (connect refused, read timeout, 5xx). 0 disables. Logged on stderr without `--verbose`. `EASYAI_HTTP_RETRIES` env also accepted. |
| `--insecure-tls` | off | Skip peer cert check (DEV ONLY). |
| `--ca-cert PATH` | system | Custom CA bundle (PEM). |
| `--system TEXT` | — | Inline system prompt. |
| `--system-file PATH` | — | System prompt from file. |
| `--temperature F` | server | Sampling temperature. |
| `--top-p F` | server | Nucleus top-p. |
| `--top-k N` | server | Top-k cutoff. |
| `--min-p F` | server | min-p (llama-server / easyai). |
| `--repeat-penalty F` | 1.04 | Repetition penalty — anti-loop default; pass 1.0 to disable. |
| `--frequency-penalty F` | server | Frequency penalty (OpenAI standard, `[0.0, 2.0]`). |
| `--presence-penalty F` | server | Presence penalty (OpenAI standard, `[-2.0, 2.0]`). |
| `--seed N` | random | Deterministic sampling seed. |
| `--max-tokens N` | server | Cap reply length. |
| `--stop SEQ` | — | Add a stop string (repeatable). |
| `--extra-json '{…}'` | — | Free-form JSON merged into the request body. |
| `--tools LIST` | datetime,plan,web,system_* | Comma list of locally-registered tools. |
| `--sandbox DIR` | — | Enable the unified `fs` tool (action=read/write/list/glob/grep/check_path/cwd/sandbox) scoped to `DIR`. |
| `--allow-bash` | off | Register `bash` (uses `--sandbox` as cwd, else current dir). |
| `--no-python` | python3 on | Drop the auto-registered `python3` tool (default-on whenever `--sandbox` or `--allow-bash` is set). |
| `--use-google` | off | Enable engine=`"google"` inside the unified `web` tool. |
| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |
| `--memory DIR` | — | Enable persistent memory (seven `knowledge_*` tools; alias `--RAG`). |
| `--tools-mode MODE` | `split` | How `fs` / `web` are exposed. Default `split` (since 2026-05-15): one focused tool per action — `read_file`, `edit_file`, …, `search_web`, `fetch_web`. Knowledge tools are always split (seven separate tools). `unified` registers the legacy single dispatcher per `fs`/`web` family with `action=`. `both` registers both surfaces. INI: `[cli] tools_mode`. |
| `--no-plan` | off | Don't auto-register the planning tool. |
| `-p, --prompt TEXT` | (REPL) | One-shot prompt; without it you get a REPL. |
| `--no-reasoning` | shown | Hide `delta.reasoning_content`. |
| `--max-reasoning N` | 0 (off) | Abort SSE when accumulated reasoning > N chars. |
| `--no-retry-on-incomplete` | retry on | Disable auto-retry-with-nudge. |
| `--verbose` | off | Log HTTP+SSE traffic to stderr (stderr only — no file). |
| `-q, --quiet` | off | Disable spinner glyph + ctx-fill gauge. |
| `--log-file PATH` | off | Opt in to a raw transaction log at PATH (mode 0600). Implies `--verbose`. No `/tmp` file is created by default. |
| `--continue` | off | Load `.easyai_session` from cwd before the first prompt. Default OFF (since 2026-05-13): without this flag any existing session file is ignored and overwritten on the first turn. Session is always saved per turn regardless. INI: `[cli] auto_continue`. |
| `--no-continue` | — | Explicit form of the default — ignore any existing `.easyai_session` and overwrite on the first turn. Useful to override `[cli] auto_continue = on` set in INI. |
| `--compress` | off | Ask the model for a lossless recap, replace history with it, save. No-op without `--continue` (nothing in memory to recap). Also `/compress` mid-REPL. INI: `[cli] auto_compress`. |
| `--list-tools` | — | Print local tools (no chat). |
| `--list-remote-tools` | — | `GET /v1/tools` (no chat). |
| `--list-models` | — | `GET /v1/models`. |
| `--health` | — | `GET /health`. |
| `--props` | — | `GET /props`. |
| `--metrics` | — | `GET /metrics` (Prometheus text). |
| `--set-preset NAME` | — | `POST /v1/preset {preset:NAME}`. |
### `easyai-local` — local-engine REPL
Loads a GGUF model in-process (no server). For remote endpoints
use `easyai-cli`.
| Flag | Default | What it does |
|---|---|---|
| `-m, --model PATH` | (required) | GGUF file. |
| `-p, --prompt TEXT` | (REPL) | One-shot: run prompt, print, exit. |
| `-s, --system-file PATH` | — | System prompt from file. |
| `--system TEXT` | — | Inline system prompt. |
| `--preset NAME` | `precise` | Initial preset. See [Sampling presets](#sampling-presets). |
| `--no-think` | off | Strip `…` from output. |
| `-q, --quiet` | off | Disable spinner glyph + ctx-fill gauge. |
| `--temperature F` | per preset | Override temperature. |
| `--top-p F` | per preset | top-p. |
| `--top-k N` | per preset | top-k. |
| `--min-p F` | per preset | min-p. |
| `--repeat-penalty F` | 1.04 | Repetition penalty — anti-loop default; pass 1.0 to disable. |
| `--frequency-penalty F` | 0.05 | Frequency penalty (`[0.0, 2.0]`). |
| `--presence-penalty F` | 0.1 | Presence penalty (`[-2.0, 2.0]`). |
| `--max-tokens N` | 12288 | Cap tokens per turn. |
| `--seed U32` | random | RNG seed. |
| `-c, --ctx N` | 262144 | Context size. |
| `--batch N` | = ctx | Logical batch size. |
| `--ngl N` | 99 | GPU layers. |
| `--split-mode, -sm MODE` | `none` | Multi-GPU split strategy: `none`, `layer`, `row`, `tensor`. |
| `--rope-scaling MODE` | `yarn` | RoPE scaling method: `none`, `linear`, `yarn`. |
| `--rope-scale F` | 2 | RoPE frequency scale factor. |
| `--yarn-orig-ctx N` | 131072 | YaRN original context size for scaling. |
| `-t, --threads N` | hw cores | CPU threads. |
| `--no-tools` | off | Skip the built-in toolbelt. |
| `--sandbox DIR` | — | Enable the unified `fs` tool scoped to `DIR`. |
| `--allow-bash` | off | Register `bash`. |
| `--no-python` | python3 on | Drop the auto-registered `python3` tool. |
| `--external-tools DIR` | — | Load `EASYAI-*.tools` manifests. |
| `--memory DIR` | — | Enable persistent memory (alias `--RAG`). |
| `-ctk, --cache-type-k TYPE` | `f16` | K-cache dtype. |
| `-ctv, --cache-type-v TYPE` | `f16` | V-cache dtype. |
| `-nkvo, --no-kv-offload` | off | Keep KV cache on CPU. |
| `--kv-unified` | off | Single unified KV buffer. |
| `--override-kv K=T:V` | — | GGUF metadata override (repeatable). |
### Example apps (lib API demos)
Three small binaries under `services/` show the lib API in
context. They take minimal flags — the real config happens in
the C++ source as fluent setter chains. Read these as the
canonical "how do I use the lib?" answer.
| Binary | Min flags | Purpose |
|---|---|---|
| `easyai-chat` | `-m PATH` OR `--url BASE`, `[--system TEXT]` | One-shot chat over Engine OR Client (auto-picks). |
| `easyai-agent` | `-m PATH`, `[-c CTX]`, `[-ngl N]` | Tiny agentic-loop demo with tool registration. |
| `easyai-recipes` | `-m PATH` | Five recipes (chat, persona, REPL, tools, agent loop). |
### Library API — `easyai::Agent`
The 30-second front door. Construct, optionally chain a few
fluent setters, call `ask()`. Header:
[`include/easyai/agent.hpp`](include/easyai/agent.hpp).
| Method | Type | Default | What it does |
|---|---|---|---|
| `Agent(model_path)` | ctor | — | Local model. |
| `Agent::remote(base_url, api_key="")` | static | — | Remote endpoint. |
| `.system(prompt)` | `string` | — | System prompt. |
| `.sandbox(dir)` | `string` | — | Enable `*_file` scoped to `dir`. |
| `.allow_bash(on=true)` | `bool` | off | Register `bash`. |
| `.preset(name)` | `string` | `precise` | Sampling profile. |
| `.remote_model(id)` | `string` | — | Remote model id (remote mode only). |
| `.temperature(t) / .top_p(p) / .top_k(k) / .min_p(p)` | scalar | per preset | Sampling overrides. |
| `.on_token(cb)` | `function` | — | Streaming-token callback. |
| `.ask(text)` | call | — | One-shot turn; runs tool dispatch inline. |
| `.reset()` | call | — | Wipe history. |
| `.last_error()` | accessor | — | Diagnostic. |
| `.backend()` | accessor | — | Escape hatch to the underlying `Backend &`. |
### Library API — `easyai::Engine` (local llama.cpp)
Full local engine. Header:
[`include/easyai/engine.hpp`](include/easyai/engine.hpp).
| Method | Type | Default | What it does |
|---|---|---|---|
| `.model(gguf_path)` | `string` | — | GGUF file. |
| `.context(n) / .batch(n)` | `int` | 262144 / = ctx | KV / logical batch size. |
| `.gpu_layers(n)` | `int` | 99 | 99 = all layers offloaded, 0 = CPU only. |
| `.threads(n) / .threads_batch(n)` | `int` | hw / = threads | CPU threads. |
| `.seed(u32)` | `uint32_t` | random | RNG seed. |
| `.system(prompt)` | `string` | — | System prompt. |
| `.temperature(t) / .top_p(p) / .top_k(k) / .min_p(p)` | scalar | 0.2 / 0.92 / 50 / 0.03 | Sampling. |
| `.repeat_penalty(r)` | `float` | 1.04 | Repetition penalty (multiplicative on recent logits) — anti-loop default. Set to 1.0 to disable. |
| `.frequency_penalty(f)` | `float` | 0.05 | Frequency penalty (additive, scales with count, `[0.0, 2.0]`). |
| `.presence_penalty(p)` | `float` | 0.1 | Presence penalty (additive, fixed cost per token-already-seen, OpenAI semantics, range `[-2.0, 2.0]`). Pairs well with `repeat_penalty=1.0` on long agentic flows. See [`design.md` §4b](design.md#4b-sampling-and-the-penalty-stack). |
| `.max_tokens(n)` | `int` | 12288 | Per-turn cap. |
| `.tool_choice_auto / .tool_choice_required / .tool_choice_none` | call | auto | Tool-choice mode. |
| `.parallel_tool_calls(on)` | `bool` | off | Allow parallel tool calls. |
| `.verbose(on)` | `bool` | off | Engine debug logs. |
| `.max_tool_hops(n)` | `int` | 8 | Agentic-loop cap (bumped to 99999 with `bash`). |
| `.retry_on_incomplete(on)` | `bool` | on | Auto-retry "announce-only" turns. |
| `.max_incomplete_retries(n)` | `int` | 10 | Retry budget; 0 disables. |
| `.stop_at_ctx_pct(pct)` | `int` | 100 | Hard ceiling on context fill; 0 disables. |
| `.cache_type_k(name) / .cache_type_v(name)` | `string` | `f16` | KV-cache dtype. |
| `.no_kv_offload(on) / .kv_unified(on)` | `bool` | off | KV placement / layout. |
| `.add_kv_override(spec)` | `string` | — | GGUF metadata override (repeatable). |
| `.flash_attn(on) / .use_mlock(on) / .use_mmap(on)` | `bool` | auto/off/on | Compute / memory. |
| `.numa(strategy)` | `string` | off | `distribute` / `isolate` / `numactl` / `""`. |
| `.split_mode(mode)` | `string` | `none` | Multi-GPU split: `none`, `layer`, `row`, `tensor`. |
| `.rope_scaling(mode)` | `string` | `yarn` | RoPE scaling: `none`, `linear`, `yarn`. |
| `.rope_freq_scale(f)` | `float` | 2 | RoPE frequency scale factor. |
| `.yarn_orig_ctx(n)` | `int` | 131072 | YaRN original context size. |
| `.enable_thinking(on)` | `bool` | on | Chat-template thinking flag. |
| `.add_tool(t) / .clear_tools()` | call | — | Tool registration. |
| `.on_token(cb) / .on_tool(cb) / .on_hop_reset(cb) / .on_incomplete_retry(cb)` | callback | — | Streaming hooks. |
| `.load() / .reset() / .clear_kv()` | call | — | Lifecycle. |
| `.set_sampling(t,p,k,m)` | call | — | Re-sample mid-conversation. |
| `.push_message(role, content, [tool_name, tool_call_id])` | call | — | Append history without generating. |
| `.replace_history(messages)` | call | — | Full-fidelity history replay. |
| `.chat(text) / .chat_continue() / .generate_one() / .generate()` | call | — | Inference primitives. |
| `.request_cancel() / .clear_cancel() / .cancel_requested()` | call | — | Thread-safe cancel. |
| `.last_error() / .last_was_ctx_full() / .turns() / .tools() / .backend_summary() / .n_ctx() / .model_path() / .perf_data() / .perf_reset()` | accessor | — | Introspection. |
### Library API — `easyai::Client` (remote OpenAI-compat)
Remote counterpart of `Engine`. Tools execute LOCALLY in the
consumer process. Header:
[`include/easyai/client.hpp`](include/easyai/client.hpp).
| Method | Type | Default | What it does |
|---|---|---|---|
| `.endpoint(url)` | `string` | — | `http(s)://host[:port]`. |
| `.api_key(key)` | `string` | — | Bearer token. |
| `.timeout_seconds(s)` | `int` | 86400 (24h) | Connect+read timeout — sized for multi-hour agentic sessions. |
| `.http_retries(n)` | `int` | 5 | Extra attempts on transient HTTP failures (pre-stream only — never retries mid-stream). 0 disables. Each retry logs to stderr. |
| `.verbose(v)` | `bool` | off | Log SSE lines to stderr. |
| `.log_file(fp)` | `FILE*` | — | Tee every HTTP transaction. |
| `.max_reasoning_chars(n)` | `int` | 0 (off) | Abort SSE when reasoning > N chars. |
| `.retry_on_incomplete(v)` | `bool` | on | Auto-retry "announce-only" turns. |
| `.stop_at_ctx_pct(pct)` | `int` | 100 | Bail when server-reported `ctx_used/n_ctx` exceeds. |
| `.max_tool_hops(n)` | `int` | 8 | Agentic-loop cap. |
| `.tls_insecure(v) / .ca_cert_path(path)` | `bool` / `string` | off / system | HTTPS-only TLS knobs. |
| `.model(id)` | `string` | — | Request body `model` field. |
| `.system(prompt)` | `string` | — | System prompt(s). |
| `.temperature(t) / .top_p(v) / .top_k(v) / .min_p(v)` | scalar | server | Sampling. |
| `.repeat_penalty(v)` | float | 1.04 | Repetition penalty — anti-loop default; `1.0` disables. |
| `.frequency_penalty(v) / .presence_penalty(v)` | float | server | OpenAI-shape penalties. |
| `.seed(s)` | `long long` | -1 | -1 = randomise. |
| `.max_tokens(n)` | `int` | server | Cap. |
| `.stop(sequences)` | `vector` | — | Stop strings. |
| `.extra_body_json(raw)` | `string` | — | Free-form JSON merged into request body. |
| `.add_tool(t) / .clear_tools() / .tools()` | call | — | Tool registration. |
| `.on_token(cb) / .on_reason(cb) / .on_tool(cb)` | callback | — | Streaming hooks. |
| `.chat(text) / .chat_continue() / .clear_history()` | call | — | Inference + history. |
| `.list_models / .list_remote_tools / .health / .metrics / .props / .set_preset` | call | — | Direct endpoint helpers. |
| `.request_cancel() / .clear_cancel() / .cancel_requested()` | call | — | Thread-safe cancel. |
| `.last_error() / .last_turn_was_incomplete() / .last_ctx_used() / .last_n_ctx() / .last_ctx_pct() / .last_was_ctx_full()` | accessor | — | Introspection. |
### Library API — `easyai::cli::Toolbelt`
Canonical agent toolset, fluently configured. Replaces the
"copy the same `if (sandbox.empty()) … else …` block five times"
pattern. Header: [`include/easyai/cli.hpp`](include/easyai/cli.hpp).
| Method | Default | What it does |
|---|---|---|
| `.sandbox(dir)` | `""` | Root for the unified `fs` tool (empty = no fs tool). |
| `.allow_fs(on)` | on | Register the unified `fs` tool (off in server unless `--allow-fs`). |
| `.allow_bash(on)` | off | Register `bash` (also bumps `max_tool_hops` to 99999). |
| `.with_plan(plan)` | — | Register the planning tool backed by a `Plan&`. |
| `.no_web(on)` | off | Drop the unified `web` tool. |
| `.no_datetime(on)` | off | Drop `datetime`. |
| `.use_google(on)` | off | Enable engine=`"google"` inside `web` (env vars required at apply-time). |
| `.tools()` | — | Materialise `vector`. |
| `.apply(engine) / .apply(client)` | — | Register on the consumer + bump hops if bash. |
### Sampling — what each knob does
At every step the model emits a probability distribution over the whole
vocabulary (~100k+ tokens). These knobs decide how a token is picked
from it. They work in sequence: the *cutters* (`top_k`, `top_p`,
`min_p`) narrow the candidate pool over the raw distribution, then
`temperature` controls how randomly the final token is drawn from the
survivors.
* **`temperature`** — the focus-vs-risk dial; divides the logits before
softmax. `→ 0` is greedy (always the top token: deterministic, can
repeat). `0.2–0.5` keeps the model tight on format, syntax, and
facts. `1.0` is the model's unmodified distribution. `> 1.0` flattens
the curve so unlikely tokens get a real chance — more varied and
creative, but more prone to error and incoherence. This is the main
*behaviour* dial.
* **`top_k`** — a *fixed* cut of the tail: keep only the K
most-probable tokens, discard the rest. Non-adaptive — it always cuts
at K whether the model is certain or unsure. A cheap guardrail
against ever picking junk from the long tail.
* **`top_p`** (nucleus) — an *adaptive* cut: keep the smallest set of
top tokens whose probabilities sum to P. Adapts to confidence — when
the model is sure (one token at 0.9) the nucleus is tiny; when it's
unsure (mass spread wide) the nucleus is large. Cuts the tail
proportionally.
* **`min_p`** — also adaptive, but anchored to the *top* token instead
of cumulative mass: keep tokens with `prob ≥ min_p × prob_of_top`.
`min_p 0.1` keeps anything within 10× of the best; `min_p 0.5` keeps
only what's within 2× — aggressive, very focused output.
**How they interact.** They stack. Tightening all of them at once (low
`top_k` + low `top_p` + low `temperature`) is redundant — they do the
same job and you over-constrain into robotic output. Practical rule:
pick *one* adaptive cutter (`top_p ~0.9–0.95` **or** `min_p ~0.05–0.1`),
leave `top_k` generous as a cheap backstop, and use `temperature` as
the real behaviour dial.
**How to tune.**
* *Code, agentic / tool-calling, structured output, factual Q&A* — low
`temperature` (0.2–0.6) and a tight tail cut. High temperature on
code means syntax errors, hallucinated APIs, broken tool calls.
* *Creative writing, brainstorming* — higher `temperature` (0.8–1.2),
looser cutters.
* *Heavily quantised models* — be more conservative (lower
`temperature`, tighter cut). Quantisation already adds noise to the
logits; high temperature amplifies that noise into real errors.
The presets below are just curated combinations of these four knobs —
e.g. `precise` (the project default) encodes `temp 0.2, top_p 0.92,
top_k 50, min_p 0.03`.
### Sampling presets
Named profiles applied via `--preset NAME` (binaries) or
`Engine::set_sampling()` / `easyai::find_preset()` (lib). Numbers are
baselines; `