https://github.com/arthurdev44/numen
Terminal AI coding agent in native Rust — Claude Code quality, open to all frontier models (Anthropic, OpenAI, Gemini…). Headless event-only core shared with Paneflow, Ratatui frontend.
https://github.com/arthurdev44/numen
agentic ai-agent claude-code cli coding-agent llm ratatui rust tui
Last synced: 2 days ago
JSON representation
Terminal AI coding agent in native Rust — Claude Code quality, open to all frontier models (Anthropic, OpenAI, Gemini…). Headless event-only core shared with Paneflow, Ratatui frontend.
- Host: GitHub
- URL: https://github.com/arthurdev44/numen
- Owner: ArthurDEV44
- License: gpl-3.0
- Created: 2026-06-15T13:02:55.000Z (13 days ago)
- Default Branch: main
- Last Pushed: 2026-06-16T22:15:08.000Z (12 days ago)
- Last Synced: 2026-06-17T00:12:15.038Z (12 days ago)
- Topics: agentic, ai-agent, claude-code, cli, coding-agent, llm, ratatui, rust, tui
- Language: Rust
- Size: 394 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Roadmap: docs/ROADMAP.md
Awesome Lists containing this project
README
# Numen
**A native, model-agnostic AI coding agent that lives in your terminal.** `numen` opens straight in your shell, streams a model, runs real tools (read, grep, edit, bash), and loops until the work is done, all from a single Rust binary with no Node runtime underneath.
Numen is built around a **headless core** (`agent-core`) that emits only structured events, never ANSI. The terminal UI is just one client. That same core is meant to be embedded in-process by [Paneflow](https://paneflow.dev) (Zed's GPUI) for GPU-accelerated diffs and plan trees later, without forking any agent logic. The provider layer is multi-provider by design (a clean `Provider` trait + an Anthropic-shaped canonical format), so new model backends drop in as isolated adapters.
> *Numen* (Latin): the animating will, the guiding presence. It keeps the soul of *daimon*, a guide-spirit for your codebase.
Status ·
Quickstart ·
Features ·
How it works ·
Usage ·
Roadmap ·
FAQ
The welcome screen: a native terminal agent on your ChatGPT subscription (gpt-5.5 via codex), monochrome, no window.
```console
$ numen
numen · gpt-5.5 · ~/dev/myproject
› refactor the auth module to drop unwrap() in prod paths
⠋ read crates/auth/src/token.rs
⠙ edit crates/auth/src/token.rs (3 hunks)
✓ bash cargo clippy --no-deps · 0 warnings
Replaced 4 unwrap() with ? / ok_or(...). Diff above.
# headless, scriptable, no TUI
$ numen -p "summarize the changes in the last commit"
```
## Status
**Early, single-provider, Linux-first. No packaged releases yet, you build from source.** This is an honest snapshot, not a roadmap fantasy:
- **It runs today.** The agentic loop, the tool suite, the sandbox, JSONL sessions with resume, MCP, and the monochrome TUI all work.
- **One model channel ships so far: your ChatGPT subscription.** GPT / Codex models served through the Codex backend (Responses API, SSE, stateless). The provider layer is written multi-provider from day one, but this is the only adapter wired up. Anthropic, OpenAI BYOK, Gemini, and others are architecture-ready, not yet built. See [Roadmap](#roadmap).
- **Linux is the supported platform.** The filesystem sandbox uses Landlock (Linux kernel) and credentials use the Secret Service keyring. macOS Seatbelt and broader support come later; off-Linux, FS confinement degrades explicitly.
- **The subscription auth is unofficial and revocable.** It reuses the open-source Codex CLI OAuth client, which is ToS-grey and could be cut off at any time (it happened to Anthropic Pro/Max for third-party tools in 2026). Treat it as a convenience, not a foundation. See [Authentication](#authentication).
## Quickstart
You need a Rust toolchain (1.95+) and a Linux desktop session with a Secret Service keyring (GNOME Keyring or KWallet) for credential storage.
```bash
# 1. Build from source
git clone https://github.com/ArthurDEV44/numen.git
cd numen
cargo build --release # produces target/release/numen
# 2. Authenticate with your ChatGPT subscription (OAuth, opens a browser)
cargo run -p agent-auth --example login
# 3. Run it in any project directory
cd ~/dev/myproject
/path/to/numen/target/release/numen
```
Drop `target/release/numen` on your `PATH` (or `cargo install --path crates/agent-cli`) to call `numen` from anywhere.
## Where it fits
Numen overlaps with every terminal coding agent, but the design center is specific: **a native, model-agnostic agent whose core is built to be embedded in a GPU terminal workspace.**
| Tool | Strength | Numen's angle |
|---|---|---|
| Claude Code | Polished Anthropic-native agent, large ecosystem | Model-agnostic core by design; one native Rust binary, no Node runtime |
| Codex CLI | OpenAI-native, strong sandboxing | Reuses the ChatGPT-subscription channel, but ships a headless core meant to embed in Paneflow |
| aider / opencode | Mature multi-model OSS agents | Rust-native, kernel-level FS sandbox (Landlock), shared core with a GPU terminal workspace |
| Paneflow | Runs CLI agents in parallel GPU panes | Numen is the agent; Paneflow is the surface. The plan is to embed `agent-core` in-process, no IPC |
The honest caveat: most rows describe a *direction*. Today Numen ships one provider and one frontend. The bet is in the architecture (headless core + provider trait), not in a checklist.
## Features
Shipped today:
- **Agentic loop**: stream → tool → loop, driven by a compiler-checked transition state machine. The model calls tools, sees results, and continues until it ends the turn.
- **Built-in tool suite**: `read`, `glob`, `grep`, `write`, `edit`, `bash`. Concurrency-safe reads run in parallel; mutations run serially.
- **Fail-closed permissions + taint**: every tool is dangerous until proven otherwise. Tool output is untrusted by default (OWASP LLM01); a destructive or network action in a turn that just read untrusted content is forced to ask, regardless of mode.
- **Execution sandbox**: Landlock confines all writes to the workspace (the agent *and* its `bash` subprocesses); a fail-closed local proxy gates outbound network from tool subprocesses via `--allow `.
- **Persistent sessions**: one append-only JSONL file per conversation under `.numen/sessions/`, with `/resume` to reopen a past session and a workspace-wide prompt history.
- **`/goal` completion loop**: set a session objective and Numen re-runs the agentic loop until it emits a done marker (capped at 25 iterations), persisted in a `.numen/goal` sidecar so it survives restarts.
- **MCP (stdio)**: connect to Model Context Protocol servers from `.mcp.json` or your existing `~/.claude.json`, managed from the `/mcp` submenu.
- **Markdown rendering**: assistant replies are rendered to styled spans (CommonMark + GFM).
- **Monochrome Ratatui UI**: clean, modern, Rauch/Vercel-flavored, with a braille Dyson-sphere logo and a welcome screen, not a double-bordered retro TUI.
- **Headless mode**: `numen -p "..."` runs without the TUI, so the core is scriptable and testable without a terminal or a live API.
- **ChatGPT subscription auth**: OAuth PKCE, refresh-token rotation, credentials in the OS keyring (never in plaintext).
Architecture-ready, not yet built:
- Additional provider adapters (Anthropic, OpenAI BYOK, Gemini) behind the same `Provider` trait
- Paneflow in-process embedding with GPU diffs, plan trees, and hunk review
- Vector memory (`sqlite-vec`), sub-agents, prompt-caching strategy, VCR provider tests
## How it works
Numen is a Cargo workspace. The crates are named `agent-*` internally; the published binary and command are `numen` (see [`docs/DECISIONS.md`](docs/DECISIONS.md), ADR-8).
```
┌──────────────────────────────┐
│ agent-core │
│ loop + state machine + │
│ canonical types (headless) │
│ emits: Stream │
└──────────────┬───────────────┘
│ structured events (never ANSI)
┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ agent-tui │ │ mode -p │ │ Paneflow (GPUI) │
│ Ratatui client │ │ headless print │ │ embed in-process │
│ (terminal) │ │ (text / JSON) │ │ GPU render (next)│
└─────────────────┘ └─────────────────┘ └──────────────────┘
```
The founding invariant: **`agent-core` depends on neither the TUI nor the provider layer** (only on `agent-tokenizer`, which is also headless). I/O is injected through traits, so the loop is testable without network, terminal, or a real model. The provider layer normalizes heterogeneous wire formats into one Anthropic-shaped canonical format, with divergences localized per adapter.
| Crate | Role |
|---|---|
| `agent-core` | Agent loop, transition state machine, canonical message/transcript types (headless) |
| `agent-provider` | `Provider` trait + adapters (reqwest + SSE), canonical `StreamEvent`, error taxonomy |
| `agent-tools` | Tool registry, fail-closed trait, concurrent/serial dispatch, permissions, taint |
| `agent-mcp` | `rmcp`-based MCP client (stdio), config loading, server registry |
| `agent-tui` | Ratatui + crossterm frontend, decoupled from the core via channels |
| `agent-session` | Append-only JSONL persistence, resume, compaction boundaries |
| `agent-sandbox` | Landlock FS confinement + local network allow-list proxy |
| `agent-auth` | Credential storage (keyring), OAuth PKCE, token refresh |
| `agent-tokenizer` | Local token counting (fallback when a provider omits stream usage) |
| `agent-cli` | The `numen` binary, the only crate that wires everything together |
Full detail in [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).
## Authentication
Numen authenticates with your **ChatGPT subscription** (Plus / Pro), not a metered API key. The flow:
```bash
cargo run -p agent-auth --example login
```
This runs OAuth PKCE against `auth.openai.com`, then talks to the ChatGPT backend's Responses API (`chatgpt.com/backend-api/codex/responses`) in stateless SSE mode, so the full transcript is sent each turn and compaction / resume / replay stay intact. Tokens are stored in the OS keyring and refreshed automatically.
**Read this before you rely on it.** The login reuses the OAuth client of the open-source Codex CLI, which effectively impersonates Codex. That is ToS-grey and **revocable**: OpenAI could disable this client at any time, exactly as Anthropic blocked third-party tools from using Pro/Max subscriptions in 2026. Numen treats the subscription as a disposable convenience layer, not a foundation. The day it breaks, adding a BYOK adapter (Chat Completions, Anthropic, ...) is an isolated module, not a rewrite. The model-agnostic architecture is the insurance policy. See [`docs/DECISIONS.md`](docs/DECISIONS.md) (ADR-10, ADR-11) and [`docs/openai-subscription-auth.md`](docs/openai-subscription-auth.md).
## Build from source
### Rust toolchain
Numen uses Rust **1.95+** (edition 2024). Install via [rustup](https://rustup.rs/):
```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
```
### Linux system dependencies
Credential storage uses the Secret Service API, so you need a keyring daemon (GNOME Keyring or KWallet) and its dev headers. TLS uses rustls, so OpenSSL is not required.
**Fedora:**
```bash
sudo dnf install gcc pkgconf-pkg-config libsecret-devel
```
**Debian / Ubuntu:**
```bash
sudo apt install build-essential pkg-config libsecret-1-dev
```
### Build and test
```bash
cargo build --release # binary at target/release/numen
cargo test --workspace # headless core is fully testable without a live API
```
The filesystem sandbox needs a Linux kernel with Landlock (5.13+, ABI improvements through 6.x). Without it, Numen warns and runs unconfined rather than failing closed.
## Usage
```bash
numen # interactive TUI in the current directory
numen -p "" # headless one-shot, prints the answer and exits
numen "" # bare prompt is treated as -p
```
### Flags
| Flag | Effect |
|------|--------|
| `--model ` | Pick the model (default `gpt-5.5`). Also switchable in-session via `/models`. |
| `--allow ` | Add a host to the network allow-list for tool subprocesses (repeatable). |
| `--yes`, `-y` | Auto-approve tool actions (headless / trusted automation). |
| `--no-sandbox` | Disable Landlock FS confinement (writes are no longer confined to the workspace). |
### Slash commands (interactive)
| Command | Action |
|---------|--------|
| `/help` | List available commands |
| `/models` | Switch the active model for this session |
| `/goal` | Set an objective and work until it is reached |
| `/skills` | Insert a skill from `~/.agents/skills` into the message |
| `/providers` | Inspect / configure the auth provider |
| `/mcp` | Manage MCP server connections |
| `/resume` | Reopen a past conversation |
| `/new`, `/clear` | Start fresh (clear the context) |
| `/quit` | Exit |
### Models
The Codex backend accepts a whitelist of versioned slugs it rotates. The generic `gpt-5` slug is rejected; use a versioned one:
`gpt-5.5` (default) · `gpt-5.4` · `gpt-5.4-mini` · `gpt-5.3-codex-spark`
## Files and configuration
Numen reads and writes a few well-known paths:
| Path | Purpose |
|------|---------|
| `/.numen/sessions/*.jsonl` | One append-only transcript per conversation; backs `/resume` |
| `/.numen/goal` | Persistent `/goal` objective (survives restarts) |
| `/.mcp.json` | Workspace MCP servers (highest priority) |
| `~/.claude.json` | Reused `mcpServers` from Claude Code, merged underneath |
| `~/.agents/skills//` | Skills surfaced by `/skills` (one directory per skill) |
These paths outside the workspace (skills, MCP config, keyring) are read **before** the Landlock sandbox is applied, since they would be unreachable afterward.
## Sandbox and security
- **Filesystem**: Landlock confines every write to the workspace, kernel-level, inherited by `bash` subprocesses (the FS sandbox is applied on the main thread before the Tokio runtime is built, so workers inherit it).
- **Network**: a local CONNECT proxy with a fail-closed allow-list. Tool subprocesses get `HTTP(S)_PROXY` pointing at it (`--allow` opens specific hosts). The agent's own provider calls go direct.
- **Prompt injection (OWASP LLM01)**: tool output is tainted as untrusted and the taint propagates; a destructive or network action following recent taint is forced to ask.
- **Fail-closed tools**: a tool that declares nothing is treated as non-concurrent, mutating, and untrusted.
- **Credentials**: OAuth tokens live in the OS keyring, never in plaintext on disk.
## Roadmap
The MVP target was deliberately narrow: **make Numen excellent with one model channel (the ChatGPT subscription) and dogfood it daily**, rather than ship six empty provider columns. The multi-provider architecture is the invariant; adapters land incrementally.
- **Now (shipped)**: agentic loop, tool suite + sandbox, sessions + resume, `/goal`, MCP stdio, monochrome TUI, ChatGPT subscription provider.
- **Next**: more provider adapters behind the existing `Provider` trait, MCP tools wired into the model loop, Paneflow in-process embedding (GPU diffs, plan trees, hunk review).
- **Later**: vector memory (`sqlite-vec`), sub-agents, macOS Seatbelt hardening, VCR provider tests in CI.
Phases and the de-risking spikes live in [`docs/ROADMAP.md`](docs/ROADMAP.md).
## Documentation
| Document | Contents |
|---|---|
| [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) | Crate workspace, agent loop, tool pipeline, compaction, sessions, sandbox |
| [`docs/PROVIDERS.md`](docs/PROVIDERS.md) | Provider trait, canonical format, per-provider divergences, retry and caching |
| [`docs/DECISIONS.md`](docs/DECISIONS.md) | Architecture decision records (language, frontend, naming, providers, risks) |
| [`docs/ROADMAP.md`](docs/ROADMAP.md) | Phases, de-risking spikes, sequencing |
| [`docs/openai-subscription-auth.md`](docs/openai-subscription-auth.md) | ChatGPT subscription auth mechanics |
## FAQ
**Why Rust and not TypeScript like Claude Code?**
The whole point is sharing a core with Paneflow, which is GPUI (Rust). A Rust core can be embedded in-process with shared types and no IPC. A TS core hits an FFI wall. The accepted cost is slower solo dev velocity; the mitigation is a tight scope and decoupled, testable crates.
**Why does it only work with a ChatGPT subscription today?**
Sequencing, not abandonment. The author orchestrates agents all day and wanted *his* model working perfectly first, then to add other providers over time. The `Provider` trait keeps the door open; the canonical format and retry/auth layers already exist.
**Is this allowed? Will my account get banned?**
The subscription login reuses the open-source Codex CLI's OAuth client, which is unofficial and ToS-grey. It works for personal use and is revocable. If that risk is unacceptable to you, wait for the BYOK adapters.
**Can I use my Anthropic Max subscription?**
No. Anthropic blocks third-party tools from authenticating with Pro/Max subscriptions. That is precisely why Numen is model-agnostic by design and does not depend on any single subscription channel.
**Is this a Claude Code wrapper?**
No. It is an independent Rust implementation inspired by Claude Code's internal architecture (headless loop, transcript-before-response, compaction cascade, fail-closed tools), not a shell over it.
**Is it ready for daily use?**
It is dogfooded daily by its author on Linux, but it is early: one provider, no releases, Linux-first, and a revocable auth channel. Build from source and expect rough edges.
**Why GPL-3.0?**
Numen is free and open source by design, and copyleft keeps it that way: improvements to the agent stay in the commons, and the shared core cannot be forked into a closed product.
**How does it relate to Paneflow?**
Paneflow runs CLI agents in parallel GPU panes; Numen is one such agent. The deeper plan is for Paneflow to embed `agent-core` in-process and render its events natively (GPU diffs, plan trees). That embedding is future work; the decoupling that makes it possible exists today.
## License
[GPL-3.0-or-later](LICENSE). Numen is free and open source by design, and copyleft keeps it that way: improvements stay in the commons.