{"id":48872843,"url":"https://github.com/valraw-all/ntk","last_synced_at":"2026-04-15T23:00:47.584Z","repository":{"id":350714768,"uuid":"1207941518","full_name":"VALRAW-ALL/ntk","owner":"VALRAW-ALL","description":"Neural Token Killer — semantic compression proxy for Claude Code (Rust)","archived":false,"fork":false,"pushed_at":"2026-04-11T17:40:01.000Z","size":222,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-11T19:16:03.440Z","etag":null,"topics":["claude","claude-code","cli","compression","developer-tools","llm","neural-token-killer","ntk","ollama","phi3","rust","tokenizer"],"latest_commit_sha":null,"homepage":"https://ntk.valraw.com","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/VALRAW-ALL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":"NOTICE","maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-11T15:59:02.000Z","updated_at":"2026-04-11T17:40:09.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/VALRAW-ALL/ntk","commit_stats":null,"previous_names":["valraw-all/ntk"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/VALRAW-ALL/ntk","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VALRAW-ALL%2Fntk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VALRAW-ALL%2Fntk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VALRAW-ALL%2Fntk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VALRAW-ALL%2Fntk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/VALRAW-ALL","download_url":"https://codeload.github.com/VALRAW-ALL/ntk/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/VALRAW-ALL%2Fntk/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31863499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-15T15:24:51.572Z","status":"ssl_error","status_checked_at":"2026-04-15T15:24:39.138Z","response_time":63,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude","claude-code","cli","compression","developer-tools","llm","neural-token-killer","ntk","ollama","phi3","rust","tokenizer"],"created_at":"2026-04-15T23:00:21.026Z","updated_at":"2026-04-15T23:00:47.571Z","avatar_url":"https://github.com/VALRAW-ALL.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NTK - Neural Token Killer\n\n\u003e **v0.2.28** — Semantic compression proxy for Claude Code. Reduces tool output token count by 60–90% before it reaches the model context - without losing the information that matters.\n\n---\n\n## Table of Contents\n\n- [What it does](#what-it-does)\n- [How it works](#how-it-works)\n- [Requirements](#requirements)\n- [Installation](#installation)\n- [Usage](#usage)\n- [Configuration](#configuration)\n- [GPU Acceleration](#gpu-acceleration)\n- [RTK + NTK Coexistence](#rtk--ntk-coexistence)\n- [Development](#development)\n- [Privacy Policy](#privacy-policy)\n- [License](#license)\n- [Third-Party Licenses](#third-party-licenses)\n\n---\n\n## What it does\n\nEvery time Claude Code runs a Bash command, the output is fed back into the model context. Long outputs from `cargo test`, `tsc`, Docker logs, or `git diff` can consume hundreds or thousands of tokens - slowing down responses and eating through context budgets.\n\nNTK intercepts those outputs via the `PostToolUse` hook, compresses them semantically, and returns a compact version that preserves all errors, warnings, and actionable information. Claude sees less noise, responds faster, and stays focused longer.\n\n**Typical savings:**\n\n| Output type | Example | Savings |\n|---|---|---|\n| `cargo test` (many passing) | 47 ok + 1 failure | ~85% |\n| `tsc` errors | 16 errors in 7 files | ~5% (already dense) |\n| Docker logs | Repeated warnings | ~70% |\n| Generic command output | Mixed | ~60% |\n\n---\n\n## How it works\n\nNTK runs as a local daemon (`127.0.0.1:8765`) and processes output through three layers:\n\n```\nBash tool output\n  → PostToolUse hook (ntk-hook.sh / ntk-hook.ps1)\n  → HTTP POST /compress  (:8765)\n    → Layer 1: Fast Filter       (\u003c1ms)   - ANSI removal, line deduplication, blank-line collapse\n    → Layer 2: Tokenizer-Aware   (\u003c5ms)   - BPE path shortening, prefix consolidation (cl100k_base)\n    → Layer 3: Local Inference   (opt.)   - Ollama/Phi-3 Mini; only activates above token threshold\n  → Compressed output → Claude Code context\n```\n\n**Layer 3 activates only when** token count after L1+L2 exceeds `inference_threshold_tokens` (default: 300). Small outputs like `git status` pass through at sub-millisecond latency.\n\nIf the daemon is unreachable, the hook falls back gracefully to the original output - NTK never blocks a command.\n\n---\n\n## Requirements\n\n| Requirement | Version |\n|---|---|\n| Rust | 1.75+ (2021 edition) |\n| Cargo | bundled with Rust |\n| OS | Windows 10+, macOS 12+, Linux (glibc 2.31+) |\n\n**Optional (for Layer 3 inference):**\n\n| Requirement | Notes |\n|---|---|\n| [Ollama](https://ollama.com) | Recommended. Manages model download and GPU offloading. |\n| NVIDIA GPU (CUDA) | RTX series recommended; tested on RTX 3060+. Detected via `nvidia-smi`. |\n| AMD GPU | Detected via `rocm-smi`, Windows driver registry, or Linux sysfs — covers Polaris/Vega/RDNA even without ROCm. Inference uses `llama-server` with Vulkan. |\n| Apple Silicon (Metal) | M1 and later |\n\n---\n\n## Installation\n\n### Option 1 - From source (recommended while in pre-release)\n\n```bash\n# Clone and build\ngit clone https://github.com/VALRAW-ALL/ntk\ncd ntk\ncargo build --release\n\n# Install binary to PATH\ncargo install --path .\n\n# Register the PostToolUse hook in Claude Code\nntk init -g\n\n# Configure the Layer 3 backend (separate step — see below)\nntk model setup\n```\n\n### Option 2 - Shell installer (Unix)\n\n```bash\ncurl -fsSL https://ntk.valraw.com/install.sh | bash\n```\n\nThe installer enumerates every discrete GPU on the machine (NVIDIA and\nAMD, any number and any vendor mix) and asks which release variant to\ndownload: **NVIDIA** (`-gpu` CUDA build), **AMD** (`-cpu` build + guidance\nto set up a Vulkan llama-server), or **CPU only** (`-cpu` build). When\npiped non-interactively, the choice is made automatically from detection\nand can be overridden with `NTK_INSTALL_PLATFORM=nvidia|amd|cpu`.\n\n### Option 3 - PowerShell installer (Windows)\n\n```powershell\nirm https://ntk.valraw.com/install.ps1 | iex\n```\n\nSame logic as the Unix installer. Override with\n`$env:NTK_INSTALL_PLATFORM = 'nvidia' | 'amd' | 'cpu'` for unattended runs.\n\n### What `ntk init -g` does\n\n1. Copies the hook script to `~/.ntk/bin/` (`ntk-hook.sh` on Unix, `ntk-hook.ps1` on Windows)\n2. Patches `~/.claude/settings.json` to register the `PostToolUse` hook (idempotent - safe to run multiple times)\n3. Creates `~/.ntk/config.json` with sensible defaults\n\nThat's it. `ntk init` configures NTK itself — nothing more. Model backend\nchoice, Ollama / llama-server installation, and GPU selection all live\nunder `ntk model setup` (see [Model management](#model-management-layer-3))\nand can be re-run at any time.\n\n```json\n// ~/.claude/settings.json  (added by ntk init -g)\n{\n  \"hooks\": {\n    \"PostToolUse\": [{\n      \"matcher\": \"Bash\",\n      \"hooks\": [{ \"type\": \"command\", \"command\": \"~/.ntk/bin/ntk-hook.sh\" }]\n    }]\n  }\n}\n```\n\n**For OpenCode instead of Claude Code:**\n\n```bash\nntk init -g --opencode\n```\n\n**Verify the installation:**\n\n```bash\nntk init --show\n```\n\n**Remove the hook:**\n\n```bash\nntk init --uninstall\n```\n\n---\n\n## Usage\n\n### Daemon lifecycle\n\n```bash\nntk start           # Start daemon on 127.0.0.1:8765  (opens live TUI dashboard)\n                    # If daemon is already running: attaches to the live TUI dashboard\nntk start --gpu     # Start with GPU inference enabled\nntk stop            # Stop the daemon\nntk status          # Show daemon status, loaded model, GPU backend\nntk dashboard       # Combined status + session gain + ASCII bar chart (plain text, non-interactive)\n```\n\n**Live dashboard** - `ntk start` opens a full-screen TUI that updates every 500 ms. If the daemon is already running in the background, `ntk start` detects it and **attaches** to the live TUI without restarting the daemon. Press **Ctrl+C** to exit the TUI - the daemon keeps running:\n\n```\n┌─────────────────────────────────────────────────────────────┐\n│ ██╗  ██╗████████╗██╗  ██╗                                   │\n│ ████╗ ██║╚══██╔══╝██║ ██╔╝   Neural Token Killer            │\n│ ██╔██╗██║   ██║   █████╔╝    v0.2  •  127.0.0.1:8765       │\n│ ██║╚████║   ██║   ██╔═██╗    Uptime: 3m 12s                 │\n│ ██║ ╚███║   ██║   ██║  ██╗   Backend: candle [GPU] phi3:mini│\n│ ╚═╝  ╚══╝   ╚═╝   ╚═╝  ╚═╝                                 │\n├─────────────────── SESSION METRICS ─────────────────────────┤\n│  Compressions: 47     Tokens In: 84,291  →  Out: 12,048     │\n│  Saved: 72,243 tokens  •  Avg ratio: 85%                    │\n│                                                             │\n│  L1  ████████████████░░░░  38 runs                          │\n│  L2  ██████░░░░░░░░░░░░░░   7 runs                          │\n│  L3  ██░░░░░░░░░░░░░░░░░░   2 runs                          │\n├─────────────────── RECENT COMMANDS ─────────────────────────┤\n│  10:14:22  cargo test              1,842  →  312    L2  83% saved │\n│  10:14:08  git diff HEAD~1           940  →  188    L2  80% saved │\n│  10:13:51  docker logs api         3,200  →  412    L2  87% saved │\n└─────────────────────────────────────────────────────────────┘\n```\n\nPress **Ctrl+C** in the **attached** TUI to exit the dashboard without stopping the daemon. Press **Ctrl+C** when you started the daemon (first `ntk start`) to stop it gracefully. When stdout is not a TTY (piped or CI), `ntk start` falls back to a single status line.\n\n**Static dashboard** - `ntk dashboard` prints a combined snapshot to stdout and exits immediately (no event loop, always safe to use in scripts or CI):\n\n```\n● NTK daemon  running  127.0.0.1:8765  up 3m 22s  candle [GPU] phi3:mini q5_k_m\n  14382 tokens saved across 47 compressions (78% avg ratio)\n\n┌─ NTK · Token Savings ──────────────────────────────────────────────────────┐\n│                                                                              │\n│  cargo     ████████████████████████████████████████  41823 tok  58%         │\n│  git       ████████████████████                      21204 tok  29%         │\n│  docker    ████████                                   9101 tok  13%         │\n│                                                                              │\n│  47 compressions · 72128 tokens saved · 78% avg                             │\n└──────────────────────────────────────────────────────────────────────────────┘\n```\n\n### Metrics and history\n\n```bash\nntk gain            # Token savings summary (RTK-compatible format)\nntk metrics         # Per-command savings table (requires daemon running)\nntk graph           # ASCII bar chart of savings over time\nntk history         # Last 50 compressed commands with token counts\nntk discover        # Scan latest Claude transcript for missed compression opportunities\n```\n\n**Example `ntk gain` output:**\n\n```\nNTK: 14382 tokens saved across 47 compressions (78% avg)\n```\n\n**Example `ntk history` output:**\n\n```\nCOMMAND                 TYPE      BEFORE     AFTER  RATIO  LAYER  TIME\n--------------------------------------------------------------------------------\ncargo test              test        1842       312   83%   L2     2026-04-11 10:00\ngit diff HEAD~1         diff         940       188   80%   L2     2026-04-11 10:01\ndocker logs api         log         3200       412   87%   L2     2026-04-11 10:02\n```\n\n### Testing compression\n\n```bash\n# Test compression on any file without the daemon\nntk test-compress tests/fixtures/cargo_test_output.txt\n```\n\nOutput:\n\n```\nFile:             tests/fixtures/cargo_test_output.txt\nOriginal tokens:  512\nL1 lines removed: 46\nAfter L2 tokens:  76\nCompression:      85.2%\n\n--- Compressed output ---\n...\n```\n\n### Terminal output\n\nAll NTK commands emit colored, animated output when connected to a TTY. Colors are disabled automatically when:\n\n- stdout is redirected to a file or pipe\n- The `NO_COLOR` environment variable is set (respects the [no-color.org](https://no-color.org) convention)\n\nCommands that perform inference show a real-time progress animation:\n\n```\n⠹ Running inference …           12.3s  [4821 chars]\n```\n\n`ntk model bench` shows per-payload progress with elapsed time updating every 250ms while inference runs, followed by a colored results table where compression ratio and latency are color-coded (green → yellow → red by severity).\n\n### Model management (Layer 3)\n\n```bash\n# Interactive backend + hardware setup wizard.\n# Run this after `ntk init` (or anytime you want to change backend / GPU).\n# Detects Ollama, every NVIDIA / AMD GPU on the system, and installs\n# Ollama on demand when you pick that backend.\nntk model setup\n\n# Download the default model via Ollama\nntk model pull\n\n# Download a specific quantization\nntk model pull --quant q4_k_m   # faster, less RAM\nntk model pull --quant q6_k     # better quality, more RAM\n\n# Test inference latency and output quality\nntk model test\n\n# Test with verbose debug output:\n#   hardware config, thread counts, mlock status, system prompt preview,\n#   timing breakdown, and performance analysis with CPU-tier-aware targets\n#   (mobile/low-power ≥5 tok/s, desktop ≥10, high-end ≥15, GPU ≥40)\nntk model test --debug\n\n# Benchmark CPU vs GPU\nntk model bench\n\n# List available models in the configured backend\nntk model list\n```\n\n### Layer testing and benchmarks\n\n```bash\n# Run correctness tests on all compression layers (no daemon required)\nntk test\n\n# Include Layer 3 inference in the test run\nntk test --l3\n\n# Benchmark all compression layers (default: 5 runs per payload)\nntk bench\n\n# More runs for stable measurements\nntk bench --runs 20\n\n# Include Layer 3 in benchmark\nntk bench --l3\n```\n\n### Configuration\n\n```bash\n# Show active config\nntk config\n\n# Show config from a specific file\nntk config --file /path/to/.ntk.json\n```\n\n---\n\n## Configuration\n\nNTK merges configuration from two sources, in order:\n\n1. `~/.ntk/config.json` - global defaults\n2. `.ntk.json` in the current project directory - per-project overrides\n\n**Full reference (`~/.ntk/config.json`):**\n\n```json\n{\n  \"daemon\": {\n    \"port\": 8765,\n    \"host\": \"127.0.0.1\",\n    \"auto_start\": true,\n    \"log_level\": \"warn\"\n  },\n  \"compression\": {\n    \"enabled\": true,\n    \"layer1_enabled\": true,\n    \"layer2_enabled\": true,\n    \"layer3_enabled\": true,\n    \"inference_threshold_tokens\": 300,\n    \"context_aware\": true,\n    \"max_output_tokens\": 500,\n    \"preserve_first_stacktrace\": true,\n    \"preserve_error_counts\": true\n  },\n  \"model\": {\n    \"provider\": \"ollama\",\n    \"model_name\": \"phi3:mini\",\n    \"quantization\": \"q5_k_m\",\n    \"ollama_url\": \"http://localhost:11434\",\n    \"timeout_ms\": 300000,\n    \"fallback_to_layer1_on_timeout\": true,\n    \"gpu_layers\": -1,\n    \"gpu_auto_detect\": true,\n    \"gpu_vendor\": null,\n    \"cuda_device\": 0\n  },\n  \"metrics\": {\n    \"enabled\": true,\n    \"storage_path\": \"~/.ntk/metrics.db\",\n    \"history_days\": 30\n  },\n  \"exclusions\": {\n    \"commands\": [\"cat\", \"echo\", \"printf\"],\n    \"max_input_chars\": 500000\n  },\n  \"telemetry\": {\n    \"enabled\": true\n  }\n}\n```\n\n**Key settings:**\n\n| Setting | Default | Description |\n|---|---|---|\n| `compression.inference_threshold_tokens` | `300` | Layer 3 only activates above this token count |\n| `compression.context_aware` | `true` | Layer 4 — when the hook forwards `transcript_path`, NTK extracts the user's most recent request and prepends it to the L3 prompt so the summary focuses on relevant info. Disable for pre-v0.2.27 behaviour. |\n| `model.timeout_ms` | `300000` (5 min) | Upper bound on a single `/compress` call. L3 inference on CPU can take 60-180 s on large inputs. The daemon falls back to L1+L2 after this window. Lower to 60 000 for GPU setups. |\n| `model.fallback_to_layer1_on_timeout` | `true` | Use L1+L2 output if Ollama is slow or unavailable |\n| `model.gpu_layers` | `-1` | `-1` = all layers on GPU; `0` = CPU only |\n| `model.gpu_vendor` | `null` | `\"nvidia\"` / `\"amd\"` / `\"apple\"` — the card the user picked in `ntk model setup`. `null` = auto-detect. Runtime honours this verbatim instead of silently preferring NVIDIA on multi-vendor systems. |\n| `model.cuda_device` | `0` | Zero-based device index **within** the chosen vendor (e.g. the first AMD card is `0` in the AMD namespace, independent of how many NVIDIAs are present). |\n| `exclusions.commands` | `[\"cat\",\"echo\"]` | Commands whose output is never compressed |\n| `exclusions.max_input_chars` | `500000` | Hard limit on input size before processing |\n\n**Per-project override example (`.ntk.json` in project root):**\n\n```json\n{\n  \"compression\": {\n    \"inference_threshold_tokens\": 100\n  },\n  \"exclusions\": {\n    \"commands\": [\"make\", \"just\"]\n  }\n}\n```\n\n---\n\n## GPU Acceleration\n\nNTK enumerates every discrete GPU on the host — multiple cards, multiple\nvendors, mixed setups are all supported — and picks the best CPU fallback\nwhen no GPU is available.\n\n**NVIDIA detection** — `nvidia-smi` (every CUDA device, with accurate VRAM).\n\n**AMD detection** — tries, in order:\n\n1. `rocm-smi` (ROCm-supported cards only)\n2. **Windows**: the display-class driver registry (`VEN_1002`), reading\n   `HardwareInformation.qwMemorySize` for accurate 64-bit VRAM. This is\n   what lets Polaris / Vega cards (e.g. **RX 570 / 580 / Vega 56**) show\n   up on Windows even though they are not supported by ROCm.\n3. **Linux**: `/sys/class/drm/card*/device/vendor == 0x1002`, with VRAM\n   from `mem_info_vram_total` and the product name resolved via\n   `lspci -nn -d 1002:\u003cdevice\u003e`.\n\n**Apple Silicon** — Metal is enabled at compile time on\n`aarch64-apple-darwin`.\n\n**CPU fallbacks** — Intel AMX → AVX-512 → AVX2 → scalar.\n\n### Multi-GPU selection\n\n`ntk model setup` lists every detected GPU as its own numbered option\n(plus a CPU-only option). When more than one GPU is present, the user\npicks explicitly — the chosen **vendor** is saved to\n`config.model.gpu_vendor` and the per-vendor **device index** to\n`config.model.cuda_device`.\n\n```\n  GPU / Compute Selection\n  ────────────────────────────────────\n  Detected: 2 discrete GPUs\n\n  [1]  CPU  AVX2                      ✓ always available\n  [2]  NVIDIA GeForce RTX 3060        ✓ 12288 MB VRAM\n  [3]  AMD Radeon RX 580 2048SP       ✓ 8192 MB VRAM\n\n  Choose [1-3] or Enter for [2]:\n```\n\n**No hidden vendor preference.** On a machine with both an NVIDIA and an\nAMD card, picking AMD in the wizard actually routes inference to the AMD\ncard. The daemon passes `HIP_VISIBLE_DEVICES` / `ROCR_VISIBLE_DEVICES` /\n`GGML_VK_VISIBLE_DEVICES` to the llama-server subprocess when AMD is\nselected, and `CUDA_VISIBLE_DEVICES` when NVIDIA is selected. If the\nconfigured vendor is unavailable at runtime (GPU removed / driver\nfailure), NTK falls back to CPU and warns — it never silently switches to\na different vendor.\n\n\u003e **About the `(device 0)` label.** Each vendor numbers its own devices\n\u003e starting at 0, independently. So `NVIDIA GT 730 (device 0)` and\n\u003e `AMD RX 580 (device 0)` are different hardware — the disambiguation\n\u003e comes from `gpu_vendor`, not from the numeric index.\n\n`ntk status` reports the **configured** backend (respecting `gpu_vendor`),\nnot the \"best detected\" one.\n\n**Performance expectations - Phi-3 Mini 3.8B Q5_K_M (Layer 3 latency p95):**\n\n| Backend | Hardware example | p50 | p95 |\n|---|---|---|---|\n| CUDA | RTX 3060 | ~50ms | ~80ms |\n| CUDA | RTX 5060 Ti | ~30ms | ~50ms |\n| AMD ROCm | RX 6800 XT | ~80ms | ~130ms |\n| Metal | M2 MacBook Pro | ~80ms | ~150ms |\n| Intel AMX | Xeon 4th Gen | ~150ms | ~250ms |\n| AVX2 CPU | i7-12700 | ~300ms | ~500ms |\n| AVX2 CPU | i5-8250U | ~600ms | ~900ms |\n\nLayer 3 is skipped entirely for outputs below the threshold (default 300 tokens), so most small commands like `git add` or `ls` add zero latency.\n\n**Build options:**\n\n```bash\n# Default build — Candle CPU + Ollama / llama-server external.\n# Works on any machine, including AMD GPUs (inference routes through\n# llama-server built with Vulkan, configured via `ntk model setup`).\ncargo build --release\n\n# CUDA (NVIDIA) — enables in-process GPU offloading via Candle\ncargo build --release --features cuda\n\n# Metal (Apple Silicon)\ncargo build --release --features metal\n```\n\n**Or let the wrapper pick the right flag automatically:**\n\n```bash\n# Linux / macOS\n./scripts/build.sh\n\n# Windows (PowerShell)\n.\\scripts\\build.ps1\n```\n\nThe wrapper detects the host GPU + toolchain and adds the correct feature\n(or none), so `./scripts/build.sh` does the right thing on an NVIDIA\nworkstation, an M-series Mac, an AMD box, and a bare CPU server alike.\n\n### Release binary variants\n\nThe `release.yml` workflow publishes one binary per platform × scenario,\nwith a `-cpu` or `-gpu` suffix. All 8 artifacts are built and released\non every version bump:\n\n| Artifact | Contents | CI runner |\n|---|---|---|\n| `ntk-linux-x86_64-cpu`        | CPU-only, Candle disabled. | ubuntu-latest |\n| `ntk-linux-x86_64-gpu`        | Candle + CUDA (sm_80+). Requires NVIDIA driver ≥ 520 at runtime. | nvidia/cuda:12.5.1-devel-ubuntu22.04 container |\n| `ntk-linux-aarch64-cpu`       | CPU-only. | ubuntu-latest + taiki-e cross-toolchain |\n| `ntk-darwin-x86_64-cpu`       | CPU-only (Intel Macs). | macos-latest |\n| `ntk-darwin-aarch64-cpu`      | CPU-only (Apple Silicon). | macos-latest |\n| `ntk-darwin-aarch64-gpu`      | Candle + Metal (Apple Silicon). | macos-latest |\n| `ntk-windows-x86_64-cpu.exe`  | CPU-only. | windows-latest |\n| `ntk-windows-x86_64-gpu.exe`  | Candle + CUDA (sm_80+). Requires NVIDIA driver ≥ 520 at runtime. | windows-latest + Jimver CUDA 12.5 |\n\nThe shell / PowerShell installers pick the right artifact automatically\nbased on the user's platform choice (NVIDIA / AMD / CPU). There is no\ndedicated AMD `-gpu` binary because Candle has no AMD backend — AMD users\nget the `-cpu` binary and point NTK at an external `llama-server`\ncompiled with Vulkan (step-by-step in the installer's post-install hint\nand in the [AMD GPUs](#amd-gpus) section below).\n\n\u003e **Compute capability:** the `-gpu` binaries target `sm_80` (Ampere and\n\u003e newer: RTX 30xx, RTX 40xx, A100, H100). They run on any NVIDIA GPU with\n\u003e compute capability ≥ 8.0. For older GPUs (Pascal sm_60, Turing sm_75,\n\u003e etc.) build from source with `CUDA_COMPUTE_CAP=\u003ccap\u003e cargo build\n\u003e --release --features cuda`.\n\n### Prerequisites for GPU features\n\nCargo **does not** install GPU SDKs for you — feature flags only toggle\nwhich bindings get compiled, and the SDK has to be present at build time.\n\n| Feature flag | Required on the build machine |\n|---|---|\n| *(none, default)* | Just Rust stable. Nothing GPU-specific. |\n| `cuda` | **CUDA Toolkit 12.x** with `nvcc` on `PATH` **and** the following libs: `cudart`, `cublas`, `cublas_dev`, `curand`, `curand_dev`, `nvrtc`, `nvrtc_dev`. Install: `winget install Nvidia.CUDA` (Windows) or the NVIDIA network installer for your distro (Linux). |\n| `metal` | **macOS on Apple Silicon (aarch64)**. Metal ships with Xcode Command Line Tools — `xcode-select --install`. Intel Macs may compile but are not supported at runtime. |\n\n**CUDA build troubleshooting:**\n\n| Error | Fix |\n|---|---|\n| `Failed to execute nvcc` | Install CUDA Toolkit, reopen shell |\n| `Cannot find compiler 'cl.exe'` (Windows) | Open **Developer Command Prompt** or activate MSVC env first |\n| `LNK1181: cannot open input file 'nvrtc.lib'` (Windows) | Re-install CUDA with `nvrtc` + `nvrtc_dev` components |\n| `Cannot open input file 'libcuda.so'` (Linux headless) | `export LIBRARY_PATH=/usr/local/cuda/lib64/stubs RUSTFLAGS=\"-L /usr/local/cuda/lib64/stubs\"` |\n| `nvidia-smi` fails at build time | `export CUDA_COMPUTE_CAP=80` (or your GPU's sm number) |\n\n### AMD GPUs\n\nThere is no `--features amd` / `--features rocm` / `--features vulkan` —\nCandle has no AMD backend in the currently pinned version. For AMD GPU\nacceleration on NTK:\n\n1. Build NTK with the default flags (`cargo build --release`).\n2. `ntk model setup` → choose **llama.cpp** backend. NTK auto-downloads\n   the latest **Vulkan build** of `llama-server` — it works on Polaris\n   (RX 580), Vega, RDNA, and RDNA2+ without ROCm or any SDK. Runtime\n   automatically scopes `HIP_VISIBLE_DEVICES` / `GGML_VK_VISIBLE_DEVICES`\n   to your selected card.\n3. Inference runs on the AMD GPU through `llama-server`; the NTK\n   daemon talks to it over HTTP at `localhost:8766`.\n\n**If the installed `llama-server` is CPU-only** (e.g. downloaded\nmanually from an AVX2 release), `ntk model setup` detects the missing\nGPU DLLs and hides the NVIDIA / AMD GPU options in the wizard — only\nCPU is offered. Replace the binary with a Vulkan build and re-run the\nwizard to enable GPU selection.\n\nThe `ntk start --gpu` and `ntk model setup` commands detect AMD cards\n(Polaris / Vega / RDNA) via the Windows driver registry or Linux sysfs,\nso your GPU shows up in the selection list even without ROCm.\n\n### Ollama backend vs llama.cpp backend\n\n| Feature | Ollama | llama.cpp |\n|---|---|---|\n| NVIDIA CUDA ≥ 5.0 (Maxwell+) | ✅ | ✅ CUDA build |\n| Apple Silicon | ✅ Metal | ✅ Metal build |\n| AMD RDNA2+ on Linux | ✅ via ROCm | ✅ Vulkan / HIP |\n| **AMD Polaris (RX 580, RX 5xx)** | ❌ ROCm dropped support | ✅ **Vulkan** |\n| **NVIDIA Kepler (GT 7xx)** | ❌ compute \u003c 5.0 | ✅ Vulkan |\n| Model management | `ollama pull/list/rm` | Manual GGUF download |\n| Setup complexity | 1 command | auto-download via wizard |\n| L3 latency (CPU) | ~150-300 ms overhead | ~50-100 ms (socket local) |\n\n**TL;DR:** for NVIDIA Turing+ / Apple Silicon → Ollama is simpler. For\nolder NVIDIA Kepler or any AMD Polaris → llama.cpp + Vulkan is the\nonly path to GPU acceleration.\n\n---\n\n## RTK + NTK Coexistence\n\nNTK is designed to work alongside [RTK (Rust Token Killer)](https://github.com/VALRAW-ALL/rtk):\n\n- **RTK** runs first, inside the shell command via `rtk \u003ccmd\u003e`. It applies rule-based filtering (regex) synchronously.\n- **NTK** runs after, via the `PostToolUse` hook. It applies semantic compression on RTK's already-filtered output.\n\nThis double-pass often yields better results than either tool alone:\n\n```\nRaw output: 1842 tokens\nAfter RTK:   420 tokens   (rule-based: removed ANSI, grouped repeats)\nAfter NTK:   132 tokens   (semantic: summarized remaining noise)\nCombined:    ~93% savings\n```\n\nNTK's Layer 1 detects RTK-pre-filtered output (shorter input, no ANSI codes, already contains `[×N]` groupings) and skips redundant processing. Layer 3's threshold often won't trigger on already-filtered output, keeping latency near zero.\n\n```bash\n# Both tools active simultaneously - this is the recommended setup\nrtk cargo test\n# RTK filters in the shell → NTK further compresses via hook\n```\n\n---\n\n## Development\n\n### Build\n\n```bash\ncargo build           # debug\ncargo build --release # release\ncargo check           # fast compile check\n```\n\n### Test\n\n```bash\n# All tests\ncargo test\n\n# Individual test suites\ncargo test --test layer1_tests\ncargo test --test layer2_tests\ncargo test --test compression_pipeline_tests\ncargo test --test snapshot_tests\ncargo test --test quality_regression_tests\n\n# Property-based tests (slow - runs ~256 cases per property)\ncargo test --test compression_invariants\n\n# Reduce proptest cases for a faster run\nPROPTEST_CASES=32 cargo test --test compression_invariants\n```\n\n### Review snapshot changes\n\nAfter modifying the compression logic, snapshots will fail if the output changes. Review and approve the diffs:\n\n```bash\ncargo test --test snapshot_tests   # shows diffs for changed snapshots\ncargo insta review                 # interactively approve or reject each change\n```\n\nTo force-update all snapshots (e.g. after an intentional algorithm improvement):\n\n```bash\nINSTA_UPDATE=always cargo test --test snapshot_tests\n```\n\n### Benchmarks\n\n```bash\ncargo bench                      # run all benchmarks, generate HTML report\ncargo bench layer1_1kb           # single benchmark\n\n# Open the HTML report\nopen target/criterion/report/index.html   # macOS\nxdg-open target/criterion/report/index.html  # Linux\n```\n\n**Current baseline (debug build, i7-12700):**\n\n| Benchmark | Measured |\n|---|---|\n| `layer1_1kb` | ~19 µs |\n| `layer1_100kb` | \u003c 2 ms |\n| `layer2_tokenizer` (1kb) | \u003c 5 ms |\n| Full pipeline L1+L2 (1kb) | \u003c 10 ms |\n\n### Token-savings benchmark (microbench + macrobench)\n\nThe `bench/` directory contains a full test harness for measuring how\nmany tokens NTK actually saves. See `docs/testing-plan.md` (English)\nor `docs/plano-de-testes.md` (PT-BR) for the full planning doc.\n\n**Quick start:**\n\n```powershell\n# 1. Generate the 8 deterministic fixtures (one-off)\npwsh bench/generate_fixtures.ps1\n\n# 2. Start the daemon with compression logging ON\n$env:NTK_LOG_COMPRESSIONS = \"1\"\nntk start\n\n# 3. Replay every fixture against /compress and write microbench.csv\npwsh bench/replay.ps1\n\n# 4. Generate the markdown report (optionally with A/B transcripts)\npwsh bench/report.ps1\n#    Or with before/after Claude Code session transcripts:\npwsh bench/parse_transcript.ps1 `\n  -Transcript ~/.claude/projects/\u003cproj\u003e/\u003csession-A\u003e.jsonl\npwsh bench/parse_transcript.ps1 `\n  -Transcript ~/.claude/projects/\u003cproj\u003e/\u003csession-B\u003e.jsonl\npwsh bench/report.ps1 `\n  -A ~/.claude/projects/\u003cproj\u003e/\u003csession-A\u003e.csv `\n  -B ~/.claude/projects/\u003cproj\u003e/\u003csession-B\u003e.csv\n```\n\nUnix/macOS users can substitute `pwsh bench/run_all.ps1` with\n`bash bench/run_all.sh` — the orchestrator plus `replay.sh` are\nportable; `parse_transcript.ps1` and `report.ps1` still require\nPowerShell (available on Unix via `pwsh`).\n\n**Outputs:**\n\n- `bench/microbench.csv` — one row per fixture with per-layer token\n  counts, latency and compression ratio.\n- `~/.ntk/logs/YYYY-MM-DD/*.json` — when `NTK_LOG_COMPRESSIONS=1` is\n  set, every compression writes a JSON file with the raw input, each\n  layer's intermediate output, and the final output. Useful for\n  auditing what NTK sent to Claude.\n- `bench/report.md` — rendered markdown with per-fixture table, A/B\n  session delta, and estimated USD cost (Sonnet 4.6 rates editable\n  via script flags).\n\n**Baseline prompt for macrobench:** `bench/prompts/baseline.md`. It\nruns 7 deterministic Bash commands in the NTK repo plus one summary\nturn — paste verbatim into Claude Code for the A (hook off) and\nB (hook on) runs. The PowerShell orchestrator `bench/ab_session.ps1`\nautomates the variant management (install / uninstall hook, wait for\neach session, copy transcripts, generate report).\n\n**Multi-language coverage** (12 fixtures). Measured ratios with L3\nskipped (CPU timeout), so the numbers below come purely from L1+L2\ndeterministic compression:\n\n| Category | Fixture | Ratio |\n|---|---|---:|\n| repetitive logs | `docker_logs_repetitive` | **92%** |\n| Node trace | `node_express_trace` | **83%** |\n| cargo test | `cargo_test_failures` | **68%** |\n| Python trace | `python_django_trace` | **62%** |\n| Java trace | `stack_trace_java` | **60%** |\n| Go trace | `go_panic_trace` | **56%** |\n| PHP trace | `php_symfony_trace` | 33% |\n| unstructured log | `generic_long_log` | 14% |\n| TS errors | `tsc_errors_node_modules` | 10% |\n| git diff | `git_diff_large` | 9% |\n\nRun `bench/run_all.ps1` (or `.sh`) to reproduce. See `docs/testing-plan.md`\nfor the methodology.\n\n### Layer 4 — Context Injection\n\nWhen the hook forwards the Claude Code `transcript_path` (v0.2.27+),\nthe daemon reads the most recent user message and prepends it to the\nL3 prompt so the summary focuses on information relevant to the user's\nactual goal. Four prompt formats are supported:\n\n| Format | Shape |\n|---|---|\n| `Prefix` (default) | `CONTEXT: The user's most recent request was: \"...\"\\n\\n\u003coutput\u003e` |\n| `XmlWrap` | `\u003cuser_intent\u003e...\u003c/user_intent\u003e\\n\\n\u003coutput\u003e` |\n| `Goal` | `User goal: ... — extract only info that advances this goal.\\n\\n\u003coutput\u003e` |\n| `Json` | `{\"user_intent\": \"...\"}\\n\\n\u003coutput\u003e` |\n\nOverride at runtime for experiments:\n```bash\nNTK_L4_FORMAT=xml ntk start\n```\nA/B among formats:\n```powershell\npwsh bench/prompt_formats.ps1\n```\nDisable entirely by setting `compression.context_aware = false` in\n`~/.ntk/config.json`.\n\n### Linting and security gate\n\n```bash\n# Clippy with security lints (required to pass before committing)\ncargo clippy -- \\\n  -W clippy::unwrap_used \\\n  -W clippy::expect_used \\\n  -W clippy::panic \\\n  -W clippy::arithmetic_side_effects \\\n  -D warnings\n\n# Dependency vulnerability audit\ncargo audit\n\n# Format check\ncargo fmt --check\n```\n\n### Project structure\n\n```\nsrc/\n  main.rs                  - CLI (clap) + daemon entry point\n  server.rs                - HTTP routes: /compress, /metrics, /records, /health, /state\n  config.rs                - Config deserialization + merge + validation\n  detector.rs              - Output type detection (test/build/log/diff/generic)\n  metrics.rs               - In-memory store + SQLite persistence (sqlx)\n  gpu.rs                   - GPU backend detection hierarchy\n  installer.rs             - ntk init: idempotent hook + config install\n  telemetry.rs             - Anonymous daily telemetry (opt-out)\n  compressor/\n    layer1_filter.rs       - ANSI strip, dedup, blank-line collapse\n    layer2_tokenizer.rs    - tiktoken-rs BPE, path shortening\n    layer3_backend.rs      - BackendKind abstraction (Ollama / Candle / LlamaCpp)\n    layer3_inference.rs    - Ollama HTTP client + fallback\n    layer3_candle.rs       - In-process inference via HuggingFace Candle (CUDA/Metal/CPU)\n    layer3_llamacpp.rs     - llama.cpp server client with auto-start\n  output/\n    terminal.rs            - ANSI colors, TTY detection, Spinner + BenchSpinner\n    table.rs               - Metrics tables for stdout\n    graph.rs               - ASCII bar charts + sparklines (stdout, non-interactive)\n    dashboard.rs           - ratatui TUI: live + attach-mode dashboard (polls /state endpoint)\n\nscripts/\n  ntk-hook.sh              - PostToolUse hook (Unix/macOS)\n  ntk-hook.ps1             - PostToolUse hook (Windows PowerShell)\n  install.sh               - One-line installer (Unix)\n  install.ps1              - One-line installer (Windows)\n\ntests/\n  unit/                    - Layer 1, Layer 2, detector unit tests\n  integration/             - Pipeline, endpoint, CLI, Ollama mock, quality, snapshot tests\n  proptest/                - Compression invariants (proptest)\n  benchmarks/              - criterion.rs benchmarks\n  fixtures/                - Real captured outputs (cargo, tsc, vitest, docker, next.js)\n```\n\n---\n\n## Privacy Policy\n\nNTK collects **anonymous, aggregated** usage metrics. No code, file contents, command arguments, paths, or personally identifiable information is ever collected.\n\n### What is collected (once per day, opt-in by default)\n\n| Field | Description |\n|---|---|\n| `device_hash` | SHA-256(random_salt + machine_id) - not reversible to any personal identifier |\n| `ntk_version` | Installed NTK version |\n| `os` | Operating system name (`linux`, `macos`, `windows`) |\n| `arch` | CPU architecture (`x86_64`, `aarch64`) |\n| `compressions_24h` | Number of compressions in the last 24 hours |\n| `top_commands` | Most-used command **names only** (e.g. `[\"cargo\", \"git\"]`) - no arguments, no paths |\n| `avg_savings_pct` | Average token savings percentage |\n| `layer_pct` | Layer distribution: how often L1, L2, L3 produced the final output |\n| `gpu_backend` | Backend used (e.g. `cuda`, `cpu`) |\n\n### What is NOT collected\n\n- Source code or file contents\n- Command arguments or flags (e.g. `cargo test --test foo` → only `cargo` is stored)\n- File paths, directory names, or project names\n- Environment variables or secrets\n- IP addresses or network information (telemetry endpoint receives only the JSON payload)\n- Any information from the compressed or uncompressed tool outputs\n\n### How the device hash works\n\nA random UUID (salt) is generated once and stored locally in `~/.ntk/.telemetry_salt` with mode `600` (readable only by the file owner on Unix). The salt is combined with a non-personal machine identifier and hashed with SHA-256. The salt is **never sent** - only the hash is. The hash cannot be reversed to identify the machine or the user.\n\n### Opt-out\n\nTelemetry can be disabled in two ways:\n\n```bash\n# Environment variable - add to ~/.bashrc or ~/.zshrc for permanent opt-out\nexport NTK_TELEMETRY_DISABLED=1\n```\n\n```json\n// ~/.ntk/config.json\n{\n  \"telemetry\": { \"enabled\": false }\n}\n```\n\nWhen telemetry is disabled, **no network requests are made** and no payload is constructed.\n\n---\n\n## License\n\nCopyright 2026 Alessandro Mota\n\nLicensed under the **Apache License, Version 2.0** (the \"License\"). You may not use this software except in compliance with the License.\n\nYou may obtain a copy of the License at:\n\n```\nhttp://www.apache.org/licenses/LICENSE-2.0\n```\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n\nA copy of the full license text is available in the [`LICENSE`](LICENSE) file.\n\n---\n\n## Third-Party Licenses\n\nNTK depends on the following open-source libraries. All are compatible with the Apache-2.0 license.\n\n### Runtime dependencies\n\n| Crate | Version | License | Purpose |\n|---|---|---|---|\n| [axum](https://github.com/tokio-rs/axum) | 0.7 | MIT | HTTP daemon framework |\n| [tokio](https://github.com/tokio-rs/tokio) | 1 | MIT | Async runtime |\n| [serde](https://github.com/serde-rs/serde) | 1 | MIT / Apache-2.0 | Serialization |\n| [serde_json](https://github.com/serde-rs/json) | 1 | MIT / Apache-2.0 | JSON handling |\n| [anyhow](https://github.com/dtolnay/anyhow) | 1 | MIT / Apache-2.0 | Error handling |\n| [thiserror](https://github.com/dtolnay/thiserror) | 1 | MIT / Apache-2.0 | Error types |\n| [tiktoken-rs](https://github.com/zurawiki/tiktoken-rs) | 0.5 | MIT | BPE tokenizer (cl100k_base) |\n| [strip-ansi-escapes](https://github.com/luser/strip-ansi-escapes) | 0.2 | Apache-2.0 | ANSI code removal |\n| [sqlx](https://github.com/launchbadge/sqlx) | 0.7 | MIT / Apache-2.0 | Async SQLite persistence |\n| [libsqlite3-sys](https://github.com/rusqlite/rusqlite) | 0.27 | MIT | SQLite bundled build |\n| [dirs](https://github.com/dirs-dev/dirs-rs) | 5 | MIT / Apache-2.0 | Platform-specific paths |\n| [reqwest](https://github.com/seanmonstar/reqwest) | 0.11 | MIT / Apache-2.0 | HTTP client (Ollama, telemetry) |\n| [clap](https://github.com/clap-rs/clap) | 4 | MIT / Apache-2.0 | CLI argument parsing |\n| [tracing](https://github.com/tokio-rs/tracing) | 0.1 | MIT | Structured logging |\n| [tracing-subscriber](https://github.com/tokio-rs/tracing) | 0.3 | MIT | Log output formatting |\n| [ratatui](https://github.com/ratatui-org/ratatui) | 0.28 | MIT | ASCII charts (stdout only) |\n| [sha2](https://github.com/RustCrypto/hashes) | 0.10 | MIT / Apache-2.0 | SHA-256 for telemetry hash |\n| [uuid](https://github.com/uuid-rs/uuid) | 1 | MIT / Apache-2.0 | Random salt generation |\n| [url](https://github.com/servo/rust-url) | 2 | MIT / Apache-2.0 | URL validation (Ollama config) |\n| [chrono](https://github.com/chronotope/chrono) | 0.4 | MIT / Apache-2.0 | Timestamps in metrics |\n| [nix](https://github.com/nix-rust/nix) *(Unix)* | 0.27 | MIT | SIGTERM for `ntk stop` |\n| [windows-sys](https://github.com/microsoft/windows-rs) *(Windows)* | 0.52 | MIT / Apache-2.0 | TerminateProcess for `ntk stop` |\n\n### Development / test dependencies\n\n| Crate | Version | License | Purpose |\n|---|---|---|---|\n| [tempfile](https://github.com/Stebalien/tempfile) | 3 | MIT / Apache-2.0 | Temporary files in tests |\n| [wiremock](https://github.com/LukeMathWalker/wiremock-rs) | 0.6 | MIT | Mock Ollama HTTP server |\n| [axum-test](https://github.com/JosephLenton/axum-test) | 14 | MIT | Integration test HTTP server |\n| [proptest](https://github.com/proptest-rs/proptest) | 1 | MIT / Apache-2.0 | Property-based tests |\n| [insta](https://github.com/mitsuhiko/insta) | 1 | Apache-2.0 | Snapshot testing |\n| [assert_cmd](https://github.com/assert-rs/assert_cmd) | 2 | MIT / Apache-2.0 | CLI binary tests |\n| [criterion](https://github.com/bheisler/criterion.rs) | 0.5 | MIT / Apache-2.0 | Statistical benchmarks |\n| [tokio-test](https://github.com/tokio-rs/tokio) | 0.4 | MIT | Async test utilities |\n| [predicates](https://github.com/assert-rs/predicates-rs) | 3 | MIT / Apache-2.0 | Test assertions |\n\nTo audit the full dependency tree and their licenses, run:\n\n```bash\ncargo install cargo-license\ncargo license\n```\n\nTo check for known vulnerabilities in any dependency:\n\n```bash\ncargo install cargo-audit\ncargo audit\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvalraw-all%2Fntk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fvalraw-all%2Fntk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fvalraw-all%2Fntk/lists"}