{"id":49531248,"url":"https://github.com/jaytoone/ctx","last_synced_at":"2026-05-08T05:01:24.093Z","repository":{"id":346533506,"uuid":"1190393421","full_name":"jaytoone/CTX","owner":"jaytoone","description":"Trigger-Driven Dynamic Context Loading for Code-Aware LLM Agents","archived":false,"fork":false,"pushed_at":"2026-04-27T06:30:18.000Z","size":2319,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-27T06:33:11.219Z","etag":null,"topics":["bm25","claude-code","claude-code-plugin","context-retrieval","hooks","llm-tools","memory"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jaytoone.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-24T08:39:21.000Z","updated_at":"2026-04-27T06:30:11.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/jaytoone/CTX","commit_stats":null,"previous_names":["jaytoone/ctx"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/jaytoone/CTX","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaytoone%2FCTX","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaytoone%2FCTX/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaytoone%2FCTX/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaytoone%2FCTX/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jaytoone","download_url":"https://codeload.github.com/jaytoone/CTX/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jaytoone%2FCTX/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32527139,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-02T01:12:54.858Z","status":"online","status_checked_at":"2026-05-02T02:00:05.923Z","response_time":132,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bm25","claude-code","claude-code-plugin","context-retrieval","hooks","llm-tools","memory"],"created_at":"2026-05-02T08:03:07.767Z","updated_at":"2026-05-02T08:03:10.921Z","avatar_url":"https://github.com/jaytoone.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CTX: Trigger-Driven Dynamic Context Loading for Code-Aware LLM Agents\n\n[![PyPI version](https://img.shields.io/pypi/v/ctx-retriever)](https://pypi.org/project/ctx-retriever/)\n[![PyPI downloads](https://img.shields.io/pypi/dm/ctx-retriever)](https://pypi.org/project/ctx-retriever/)\n[![Python](https://img.shields.io/badge/python-3.9%2B-blue)](https://pypi.org/project/ctx-retriever/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n[![HuggingFace Demo](https://img.shields.io/badge/HuggingFace-Demo-orange)](https://huggingface.co/spaces/jaytoone/ctx-demo)\n[![Publish to PyPI](https://github.com/jaytoone/CTX/actions/workflows/publish.yml/badge.svg)](https://github.com/jaytoone/CTX/actions/workflows/publish.yml)\n\nCTX classifies developer queries into four trigger types and routes each to a specialized retrieval pipeline. For dependency-sensitive queries, CTX traverses the codebase import graph to resolve transitive relationships that keyword and embedding methods miss. It achieves **1.9x higher Token-Efficiency Score** than BM25 while using only **5.2% of tokens**, and **outperforms BM25 on held-out external codebases** (Flask, FastAPI, Requests — mean R@5 +0.163).\n\n\u003e **Key insight**: code import graphs encode structural dependency information that text-based RAG cannot capture. CTX achieves Recall@5 = 1.0 on implicit dependency queries vs 0.4 for BM25.\n\n## Install\n\n```bash\npip install ctx-retriever\n```\n\nOr from source:\n\n```bash\ngit clone https://github.com/jaytoone/CTX\ncd CTX\npip install -e .\n```\n\n## Quick Start\n\n```python\nfrom ctx_retriever.retrieval.adaptive_trigger import AdaptiveTriggerRetriever\n\n# Point at any codebase directory\nretriever = AdaptiveTriggerRetriever(\"/path/to/your/project\")\n\n# Retrieve relevant files for any natural-language query\nresult = retriever.retrieve(\n    query_id=\"my_query\",\n    query_text=\"how does authentication work?\",\n    k=5\n)\n\nfor filepath in result.retrieved_files:\n    print(filepath, result.scores[filepath])\n```\n\n## Claude Code Hook (Recommended)\n\nCTX runs as a set of Claude Code hooks that inject relevant past decisions, docs, and code into every prompt. Install is one command:\n\n```bash\npip install ctx-retriever\nctx-install                     # register CTX hooks in ~/.claude/settings.json\n```\n\n**That's it.** Restart Claude Code and hooks fire on every prompt.\n\n### What ctx-install does (atomic, backup-first)\n\n1. Verifies the 4 CTX hook files exist at `~/.claude/hooks/` (chat-memory, bm25-memory, memory-keyword-trigger, g2-fallback)\n2. Reads `~/.claude/settings.json`, takes a timestamped backup (`settings.json.bak.\u003cTS\u003e`)\n3. Merges the CTX hook registrations into the existing `hooks` dict **without overwriting your other hooks** (dedupes by command string — safe to re-run)\n4. Atomically writes the new settings.json (temp-file-then-rename — never leaves partial state on disk)\n5. Smoke-tests by firing `bm25-memory.py` once with a dummy prompt and confirming `last-injection.json` gets written\n\n### Other subcommands\n\n```bash\nctx-install --dry-run           # show what would change, touch nothing\nctx-install status              # verify hook file presence + settings.json registration + last fire\nctx-install --uninstall         # remove CTX hook registrations (hook files left in place)\n```\n\n### Manual install (legacy — only needed if `ctx-install` fails)\n\n```bash\n# 1. Copy hook files to ~/.claude/hooks/\n# 2. Register each in ~/.claude/settings.json under the appropriate event key\n```\n\nExample settings block (what ctx-install writes for you):\n\n```json\n{\n  \"hooks\": {\n    \"UserPromptSubmit\": [\n      { \"hooks\": [{ \"type\": \"command\", \"command\": \"python3 $HOME/.claude/hooks/chat-memory.py\" }] },\n      { \"hooks\": [{ \"type\": \"command\", \"command\": \"python3 $HOME/.claude/hooks/bm25-memory.py --rich\" }] },\n      { \"hooks\": [{ \"type\": \"command\", \"command\": \"python3 $HOME/.claude/hooks/memory-keyword-trigger.py\" }] }\n    ],\n    \"PostToolUse\": [\n      { \"matcher\": \"Grep\",\n        \"hooks\": [{ \"type\": \"command\", \"command\": \"python3 $HOME/.claude/hooks/g2-fallback.py\" }] }\n    ]\n  }\n}\n```\n\n**What you get in each prompt:**\n```\n[CTX] Trigger: EXPLICIT_SYMBOL | Query: AuthService | Confidence: 0.70 | Intent: judge from prompt\nCode files (3/847 total):\n• src/auth/service.py [score=1.000]\n• src/auth/middleware.py [score=0.823]\n• tests/test_auth.py [score=0.741]\n(Use the prompt intent to decide how to treat this context.)\n```\n\n## Validate on your own transcripts\n\nBefore installing, you can measure what CTX *would* give you on your own Claude Code transcripts — no install, no signup, no upload:\n\n```bash\npython3 benchmarks/ctx_validate.py --days 7\n```\n\nstdlib-only; reads `~/.claude/projects/*/\u003csession\u003e.jsonl` locally and emits a Wilson-95-CI markdown report:\n\n```\n- Text match rate:   26.9% [23.2%, 31.1%] ±4.0pp  (n=201)\n- Tool-use match:    11.1% [8.6%, 14.2%]  ±2.8pp\n- Union (either):    32.8% [28.7%, 37.1%] ±4.2pp\nPer response-type:\n  prose:       51.2% ±10.3pp  (n=86)\n  tool_heavy:  26.2% ±8.2pp   (n=107)\n  mixed:       25.0% ±26.0pp  (n=8)\n```\n\n**What this measures** — distinctive terms from each user prompt, substring-matched against the assistant's response text AND tool_use parameters (file_path/command/pattern). On turns where CTX's hooks would surface related context, this rate approximates the *ceiling* of plausible utility. It is NOT a direct CTX measurement — install CTX and compare against live `utility_measured` telemetry for the actual delta. Use it to decide \"is this signal worth pursuing?\" before committing to install.\n\nLive dashboard (after install):\n\n![CTX Telemetry Dashboard](iter5-full.png)\n\nThe dashboard visualizes utility in four stacked views — pooled rate with 95% CI, per-block breakdown (g1/g2_docs/g2_prefetch), by response type (prose/mixed/tool_heavy), and by item age (0-7d / 7-30d / 30d+). The knowledge graph below it lights up decisions in coral when Claude actually used them in the last 7 days; dead-weight decisions (no recent references) appear muted — pruning candidates.\n\n## Hook Performance\n\nCTX adds no LLM calls — latency is purely algorithmic (BM25 + BFS indexing):\n\n| Project | Language | Files | Hook Latency |\n|---------|----------|-------|-------------|\n| Small project | Python | ~88 | ~40ms |\n| Medium project | Python | ~215 | ~165ms |\n| Large project | TypeScript | ~651 | ~270ms |\n| Very large | any | \u003e2000 | skipped (auto-excluded) |\n\nThe hook is skipped for prompts \u003c15 chars, slash commands, `[noctx]` tags, and codebases with \u003c3 files.\n\n**Control tags** you can add to any prompt:\n\n| Tag | Effect |\n|-----|--------|\n| `[noctx]` | Disable CTX for this prompt |\n| `[fix]` | Fix/Replace mode — adds anti-anchoring reminder so Claude doesn't copy the existing (potentially wrong) implementation |\n\n`[fix]` is also auto-triggered when the prompt starts with `fix:`, `bug:`, `refactor:`, or `replace:`.\n\n## Trigger Types\n\n| Trigger | When Used | Mechanism |\n|---------|-----------|-----------|\n| `EXPLICIT_SYMBOL` | Query names a class/function | Symbol index lookup |\n| `SEMANTIC_CONCEPT` | Query describes a concept | BM25 keyword scoring |\n| `IMPLICIT_CONTEXT` | Dependency queries (\"what uses X\") | BFS import graph traversal |\n| `TEMPORAL_HISTORY` | Recent changes / history | Session file tracker |\n\n## Results\n\n### Synthetic Benchmark (50 files, 166 queries)\n\n| Strategy | Recall@5 | Token Usage | TES |\n|----------|----------|-------------|-----|\n| Full Context | 0.075 | 100.0% | 0.019 |\n| BM25 | 0.982 | 18.7% | 0.410 |\n| Dense TF-IDF | 0.973 | 21.0% | 0.406 |\n| GraphRAG-lite | 0.523 | 24.0% | 0.218 |\n| LlamaIndex | 0.972 | 20.1% | 0.405 |\n| Chroma Dense | 0.829 | 19.3% | 0.346 |\n| Hybrid Dense+CTX | 0.725 | 23.6% | 0.303 |\n| **CTX (Ours)** | **0.874** | **5.2%** | **0.776** |\n\n**TES** = Recall@5 / ln(1 + files_loaded). Higher = better token efficiency.\n\n### External Codebase Benchmark (Flask, FastAPI, Requests)\n\nCTX outperforms BM25 on all three held-out external codebases in code-to-code structural retrieval:\n\n| Codebase | Files | CTX R@5 | BM25 R@5 | Δ |\n|----------|-------|---------|----------|---|\n| Flask | 79 | **0.545** | 0.347 | **+0.198** |\n| FastAPI | 928 | **0.328** | 0.174 | **+0.154** |\n| Requests | 35 | **0.626** | 0.489 | **+0.137** |\n| **Mean** | — | **0.500** | 0.337 | **+0.163** |\n\n*Bootstrap 95% CI: external mean [0.441, 0.550]*\n\n### COIR External Benchmark (CodeSearchNet Python)\n\n| Strategy | Recall@1 | Recall@5 | MRR |\n|----------|----------|----------|-----|\n| Dense Embedding (MiniLM) | 0.960 | 1.000 | 0.978 |\n| Hybrid Dense+CTX | 0.930 | 0.950 | 0.940 |\n| BM25 | 0.920 | 0.980 | 0.946 |\n| CTX Adaptive Trigger | 0.720 | 0.740 | 0.728 |\n\n### Downstream LLM Evaluation\n\nCTX context injected into developer prompts improves LLM task quality across two models:\n\n| Scenario | WITH CTX | WITHOUT CTX | Δ |\n|----------|----------|-------------|---|\n| G1 (session memory recall) | 1.000 | 0.110 | **+0.890** |\n| G2 (CTX-specific knowledge) | 0.688 | 0.000 | **+0.688** |\n\nG1: CTX persistent memory enables perfect cross-session recall (vs 11% without). G2: CTX context eliminates hallucination on CTX-specific API queries.\n\n### Key Findings\n\n- CTX achieves **1.9x higher TES** than BM25 with only 5.2% token usage\n- CTX achieves **perfect Recall@5 (1.0)** on IMPLICIT_CONTEXT dependency queries\n- CTX **outperforms BM25 on all 3 external codebases** in code-to-code retrieval (mean +0.163 R@5)\n- CTX context improves downstream LLM task quality: **G1 +0.890**, **G2 +0.688**\n- Trigger classifier achieves **100% accuracy** (all 4 types F1=1.00) on synthetic benchmark\n- CTX Adaptive Trigger achieves **R@5=0.740 on COIR** (improved from 0.380 via BM25 hybrid + CamelCase fix)\n- Hybrid Dense+CTX achieves R@5=0.950 on COIR — best of both worlds\n- No single strategy dominates all dimensions — workload determines optimal choice\n\n## When to Use CTX\n\n**CTX excels when:**\n- You need dependency-aware retrieval: `IMPLICIT_CONTEXT` queries (e.g., \"what uses AuthService?\") achieve perfect Recall@5 (1.0) via BFS import graph traversal\n- Working with a **known codebase** with established symbol/import structure — code-to-code retrieval outperforms BM25 on real projects (Flask: +0.198, FastAPI: +0.154, Requests: +0.137)\n- Token budget is critical — CTX uses only **5.2% of tokens** vs 18.7% for BM25 (TES: 1.9x higher)\n- Queries name **explicit symbols** (class names, function names) — EXPLICIT_SYMBOL trigger routes directly to symbol index\n\n**CTX is not designed for:**\n- **Text-to-code semantic search** (COIR-style): finding code from natural-language descriptions. CTX R@5=0.740 vs BM25=0.980 on CodeSearchNet Python — still a gap; for best results use Dense Embedding or Hybrid Dense+CTX instead\n- **Large unseen codebases** (\u003e500 files, no prior indexing): heuristic symbol extraction degrades at scale; consider AST-based indexers\n- **Natural-language concept queries** without code keywords: SEMANTIC_CONCEPT trigger falls back to BM25, losing CTX's structural advantage\n\n## Running Experiments\n\n```bash\n# Synthetic benchmark\npython run_experiment.py --dataset-size small --strategy all\n\n# Real codebase\npython run_experiment.py --dataset-source real --project-path /path/to/project --strategy all\n\n# COIR external benchmark\npython run_coir_eval.py --n-queries 100\n\n# Ablation study\npython run_experiment.py --dataset-size small --mode ablation\n```\n\nResults are written to `benchmarks/results/`.\n\n## Project Structure\n\n```\nCTX/\n  src/\n    retrieval/            # Retrieval strategies (8 total)\n      adaptive_trigger.py # CTX core: trigger-driven retrieval\n      hybrid_dense_ctx.py # Hybrid: dense seed + graph expansion\n      bm25_retriever.py   # BM25 sparse retrieval\n      dense_retriever.py  # TF-IDF dense retrieval\n      chroma_retriever.py # ChromaDB + sentence-transformers\n      graph_rag.py        # GraphRAG-lite baseline\n      llamaindex_retriever.py # LlamaIndex AST-aware chunking\n      full_context.py     # Full context baseline\n    trigger/              # Trigger classifier (4 types)\n    evaluator/            # Benchmark runner, metrics, COIR\n    data/                 # Dataset generation, real codebase loader\n  hooks/\n    ctx_real_loader.py    # Claude Code UserPromptSubmit hook\n    ctx_session_tracker.py # PostToolUse session tracker\n  benchmarks/\n    results/              # Experiment results and reports\n  docs/\n    claude_code_integration.md  # Claude Code setup guide\n    paper/                # Paper draft (markdown + LaTeX)\n```\n\n## Telemetry (opt-in, local-only)\n\nCTX can log retrieval quality metrics locally to help you understand how well the context injection is working.\n\n**Opt in:**\n```bash\nexport CTX_TELEMETRY=1          # enable for this shell\n# or: touch ~/.claude/ctx-telemetry.enabled   # persist across shells\n```\n\n**View your data:**\n```bash\nctx-telemetry                   # summary + flywheel health verdict (causal r, upgrade hint)\nctx-telemetry last              # last 10 session turns\nctx-telemetry calibrate         # citation bias + causal r-analysis (v1.5)\nctx-telemetry tune              # compute auto-tune params → ctx-auto-tune.json\nctx-telemetry cluster [-p DIR]  # detect tech stack → project_type_hint in ctx-auto-tune.json\nctx-telemetry consent           # Stage 2 upload consent status\nctx-telemetry upload            # Stage 2 dry-run preview\nctx-telemetry clear             # delete all local telemetry logs\n```\n\nSample `ctx-telemetry` output:\n```\nCTX Retrieval Telemetry — 42 session-turn records (schema v1.6)\n...\nFlywheel health [n=42]: causal-r=+0.35 | upgrade=✓ HYBRID | kw=43%\n```\n\n**Auto-tune (flywheel):** After `ctx-telemetry tune` runs with ≥15 records, CTX automatically adjusts retrieval parameters based on your usage patterns (e.g., top_k reduction for query types with lower citation rates). The active tuning state is shown in CTX's context header: `\u003e **CTX auto-tune** [n=42, hybrid✓]`.\n\nWith ≥10 v1.5 records, `tune` also computes a causal signal: Pearson r between BM25 top retrieval score and citation rate. High r (\u003e0.30) means quality-driven citations — HYBRID upgrade is worthwhile. Low r (\u003c0.10) suggests position bias may be dominant — validate before upgrading. This is stored as `hybrid_upgrade_hint` in `ctx-auto-tune.json`.\n\n**Project cluster detection (Stage 3 prerequisite):** `ctx-telemetry cluster` scans your project's source files, matches term frequencies against tech-stack signature profiles (python_ml, python_backend, nextjs_react, rust_systems, go_backend), and writes `project_type_hint` to `ctx-auto-tune.json`. This is a local-first proxy for the Stage 3 `project_type_id` cluster — enabling cold-start pre-warming without requiring cross-user data. Example output:\n```\npython_ml            ██████████████████████████████    80.0%  (18 keywords matched)\npython_backend       ███████                           19.0%  (13 keywords matched)\nProject type: python_ml  (confidence: HIGH)\n```\n\n### What is collected (schema v1.6)\n\nAll data stays on your machine at `~/.claude/ctx-retrieval-events.jsonl`. Nothing is uploaded.\n\n| Field | Type | Description |\n|-------|------|-------------|\n| `user_id` | string(16) | SHA256(machine-id + install-month)[:16] — anonymous, changes on reinstall |\n| `session_id_hash` | string(16) | SHA256(session_id)[:16] — non-reversible |\n| `ts_unix_hour` | int | Unix timestamp truncated to hour |\n| `hook_source` | enum | G1 / G2_DOCS / G2_CODE / CM |\n| `query_type` | enum | KEYWORD / SEMANTIC / TEMPORAL |\n| `retrieval_method` | enum | HYBRID / BM25 / UNKNOWN |\n| `candidates_returned` | int | Number of candidates before ranking |\n| `total_injected` | int | Items injected into context |\n| `total_cited` | int | Items referenced by the AI response |\n| `utility_rate` | float | cited / injected — retrieval precision proxy |\n| `session_turn_index` | int | Turn index within the current session |\n| `vec_daemon_up` | bool | Whether semantic layer was active |\n| `bge_daemon_up` | bool | Whether cross-encoder reranker was active |\n| `duration_ms` | int | Per-block retrieval latency |\n| `top_score_bm25` | float\\|null | Max BM25 score — causal calibration signal (v1.5) |\n| `top_score_dense` | float\\|null | Max cosine similarity score (v1.5) |\n\n### What is NOT collected\n\n- ❌ No query text, response text, or code content\n- ❌ No file names, commit messages, or project paths\n- ❌ No email, device name, or personally identifiable information\n- ❌ No network requests — Stage 1 is local-only\n\n### Privacy design\n\n- `user_id` = SHA256(machine-id + month-boundary) — not linkable to email or name; changes on reinstall\n- Timestamps truncated to **hour** (not minute)\n- All content stripped — only counts, rates, method names, and latency\n- Follows [Sourcegraph's numeric-only telemetry](https://sourcegraph.com/docs/admin/telemetry) pattern\n\n**Stage 2 (not yet implemented):** opt-in upload of k-anonymized `session_aggregate` rows via `ctx-telemetry consent`. Rows with fewer than 5 users per (date × project_type) window are suppressed before any upload.\n\n## Paper\n\n- Paper draft: [`docs/paper/CTX_paper_draft.md`](docs/paper/CTX_paper_draft.md)\n- arXiv: TBD\n- EMNLP 2026 submission: TBD\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaytoone%2Fctx","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjaytoone%2Fctx","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjaytoone%2Fctx/lists"}