https://github.com/jonathanlight/context_os
context_os
https://github.com/jonathanlight/context_os
Last synced: 28 days ago
JSON representation
context_os
- Host: GitHub
- URL: https://github.com/jonathanlight/context_os
- Owner: Jonathanlight
- License: mit
- Created: 2026-05-27T07:08:49.000Z (29 days ago)
- Default Branch: develop
- Last Pushed: 2026-05-27T18:01:58.000Z (29 days ago)
- Last Synced: 2026-05-27T18:06:08.018Z (29 days ago)
- Language: Python
- Size: 58.6 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

# ContextOS
**The complete toolchain for LLM context engineering — lint, compile, audit, evaluate, auto-fix, and ship.**
**English** · [Français](README.fr.md)
[](https://github.com/Jonathanlight/context_os/actions/workflows/ci.yml)
[](https://jonathanlight.github.io/context_os/)
[](https://pypi.org/project/context-os-ctx/)
[](https://www.python.org/downloads/)
[](LICENSE)
---
## The problem
A developer working seriously with LLMs in 2026 maintains **5–10 context files** scattered across incompatible formats:
```
your-repo/
├── CLAUDE.md ← Claude Code
├── AGENTS.md ← Codex / Aider
├── .cursorrules ← Cursor
├── .cursor/rules/*.mdc ← Cursor (new format)
├── .clinerules ← Cline
├── .windsurfrules ← Windsurf
├── .github/copilot-instructions.md ← GitHub Copilot
├── skills//SKILL.md ← Anthropic Skills
└── rag.ctx ← RAG corpus config
```
They **drift**. They **contradict**. Nobody catches the rule that says `Always use type hints` in `CLAUDE.md` while `.cursorrules` says `Never use type hints in benchmarks`. The model sees both; one wins; the choice is invisible to the author.
## The solution
ContextOS treats LLM context as **source code**: parsed into a typed AST, validated against 27 lint rules, kept in sync across targets, evaluated against real LLMs, auto-fixed where unambiguous.
```
┌─────────────────┐
│ project.ctx │ ← write once
└────────┬────────┘
│
┌────────▼─────────┐
│ ContextOS │ ← parse · lint · evaluate · fix
└────────┬─────────┘
│
┌────────┼────────┬────────┬────────┬─────────┐
▼ ▼ ▼ ▼ ▼ ▼
CLAUDE.md AGENTS.md cursor copilot windsurf SKILL.md
rag.manifest.json
```
## What's in the box
| | Capability | One-liner |
| --- | --- | --- |
| 📝 | **`ctx compile`** | One `.ctx` source → eight target formats |
| 🔎 | **`ctx lint`** | 27 rules across A / C / F / K / P / R / S / X / XA categories |
| 🌳 | **`ctx audit`** | Walk a repo, lint every file, surface cross-artifact collisions |
| 📊 | **`ctx stats`** | Aggregate corpus-wide diagnostics + top rule codes |
| 🔀 | **`ctx diff`** | Semantic diff between two context versions |
| 🧪 | **`ctx eval`** | Functional evaluation against Anthropic Skills + RAG retrieval |
| 📉 | **`ctx eval-diff`** | Compare runs, gate CI on regressions |
| 🛠 | **`ctx fix`** | Auto-apply structured fixes (X003, F001, X001, S005) |
| 🧠 | **`ctx lsp`** | LSP server for VSCode / Neovim / Helix / Sublime |
| 📺 | **`--html`** | Self-contained HTML reports for audit + eval |
## Install
> **PyPI publication is pending** — the `context-os-ctx` distribution
> on PyPI is not (yet) the package from this repository. Until the
> first release is published, install from the git source. See
> [`docs/release.md`](docs/release.md) for the publishing setup.
### From the git source (recommended today)
```bash
# Core CLI
pipx install git+https://github.com/Jonathanlight/context_os.git
# With editor (LSP) support
pipx install 'git+https://github.com/Jonathanlight/context_os.git#egg=context-os-ctx[lsp]'
# With evaluation (Anthropic + OpenAI + numpy)
pipx install 'git+https://github.com/Jonathanlight/context_os.git#egg=context-os-ctx[eval]'
```
Pin to a specific release by appending `@vX.Y.Z`:
```bash
pipx install git+https://github.com/Jonathanlight/context_os.git@v4.0.0
```
### Once PyPI publishing is configured
```bash
pipx install context-os-ctx
pipx install 'context-os-ctx[lsp]'
pipx install 'context-os-ctx[eval]'
```
### Verify
```bash
ctx --version
# contextos 4.0.0
```
## Five-minute tour
### 1 — Write your project context once
```toml
# project.ctx
project = "MyApp"
artifacts = ["context"]
[stack]
required = ["python>=3.12", "fastapi"]
forbidden = ["django"]
[[rules]]
id = "TDD-001"
title = "Write a failing test before any production code change"
severity = "must"
rationale = "Catches regressions before they reach PR review."
[[rules]]
id = "SEC-042"
title = "Sanitize every input crossing a trust boundary"
severity = "must"
rationale = "Prevents path-traversal and injection attacks."
example_good = "bleach.clean(user_html)"
```
### 2 — Compile to every target
```bash
ctx compile project.ctx --target claude_code --output-dir . # → ./CLAUDE.md
ctx compile project.ctx --target codex --output-dir . # → ./AGENTS.md
ctx compile project.ctx --target cursor --output-dir . # → ./.cursor/rules/agent.mdc
ctx compile project.ctx --target copilot --output-dir . # → ./.github/copilot-instructions.md
```
### 3 — Lint existing files
```text
$ ctx lint CLAUDE.md --target claude_code
warning[A001]: vague directive: 'Be concise'
--> CLAUDE.md:42:1
= help: rephrase with a measurable criterion (e.g. 'public functions
<= 40 lines' instead of 'be concise')
= doc: https://contextos.dev/rules/A001
warning[K002]: must-severity rule 'TDD-001' has no rationale
--> CLAUDE.md:14:1
= help: add `rationale = "..."` explaining why this rule is mandatory
```
### 4 — Audit a whole repo
```text
$ ctx audit .
--- ./CLAUDE.md
warning[F002]: rule title is 145 characters long (limit: 120)
--> ./CLAUDE.md:12:1
--- ./AGENTS.md
no diagnostics
--- Cross-artifact
warning[XA001]: rule id 'TDD-001' collides across 2 files with different
content: ['./AGENTS.md', './CLAUDE.md']
summary: 2 diagnostic(s)
```
### 5 — Evaluate your skills functionally
```bash
ctx eval skills.eval.toml --skills-dir skills/ --json --output current.json
ctx eval-diff baseline.json current.json
# exit 1 if a previously-passing case now fails
```
### 6 — Auto-fix what we can
```bash
ctx fix CLAUDE.md # dry-run: prints a unified diff
ctx fix CLAUDE.md --apply # writes the fix back
```
## Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ ContextOS v4.0 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ AST (Pydantic v2, mypy --strict) │ │
│ │ Agent · Skill · RAG · Eval │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────┬─────────────┬────────────┬────────────────┐ │
│ │ Parsers │ Analyzers │ Emitters │ Eval runners │ │
│ │ .ctx │ 27 rules │ 8 targets │ Anthropic │ │
│ │ CLAUDE.md │ A/C/F/K/P/ │ Markdown │ OpenAI │ │
│ │ SKILL.md │ R/S/X/XA │ JSON │ Mock │ │
│ └────────────┴─────────────┴────────────┴────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Surfaces │ │
│ │ CLI · Python library · LSP server · VSCode extension │ │
│ │ · GitHub Action · HTML reports │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
Three artifact families: **agent context files** · **Anthropic Skills** · **RAG corpora**.
Three operating modes: **structural lint** · **functional eval** · **auto-fix**.
Five consumption surfaces: **CLI** · **Python library** · **LSP** · **VSCode extension** · **GitHub Action**.
## Supported targets
| Target | Parse | Emit | Filename |
|-------------------|:-----:|:----:|---------------------------------------|
| `claude_code` | ✅ | ✅ | `CLAUDE.md` |
| `codex` | ✅ | ✅ | `AGENTS.md` |
| `cursor` | ⏳ | ✅ | `.cursor/rules/agent.mdc` |
| `copilot` | ⏳ | ✅ | `.github/copilot-instructions.md` |
| `cline` | ⏳ | ✅ | `.clinerules` |
| `windsurf` | ⏳ | ✅ | `.windsurfrules` |
| `anthropic_skill` | ✅ | ✅ | `SKILL.md` |
| `rag_manifest` | N/A | ✅ | `rag.manifest.json` |
## CLI reference
The **Input** column tells you what the command expects on the command line:
- _none_ — no positional argument, the command is self-contained
- _file_ — a path to an existing file (e.g. `CLAUDE.md`, `project.ctx`)
- _dir_ — a directory path (typically a repo root)
- _two files_ — two paths (diff and eval-diff)
| Command | Input | Output | Purpose |
|-----------------|------------------|---------------------------------------|----------------------------------------------------------------------|
| `ctx create` | _none_ | new `.ctx` | Scaffold a starter `.ctx` from `--lang python,fastapi,react,...` |
| `ctx init` | dir (default `.`)| new `/.ctx` | Walk repo recursively, detect stack from manifests, write a `.ctx` |
| `ctx eval-init` | _none_ | new `.eval.toml` | Scaffold a minimal `.eval.toml` (sample suite to feed `ctx eval`) |
| `ctx parse` | file | JSON or TOML on stdout | `.ctx` / Markdown / SKILL.md → AST |
| `ctx compile` | file (`.ctx`) | target file on stdout or via `-o` | `.ctx` → CLAUDE.md / AGENTS.md / cursor / copilot / cline / windsurf |
| `ctx lint` | file | diagnostics on stdout | Run the 27 analyzers on a single file |
| `ctx diff` | two files | unified diff on stdout | Semantic AST-level diff of two Documents |
| `ctx audit` | dir | report on stdout / HTML via `-o` | Walk a repo, lint everything, run cross-artifact rules |
| `ctx stats` | dir | aggregate JSON / text | Aggregate corpus-wide statistics from an audit |
| `ctx fix` | file | unified diff or modified file | Auto-apply structured fixes; `--dry-run` default, `--apply` to write |
| `ctx eval` | file (`.eval.toml`)| pass/fail report | Run a `.eval.toml` suite against a real (or mock) provider |
| `ctx eval-diff` | two files | regression report | Compare two `ctx eval --json` outputs; exit 1 on regression |
| `ctx lsp` | _none_ | LSP over stdio | Language server (requires `[lsp]` extras) |
| `ctx upgrade` | _none_ | upgrades the install | Check PyPI and `pip install --upgrade context-os-ctx` |
### Worked examples
```bash
# === Starting from scratch ==================================================
# 1. Build a starter .ctx for a brand new project
ctx create church-manager --lang php,symfony,doctrine --domain "parish management"
# → ./church-manager.ctx
# 2. Mix several stacks; aliases like Next.js / c# / spring boot are accepted
ctx create acme --lang python,fastapi,react,tailwind --domain fintech
# → ./acme.ctx
# 3. Discover the registry (90+ slugs across 4 waves)
ctx create --list-languages
# === Starting from an existing repo =========================================
# 4. Auto-detect the stack of the current directory (recursive by default)
ctx init # → ./.ctx
ctx init . --project demo # → ./demo.ctx
ctx init . --dry-run # print the .ctx, write nothing
ctx init . --no-recursive # only inspect the root manifest
ctx init . --depth 2 # cap recursion to 2 levels
# === Compiling a .ctx into agent files ======================================
# 5. .ctx → target file (one of 8 supported targets)
ctx compile project.ctx --target claude_code --output-dir . # → ./CLAUDE.md
ctx compile project.ctx --target codex --output-dir . # → ./AGENTS.md
ctx compile project.ctx --target cursor --output-dir . # → ./.cursor/rules/agent.mdc
ctx compile project.ctx --target copilot --output-dir . # → ./.github/copilot-instructions.md
# === Linting an existing agent file =========================================
# 6. Lint a CLAUDE.md / AGENTS.md / SKILL.md (target is required for .md)
ctx lint CLAUDE.md --target claude_code
ctx lint SKILL.md --target anthropic_skill
# 7. Walk the whole repo and lint every agent file at once
ctx audit .
ctx audit . --html --output audit.html # self-contained HTML report
# === Evaluating skills / RAG ================================================
# 8. ctx eval needs a hand-written .eval.toml. Scaffold one first:
ctx eval-init skills # → ./skills.eval.toml (Skill target)
ctx eval-init policy --target rag # → ./policy.eval.toml (RAG target)
# 9. Smoke-test with --dry-run (no API key, no spend)
ctx eval skills.eval.toml --dry-run
ctx eval policy.eval.toml --dry-run --rag-chunks chunks.json
# 10. Real run (requires ANTHROPIC_API_KEY or OPENAI_API_KEY)
ctx eval skills.eval.toml --skills-dir ./skills/
# === Maintenance ============================================================
# 11. Upgrade the CLI itself
ctx upgrade --check # report only
ctx upgrade # pip install --upgrade context-os-ctx
```
**Universal flags:** every command has `--json` for machine-readable output.
**HTML reports:** `ctx audit --html` and `ctx eval --html` emit self-contained HTML pages.
**Exit codes:** 0 on success, 1 only when an error-severity diagnostic fires (or a structured failure occurs).
**Per-command help:** every command prints concrete examples under `--help` (e.g. `ctx init --help`, `ctx eval --help`).
## Project status
| Phase | What | Status |
|-------|---------------------------------------------------------------|------------|
| 1 | Parser + AST + Claude emitter | ✅ shipped |
| 2 | 15 lint rules (A / C / F / K / P / X / XA) | ✅ shipped |
| 3 | 5 more emitters + diff + audit | ✅ shipped |
| 4 | Corpus stats + docs site + **v1.0** launch | ✅ shipped |
| 5 | Anthropic Skills (`SKILL.md`) + 6 skill rules | ✅ shipped |
| 6 | RAG corpora + 6 RAG rules + **v2.0** launch | ✅ shipped |
| 7A | LSP server + VSCode extension + GitHub Action + **v2.1** | ✅ shipped |
| 7B | Live evaluation (Skills + RAG) + **v3.0** launch | ✅ shipped |
| 8 | PyPI/Marketplace + HTML reports + `ctx fix` + **v4.0** launch | ✅ shipped |
| 9+ | Multi-provider Skills, embedding helpers, PDF in RAG, … | ⏳ planned |
## Editor integration
ContextOS speaks LSP and ships a VSCode extension wrapping it.
```bash
pipx install 'git+https://github.com/Jonathanlight/context_os.git#egg=context-os-ctx[lsp]'
```
- **VSCode** — extension at [`extensions/vscode/`](extensions/vscode).
- **Neovim / Helix / Sublime** — three-line LSP client configs in [`docs/editor.md`](docs/editor.md).
- **GitHub CI** — composite Action at [`actions/lint/`](actions/lint):
```yaml
- uses: Jonathanlight/context_os/actions/lint@v4.0.0
```
Posts a sticky audit report as a PR comment.
The 27 lint rules, completion, hover, and quick-fix code actions surface identically in every shape — the editor is just an alternate window onto the same Python core.
## Live evaluation
Move from validating **structure** to validating **behavior**: does the skill actually fire on the right prompts? Does RAG retrieval actually find the expected sources?
```bash
pipx install 'git+https://github.com/Jonathanlight/context_os.git#egg=context-os-ctx[eval]'
ctx eval skills.eval.toml --dry-run # mock smoke, no API calls
ctx eval skills.eval.toml --skills-dir skills/ # real Anthropic run
ctx eval-diff baseline.json current.json # exit 1 on regression
```
- **Skills routing** via the Anthropic Messages API tool-use feature (`ANTHROPIC_API_KEY`).
- **RAG retrieval** via in-process cosine over pre-indexed embeddings (`OPENAI_API_KEY` for query embeddings; bring your own for other vendors via the Python API).
- **Mock providers** drive every test in CI without spending tokens.
See [`docs/eval.md`](docs/eval.md) for the full workflow.
## Visualization & auto-fix
```bash
ctx audit . --html --output audit.html
ctx eval skills.eval.toml --dry-run --html --output eval.html
ctx fix CLAUDE.md # dry-run unified diff
ctx fix CLAUDE.md --apply # write fixes
```
See [`docs/dashboard.md`](docs/dashboard.md).
## Docs
- 📚 [Documentation site](https://jonathanlight.github.io/context_os/)
- 🛠 [Getting started](docs/getting-started.md)
- 🧠 [Editor integration](docs/editor.md)
- 🧪 [Live evaluation](docs/eval.md)
- 📺 [Visualization & auto-fix](docs/dashboard.md)
- 📦 [Releasing](docs/release.md)
- 📋 [Rules catalog](docs/rules/index.md)
- 🏛 [Specs](docs/specs/) (Vision · Spec · Architecture · Roadmap)
## Stack
- **Python 3.12+**, `mypy --strict` clean on every file.
- [Pydantic v2](https://docs.pydantic.dev) — strict-typed AST.
- [Typer](https://typer.tiangolo.com) — the `ctx` CLI.
- [mistletoe](https://github.com/miyuchina/mistletoe) — Markdown parser.
- [tomlkit](https://github.com/python-poetry/tomlkit) — TOML preserving comments + order.
- [ruamel.yaml](https://yaml.readthedocs.io) — YAML preserving comments + order (SKILL.md frontmatter).
- [pygls](https://github.com/openlawlibrary/pygls) (optional) — LSP server.
- [anthropic](https://docs.anthropic.com), [openai](https://platform.openai.com), [numpy](https://numpy.org) (optional) — eval runtime.
- pytest + [hypothesis](https://hypothesis.readthedocs.io) — **1129 tests**, 94 % coverage, 1000-example property tests on the round-trip.
- [ruff](https://docs.astral.sh/ruff/) — lint + format.
## Install for development
```bash
git clone git@github.com:Jonathanlight/context_os.git
cd context_os
pip install -e ".[dev]"
pre-commit install
./scripts/check.sh # ruff + mypy --strict + pytest
```
For the docs site:
```bash
pip install -e ".[docs]"
mkdocs serve # http://127.0.0.1:8000/
```
## Contributing
Pull requests welcome. Read [`tasks/CONTRIBUTING.md`](tasks/CONTRIBUTING.md) first — ContextOS enforces strict PR discipline (one subject per PR, ≤ 400 lines diff excluding tests, conventional commits prefixed by phase).
## License
MIT — see [LICENSE](LICENSE).
## Author
[Jonathan KABLAN](https://github.com/Jonathanlight) — Senior Full Stack Developer.