https://github.com/sayeem3051/python-context-engineer
Build perfect LLM context from your Python codebase — automatically
https://github.com/sayeem3051/python-context-engineer
artificial-intelligence claude codebase context-engineering developer-tools gpt llm machine-learning openai prompt-engineering python token-optimization
Last synced: 16 days ago
JSON representation
Build perfect LLM context from your Python codebase — automatically
- Host: GitHub
- URL: https://github.com/sayeem3051/python-context-engineer
- Owner: Sayeem3051
- License: mit
- Created: 2026-04-04T12:24:40.000Z (22 days ago)
- Default Branch: main
- Last Pushed: 2026-04-08T16:51:53.000Z (18 days ago)
- Last Synced: 2026-04-08T20:43:15.428Z (18 days ago)
- Topics: artificial-intelligence, claude, codebase, context-engineering, developer-tools, gpt, llm, machine-learning, openai, prompt-engineering, python, token-optimization
- Language: Python
- Homepage: https://pypi.org/project/ctxeng/
- Size: 462 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# ctxeng — Python Context Engineering Library
Stop copy-pasting files into ChatGPT.
Build the perfect LLM context from your codebase, automatically.
---
**Context engineering** is the new prompt engineering.
The quality of your LLM's output depends almost entirely on *what you put in the context window* — not how you phrase the question.
`ctxeng` solves this automatically:
- **Scans your codebase** and scores every file for relevance to your query
- **Ranks by signal** — keyword overlap, AST symbols, git recency, import graph
- **Fits the budget** — smart truncation keeps the best parts within any model's token limit
- **Ships ready to paste** — XML, Markdown, or plain text output that works with Claude, GPT-4o, Gemini, and every other model
One small dependency ([pathspec](https://pypi.org/project/pathspec/)) powers ``.ctxengignore`` (gitignore-style patterns). Works with any LLM.
---
## Installation
```bash
# Core install (includes .ctxengignore support)
pip install ctxeng
# pathspec is included automatically
```
For accurate token counting (strongly recommended):
```bash
pip install "ctxeng[tiktoken]"
```
For file watching (used by `ctxeng watch` when that command is available):
```bash
pip install "ctxeng[watch]"
```
For semantic similarity scoring (optional local embeddings):
```bash
pip install "ctxeng[semantic]"
```
For one-line LLM calls:
```bash
pip install "ctxeng[anthropic]" # Claude
pip install "ctxeng[openai]" # GPT-4o
pip install "ctxeng[all]" # everything
```
---
## Quickstart
### Python API
```python
from ctxeng import ContextEngine
engine = ContextEngine(root=".", model="claude-sonnet-4")
ctx = engine.build("Fix the authentication bug in the login flow")
print(ctx.summary())
# Context summary (12,340 tokens / 197,440 budget):
# Included : 8 files
# Skipped : 23 files (over budget)
# Est. cost: ~$0.037 (claude-sonnet-4)
# [████████ ] 0.84 src/auth/login.py
# [███████ ] 0.71 src/auth/middleware.py
# [█████ ] 0.53 src/models/user.py
# [████ ] 0.41 tests/test_auth.py
# ...
# Paste directly into your LLM
print(ctx.to_string())
```
### Fluent Builder API
```python
from ctxeng import ContextBuilder
ctx = (
ContextBuilder(root=".")
.for_model("gpt-4o")
.only("**/*.py")
.exclude("tests/**", "migrations/**")
.from_git_diff() # only changed files
.with_system("You are a senior Python engineer. Be concise.")
.build("Refactor the payment module to use async/await")
)
print(ctx.to_string("markdown"))
```
### One-line LLM call
```python
from ctxeng import ContextEngine
from ctxeng.integrations import ask_claude
engine = ContextEngine(".", model="claude-sonnet-4")
ctx = engine.build("Why is the test_login test failing?")
response = ask_claude(ctx)
print(response)
```
### CLI
```bash
# Build context for a query and print to stdout
ctxeng build "Fix the auth bug"
# Focused on git-changed files only
ctxeng build "Review my changes" --git-diff
# Target a specific model with markdown output
ctxeng build "Refactor this" --model gpt-4o --fmt markdown
# Save to file
ctxeng build "Explain the payment flow" --output context.md
# Project stats
ctxeng info
```
### Watch mode
Automatically rebuild context when files change (requires `watchdog`):
```bash
pip install "ctxeng[watch]"
ctxeng watch "Fix the auth bug" --output context.md
```
Example output:
```text
[14:32:01] File changed: src/auth/login.py
[14:32:01] Rebuilding context...
[14:32:01] Done. 8 files, 12,340 tokens, ~$0.037
[14:32:01] Written to: context.md
```
### `.ctxengignore`
Add a **`.ctxengignore`** file at your project root to exclude paths from filesystem discovery (same syntax as **`.gitignore`**). It is applied automatically when you run `ctxeng build`, `ctxeng info`, or `ContextEngine` / `ContextBuilder` without explicit `--files` / `include_files`.
Example `.ctxengignore`:
```gitignore
# Dependencies
node_modules/
venv/
.venv/
# Build artifacts
dist/
build/
*.egg-info/
# Migrations
migrations/**
**/migrations/**
# Lock files
*.lock
poetry.lock
package-lock.json
```
Supported patterns include `*`, `?`, `**`, directory slashes, and negation with `!` (full gitwildmatch semantics via pathspec). If `.ctxengignore` is missing, nothing is excluded beyond ctxeng’s built-in skips.
```python
from pathlib import Path
from ctxeng import parse_ctxengignore
patterns = parse_ctxengignore(Path("."))
# → list of pattern strings, or [] if no file
```
### Import graph (Python)
After files are scored, **ctxeng** parses static ``import`` / ``from … import`` statements in each discovered ``.py`` file, resolves **relative imports** from the file’s location, and can **pull in imported modules** from the same collection set before the token budget is applied.
- **Default:** one hop (`import_graph_depth=1`), relevance for added files = parent score × **0.7**
- **Edges only** to files already in the current discovery set (filesystem / git / explicit list)
- Stdlib and third-party imports are ignored (no file under your root → no edge)
```python
from ctxeng import ContextEngine, ContextBuilder
# Engine: on by default; adjust depth or turn off
engine = ContextEngine(
root=".",
use_import_graph=True,
import_graph_depth=2,
)
ctx = (
ContextBuilder(".")
.for_model("claude-sonnet-4")
.use_import_graph(depth=2) # follow two hops of local imports
# .no_import_graph() # disable expansion
.build("Fix the checkout bug in orders")
)
```
CLI (import expansion is **on** by default):
```bash
ctxeng build "Refactor auth" --no-import-graph
ctxeng build "Refactor auth" --import-graph-depth 2
```
Lower-level API:
```python
from pathlib import Path
from ctxeng import build_import_graph, expand_with_imports
from ctxeng.models import ContextFile
paths = [Path("src/app.py"), Path("src/lib.py")]
graph = build_import_graph(Path("."), paths)
# graph[path] → list of imported paths (within `paths`)
expanded = expand_with_imports(
[ContextFile(path=paths[0], content="...", relevance_score=0.9, language="python")],
graph,
Path("."),
max_depth=1,
score_decay=0.7,
)
```
### Cost estimates
`ContextEngine` fills ``ctx.cost_estimate`` with a **rough USD** figure for **input tokens** only, using built-in per‑1K rates for common models (see ``ctxeng.costs.COST_PER_1K_INPUT_TOKENS``). Unknown model names yield ``None``. Rates are indicative—verify with your provider before budgeting.
``Context.summary()`` includes a line when a cost is known:
```text
Context summary (12,340 tokens / 197,440 budget):
Included : 8 files
Skipped : 23 files (over budget)
Est. cost: ~$0.037 (claude-sonnet-4)
```
```python
from ctxeng import estimate_cost, ContextEngine
engine = ContextEngine(root=".", model="gpt-4o")
ctx = engine.build("Explain this module")
print(ctx.cost_estimate) # float | None
print(ctx.summary()) # includes Est. cost when known
```
CLI: cost line is **on** by default; use ``--no-show-cost`` to omit it from stderr.
---
## How It Works
### Scoring signals
Each file gets a relevance score from 0 → 1, combining:
| Signal | What it measures |
|--------|-----------------|
| **Keyword overlap** | How many query terms appear in the file content |
| **AST symbols** | Class/function/import names that match the query (Python) |
| **Path relevance** | Filename and directory names matching query tokens |
| **Git recency** | Files touched in recent commits score higher |
| **Import expansion** | After scoring, locally imported Python modules can be added with a decayed score |
| **Semantic similarity** | Optional embedding similarity between query and file content |
### Token budget optimization
Files are ranked by score and filled greedily into the token budget. Files that don't fit are **smart-truncated** (head + tail, never middle) rather than dropped entirely — the top of a file has imports and class defs; the tail has recent changes. Both are high-signal.
---
## Examples
### Debug a failing test
```python
from ctxeng import ContextBuilder
from ctxeng.integrations import ask_claude
ctx = (
ContextBuilder(".")
.for_model("claude-sonnet-4")
.include_files("tests/test_payment.py", "src/payment/service.py")
.with_system("You are a Python debugging expert.")
.build("test_charge_user is failing with a KeyError on 'amount'")
)
response = ask_claude(ctx)
```
### Code review on a PR
```python
# Only include what changed in this branch vs main
ctx = (
ContextBuilder(".")
.for_model("gpt-4o")
.from_git_diff(base="main")
.with_system("Do a thorough code review. Flag security issues first.")
.build("Review this pull request")
)
```
### Explain an unfamiliar codebase
```python
from ctxeng import ContextEngine
engine = ContextEngine(
root="/path/to/project",
model="gemini-1.5-pro", # 1M token window → include everything
)
ctx = engine.build("Give me a high-level architecture overview")
print(ctx.to_string())
```
### Targeted refactor
```python
ctx = (
ContextBuilder(".")
.for_model("claude-sonnet-4")
.only("src/database/**/*.py")
.exclude("**/*_test.py")
.build("Convert all raw SQL queries to use SQLAlchemy ORM")
)
```
---
## API Reference
### `ContextEngine`
```python
ContextEngine(
root=".", # Project root
model="claude-sonnet-4",# Sets token budget automatically
budget=None, # Or explicit TokenBudget(total=50_000)
max_file_size_kb=500, # Skip files larger than this
include_patterns=None, # ["**/*.py"] — only these files
exclude_patterns=None, # ["tests/**"] — skip these
use_git=True, # Use git recency signal
use_import_graph=True, # Add local Python imports of scored files
import_graph_depth=1, # Hops along the import graph
)
```
```python
engine.build(
query="", # What you want the LLM to do
files=None, # Explicit list of paths (skips auto-discovery)
git_diff=False, # Only changed files
git_base="HEAD", # Diff base ref
system_prompt="", # System prompt (counts against budget)
fmt="xml", # "xml" | "markdown" | "plain"
)
# → Context
```
### `ContextBuilder` (fluent API)
```python
ContextBuilder(root=".")
.for_model("gpt-4o")
.with_budget(total=50_000, reserved_output=4096)
.only("**/*.py", "**/*.yaml")
.exclude("tests/**", "migrations/**")
.include_files("src/specific.py")
.from_git_diff(base="main")
.with_system("You are an expert Python engineer.")
.max_file_size(200) # KB
.no_git()
.use_import_graph(depth=2) # optional; omit for default depth 1
.build("query")
# → Context
```
### `Context`
```python
ctx.to_string(fmt="xml") # → str ready to paste into an LLM
ctx.summary(show_cost=True) # → summary; optional show_cost=False hides Est. cost
ctx.cost_estimate # → float | None (rough input USD for known models)
ctx.files # → list[ContextFile], sorted by relevance
ctx.skipped_files # → files that didn't fit the budget
ctx.total_tokens # → estimated token usage
ctx.budget.available # → remaining token budget
```
### `TokenBudget`
```python
TokenBudget.for_model("claude-sonnet-4") # auto-detect limit
TokenBudget(total=50_000, reserved_output=2048, reserved_system=512)
```
Supported models (auto-detected): `claude-opus-4`, `claude-sonnet-4`, `claude-haiku-4`, `gpt-4o`, `gpt-4-turbo`, `gpt-4`, `gpt-3.5-turbo`, `gemini-1.5-pro`, `gemini-1.5-flash`, `llama-3`.
---
## CLI Reference
```
ctxeng [--root PATH] [options]
Commands:
build Build context for a query
info Show project info and file stats
build options:
--model, -m Target model (default: claude-sonnet-4)
--fmt, -f Output format: xml | markdown | plain (default: xml)
--output, -o Write to file instead of stdout
--only Glob patterns to include
--exclude Glob patterns to exclude
--files Explicit file list
--git-diff Only include git-changed files
--git-base Git base ref (default: HEAD)
--system System prompt text
--budget Override total token budget
--no-git Disable git recency scoring
--max-size Max file size in KB (default: 500)
--import-graph / --no-import-graph
Expand with local Python import graph (default: on)
--import-graph-depth N
Import hops when import graph is on (default: 1)
--show-cost / --no-show-cost
Include estimated input cost in stderr summary (default: on)
--semantic Enable semantic similarity scoring (requires sentence-transformers)
--semantic-model Semantic model name (default: all-MiniLM-L6-v2)
watch options:
--interval S Polling interval in seconds (default: 1.0)
--semantic Enable semantic similarity scoring (requires sentence-transformers)
--semantic-model Semantic model name (default: all-MiniLM-L6-v2)
```
---
## Supported Models
| Model | Context window | Auto-detected |
|-------|---------------|---------------|
| claude-opus-4, claude-sonnet-4, claude-haiku-4 | 200K | ✓ |
| gpt-4o, gpt-4-turbo | 128K | ✓ |
| gpt-4 | 8K | ✓ |
| gpt-3.5-turbo | 16K | ✓ |
| gemini-1.5-pro, gemini-1.5-flash | 1M | ✓ |
| llama-3 | 32K | ✓ |
| any other | 32K (safe default) | — |
---
## Why not just paste files manually?
You could. But you'll hit these problems immediately:
- **Token limit errors** — too many files, context overflows
- **Irrelevant noise** — wrong files dilute signal, hurt output quality
- **Stale context** — you forget to update when code changes
- **Manual effort** — figuring out which files matter takes time
`ctxeng` solves all four. The right files, in the right order, trimmed to fit, every time.
---
## Roadmap
- [x] Semantic similarity scoring ✅
- [x] `ctxeng watch` — auto-rebuild on file changes ✅
- [x] VSCode extension ✅
- [x] Streaming context into LLM APIs ✅
- [x] Cost estimates (input-token USD hint in summary) ✅
- [x] Import graph analysis (local Python static imports) ✅
- [x] `.ctxengignore` file support ✅
---
## Contributing
PRs welcome! See [CONTRIBUTING.md](CONTRIBUTING.md).
```bash
git clone https://github.com/sayeem3051/python-context-engineer
cd python-context-engineer
pip install -e ".[dev]"
pytest
```
---
## License
MIT. Use freely, modify as needed, contribute back if you can.
---
If ctxeng saved you time, please ⭐ the repo — it helps others find it.