https://github.com/sayeem3051/python-context-engineer

Build perfect LLM context from your Python codebase — automatically
https://github.com/sayeem3051/python-context-engineer
artificial-intelligence claude codebase context-engineering developer-tools gpt llm machine-learning openai prompt-engineering python token-optimization
Last synced: 16 days ago
JSON representation
Build perfect LLM context from your Python codebase — automatically
Host: GitHub
URL: https://github.com/sayeem3051/python-context-engineer
Owner: Sayeem3051
License: mit
Created: 2026-04-04T12:24:40.000Z (22 days ago)
Default Branch: main
Last Pushed: 2026-04-08T16:51:53.000Z (18 days ago)
Last Synced: 2026-04-08T20:43:15.428Z (18 days ago)
Topics: artificial-intelligence, claude, codebase, context-engineering, developer-tools, gpt, llm, machine-learning, openai, prompt-engineering, python, token-optimization
Language: Python
Homepage: https://pypi.org/project/ctxeng/
Size: 462 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # ctxeng — Python Context Engineering Library



  Stop copy-pasting files into ChatGPT.


  Build the perfect LLM context from your codebase, automatically.





  

  

  

  

  





  



---

**Context engineering** is the new prompt engineering.

The quality of your LLM's output depends almost entirely on *what you put in the context window* — not how you phrase the question.

`ctxeng` solves this automatically:

- **Scans your codebase** and scores every file for relevance to your query

- **Ranks by signal** — keyword overlap, AST symbols, git recency, import graph

- **Fits the budget** — smart truncation keeps the best parts within any model's token limit

- **Ships ready to paste** — XML, Markdown, or plain text output that works with Claude, GPT-4o, Gemini, and every other model

One small dependency ([pathspec](https://pypi.org/project/pathspec/)) powers ``.ctxengignore`` (gitignore-style patterns). Works with any LLM.

---

## Installation

```bash

# Core install (includes .ctxengignore support)

pip install ctxeng

# pathspec is included automatically

```

For accurate token counting (strongly recommended):

```bash

pip install "ctxeng[tiktoken]"

```

For file watching (used by `ctxeng watch` when that command is available):

```bash

pip install "ctxeng[watch]"

```

For semantic similarity scoring (optional local embeddings):

```bash

pip install "ctxeng[semantic]"

```

For one-line LLM calls:

```bash

pip install "ctxeng[anthropic]"    # Claude

pip install "ctxeng[openai]"       # GPT-4o

pip install "ctxeng[all]"          # everything

```

---

## Quickstart

### Python API

```python

from ctxeng import ContextEngine

engine = ContextEngine(root=".", model="claude-sonnet-4")

ctx = engine.build("Fix the authentication bug in the login flow")

print(ctx.summary())

# Context summary (12,340 tokens / 197,440 budget):

#   Included : 8 files

#   Skipped  : 23 files (over budget)

#   Est. cost: ~$0.037 (claude-sonnet-4)

#   [████████  ] 0.84  src/auth/login.py

#   [███████   ] 0.71  src/auth/middleware.py

#   [█████     ] 0.53  src/models/user.py

#   [████      ] 0.41  tests/test_auth.py

#   ...

# Paste directly into your LLM

print(ctx.to_string())

```

### Fluent Builder API

```python

from ctxeng import ContextBuilder

ctx = (

    ContextBuilder(root=".")

    .for_model("gpt-4o")

    .only("**/*.py")

    .exclude("tests/**", "migrations/**")

    .from_git_diff()                        # only changed files

    .with_system("You are a senior Python engineer. Be concise.")

    .build("Refactor the payment module to use async/await")

)

print(ctx.to_string("markdown"))

```

### One-line LLM call

```python

from ctxeng import ContextEngine

from ctxeng.integrations import ask_claude

engine = ContextEngine(".", model="claude-sonnet-4")

ctx = engine.build("Why is the test_login test failing?")

response = ask_claude(ctx)

print(response)

```

### CLI

```bash

# Build context for a query and print to stdout

ctxeng build "Fix the auth bug"

# Focused on git-changed files only

ctxeng build "Review my changes" --git-diff

# Target a specific model with markdown output

ctxeng build "Refactor this" --model gpt-4o --fmt markdown

# Save to file

ctxeng build "Explain the payment flow" --output context.md

# Project stats

ctxeng info

```

### Watch mode

Automatically rebuild context when files change (requires `watchdog`):

```bash

pip install "ctxeng[watch]"

ctxeng watch "Fix the auth bug" --output context.md

```

Example output:

```text

[14:32:01] File changed: src/auth/login.py

[14:32:01] Rebuilding context...

[14:32:01] Done. 8 files, 12,340 tokens, ~$0.037

[14:32:01] Written to: context.md

```

### `.ctxengignore`

Add a **`.ctxengignore`** file at your project root to exclude paths from filesystem discovery (same syntax as **`.gitignore`**). It is applied automatically when you run `ctxeng build`, `ctxeng info`, or `ContextEngine` / `ContextBuilder` without explicit `--files` / `include_files`.

Example `.ctxengignore`:

```gitignore

# Dependencies

node_modules/

venv/

.venv/

# Build artifacts

dist/

build/

*.egg-info/

# Migrations

migrations/**

**/migrations/**

# Lock files

*.lock

poetry.lock

package-lock.json

```

Supported patterns include `*`, `?`, `**`, directory slashes, and negation with `!` (full gitwildmatch semantics via pathspec). If `.ctxengignore` is missing, nothing is excluded beyond ctxeng’s built-in skips.

```python

from pathlib import Path

from ctxeng import parse_ctxengignore

patterns = parse_ctxengignore(Path("."))

# → list of pattern strings, or [] if no file

```

### Import graph (Python)

After files are scored, **ctxeng** parses static ``import`` / ``from … import`` statements in each discovered ``.py`` file, resolves **relative imports** from the file’s location, and can **pull in imported modules** from the same collection set before the token budget is applied.

- **Default:** one hop (`import_graph_depth=1`), relevance for added files = parent score × **0.7**

- **Edges only** to files already in the current discovery set (filesystem / git / explicit list)

- Stdlib and third-party imports are ignored (no file under your root → no edge)

```python

from ctxeng import ContextEngine, ContextBuilder

# Engine: on by default; adjust depth or turn off

engine = ContextEngine(

    root=".",

    use_import_graph=True,

    import_graph_depth=2,

)

ctx = (

    ContextBuilder(".")

    .for_model("claude-sonnet-4")

    .use_import_graph(depth=2)   # follow two hops of local imports

    # .no_import_graph()         # disable expansion

    .build("Fix the checkout bug in orders")

)

```

CLI (import expansion is **on** by default):

```bash

ctxeng build "Refactor auth" --no-import-graph

ctxeng build "Refactor auth" --import-graph-depth 2

```

Lower-level API:

```python

from pathlib import Path

from ctxeng import build_import_graph, expand_with_imports

from ctxeng.models import ContextFile

paths = [Path("src/app.py"), Path("src/lib.py")]

graph = build_import_graph(Path("."), paths)

# graph[path] → list of imported paths (within `paths`)

expanded = expand_with_imports(

    [ContextFile(path=paths[0], content="...", relevance_score=0.9, language="python")],

    graph,

    Path("."),

    max_depth=1,

    score_decay=0.7,

)

```

### Cost estimates

`ContextEngine` fills ``ctx.cost_estimate`` with a **rough USD** figure for **input tokens** only, using built-in per‑1K rates for common models (see ``ctxeng.costs.COST_PER_1K_INPUT_TOKENS``). Unknown model names yield ``None``. Rates are indicative—verify with your provider before budgeting.

``Context.summary()`` includes a line when a cost is known:

```text

Context summary (12,340 tokens / 197,440 budget):

  Included : 8 files

  Skipped  : 23 files (over budget)

  Est. cost: ~$0.037 (claude-sonnet-4)

```

```python

from ctxeng import estimate_cost, ContextEngine

engine = ContextEngine(root=".", model="gpt-4o")

ctx = engine.build("Explain this module")

print(ctx.cost_estimate)   # float | None

print(ctx.summary())       # includes Est. cost when known

```

CLI: cost line is **on** by default; use ``--no-show-cost`` to omit it from stderr.

---

## How It Works



  



### Scoring signals

Each file gets a relevance score from 0 → 1, combining:

| Signal | What it measures |

|--------|-----------------|

| **Keyword overlap** | How many query terms appear in the file content |

| **AST symbols** | Class/function/import names that match the query (Python) |

| **Path relevance** | Filename and directory names matching query tokens |

| **Git recency** | Files touched in recent commits score higher |

| **Import expansion** | After scoring, locally imported Python modules can be added with a decayed score |

| **Semantic similarity** | Optional embedding similarity between query and file content |

### Token budget optimization

Files are ranked by score and filled greedily into the token budget. Files that don't fit are **smart-truncated** (head + tail, never middle) rather than dropped entirely — the top of a file has imports and class defs; the tail has recent changes. Both are high-signal.

---

## Examples

### Debug a failing test

```python

from ctxeng import ContextBuilder

from ctxeng.integrations import ask_claude

ctx = (

    ContextBuilder(".")

    .for_model("claude-sonnet-4")

    .include_files("tests/test_payment.py", "src/payment/service.py")

    .with_system("You are a Python debugging expert.")

    .build("test_charge_user is failing with a KeyError on 'amount'")

)

response = ask_claude(ctx)

```

### Code review on a PR

```python

# Only include what changed in this branch vs main

ctx = (

    ContextBuilder(".")

    .for_model("gpt-4o")

    .from_git_diff(base="main")

    .with_system("Do a thorough code review. Flag security issues first.")

    .build("Review this pull request")

)

```

### Explain an unfamiliar codebase

```python

from ctxeng import ContextEngine

engine = ContextEngine(

    root="/path/to/project",

    model="gemini-1.5-pro",  # 1M token window → include everything

)

ctx = engine.build("Give me a high-level architecture overview")

print(ctx.to_string())

```

### Targeted refactor

```python

ctx = (

    ContextBuilder(".")

    .for_model("claude-sonnet-4")

    .only("src/database/**/*.py")

    .exclude("**/*_test.py")

    .build("Convert all raw SQL queries to use SQLAlchemy ORM")

)

```

---

## API Reference

### `ContextEngine`

```python

ContextEngine(

    root=".",               # Project root

    model="claude-sonnet-4",# Sets token budget automatically

    budget=None,            # Or explicit TokenBudget(total=50_000)

    max_file_size_kb=500,   # Skip files larger than this

    include_patterns=None,  # ["**/*.py"] — only these files

    exclude_patterns=None,  # ["tests/**"] — skip these

    use_git=True,           # Use git recency signal

    use_import_graph=True,  # Add local Python imports of scored files

    import_graph_depth=1,    # Hops along the import graph

)

```

```python

engine.build(

    query="",               # What you want the LLM to do

    files=None,             # Explicit list of paths (skips auto-discovery)

    git_diff=False,         # Only changed files

    git_base="HEAD",        # Diff base ref

    system_prompt="",       # System prompt (counts against budget)

    fmt="xml",              # "xml" | "markdown" | "plain"

)

# → Context

```

### `ContextBuilder` (fluent API)

```python

ContextBuilder(root=".")

    .for_model("gpt-4o")

    .with_budget(total=50_000, reserved_output=4096)

    .only("**/*.py", "**/*.yaml")

    .exclude("tests/**", "migrations/**")

    .include_files("src/specific.py")

    .from_git_diff(base="main")

    .with_system("You are an expert Python engineer.")

    .max_file_size(200)     # KB

    .no_git()

    .use_import_graph(depth=2)  # optional; omit for default depth 1

    .build("query")

# → Context

```

### `Context`

```python

ctx.to_string(fmt="xml")    # → str ready to paste into an LLM

ctx.summary(show_cost=True) # → summary; optional show_cost=False hides Est. cost

ctx.cost_estimate           # → float | None (rough input USD for known models)

ctx.files                   # → list[ContextFile], sorted by relevance

ctx.skipped_files           # → files that didn't fit the budget

ctx.total_tokens            # → estimated token usage

ctx.budget.available        # → remaining token budget

```

### `TokenBudget`

```python

TokenBudget.for_model("claude-sonnet-4")  # auto-detect limit

TokenBudget(total=50_000, reserved_output=2048, reserved_system=512)

```

Supported models (auto-detected): `claude-opus-4`, `claude-sonnet-4`, `claude-haiku-4`, `gpt-4o`, `gpt-4-turbo`, `gpt-4`, `gpt-3.5-turbo`, `gemini-1.5-pro`, `gemini-1.5-flash`, `llama-3`.

---

## CLI Reference

```

ctxeng [--root PATH]  [options]

Commands:

  build   Build context for a query

  info    Show project info and file stats

build options:

  --model, -m     Target model (default: claude-sonnet-4)

  --fmt, -f       Output format: xml | markdown | plain (default: xml)

  --output, -o    Write to file instead of stdout

  --only          Glob patterns to include

  --exclude       Glob patterns to exclude

  --files         Explicit file list

  --git-diff      Only include git-changed files

  --git-base      Git base ref (default: HEAD)

  --system        System prompt text

  --budget        Override total token budget

  --no-git        Disable git recency scoring

  --max-size      Max file size in KB (default: 500)

  --import-graph / --no-import-graph

                  Expand with local Python import graph (default: on)

  --import-graph-depth N

                  Import hops when import graph is on (default: 1)

  --show-cost / --no-show-cost

                  Include estimated input cost in stderr summary (default: on)

  --semantic       Enable semantic similarity scoring (requires sentence-transformers)

  --semantic-model Semantic model name (default: all-MiniLM-L6-v2)

watch options:

  --interval S    Polling interval in seconds (default: 1.0)

  --semantic       Enable semantic similarity scoring (requires sentence-transformers)

  --semantic-model Semantic model name (default: all-MiniLM-L6-v2)

```

---

## Supported Models

| Model | Context window | Auto-detected |

|-------|---------------|---------------|

| claude-opus-4, claude-sonnet-4, claude-haiku-4 | 200K | ✓ |

| gpt-4o, gpt-4-turbo | 128K | ✓ |

| gpt-4 | 8K | ✓ |

| gpt-3.5-turbo | 16K | ✓ |

| gemini-1.5-pro, gemini-1.5-flash | 1M | ✓ |

| llama-3 | 32K | ✓ |

| any other | 32K (safe default) | — |

---

## Why not just paste files manually?

You could. But you'll hit these problems immediately:

- **Token limit errors** — too many files, context overflows

- **Irrelevant noise** — wrong files dilute signal, hurt output quality

- **Stale context** — you forget to update when code changes

- **Manual effort** — figuring out which files matter takes time

`ctxeng` solves all four. The right files, in the right order, trimmed to fit, every time.

---

## Roadmap

- [x] Semantic similarity scoring ✅

- [x] `ctxeng watch` — auto-rebuild on file changes ✅

- [x] VSCode extension ✅

- [x] Streaming context into LLM APIs ✅

- [x] Cost estimates (input-token USD hint in summary) ✅

- [x] Import graph analysis (local Python static imports) ✅

- [x] `.ctxengignore` file support ✅

---

## Contributing

PRs welcome! See [CONTRIBUTING.md](CONTRIBUTING.md).

```bash

git clone https://github.com/sayeem3051/python-context-engineer

cd python-context-engineer

pip install -e ".[dev]"

pytest

```

---

## License

MIT. Use freely, modify as needed, contribute back if you can.

---



  If ctxeng saved you time, please ⭐ the repo — it helps others find it.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sayeem3051/python-context-engineer

Awesome Lists containing this project

README