https://github.com/vstorm-co/summarization-pydantic-ai
Context Management processor for Pydantic AI agents, providing LLM-powered summarization or zero-cost sliding window trimming to handle infinite/long-running conversations without context overflow. Supports flexible triggers, safe cutoffs, and custom prompts for efficient AI apps.
https://github.com/vstorm-co/summarization-pydantic-ai
ai-agents anthropic chatgpt claude context-engineering deepagents gemini llm pydantic-ai python
Last synced: 23 days ago
JSON representation
Context Management processor for Pydantic AI agents, providing LLM-powered summarization or zero-cost sliding window trimming to handle infinite/long-running conversations without context overflow. Supports flexible triggers, safe cutoffs, and custom prompts for efficient AI apps.
- Host: GitHub
- URL: https://github.com/vstorm-co/summarization-pydantic-ai
- Owner: vstorm-co
- License: mit
- Created: 2026-01-20T21:21:46.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-01-22T21:36:37.000Z (about 2 months ago)
- Last Synced: 2026-01-23T16:20:17.478Z (about 2 months ago)
- Topics: ai-agents, anthropic, chatgpt, claude, context-engineering, deepagents, gemini, llm, pydantic-ai, python
- Language: Python
- Homepage: https://vstorm-co.github.io/summarization-pydantic-ai/
- Size: 938 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
Context Management for Pydantic AI
Automatic Conversation Summarization and History Management
Intelligent Summarization — LLM-powered context compression
•
Sliding Window — zero-cost message trimming
•
Context Manager — real-time token tracking + tool truncation
•
Safe Cutoff — preserves tool call pairs
---
**Context Management for Pydantic AI** helps your [Pydantic AI](https://ai.pydantic.dev/) agents handle long conversations without exceeding model context limits. Choose between intelligent LLM summarization or fast sliding window trimming.
> **Full framework?** Check out [Pydantic Deep Agents](https://github.com/vstorm-co/pydantic-deepagents) — complete agent framework with planning, filesystem, subagents, and skills.
## Use Cases
| What You Want to Build | How This Library Helps |
|------------------------|------------------------|
| **Long-Running Agent** | Automatically compress history when context fills up |
| **Customer Support Bot** | Preserve key details while discarding routine exchanges |
| **Code Assistant** | Keep recent code context, summarize older discussions |
| **High-Throughput App** | Zero-cost sliding window for maximum speed |
| **Cost-Sensitive App** | Choose between quality (summarization) or free (sliding window) |
## Installation
```bash
pip install summarization-pydantic-ai
```
Or with uv:
```bash
uv add summarization-pydantic-ai
```
For accurate token counting:
```bash
pip install summarization-pydantic-ai[tiktoken]
```
For real-time token tracking and tool output truncation:
```bash
pip install summarization-pydantic-ai[hybrid]
```
## Quick Start
```python
from pydantic_ai import Agent
from pydantic_ai_summarization import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000),
keep=("messages", 20),
)
agent = Agent(
"openai:gpt-4o",
history_processors=[processor],
)
result = await agent.run("Hello!")
```
**That's it.** Your agent now:
- Monitors conversation size on every turn
- Summarizes older messages when limits are reached
- Preserves tool call/response pairs (never breaks them)
- Keeps recent context intact
## Available Processors
| Processor | LLM Cost | Latency | Context Preservation |
|-----------|----------|---------|---------------------|
| `SummarizationProcessor` | High | High | Intelligent summary |
| `SlidingWindowProcessor` | Zero | ~0ms | Discards old messages |
| `ContextManagerMiddleware` | Per compression | Low tracking / High compression | Intelligent summary |
### Intelligent Summarization
Uses an LLM to create summaries of older messages:
```python
from pydantic_ai_summarization import create_summarization_processor
processor = create_summarization_processor(
trigger=("tokens", 100000), # When to summarize
keep=("messages", 20), # What to keep
)
```
### Zero-Cost Sliding Window
Simply discards old messages — no LLM calls:
```python
from pydantic_ai_summarization import create_sliding_window_processor
processor = create_sliding_window_processor(
trigger=("messages", 100), # When to trim
keep=("messages", 50), # What to keep
)
```
### Real-Time Context Manager
Dual-protocol middleware combining token tracking, auto-compression, and tool output truncation:
```python
from pydantic_ai import Agent
from pydantic_ai_summarization import create_context_manager_middleware
middleware = create_context_manager_middleware(
max_tokens=200_000,
compress_threshold=0.9,
on_usage_update=lambda pct, cur, mx: print(f"{pct:.0%} used ({cur:,}/{mx:,})"),
)
agent = Agent(
"openai:gpt-4o",
history_processors=[middleware],
)
```
Requires `pip install summarization-pydantic-ai[hybrid]`
## Trigger Types
| Type | Example | Description |
|------|---------|-------------|
| `messages` | `("messages", 50)` | Trigger when message count exceeds threshold |
| `tokens` | `("tokens", 100000)` | Trigger when token count exceeds threshold |
| `fraction` | `("fraction", 0.8)` | Trigger at percentage of max_input_tokens |
## Keep Types
| Type | Example | Description |
|------|---------|-------------|
| `messages` | `("messages", 20)` | Keep last N messages |
| `tokens` | `("tokens", 10000)` | Keep last N tokens worth |
| `fraction` | `("fraction", 0.2)` | Keep last N% of context |
## Advanced Configuration
### Multiple Triggers
```python
from pydantic_ai_summarization import SummarizationProcessor
processor = SummarizationProcessor(
model="openai:gpt-4o",
trigger=[
("messages", 50), # OR 50+ messages
("tokens", 100000), # OR 100k+ tokens
],
keep=("messages", 10),
)
```
### Fraction-Based
```python
processor = SummarizationProcessor(
model="openai:gpt-4o",
trigger=("fraction", 0.8), # 80% of context window
keep=("fraction", 0.2), # Keep last 20%
max_input_tokens=128000, # GPT-4's context window
)
```
### Custom Token Counter
```python
def my_token_counter(messages):
return sum(len(str(msg)) for msg in messages) // 4
processor = create_summarization_processor(
token_counter=my_token_counter,
)
```
### Custom Model (e.g., Azure OpenAI)
```python
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
from pydantic_ai_summarization import create_summarization_processor
azure_model = OpenAIModel(
"gpt-4o",
provider=OpenAIProvider(
base_url="https://my-resource.openai.azure.com/openai/deployments/gpt-4o",
api_key="your-azure-api-key",
),
)
processor = create_summarization_processor(
model=azure_model,
trigger=("tokens", 100000),
keep=("messages", 20),
)
```
### Custom Summary Prompt
```python
processor = create_summarization_processor(
summary_prompt="""
Extract key information from this conversation.
Focus on: decisions made, code written, pending tasks.
Conversation:
{messages}
""",
)
```
## Why Choose This Library?
| Feature | Description |
|---------|-------------|
| **Two Strategies** | Intelligent summarization or fast sliding window |
| **Flexible Triggers** | Message count, token count, or fraction-based |
| **Safe Cutoff** | Never breaks tool call/response pairs |
| **Custom Counters** | Bring your own token counting logic |
| **Custom Prompts** | Control how summaries are generated |
| **Token Tracking** | Real-time usage monitoring with callbacks |
| **Tool Truncation** | Automatic truncation of large tool outputs |
| **Custom Models** | Use any pydantic-ai Model (Azure, custom providers) |
| **Lightweight** | Only requires pydantic-ai-slim (no extra model SDKs) |
## Related Projects
| Package | Description |
|---------|-------------|
| [Pydantic Deep Agents](https://github.com/vstorm-co/pydantic-deepagents) | Full agent framework (uses this library) |
| [pydantic-ai-backend](https://github.com/vstorm-co/pydantic-ai-backend) | File storage and Docker sandbox |
| [pydantic-ai-todo](https://github.com/vstorm-co/pydantic-ai-todo) | Task planning toolset |
| [subagents-pydantic-ai](https://github.com/vstorm-co/subagents-pydantic-ai) | Multi-agent orchestration |
| [pydantic-ai](https://github.com/pydantic/pydantic-ai) | The foundation — agent framework by Pydantic |
## Contributing
```bash
git clone https://github.com/vstorm-co/summarization-pydantic-ai.git
cd summarization-pydantic-ai
make install
make test # 100% coverage required
```
## License
MIT — see [LICENSE](LICENSE)
Built with ❤️ by vstorm-co