https://github.com/vstorm-co/summarization-pydantic-ai

Context Management processor for Pydantic AI agents, providing LLM-powered summarization or zero-cost sliding window trimming to handle infinite/long-running conversations without context overflow. Supports flexible triggers, safe cutoffs, and custom prompts for efficient AI apps.
https://github.com/vstorm-co/summarization-pydantic-ai

ai-agents anthropic chatgpt claude context-engineering deepagents gemini llm pydantic-ai python

Last synced: 23 days ago
JSON representation

Host: GitHub
URL: https://github.com/vstorm-co/summarization-pydantic-ai
Owner: vstorm-co
License: mit
Created: 2026-01-20T21:21:46.000Z (about 2 months ago)
Default Branch: main
Last Pushed: 2026-01-22T21:36:37.000Z (about 2 months ago)
Last Synced: 2026-01-23T16:20:17.478Z (about 2 months ago)
Topics: ai-agents, anthropic, chatgpt, claude, context-engineering, deepagents, gemini, llm, pydantic-ai, python
Language: Python
Homepage: https://vstorm-co.github.io/summarization-pydantic-ai/
Size: 938 KB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          
Context Management for Pydantic AI




  Automatic Conversation Summarization and History Management





  

  

  

  

  





  Intelligent Summarization — LLM-powered context compression

   • 

  Sliding Window — zero-cost message trimming

   • 

  Context Manager — real-time token tracking + tool truncation

   • 

  Safe Cutoff — preserves tool call pairs



---

**Context Management for Pydantic AI** helps your [Pydantic AI](https://ai.pydantic.dev/) agents handle long conversations without exceeding model context limits. Choose between intelligent LLM summarization or fast sliding window trimming.

> **Full framework?** Check out [Pydantic Deep Agents](https://github.com/vstorm-co/pydantic-deepagents) — complete agent framework with planning, filesystem, subagents, and skills.

## Use Cases

| What You Want to Build | How This Library Helps |

|------------------------|------------------------|

| **Long-Running Agent** | Automatically compress history when context fills up |

| **Customer Support Bot** | Preserve key details while discarding routine exchanges |

| **Code Assistant** | Keep recent code context, summarize older discussions |

| **High-Throughput App** | Zero-cost sliding window for maximum speed |

| **Cost-Sensitive App** | Choose between quality (summarization) or free (sliding window) |

## Installation

```bash

pip install summarization-pydantic-ai

```

Or with uv:

```bash

uv add summarization-pydantic-ai

```

For accurate token counting:

```bash

pip install summarization-pydantic-ai[tiktoken]

```

For real-time token tracking and tool output truncation:

```bash

pip install summarization-pydantic-ai[hybrid]

```

## Quick Start

```python

from pydantic_ai import Agent

from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(

    trigger=("tokens", 100000),

    keep=("messages", 20),

)

agent = Agent(

    "openai:gpt-4o",

    history_processors=[processor],

)

result = await agent.run("Hello!")

```

**That's it.** Your agent now:

- Monitors conversation size on every turn

- Summarizes older messages when limits are reached

- Preserves tool call/response pairs (never breaks them)

- Keeps recent context intact

## Available Processors

| Processor | LLM Cost | Latency | Context Preservation |

|-----------|----------|---------|---------------------|

| `SummarizationProcessor` | High | High | Intelligent summary |

| `SlidingWindowProcessor` | Zero | ~0ms | Discards old messages |

| `ContextManagerMiddleware` | Per compression | Low tracking / High compression | Intelligent summary |

### Intelligent Summarization

Uses an LLM to create summaries of older messages:

```python

from pydantic_ai_summarization import create_summarization_processor

processor = create_summarization_processor(

    trigger=("tokens", 100000),  # When to summarize

    keep=("messages", 20),       # What to keep

)

```

### Zero-Cost Sliding Window

Simply discards old messages — no LLM calls:

```python

from pydantic_ai_summarization import create_sliding_window_processor

processor = create_sliding_window_processor(

    trigger=("messages", 100),  # When to trim

    keep=("messages", 50),      # What to keep

)

```

### Real-Time Context Manager

Dual-protocol middleware combining token tracking, auto-compression, and tool output truncation:

```python

from pydantic_ai import Agent

from pydantic_ai_summarization import create_context_manager_middleware

middleware = create_context_manager_middleware(

    max_tokens=200_000,

    compress_threshold=0.9,

    on_usage_update=lambda pct, cur, mx: print(f"{pct:.0%} used ({cur:,}/{mx:,})"),

)

agent = Agent(

    "openai:gpt-4o",

    history_processors=[middleware],

)

```

Requires `pip install summarization-pydantic-ai[hybrid]`

## Trigger Types

| Type | Example | Description |

|------|---------|-------------|

| `messages` | `("messages", 50)` | Trigger when message count exceeds threshold |

| `tokens` | `("tokens", 100000)` | Trigger when token count exceeds threshold |

| `fraction` | `("fraction", 0.8)` | Trigger at percentage of max_input_tokens |

## Keep Types

| Type | Example | Description |

|------|---------|-------------|

| `messages` | `("messages", 20)` | Keep last N messages |

| `tokens` | `("tokens", 10000)` | Keep last N tokens worth |

| `fraction` | `("fraction", 0.2)` | Keep last N% of context |

## Advanced Configuration

### Multiple Triggers

```python

from pydantic_ai_summarization import SummarizationProcessor

processor = SummarizationProcessor(

    model="openai:gpt-4o",

    trigger=[

        ("messages", 50),    # OR 50+ messages

        ("tokens", 100000),  # OR 100k+ tokens

    ],

    keep=("messages", 10),

)

```

### Fraction-Based

```python

processor = SummarizationProcessor(

    model="openai:gpt-4o",

    trigger=("fraction", 0.8),  # 80% of context window

    keep=("fraction", 0.2),     # Keep last 20%

    max_input_tokens=128000,    # GPT-4's context window

)

```

### Custom Token Counter

```python

def my_token_counter(messages):

    return sum(len(str(msg)) for msg in messages) // 4

processor = create_summarization_processor(

    token_counter=my_token_counter,

)

```

### Custom Model (e.g., Azure OpenAI)

```python

from pydantic_ai.models.openai import OpenAIModel

from pydantic_ai.providers.openai import OpenAIProvider

from pydantic_ai_summarization import create_summarization_processor

azure_model = OpenAIModel(

    "gpt-4o",

    provider=OpenAIProvider(

        base_url="https://my-resource.openai.azure.com/openai/deployments/gpt-4o",

        api_key="your-azure-api-key",

    ),

)

processor = create_summarization_processor(

    model=azure_model,

    trigger=("tokens", 100000),

    keep=("messages", 20),

)

```

### Custom Summary Prompt

```python

processor = create_summarization_processor(

    summary_prompt="""

    Extract key information from this conversation.

    Focus on: decisions made, code written, pending tasks.

    Conversation:

    {messages}

    """,

)

```

## Why Choose This Library?

| Feature | Description |

|---------|-------------|

| **Two Strategies** | Intelligent summarization or fast sliding window |

| **Flexible Triggers** | Message count, token count, or fraction-based |

| **Safe Cutoff** | Never breaks tool call/response pairs |

| **Custom Counters** | Bring your own token counting logic |

| **Custom Prompts** | Control how summaries are generated |

| **Token Tracking** | Real-time usage monitoring with callbacks |

| **Tool Truncation** | Automatic truncation of large tool outputs |

| **Custom Models** | Use any pydantic-ai Model (Azure, custom providers) |

| **Lightweight** | Only requires pydantic-ai-slim (no extra model SDKs) |

## Related Projects

| Package | Description |

|---------|-------------|

| [Pydantic Deep Agents](https://github.com/vstorm-co/pydantic-deepagents) | Full agent framework (uses this library) |

| [pydantic-ai-backend](https://github.com/vstorm-co/pydantic-ai-backend) | File storage and Docker sandbox |

| [pydantic-ai-todo](https://github.com/vstorm-co/pydantic-ai-todo) | Task planning toolset |

| [subagents-pydantic-ai](https://github.com/vstorm-co/subagents-pydantic-ai) | Multi-agent orchestration |

| [pydantic-ai](https://github.com/pydantic/pydantic-ai) | The foundation — agent framework by Pydantic |

## Contributing

```bash

git clone https://github.com/vstorm-co/summarization-pydantic-ai.git

cd summarization-pydantic-ai

make install

make test  # 100% coverage required

```

## License

MIT — see [LICENSE](LICENSE)



  _{Built with ❤️ by vstorm-co}

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vstorm-co/summarization-pydantic-ai

Awesome Lists containing this project

README

Context Management for Pydantic AI