https://github.com/mnemom/aip

Agent Integrity Protocol — real-time thinking block analysis for AI agent alignment
https://github.com/mnemom/aip
agent ai alignment integrity llm protocol safety thinking
Last synced: 5 months ago
JSON representation
Agent Integrity Protocol — real-time thinking block analysis for AI agent alignment
Host: GitHub
URL: https://github.com/mnemom/aip
Owner: mnemom
License: apache-2.0
Created: 2026-02-10T02:53:20.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-02-14T05:13:59.000Z (5 months ago)
Last Synced: 2026-02-15T03:27:06.958Z (5 months ago)
Topics: agent, ai, alignment, integrity, llm, protocol, safety, thinking
Language: Python
Size: 434 KB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: docs/SECURITY.md
Awesome Lists containing this project

README

          # Agent Integrity Protocol (AIP)

[![CI](https://github.com/mnemom/aip/actions/workflows/ci.yml/badge.svg)](https://github.com/mnemom/aip/actions/workflows/ci.yml)

[![CodeQL](https://github.com/mnemom/aip/actions/workflows/codeql.yml/badge.svg)](https://github.com/mnemom/aip/actions/workflows/codeql.yml)

[![PyPI](https://img.shields.io/pypi/v/agent-integrity-proto.svg)](https://pypi.org/project/agent-integrity-proto/)

[![npm](https://img.shields.io/npm/v/@mnemom/agent-integrity-protocol.svg)](https://www.npmjs.com/package/@mnemom/agent-integrity-protocol)

[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)

[![Spec](https://img.shields.io/badge/spec-v0.1.0-green.svg)](docs/SPEC.md)

**Real-time thinking block analysis for AI agent alignment.**

AIP analyzes what an agent is *thinking* before it acts. It extracts thinking blocks from LLM responses, evaluates them against an Alignment Card using an analysis LLM, and delivers integrity verdicts in real-time — enabling intervention between turns, not after the damage is done.

> AIP follows the daimonion philosophy: silence means aligned, voice means outside boundaries. It is a sister protocol to [AAP](https://github.com/mnemom/aap) — sharing the same Alignment Card, enriching the same AP-Trace, but operating on a fundamentally different timescale: real-time rather than retroactive.

## Quick Start

```bash

# Install

pip install agent-integrity-proto

# or

npm install @mnemom/agent-integrity-protocol

```

```python

from aip import create_client

client = create_client(

    base_url="https://gateway.example.com",

    agent_id="my-agent",

    shared_secret="your-secret",

)

# Check a thinking block against an alignment card

signal = client.check(

    thinking_block="I should help the user find the best product...",

    alignment_card=card,

)

if signal.proceed:

    # Verdict was 'clear' — continue

    execute_action()

else:

    # Verdict was 'review_needed' or 'boundary_violation'

    escalate(signal.checkpoint.concerns)

```

```typescript

import { createClient } from '@mnemom/agent-integrity-protocol';

const client = createClient({

  baseUrl: 'https://gateway.example.com',

  agentId: 'my-agent',

  sharedSecret: 'your-secret',

});

const signal = await client.check({

  thinkingBlock: 'I should help the user find the best product...',

  alignmentCard: card,

});

if (signal.proceed) {

  executeAction();

} else {

  escalate(signal.checkpoint.concerns);

}

```

## Why AIP?

AAP provides post-hoc transparency — every agent action is traced, verified, and checked for drift. But verification happens *after* the action, typically 30-60 seconds later. This creates a window of vulnerability:

- **Prompt injection**: A malicious prompt causes data exfiltration. The trace records it, but the data is already gone.

- **Gradual drift**: Reasoning shifts subtly across turns. Each action passes verification, but the cumulative direction is misaligned.

- **Misaligned intent**: The thinking block reveals a forbidden action, but it's only examined retroactively.

All three failure modes are visible in thinking blocks *before* the action lands. AIP fills this gap.

| Protocol | Function | Timescale |

|----------|----------|-----------|

| **AAP** | Post-hoc verification via AP-Traces | Retroactive (30-60s) |

| **AIP** | Real-time thinking block analysis | Between turns (~2s) |

| **Together** | Full coverage: real-time detection + audit trail | Both |

## How It Works

```

                         Agent Turn N

                              │

                              ▼

┌──────────────────────────────────────────────────────┐

│  Layer 1: Thinking Block Extraction                  │

│                                                      │

│  Anthropic  │  OpenAI  │  Google  │  Fallback        │

│  (thinking) │ (reason) │ (think)  │  (regex)         │

└──────────────────────┬───────────────────────────────┘

                       │

                       ▼

┌──────────────────────────────────────────────────────┐

│  Layer 2: Integrity Analysis                         │

│                                                      │

│  Card Summary + Conscience Values + Session Window   │

│              → Analysis LLM (e.g. Haiku)             │

│              → Integrity Verdict                     │

└──────────────────────┬───────────────────────────────┘

                       │

                       ▼

┌──────────────────────────────────────────────────────┐

│  Layer 3: Signal Delivery                            │

│                                                      │

│  clear → proceed    │  review_needed → warn/pause    │

│                     │  boundary_violation → block     │

└──────────────────────────────────────────────────────┘

                       │

                       ▼

                  Agent Turn N+1

```

### Integrity Checkpoint

The core output — a structured record of each thinking block evaluation:

```json

{

  "checkpoint_id": "ic-f47ac10b",

  "agent_id": "my-agent",

  "verdict": "clear",

  "confidence": 0.95,

  "concerns": [],

  "thinking_hash": "sha256:a1b2c3...",

  "window_position": { "index": 3, "window_size": 10 },

  "analysis_metadata": {

    "model": "claude-haiku-4-5-20251001",

    "latency_ms": 1847,

    "conscience_values_used": 2

  }

}

```

### Conscience Prompt

AIP builds a system prompt for the analysis LLM from:

- **Card summary** — compact representation of the Alignment Card

- **Conscience values** — `BOUNDARY` (hard limits) and `FEAR` (concerns) types injected as evaluation criteria

- **Session window** — recent checkpoint history for drift context

- **The thinking block itself**

The analysis LLM acts as a daimonion — it never commands, only warns.

### Provider Adapters

Thinking blocks differ by provider. AIP abstracts extraction:

| Provider | Source | Confidence |

|----------|--------|------------|

| Anthropic | `thinking` content blocks (+ SSE stream fallback) | 1.0 |

| OpenAI | `reasoning_content` field (+ SSE stream fallback) | 0.9 |

| Google | Gemini thinking blocks (+ SSE stream fallback) | 0.9 |

| Fallback | Regex-based extraction | 0.3 |

All adapters attempt SSE stream extraction as a fallback when standard JSON parsing fails (v0.1.3+). This allows analysis of streamed responses captured by gateway proxies.

## What AIP Does Not Do

This matters. Read it.

1. **AIP does NOT prevent injection — it detects it.** AIP identifies injection patterns in thinking blocks. It does not prevent injected content from reaching the model.

2. **AIP does NOT interrupt streams.** AIP operates between turns. It does not cancel in-flight streaming responses.

3. **AIP does NOT replace AAP.** Post-hoc verification, trace storage, and public transparency remain AAP's domain. AIP supplements AAP with real-time detection.

4. **LLM-as-judge has inherent limits.** The analysis LLM can be fooled by sophisticated adversarial content. AIP reduces the attack surface but does not eliminate it.

5. **Thinking blocks are model-dependent.** Not all models expose thinking. Models that don't expose thinking blocks cannot be analyzed by AIP.

For the complete limitations disclosure, see [Section 14 of the Specification](docs/SPEC.md#14-limitations).

## Installation

```bash

# Python

pip install agent-integrity-proto

# TypeScript

npm install @mnemom/agent-integrity-protocol

```

**Requirements:** Python >= 3.10 | Node.js >= 18.0.0

## API Reference

### Python

```python

# Core analysis

from aip import (

    check_integrity,        # Evaluate thinking block → IntegrityCheckpoint

    build_signal,           # Construct signal from checkpoint → IntegritySignal

    build_conscience_prompt, # Generate analysis LLM prompt

    hash_thinking_block,    # Content-addressed thinking reference

    detect_integrity_drift, # Track behavioral drift across checkpoints

    validate_agreement,     # Verify card-conscience alignment

)

# Provider adapters

from aip import (

    AnthropicAdapter,       # Anthropic thinking content blocks

    OpenAIAdapter,          # OpenAI reasoning_content

    GoogleAdapter,          # Google Gemini thinking

    FallbackAdapter,        # Regex-based fallback

    AdapterRegistry,        # Dynamic provider selection

)

# SDK client

from aip import create_client, sign_payload, verify_signature

# Session state

from aip import WindowManager, create_window_state

```

### TypeScript

```typescript

import {

  // Core analysis

  checkIntegrity,

  buildSignal,

  buildConsciencePrompt,

  hashThinkingBlock,

  detectIntegrityDrift,

  validateAgreement,

  // Provider adapters

  AnthropicAdapter,

  OpenAIAdapter,

  GoogleAdapter,

  FallbackAdapter,

  AdapterRegistry,

  // SDK client

  createClient,

  signPayload,

  verifySignature,

  // Session state

  WindowManager,

  createWindowState,

} from '@mnemom/agent-integrity-protocol';

```

## Documentation

| Document | Description |

|----------|-------------|

| [**SPEC.md**](docs/SPEC.md) | Full protocol specification (IETF-style, 2,214 lines) |

| [**QUICKSTART.md**](docs/QUICKSTART.md) | Zero to integrity checking in 5 minutes |

| [**LIMITS.md**](docs/LIMITS.md) | What AIP guarantees and doesn't |

| [**SECURITY.md**](docs/SECURITY.md) | Threat model and security considerations |

| [**CHANGELOG.md**](CHANGELOG.md) | Release history |

## Examples

| Example | Description |

|---------|-------------|

| [`basic-check/`](examples/basic-check/) | Minimal integrity check with aligned and misaligned thinking |

| [`gateway-integration/`](examples/gateway-integration/) | Cloudflare Worker gateway with real-time AIP analysis |

| [`adversarial/`](examples/adversarial/) | Attack scenarios: injection, drift, meta-injection, deception |

## Status

**Current Version**: 0.1.3

| Component | Status |

|-----------|--------|

| Specification | ✅ Complete |

| TypeScript SDK | ✅ Complete (272 tests) |

| Python SDK | ✅ Complete (267 tests) |

| Provider Adapters | ✅ Anthropic, OpenAI, Google, Fallback |

| Session Windowing | ✅ Complete |

| Drift Detection | ✅ Complete |

| Gateway Integration | ✅ Verified (Cloudflare Workers) |

## Contributing

We welcome contributions. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

Key areas where we need help:

- Provider adapter implementations for additional LLMs

- Integration examples with agent frameworks

- Adversarial test vectors

- Documentation improvements

## License

Apache 2.0. See [LICENSE](LICENSE) for details.

---

*Agent Integrity Protocol is part of the [Mnemom.ai](https://github.com/mnemom) trust infrastructure for autonomous agents, alongside [AAP](https://github.com/mnemom/aap) (Agent Alignment Protocol).*
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mnemom/aip

Awesome Lists containing this project

README