{"id":44460905,"url":"https://github.com/mnemom/aip","last_synced_at":"2026-02-21T08:06:57.687Z","repository":{"id":337950132,"uuid":"1154111231","full_name":"mnemom/aip","owner":"mnemom","description":"Agent Integrity Protocol — real-time thinking block analysis for AI agent alignment","archived":false,"fork":false,"pushed_at":"2026-02-14T05:13:59.000Z","size":444,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-15T03:27:06.958Z","etag":null,"topics":["agent","ai","alignment","integrity","llm","protocol","safety","thinking"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mnemom.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"docs/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-10T02:53:20.000Z","updated_at":"2026-02-14T05:13:54.000Z","dependencies_parsed_at":"2026-02-14T21:03:34.301Z","dependency_job_id":null,"html_url":"https://github.com/mnemom/aip","commit_stats":null,"previous_names":["mnemom/aip"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/mnemom/aip","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mnemom%2Faip","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mnemom%2Faip/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mnemom%2Faip/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mnemom%2Faip/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mnemom","download_url":"https://codeload.github.com/mnemom/aip/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mnemom%2Faip/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29490360,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-15T19:29:10.908Z","status":"ssl_error","status_checked_at":"2026-02-15T19:29:10.419Z","response_time":118,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","ai","alignment","integrity","llm","protocol","safety","thinking"],"created_at":"2026-02-12T19:00:21.535Z","updated_at":"2026-02-15T22:01:03.501Z","avatar_url":"https://github.com/mnemom.png","language":"Python","readme":"# Agent Integrity Protocol (AIP)\n\n[![CI](https://github.com/mnemom/aip/actions/workflows/ci.yml/badge.svg)](https://github.com/mnemom/aip/actions/workflows/ci.yml)\n[![CodeQL](https://github.com/mnemom/aip/actions/workflows/codeql.yml/badge.svg)](https://github.com/mnemom/aip/actions/workflows/codeql.yml)\n[![PyPI](https://img.shields.io/pypi/v/agent-integrity-proto.svg)](https://pypi.org/project/agent-integrity-proto/)\n[![npm](https://img.shields.io/npm/v/@mnemom/agent-integrity-protocol.svg)](https://www.npmjs.com/package/@mnemom/agent-integrity-protocol)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)\n[![Spec](https://img.shields.io/badge/spec-v0.1.0-green.svg)](docs/SPEC.md)\n\n**Real-time thinking block analysis for AI agent alignment.**\n\nAIP analyzes what an agent is *thinking* before it acts. It extracts thinking blocks from LLM responses, evaluates them against an Alignment Card using an analysis LLM, and delivers integrity verdicts in real-time — enabling intervention between turns, not after the damage is done.\n\n\u003e AIP follows the daimonion philosophy: silence means aligned, voice means outside boundaries. It is a sister protocol to [AAP](https://github.com/mnemom/aap) — sharing the same Alignment Card, enriching the same AP-Trace, but operating on a fundamentally different timescale: real-time rather than retroactive.\n\n## Quick Start\n\n```bash\n# Install\npip install agent-integrity-proto\n# or\nnpm install @mnemom/agent-integrity-protocol\n```\n\n```python\nfrom aip import create_client\n\nclient = create_client(\n    base_url=\"https://gateway.example.com\",\n    agent_id=\"my-agent\",\n    shared_secret=\"your-secret\",\n)\n\n# Check a thinking block against an alignment card\nsignal = client.check(\n    thinking_block=\"I should help the user find the best product...\",\n    alignment_card=card,\n)\n\nif signal.proceed:\n    # Verdict was 'clear' — continue\n    execute_action()\nelse:\n    # Verdict was 'review_needed' or 'boundary_violation'\n    escalate(signal.checkpoint.concerns)\n```\n\n```typescript\nimport { createClient } from '@mnemom/agent-integrity-protocol';\n\nconst client = createClient({\n  baseUrl: 'https://gateway.example.com',\n  agentId: 'my-agent',\n  sharedSecret: 'your-secret',\n});\n\nconst signal = await client.check({\n  thinkingBlock: 'I should help the user find the best product...',\n  alignmentCard: card,\n});\n\nif (signal.proceed) {\n  executeAction();\n} else {\n  escalate(signal.checkpoint.concerns);\n}\n```\n\n## Why AIP?\n\nAAP provides post-hoc transparency — every agent action is traced, verified, and checked for drift. But verification happens *after* the action, typically 30-60 seconds later. This creates a window of vulnerability:\n\n- **Prompt injection**: A malicious prompt causes data exfiltration. The trace records it, but the data is already gone.\n- **Gradual drift**: Reasoning shifts subtly across turns. Each action passes verification, but the cumulative direction is misaligned.\n- **Misaligned intent**: The thinking block reveals a forbidden action, but it's only examined retroactively.\n\nAll three failure modes are visible in thinking blocks *before* the action lands. AIP fills this gap.\n\n| Protocol | Function | Timescale |\n|----------|----------|-----------|\n| **AAP** | Post-hoc verification via AP-Traces | Retroactive (30-60s) |\n| **AIP** | Real-time thinking block analysis | Between turns (~2s) |\n| **Together** | Full coverage: real-time detection + audit trail | Both |\n\n## How It Works\n\n```\n                         Agent Turn N\n                              │\n                              ▼\n┌──────────────────────────────────────────────────────┐\n│  Layer 1: Thinking Block Extraction                  │\n│                                                      │\n│  Anthropic  │  OpenAI  │  Google  │  Fallback        │\n│  (thinking) │ (reason) │ (think)  │  (regex)         │\n└──────────────────────┬───────────────────────────────┘\n                       │\n                       ▼\n┌──────────────────────────────────────────────────────┐\n│  Layer 2: Integrity Analysis                         │\n│                                                      │\n│  Card Summary + Conscience Values + Session Window   │\n│              → Analysis LLM (e.g. Haiku)             │\n│              → Integrity Verdict                     │\n└──────────────────────┬───────────────────────────────┘\n                       │\n                       ▼\n┌──────────────────────────────────────────────────────┐\n│  Layer 3: Signal Delivery                            │\n│                                                      │\n│  clear → proceed    │  review_needed → warn/pause    │\n│                     │  boundary_violation → block     │\n└──────────────────────────────────────────────────────┘\n                       │\n                       ▼\n                  Agent Turn N+1\n```\n\n### Integrity Checkpoint\n\nThe core output — a structured record of each thinking block evaluation:\n\n```json\n{\n  \"checkpoint_id\": \"ic-f47ac10b\",\n  \"agent_id\": \"my-agent\",\n  \"verdict\": \"clear\",\n  \"confidence\": 0.95,\n  \"concerns\": [],\n  \"thinking_hash\": \"sha256:a1b2c3...\",\n  \"window_position\": { \"index\": 3, \"window_size\": 10 },\n  \"analysis_metadata\": {\n    \"model\": \"claude-haiku-4-5-20251001\",\n    \"latency_ms\": 1847,\n    \"conscience_values_used\": 2\n  }\n}\n```\n\n### Conscience Prompt\n\nAIP builds a system prompt for the analysis LLM from:\n\n- **Card summary** — compact representation of the Alignment Card\n- **Conscience values** — `BOUNDARY` (hard limits) and `FEAR` (concerns) types injected as evaluation criteria\n- **Session window** — recent checkpoint history for drift context\n- **The thinking block itself**\n\nThe analysis LLM acts as a daimonion — it never commands, only warns.\n\n### Provider Adapters\n\nThinking blocks differ by provider. AIP abstracts extraction:\n\n| Provider | Source | Confidence |\n|----------|--------|------------|\n| Anthropic | `thinking` content blocks (+ SSE stream fallback) | 1.0 |\n| OpenAI | `reasoning_content` field (+ SSE stream fallback) | 0.9 |\n| Google | Gemini thinking blocks (+ SSE stream fallback) | 0.9 |\n| Fallback | Regex-based extraction | 0.3 |\n\nAll adapters attempt SSE stream extraction as a fallback when standard JSON parsing fails (v0.1.3+). This allows analysis of streamed responses captured by gateway proxies.\n\n## What AIP Does Not Do\n\nThis matters. Read it.\n\n1. **AIP does NOT prevent injection — it detects it.** AIP identifies injection patterns in thinking blocks. It does not prevent injected content from reaching the model.\n\n2. **AIP does NOT interrupt streams.** AIP operates between turns. It does not cancel in-flight streaming responses.\n\n3. **AIP does NOT replace AAP.** Post-hoc verification, trace storage, and public transparency remain AAP's domain. AIP supplements AAP with real-time detection.\n\n4. **LLM-as-judge has inherent limits.** The analysis LLM can be fooled by sophisticated adversarial content. AIP reduces the attack surface but does not eliminate it.\n\n5. **Thinking blocks are model-dependent.** Not all models expose thinking. Models that don't expose thinking blocks cannot be analyzed by AIP.\n\nFor the complete limitations disclosure, see [Section 14 of the Specification](docs/SPEC.md#14-limitations).\n\n## Installation\n\n```bash\n# Python\npip install agent-integrity-proto\n\n# TypeScript\nnpm install @mnemom/agent-integrity-protocol\n```\n\n**Requirements:** Python \u003e= 3.10 | Node.js \u003e= 18.0.0\n\n## API Reference\n\n### Python\n\n```python\n# Core analysis\nfrom aip import (\n    check_integrity,        # Evaluate thinking block → IntegrityCheckpoint\n    build_signal,           # Construct signal from checkpoint → IntegritySignal\n    build_conscience_prompt, # Generate analysis LLM prompt\n    hash_thinking_block,    # Content-addressed thinking reference\n    detect_integrity_drift, # Track behavioral drift across checkpoints\n    validate_agreement,     # Verify card-conscience alignment\n)\n\n# Provider adapters\nfrom aip import (\n    AnthropicAdapter,       # Anthropic thinking content blocks\n    OpenAIAdapter,          # OpenAI reasoning_content\n    GoogleAdapter,          # Google Gemini thinking\n    FallbackAdapter,        # Regex-based fallback\n    AdapterRegistry,        # Dynamic provider selection\n)\n\n# SDK client\nfrom aip import create_client, sign_payload, verify_signature\n\n# Session state\nfrom aip import WindowManager, create_window_state\n```\n\n### TypeScript\n\n```typescript\nimport {\n  // Core analysis\n  checkIntegrity,\n  buildSignal,\n  buildConsciencePrompt,\n  hashThinkingBlock,\n  detectIntegrityDrift,\n  validateAgreement,\n\n  // Provider adapters\n  AnthropicAdapter,\n  OpenAIAdapter,\n  GoogleAdapter,\n  FallbackAdapter,\n  AdapterRegistry,\n\n  // SDK client\n  createClient,\n  signPayload,\n  verifySignature,\n\n  // Session state\n  WindowManager,\n  createWindowState,\n} from '@mnemom/agent-integrity-protocol';\n```\n\n## Documentation\n\n| Document | Description |\n|----------|-------------|\n| [**SPEC.md**](docs/SPEC.md) | Full protocol specification (IETF-style, 2,214 lines) |\n| [**QUICKSTART.md**](docs/QUICKSTART.md) | Zero to integrity checking in 5 minutes |\n| [**LIMITS.md**](docs/LIMITS.md) | What AIP guarantees and doesn't |\n| [**SECURITY.md**](docs/SECURITY.md) | Threat model and security considerations |\n| [**CHANGELOG.md**](CHANGELOG.md) | Release history |\n\n## Examples\n\n| Example | Description |\n|---------|-------------|\n| [`basic-check/`](examples/basic-check/) | Minimal integrity check with aligned and misaligned thinking |\n| [`gateway-integration/`](examples/gateway-integration/) | Cloudflare Worker gateway with real-time AIP analysis |\n| [`adversarial/`](examples/adversarial/) | Attack scenarios: injection, drift, meta-injection, deception |\n\n## Status\n\n**Current Version**: 0.1.3\n\n| Component | Status |\n|-----------|--------|\n| Specification | ✅ Complete |\n| TypeScript SDK | ✅ Complete (272 tests) |\n| Python SDK | ✅ Complete (267 tests) |\n| Provider Adapters | ✅ Anthropic, OpenAI, Google, Fallback |\n| Session Windowing | ✅ Complete |\n| Drift Detection | ✅ Complete |\n| Gateway Integration | ✅ Verified (Cloudflare Workers) |\n\n## Contributing\n\nWe welcome contributions. See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\nKey areas where we need help:\n\n- Provider adapter implementations for additional LLMs\n- Integration examples with agent frameworks\n- Adversarial test vectors\n- Documentation improvements\n\n## License\n\nApache 2.0. See [LICENSE](LICENSE) for details.\n\n---\n\n*Agent Integrity Protocol is part of the [Mnemom.ai](https://github.com/mnemom) trust infrastructure for autonomous agents, alongside [AAP](https://github.com/mnemom/aap) (Agent Alignment Protocol).*\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmnemom%2Faip","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmnemom%2Faip","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmnemom%2Faip/lists"}