https://github.com/mbhatt1/maif

Cryptographically-secure, auditable file format for AI agent memory with provenance tracking
https://github.com/mbhatt1/maif
ai ai-agent-tools ai-agents-framework cryptography
Last synced: 3 months ago
JSON representation
Cryptographically-secure, auditable file format for AI agent memory with provenance tracking
Host: GitHub
URL: https://github.com/mbhatt1/maif
Owner: mbhatt1
License: mit
Created: 2025-06-08T23:04:23.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2026-01-22T22:43:38.000Z (5 months ago)
Last Synced: 2026-01-22T23:24:25.958Z (5 months ago)
Topics: ai, ai-agent-tools, ai-agents-framework, cryptography
Language: Python
Homepage: https://vineethsai.github.io/maif/
Size: 133 MB
Stars: 6
Watchers: 0
Forks: 1
Open Issues: 18
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project

README

          


  



MAIF

Multimodal Artifact File Format for Trustworthy AI Agents




  

  

  

  

  

  

  

  

  





  Cryptographically-secure, auditable file format for AI agent memory with provenance tracking



---

## Overview

MAIF is a file format and SDK designed for AI agents that need **trustworthy memory**. Every piece of data is cryptographically linked, creating tamper-evident audit trails that prove exactly what happened, when, and by which agent.

**Key Capabilities:**

- **Cryptographic Provenance** - Hash-chained blocks for tamper-evident audit trails

- **Multi-Agent Coordination** - Shared artifacts with agent-specific logging

- **Multimodal Storage** - Text, embeddings, images, video, knowledge graphs

- **Privacy-by-Design** - Encryption, anonymization, access control built-in

- **High Performance** - Memory-mapped I/O, streaming, semantic compression

## Use Cases

- **Multi-Agent Systems** - Shared memory with full provenance (see LangGraph example)

- **RAG Pipelines** - Document storage with embeddings, search, and citation tracking

- **Compliance & Audit** - Immutable audit trails for regulated industries

- **Research** - Reproducible experiments with complete data lineage

- **Enterprise AI** - Secure, auditable AI workflows with access control

## Framework Integrations

MAIF provides drop-in integrations for popular AI agent frameworks:

| Framework | Status | Description |

|-----------|--------|-------------|

| LangGraph | Available | State checkpointer with provenance |

| CrewAI | Available | Crew/Agent callbacks, Memory |

| LangChain | Available | Callbacks, VectorStore, Memory |

| AWS Strands | Available | Agent callbacks |

```bash

pip install maif[integrations]

```

### LangGraph

```python

from langgraph.graph import StateGraph

from maif.integrations.langgraph import MAIFCheckpointer

checkpointer = MAIFCheckpointer("state.maif")

app = graph.compile(checkpointer=checkpointer)

result = app.invoke(state, config)

checkpointer.finalize()

```

### CrewAI

```python

from crewai import Crew

from maif.integrations.crewai import MAIFCrewCallback

callback = MAIFCrewCallback("crew.maif")

crew = Crew(

    agents=[...],

    tasks=[...],

    task_callback=callback.on_task_complete,

    step_callback=callback.on_step,

)

result = crew.kickoff()

callback.finalize()

```

See the [integrations documentation](docs/guide/integrations/) for full details.

---

## Quick Start

**Prerequisites:** Python 3.9+

### Installation

```bash

# Clone the repository

git clone https://github.com/vineethsai/maif.git

cd maif

# Install MAIF

pip install -e .

# With ML features (embeddings, semantic search)

pip install -e ".[ml]"

```

### Your First MAIF Artifact

```python

from maif import MAIFEncoder, MAIFDecoder, verify_maif

# Create an agent memory artifact (Ed25519 signed automatically)

encoder = MAIFEncoder("agent_memory.maif", agent_id="my-agent")

# Add content with automatic provenance tracking

encoder.add_text_block("User asked about weather in NYC", metadata={"type": "query"})

encoder.add_text_block("Temperature is 72°F, sunny", metadata={"type": "response"})

# Finalize (signs and seals the file)

encoder.finalize()

# Later: Load and verify integrity

decoder = MAIFDecoder("agent_memory.maif")

decoder.load()

is_valid, errors = decoder.verify_integrity()

print(f"Valid: {is_valid}, Blocks: {len(decoder.blocks)}")

# Read content

for i, block in enumerate(decoder.blocks):

    text = decoder.get_text_content(i)

    print(f"Block {i}: {text}")

```

**Secure MAIF Format:**

- **Self-contained** - No separate manifest files, everything in one `.maif` file

- **Ed25519 signatures** - Fast, compact 64-byte signatures on every block

- **Immutable blocks** - Each block is signed immediately on write

- **Tamper detection** - Cryptographic verification catches any modification

- **Embedded provenance** - Full audit trail built into the file

---

## Featured Example: Multi-Agent RAG System

A multi-agent system example with **LangGraph orchestration** and **MAIF provenance tracking** for demonstration purposes.

```bash

cd examples/langgraph

# Configure API key

echo "GEMINI_API_KEY=your_key" > .env

# Install dependencies

pip install -r requirements_enhanced.txt

# Create knowledge base with embeddings

python3 create_kb_enhanced.py

# Run the interactive demo

python3 demo_enhanced.py

```

**What's Included:**

- 5 specialized agents (Retriever, Synthesizer, Fact-Checker, Citation, Web Search)

- ChromaDB vector store with semantic search

- Gemini API integration for LLM reasoning

- Complete audit trail of every agent action

- Multi-turn conversation support

See [`examples/langgraph/README.md`](examples/langgraph/README.md) for full documentation.

---

## NEW: Enterprise AI Governance Demo

Interactive demonstration of MAIF's enterprise-grade governance features:

```bash

cd examples/integrations/langgraph_governance_demo

python main.py

```

**Features demonstrated:**

- Cryptographic provenance (Ed25519 signatures, hash chains)

- Tamper detection and data integrity verification

- Role-based access control with audit logging

- Multi-agent coordination with clear handoffs

- Compliance report generation (Markdown, JSON, CSV)

See [`examples/integrations/langgraph_governance_demo/README.md`](examples/integrations/langgraph_governance_demo/README.md) for details.

---

## Features

### Cryptographic Provenance

Every block is cryptographically signed and linked - any tampering is detectable.

```python

from maif import MAIFEncoder, MAIFDecoder

# Each block is signed with Ed25519 on creation

encoder = MAIFEncoder("memory.maif", agent_id="agent-1")

encoder.add_text_block("First message")   # Signed immediately

encoder.add_text_block("Second message")  # Linked to previous via hash

encoder.add_text_block("Third message")   # Chain continues

encoder.finalize()

# Verify the entire chain + all signatures

decoder = MAIFDecoder("memory.maif")

decoder.load()

is_valid, errors = decoder.verify_integrity()

# Check provenance chain

for entry in decoder.provenance:

    print(f"{entry.action} by {entry.agent_id} at {entry.timestamp}")

```

### Privacy & Security

Built-in encryption, anonymization, and access control.

```python

from maif import PrivacyLevel, EncryptionMode

# Add encrypted content

maif.add_text(

    "Sensitive data",

    encrypt=True,

    anonymize=True,  # Auto-redact PII

    privacy_level=PrivacyLevel.CONFIDENTIAL

)

# Access control

maif.add_access_rule(AccessRule(

    role="analyst",

    permissions=[Permission.READ],

    resources=["reports"]

))

```

### Multimodal Support

Store and search across text, images, video, embeddings, and knowledge graphs.

```python

from maif_api import MAIF

maif = MAIF("my-agent")

# Text with metadata

maif.add_text("Analysis results", metadata={"title": "Report", "language": "en"})

# Images with feature extraction

maif.add_image("chart.png", metadata={"title": "Sales Chart"})

# Semantic embeddings (pre-computed or from TF-IDF)

maif.add_embeddings([[0.1, 0.2, 0.3], [0.2, 0.3, 0.4]])

# Multimodal content - combines text, images, and embeddings

maif.add_multimodal({

    "text": "Product description",

    "image_path": "product.jpg",

    "embeddings": [[0.1, 0.2, ...]],

    "metadata": {"category": "electronics"}

})

maif.save("output.maif")

```

## What's Working

The following features are fully tested and working:

- **Ed25519 cryptographic signatures** - Fast, compact 64-byte signatures ✓

- **Multiple compression formats** - ZLIB, BROTLI, GZIP, and other standard formats ✓

- **Framework integrations** - LangGraph, CrewAI, LangChain, AWS Strands ✓

- **Provenance tracking** - Hash-chained blocks with tamper detection ✓

- **TF-IDF embeddings** - Lightweight semantic search with sklearn ✓

## What's In Progress / Research Phase

The following are research implementations with known limitations:

### Hierarchical Semantic Compression (HSC)

- **Status**: Research implementation

- **Current performance**: ~1.5x compression ratio on embeddings

- **What works**: DBSCAN clustering, vector quantization, Huffman coding

- **Limitations**: Not achieving claimed 2.5-4x ratio, not production-ready

- **Roadmap**: Plan to implement proper Product Quantization in v2.2

### Adaptive Cross-Modal Attention (ACAM)

```python

from maif.semantic import AdaptiveCrossModalAttention

import numpy as np

# ⚠ RESEARCH IMPLEMENTATION - Use with caution

acam = AdaptiveCrossModalAttention(embedding_dim=384, num_heads=8)

# Train on multimodal data (optional but recommended)

training_data = [

    {

        "text": np.random.randn(384),

        "image": np.random.randn(384),

        "audio": np.random.randn(384),

    },

    # ... more samples

]

stats = acam.fit(training_data, epochs=10)

# Use for attention computation

embeddings = {

    "text": np.random.randn(384),

    "image": np.random.randn(384),

}

weights = acam.compute_attention_weights(embeddings)

# Get fused representation

attended = acam.get_attended_representation(embeddings, weights, "text")

# Save/load trained weights

acam.save_weights("acam_weights.pkl")

acam.load_weights("acam_weights.pkl")

```

- **Status**: Research implementation

- **Current capability**: Computes cross-modal attention weights

- **Known limitations**: Training uses simple gradient descent, not optimized

### Cryptographic Semantic Binding (CSB)

- **Status**: Research implementation

- **Current capability**: SHA-256 based commitment schemes

- **Note**: Infrastructure in place, not validated for production use

### Neural Embeddings

- **Status**: ❌ Not implemented

- **Current**: TF-IDF only (sklearn-based)

- **Planned**: Optional sentence-transformers integration in future versions

- **Note**: Infrastructure exists but neural models not functional

---

## Performance

| Metric | Performance | Notes |

|--------|-------------|-------|

| Semantic Search | ~30ms for 1K vectors | TF-IDF based, tested at 1K, scales linearly |

| Standard Compression (ZLIB) | 2-3× typical | Proven, well-tested |

| Hierarchical Semantic Compression (HSC) | ~1.5× average | Research implementation, not production-ready |

| Integrity Verification | ~0.1ms per file | Ed25519 signature verification |

| Tamper Detection | 100% detection in <0.1ms | Hash-chain verification |

| Signature Overhead | 64 bytes per block | Ed25519 signatures |

**Note:** HSC claims of "2.5-4x compression up to 10x maximum" were not verified and are not guaranteed. Current implementation achieves ~1.5x on embeddings.

---

## Project Structure

```

maif/

├── maif/                  # Core library

│   ├── core.py           # MAIFEncoder, MAIFDecoder

│   ├── security.py       # Signing, verification

│   ├── privacy.py        # Encryption, anonymization

│   ├── integrations/     # Framework integrations (LangGraph, etc.)

│   └── semantic*.py      # Embeddings, compression

├── maif_api.py           # High-level API

├── examples/

│   ├── langgraph/        # Multi-agent RAG system

│   ├── integrations/     # Framework integration demos

│   ├── basic/            # Getting started

│   ├── security/         # Privacy & encryption

│   └── advanced/         # Agent framework, lifecycle

├── tests/                # 450+ tests

├── docs/                 # VitePress documentation

└── benchmarks/           # Performance tests

```

---

## Documentation

| Resource | Description |

|----------|-------------|

| [Getting Started](https://vineethsai.github.io/maif/guide/getting-started) | Quick start guide |

| [Framework Integrations](https://vineethsai.github.io/maif/guide/integrations/) | LangGraph, LangChain, CrewAI |

| [DeepWiki](https://deepwiki.com/vineethsai/maif) | Auto-generated API docs and code exploration |

| [Examples](examples/) | Working code examples |

---

## Examples

### Basic Usage

```bash

python examples/basic/simple_api_demo.py

python examples/basic/basic_usage.py

```

### Privacy & Security

```bash

python examples/security/privacy_demo.py

python examples/security/classified_api_simple_demo.py

```

### Advanced Features

```bash

python examples/advanced/maif_agent_demo.py          # Agent framework

python examples/advanced/lifecycle_management_demo.py # Lifecycle management

python examples/advanced/video_demo.py               # Video processing

```

---

## Contributing

We welcome contributions! Please ensure:

1. All tests pass (`pytest tests/`)

2. Code follows PEP 8 style

3. New features include tests and documentation

4. Security-sensitive changes include impact analysis

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

---

## References

- [FIPS 140-2 Standards](https://csrc.nist.gov/publications/detail/fips/140/2/final) - Cryptographic module requirements

- [NIST 800-53](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final) - Security and privacy controls

- [ISO BMFF](https://www.iso.org/standard/68960.html) - Binary format inspiration

---

## License

MIT License - See [LICENSE](LICENSE) for details.

---

## Community & Support

- **[GitHub Discussions](https://github.com/vineethsai/maif/discussions)** - Ask questions, share ideas

- **[Issue Tracker](https://github.com/vineethsai/maif/issues)** - Report bugs or request features  

- **[Documentation](https://vineethsai.github.io/maif/)** - Complete guides and API reference

- **[Security](SECURITY.md)** - Report security vulnerabilities

- **[Changelog](CHANGELOG.md)** - See what's new

- **[Specification](SPECIFICATION.md)** - MAIF file format specification

---



  Build trustworthy AI agents with cryptographic provenance
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mbhatt1/maif

Awesome Lists containing this project

README

MAIF

Multimodal Artifact File Format for Trustworthy AI Agents