An open API service indexing awesome lists of open source software.

https://github.com/mbhatt1/maif

Cryptographically-secure, auditable file format for AI agent memory with provenance tracking
https://github.com/mbhatt1/maif

ai ai-agent-tools ai-agents-framework cryptography

Last synced: 3 months ago
JSON representation

Cryptographically-secure, auditable file format for AI agent memory with provenance tracking

Awesome Lists containing this project

README

          


MAIF Logo

MAIF


Multimodal Artifact File Format for Trustworthy AI Agents


PyPI
Downloads
Python 3.9+
License: MIT
CI Tests
Documentation
DeepWiki
Release
Code of Conduct


Cryptographically-secure, auditable file format for AI agent memory with provenance tracking

---

## Overview

MAIF is a file format and SDK designed for AI agents that need **trustworthy memory**. Every piece of data is cryptographically linked, creating tamper-evident audit trails that prove exactly what happened, when, and by which agent.

**Key Capabilities:**

- **Cryptographic Provenance** - Hash-chained blocks for tamper-evident audit trails
- **Multi-Agent Coordination** - Shared artifacts with agent-specific logging
- **Multimodal Storage** - Text, embeddings, images, video, knowledge graphs
- **Privacy-by-Design** - Encryption, anonymization, access control built-in
- **High Performance** - Memory-mapped I/O, streaming, semantic compression

## Use Cases

- **Multi-Agent Systems** - Shared memory with full provenance (see LangGraph example)
- **RAG Pipelines** - Document storage with embeddings, search, and citation tracking
- **Compliance & Audit** - Immutable audit trails for regulated industries
- **Research** - Reproducible experiments with complete data lineage
- **Enterprise AI** - Secure, auditable AI workflows with access control

## Framework Integrations

MAIF provides drop-in integrations for popular AI agent frameworks:

| Framework | Status | Description |
|-----------|--------|-------------|
| LangGraph | Available | State checkpointer with provenance |
| CrewAI | Available | Crew/Agent callbacks, Memory |
| LangChain | Available | Callbacks, VectorStore, Memory |
| AWS Strands | Available | Agent callbacks |

```bash
pip install maif[integrations]
```

### LangGraph

```python
from langgraph.graph import StateGraph
from maif.integrations.langgraph import MAIFCheckpointer

checkpointer = MAIFCheckpointer("state.maif")
app = graph.compile(checkpointer=checkpointer)
result = app.invoke(state, config)
checkpointer.finalize()
```

### CrewAI

```python
from crewai import Crew
from maif.integrations.crewai import MAIFCrewCallback

callback = MAIFCrewCallback("crew.maif")
crew = Crew(
agents=[...],
tasks=[...],
task_callback=callback.on_task_complete,
step_callback=callback.on_step,
)
result = crew.kickoff()
callback.finalize()
```

See the [integrations documentation](docs/guide/integrations/) for full details.

---

## Quick Start

**Prerequisites:** Python 3.9+

### Installation

```bash
# Clone the repository
git clone https://github.com/vineethsai/maif.git
cd maif

# Install MAIF
pip install -e .

# With ML features (embeddings, semantic search)
pip install -e ".[ml]"
```

### Your First MAIF Artifact

```python
from maif import MAIFEncoder, MAIFDecoder, verify_maif

# Create an agent memory artifact (Ed25519 signed automatically)
encoder = MAIFEncoder("agent_memory.maif", agent_id="my-agent")

# Add content with automatic provenance tracking
encoder.add_text_block("User asked about weather in NYC", metadata={"type": "query"})
encoder.add_text_block("Temperature is 72°F, sunny", metadata={"type": "response"})

# Finalize (signs and seals the file)
encoder.finalize()

# Later: Load and verify integrity
decoder = MAIFDecoder("agent_memory.maif")
decoder.load()

is_valid, errors = decoder.verify_integrity()
print(f"Valid: {is_valid}, Blocks: {len(decoder.blocks)}")

# Read content
for i, block in enumerate(decoder.blocks):
text = decoder.get_text_content(i)
print(f"Block {i}: {text}")
```

**Secure MAIF Format:**
- **Self-contained** - No separate manifest files, everything in one `.maif` file
- **Ed25519 signatures** - Fast, compact 64-byte signatures on every block
- **Immutable blocks** - Each block is signed immediately on write
- **Tamper detection** - Cryptographic verification catches any modification
- **Embedded provenance** - Full audit trail built into the file

---

## Featured Example: Multi-Agent RAG System

A multi-agent system example with **LangGraph orchestration** and **MAIF provenance tracking** for demonstration purposes.

```bash
cd examples/langgraph

# Configure API key
echo "GEMINI_API_KEY=your_key" > .env

# Install dependencies
pip install -r requirements_enhanced.txt

# Create knowledge base with embeddings
python3 create_kb_enhanced.py

# Run the interactive demo
python3 demo_enhanced.py
```

**What's Included:**
- 5 specialized agents (Retriever, Synthesizer, Fact-Checker, Citation, Web Search)
- ChromaDB vector store with semantic search
- Gemini API integration for LLM reasoning
- Complete audit trail of every agent action
- Multi-turn conversation support

See [`examples/langgraph/README.md`](examples/langgraph/README.md) for full documentation.

---

## NEW: Enterprise AI Governance Demo

Interactive demonstration of MAIF's enterprise-grade governance features:

```bash
cd examples/integrations/langgraph_governance_demo
python main.py
```

**Features demonstrated:**
- Cryptographic provenance (Ed25519 signatures, hash chains)
- Tamper detection and data integrity verification
- Role-based access control with audit logging
- Multi-agent coordination with clear handoffs
- Compliance report generation (Markdown, JSON, CSV)

See [`examples/integrations/langgraph_governance_demo/README.md`](examples/integrations/langgraph_governance_demo/README.md) for details.

---

## Features

### Cryptographic Provenance

Every block is cryptographically signed and linked - any tampering is detectable.

```python
from maif import MAIFEncoder, MAIFDecoder

# Each block is signed with Ed25519 on creation
encoder = MAIFEncoder("memory.maif", agent_id="agent-1")
encoder.add_text_block("First message") # Signed immediately
encoder.add_text_block("Second message") # Linked to previous via hash
encoder.add_text_block("Third message") # Chain continues
encoder.finalize()

# Verify the entire chain + all signatures
decoder = MAIFDecoder("memory.maif")
decoder.load()
is_valid, errors = decoder.verify_integrity()

# Check provenance chain
for entry in decoder.provenance:
print(f"{entry.action} by {entry.agent_id} at {entry.timestamp}")
```

### Privacy & Security

Built-in encryption, anonymization, and access control.

```python
from maif import PrivacyLevel, EncryptionMode

# Add encrypted content
maif.add_text(
"Sensitive data",
encrypt=True,
anonymize=True, # Auto-redact PII
privacy_level=PrivacyLevel.CONFIDENTIAL
)

# Access control
maif.add_access_rule(AccessRule(
role="analyst",
permissions=[Permission.READ],
resources=["reports"]
))
```

### Multimodal Support

Store and search across text, images, video, embeddings, and knowledge graphs.

```python
from maif_api import MAIF

maif = MAIF("my-agent")

# Text with metadata
maif.add_text("Analysis results", metadata={"title": "Report", "language": "en"})

# Images with feature extraction
maif.add_image("chart.png", metadata={"title": "Sales Chart"})

# Semantic embeddings (pre-computed or from TF-IDF)
maif.add_embeddings([[0.1, 0.2, 0.3], [0.2, 0.3, 0.4]])

# Multimodal content - combines text, images, and embeddings
maif.add_multimodal({
"text": "Product description",
"image_path": "product.jpg",
"embeddings": [[0.1, 0.2, ...]],
"metadata": {"category": "electronics"}
})

maif.save("output.maif")
```

## What's Working

The following features are fully tested and working:

- **Ed25519 cryptographic signatures** - Fast, compact 64-byte signatures ✓
- **Multiple compression formats** - ZLIB, BROTLI, GZIP, and other standard formats ✓
- **Framework integrations** - LangGraph, CrewAI, LangChain, AWS Strands ✓
- **Provenance tracking** - Hash-chained blocks with tamper detection ✓
- **TF-IDF embeddings** - Lightweight semantic search with sklearn ✓

## What's In Progress / Research Phase

The following are research implementations with known limitations:

### Hierarchical Semantic Compression (HSC)
- **Status**: Research implementation
- **Current performance**: ~1.5x compression ratio on embeddings
- **What works**: DBSCAN clustering, vector quantization, Huffman coding
- **Limitations**: Not achieving claimed 2.5-4x ratio, not production-ready
- **Roadmap**: Plan to implement proper Product Quantization in v2.2

### Adaptive Cross-Modal Attention (ACAM)
```python
from maif.semantic import AdaptiveCrossModalAttention
import numpy as np

# ⚠ RESEARCH IMPLEMENTATION - Use with caution
acam = AdaptiveCrossModalAttention(embedding_dim=384, num_heads=8)

# Train on multimodal data (optional but recommended)
training_data = [
{
"text": np.random.randn(384),
"image": np.random.randn(384),
"audio": np.random.randn(384),
},
# ... more samples
]
stats = acam.fit(training_data, epochs=10)

# Use for attention computation
embeddings = {
"text": np.random.randn(384),
"image": np.random.randn(384),
}
weights = acam.compute_attention_weights(embeddings)

# Get fused representation
attended = acam.get_attended_representation(embeddings, weights, "text")

# Save/load trained weights
acam.save_weights("acam_weights.pkl")
acam.load_weights("acam_weights.pkl")
```
- **Status**: Research implementation
- **Current capability**: Computes cross-modal attention weights
- **Known limitations**: Training uses simple gradient descent, not optimized

### Cryptographic Semantic Binding (CSB)
- **Status**: Research implementation
- **Current capability**: SHA-256 based commitment schemes
- **Note**: Infrastructure in place, not validated for production use

### Neural Embeddings
- **Status**: ❌ Not implemented
- **Current**: TF-IDF only (sklearn-based)
- **Planned**: Optional sentence-transformers integration in future versions
- **Note**: Infrastructure exists but neural models not functional

---

## Performance

| Metric | Performance | Notes |
|--------|-------------|-------|
| Semantic Search | ~30ms for 1K vectors | TF-IDF based, tested at 1K, scales linearly |
| Standard Compression (ZLIB) | 2-3× typical | Proven, well-tested |
| Hierarchical Semantic Compression (HSC) | ~1.5× average | Research implementation, not production-ready |
| Integrity Verification | ~0.1ms per file | Ed25519 signature verification |
| Tamper Detection | 100% detection in <0.1ms | Hash-chain verification |
| Signature Overhead | 64 bytes per block | Ed25519 signatures |

**Note:** HSC claims of "2.5-4x compression up to 10x maximum" were not verified and are not guaranteed. Current implementation achieves ~1.5x on embeddings.

---

## Project Structure

```
maif/
├── maif/ # Core library
│ ├── core.py # MAIFEncoder, MAIFDecoder
│ ├── security.py # Signing, verification
│ ├── privacy.py # Encryption, anonymization
│ ├── integrations/ # Framework integrations (LangGraph, etc.)
│ └── semantic*.py # Embeddings, compression
├── maif_api.py # High-level API
├── examples/
│ ├── langgraph/ # Multi-agent RAG system
│ ├── integrations/ # Framework integration demos
│ ├── basic/ # Getting started
│ ├── security/ # Privacy & encryption
│ └── advanced/ # Agent framework, lifecycle
├── tests/ # 450+ tests
├── docs/ # VitePress documentation
└── benchmarks/ # Performance tests
```

---

## Documentation

| Resource | Description |
|----------|-------------|
| [Getting Started](https://vineethsai.github.io/maif/guide/getting-started) | Quick start guide |
| [Framework Integrations](https://vineethsai.github.io/maif/guide/integrations/) | LangGraph, LangChain, CrewAI |
| [DeepWiki](https://deepwiki.com/vineethsai/maif) | Auto-generated API docs and code exploration |
| [Examples](examples/) | Working code examples |

---

## Examples

### Basic Usage
```bash
python examples/basic/simple_api_demo.py
python examples/basic/basic_usage.py
```

### Privacy & Security
```bash
python examples/security/privacy_demo.py
python examples/security/classified_api_simple_demo.py
```

### Advanced Features
```bash
python examples/advanced/maif_agent_demo.py # Agent framework
python examples/advanced/lifecycle_management_demo.py # Lifecycle management
python examples/advanced/video_demo.py # Video processing
```

---

## Contributing

We welcome contributions! Please ensure:

1. All tests pass (`pytest tests/`)
2. Code follows PEP 8 style
3. New features include tests and documentation
4. Security-sensitive changes include impact analysis

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

---

## References

- [FIPS 140-2 Standards](https://csrc.nist.gov/publications/detail/fips/140/2/final) - Cryptographic module requirements
- [NIST 800-53](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final) - Security and privacy controls
- [ISO BMFF](https://www.iso.org/standard/68960.html) - Binary format inspiration

---

## License

MIT License - See [LICENSE](LICENSE) for details.

---

## Community & Support

- **[GitHub Discussions](https://github.com/vineethsai/maif/discussions)** - Ask questions, share ideas
- **[Issue Tracker](https://github.com/vineethsai/maif/issues)** - Report bugs or request features
- **[Documentation](https://vineethsai.github.io/maif/)** - Complete guides and API reference
- **[Security](SECURITY.md)** - Report security vulnerabilities
- **[Changelog](CHANGELOG.md)** - See what's new
- **[Specification](SPECIFICATION.md)** - MAIF file format specification

---


Build trustworthy AI agents with cryptographic provenance