https://github.com/mbhatt1/maif
Cryptographically-secure, auditable file format for AI agent memory with provenance tracking
https://github.com/mbhatt1/maif
ai ai-agent-tools ai-agents-framework cryptography
Last synced: 3 months ago
JSON representation
Cryptographically-secure, auditable file format for AI agent memory with provenance tracking
- Host: GitHub
- URL: https://github.com/mbhatt1/maif
- Owner: mbhatt1
- License: mit
- Created: 2025-06-08T23:04:23.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-01-22T22:43:38.000Z (5 months ago)
- Last Synced: 2026-01-22T23:24:25.958Z (5 months ago)
- Topics: ai, ai-agent-tools, ai-agents-framework, cryptography
- Language: Python
- Homepage: https://vineethsai.github.io/maif/
- Size: 133 MB
- Stars: 6
- Watchers: 0
- Forks: 1
- Open Issues: 18
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project
README
MAIF
Multimodal Artifact File Format for Trustworthy AI Agents
Cryptographically-secure, auditable file format for AI agent memory with provenance tracking
---
## Overview
MAIF is a file format and SDK designed for AI agents that need **trustworthy memory**. Every piece of data is cryptographically linked, creating tamper-evident audit trails that prove exactly what happened, when, and by which agent.
**Key Capabilities:**
- **Cryptographic Provenance** - Hash-chained blocks for tamper-evident audit trails
- **Multi-Agent Coordination** - Shared artifacts with agent-specific logging
- **Multimodal Storage** - Text, embeddings, images, video, knowledge graphs
- **Privacy-by-Design** - Encryption, anonymization, access control built-in
- **High Performance** - Memory-mapped I/O, streaming, semantic compression
## Use Cases
- **Multi-Agent Systems** - Shared memory with full provenance (see LangGraph example)
- **RAG Pipelines** - Document storage with embeddings, search, and citation tracking
- **Compliance & Audit** - Immutable audit trails for regulated industries
- **Research** - Reproducible experiments with complete data lineage
- **Enterprise AI** - Secure, auditable AI workflows with access control
## Framework Integrations
MAIF provides drop-in integrations for popular AI agent frameworks:
| Framework | Status | Description |
|-----------|--------|-------------|
| LangGraph | Available | State checkpointer with provenance |
| CrewAI | Available | Crew/Agent callbacks, Memory |
| LangChain | Available | Callbacks, VectorStore, Memory |
| AWS Strands | Available | Agent callbacks |
```bash
pip install maif[integrations]
```
### LangGraph
```python
from langgraph.graph import StateGraph
from maif.integrations.langgraph import MAIFCheckpointer
checkpointer = MAIFCheckpointer("state.maif")
app = graph.compile(checkpointer=checkpointer)
result = app.invoke(state, config)
checkpointer.finalize()
```
### CrewAI
```python
from crewai import Crew
from maif.integrations.crewai import MAIFCrewCallback
callback = MAIFCrewCallback("crew.maif")
crew = Crew(
agents=[...],
tasks=[...],
task_callback=callback.on_task_complete,
step_callback=callback.on_step,
)
result = crew.kickoff()
callback.finalize()
```
See the [integrations documentation](docs/guide/integrations/) for full details.
---
## Quick Start
**Prerequisites:** Python 3.9+
### Installation
```bash
# Clone the repository
git clone https://github.com/vineethsai/maif.git
cd maif
# Install MAIF
pip install -e .
# With ML features (embeddings, semantic search)
pip install -e ".[ml]"
```
### Your First MAIF Artifact
```python
from maif import MAIFEncoder, MAIFDecoder, verify_maif
# Create an agent memory artifact (Ed25519 signed automatically)
encoder = MAIFEncoder("agent_memory.maif", agent_id="my-agent")
# Add content with automatic provenance tracking
encoder.add_text_block("User asked about weather in NYC", metadata={"type": "query"})
encoder.add_text_block("Temperature is 72°F, sunny", metadata={"type": "response"})
# Finalize (signs and seals the file)
encoder.finalize()
# Later: Load and verify integrity
decoder = MAIFDecoder("agent_memory.maif")
decoder.load()
is_valid, errors = decoder.verify_integrity()
print(f"Valid: {is_valid}, Blocks: {len(decoder.blocks)}")
# Read content
for i, block in enumerate(decoder.blocks):
text = decoder.get_text_content(i)
print(f"Block {i}: {text}")
```
**Secure MAIF Format:**
- **Self-contained** - No separate manifest files, everything in one `.maif` file
- **Ed25519 signatures** - Fast, compact 64-byte signatures on every block
- **Immutable blocks** - Each block is signed immediately on write
- **Tamper detection** - Cryptographic verification catches any modification
- **Embedded provenance** - Full audit trail built into the file
---
## Featured Example: Multi-Agent RAG System
A multi-agent system example with **LangGraph orchestration** and **MAIF provenance tracking** for demonstration purposes.
```bash
cd examples/langgraph
# Configure API key
echo "GEMINI_API_KEY=your_key" > .env
# Install dependencies
pip install -r requirements_enhanced.txt
# Create knowledge base with embeddings
python3 create_kb_enhanced.py
# Run the interactive demo
python3 demo_enhanced.py
```
**What's Included:**
- 5 specialized agents (Retriever, Synthesizer, Fact-Checker, Citation, Web Search)
- ChromaDB vector store with semantic search
- Gemini API integration for LLM reasoning
- Complete audit trail of every agent action
- Multi-turn conversation support
See [`examples/langgraph/README.md`](examples/langgraph/README.md) for full documentation.
---
## NEW: Enterprise AI Governance Demo
Interactive demonstration of MAIF's enterprise-grade governance features:
```bash
cd examples/integrations/langgraph_governance_demo
python main.py
```
**Features demonstrated:**
- Cryptographic provenance (Ed25519 signatures, hash chains)
- Tamper detection and data integrity verification
- Role-based access control with audit logging
- Multi-agent coordination with clear handoffs
- Compliance report generation (Markdown, JSON, CSV)
See [`examples/integrations/langgraph_governance_demo/README.md`](examples/integrations/langgraph_governance_demo/README.md) for details.
---
## Features
### Cryptographic Provenance
Every block is cryptographically signed and linked - any tampering is detectable.
```python
from maif import MAIFEncoder, MAIFDecoder
# Each block is signed with Ed25519 on creation
encoder = MAIFEncoder("memory.maif", agent_id="agent-1")
encoder.add_text_block("First message") # Signed immediately
encoder.add_text_block("Second message") # Linked to previous via hash
encoder.add_text_block("Third message") # Chain continues
encoder.finalize()
# Verify the entire chain + all signatures
decoder = MAIFDecoder("memory.maif")
decoder.load()
is_valid, errors = decoder.verify_integrity()
# Check provenance chain
for entry in decoder.provenance:
print(f"{entry.action} by {entry.agent_id} at {entry.timestamp}")
```
### Privacy & Security
Built-in encryption, anonymization, and access control.
```python
from maif import PrivacyLevel, EncryptionMode
# Add encrypted content
maif.add_text(
"Sensitive data",
encrypt=True,
anonymize=True, # Auto-redact PII
privacy_level=PrivacyLevel.CONFIDENTIAL
)
# Access control
maif.add_access_rule(AccessRule(
role="analyst",
permissions=[Permission.READ],
resources=["reports"]
))
```
### Multimodal Support
Store and search across text, images, video, embeddings, and knowledge graphs.
```python
from maif_api import MAIF
maif = MAIF("my-agent")
# Text with metadata
maif.add_text("Analysis results", metadata={"title": "Report", "language": "en"})
# Images with feature extraction
maif.add_image("chart.png", metadata={"title": "Sales Chart"})
# Semantic embeddings (pre-computed or from TF-IDF)
maif.add_embeddings([[0.1, 0.2, 0.3], [0.2, 0.3, 0.4]])
# Multimodal content - combines text, images, and embeddings
maif.add_multimodal({
"text": "Product description",
"image_path": "product.jpg",
"embeddings": [[0.1, 0.2, ...]],
"metadata": {"category": "electronics"}
})
maif.save("output.maif")
```
## What's Working
The following features are fully tested and working:
- **Ed25519 cryptographic signatures** - Fast, compact 64-byte signatures ✓
- **Multiple compression formats** - ZLIB, BROTLI, GZIP, and other standard formats ✓
- **Framework integrations** - LangGraph, CrewAI, LangChain, AWS Strands ✓
- **Provenance tracking** - Hash-chained blocks with tamper detection ✓
- **TF-IDF embeddings** - Lightweight semantic search with sklearn ✓
## What's In Progress / Research Phase
The following are research implementations with known limitations:
### Hierarchical Semantic Compression (HSC)
- **Status**: Research implementation
- **Current performance**: ~1.5x compression ratio on embeddings
- **What works**: DBSCAN clustering, vector quantization, Huffman coding
- **Limitations**: Not achieving claimed 2.5-4x ratio, not production-ready
- **Roadmap**: Plan to implement proper Product Quantization in v2.2
### Adaptive Cross-Modal Attention (ACAM)
```python
from maif.semantic import AdaptiveCrossModalAttention
import numpy as np
# ⚠ RESEARCH IMPLEMENTATION - Use with caution
acam = AdaptiveCrossModalAttention(embedding_dim=384, num_heads=8)
# Train on multimodal data (optional but recommended)
training_data = [
{
"text": np.random.randn(384),
"image": np.random.randn(384),
"audio": np.random.randn(384),
},
# ... more samples
]
stats = acam.fit(training_data, epochs=10)
# Use for attention computation
embeddings = {
"text": np.random.randn(384),
"image": np.random.randn(384),
}
weights = acam.compute_attention_weights(embeddings)
# Get fused representation
attended = acam.get_attended_representation(embeddings, weights, "text")
# Save/load trained weights
acam.save_weights("acam_weights.pkl")
acam.load_weights("acam_weights.pkl")
```
- **Status**: Research implementation
- **Current capability**: Computes cross-modal attention weights
- **Known limitations**: Training uses simple gradient descent, not optimized
### Cryptographic Semantic Binding (CSB)
- **Status**: Research implementation
- **Current capability**: SHA-256 based commitment schemes
- **Note**: Infrastructure in place, not validated for production use
### Neural Embeddings
- **Status**: ❌ Not implemented
- **Current**: TF-IDF only (sklearn-based)
- **Planned**: Optional sentence-transformers integration in future versions
- **Note**: Infrastructure exists but neural models not functional
---
## Performance
| Metric | Performance | Notes |
|--------|-------------|-------|
| Semantic Search | ~30ms for 1K vectors | TF-IDF based, tested at 1K, scales linearly |
| Standard Compression (ZLIB) | 2-3× typical | Proven, well-tested |
| Hierarchical Semantic Compression (HSC) | ~1.5× average | Research implementation, not production-ready |
| Integrity Verification | ~0.1ms per file | Ed25519 signature verification |
| Tamper Detection | 100% detection in <0.1ms | Hash-chain verification |
| Signature Overhead | 64 bytes per block | Ed25519 signatures |
**Note:** HSC claims of "2.5-4x compression up to 10x maximum" were not verified and are not guaranteed. Current implementation achieves ~1.5x on embeddings.
---
## Project Structure
```
maif/
├── maif/ # Core library
│ ├── core.py # MAIFEncoder, MAIFDecoder
│ ├── security.py # Signing, verification
│ ├── privacy.py # Encryption, anonymization
│ ├── integrations/ # Framework integrations (LangGraph, etc.)
│ └── semantic*.py # Embeddings, compression
├── maif_api.py # High-level API
├── examples/
│ ├── langgraph/ # Multi-agent RAG system
│ ├── integrations/ # Framework integration demos
│ ├── basic/ # Getting started
│ ├── security/ # Privacy & encryption
│ └── advanced/ # Agent framework, lifecycle
├── tests/ # 450+ tests
├── docs/ # VitePress documentation
└── benchmarks/ # Performance tests
```
---
## Documentation
| Resource | Description |
|----------|-------------|
| [Getting Started](https://vineethsai.github.io/maif/guide/getting-started) | Quick start guide |
| [Framework Integrations](https://vineethsai.github.io/maif/guide/integrations/) | LangGraph, LangChain, CrewAI |
| [DeepWiki](https://deepwiki.com/vineethsai/maif) | Auto-generated API docs and code exploration |
| [Examples](examples/) | Working code examples |
---
## Examples
### Basic Usage
```bash
python examples/basic/simple_api_demo.py
python examples/basic/basic_usage.py
```
### Privacy & Security
```bash
python examples/security/privacy_demo.py
python examples/security/classified_api_simple_demo.py
```
### Advanced Features
```bash
python examples/advanced/maif_agent_demo.py # Agent framework
python examples/advanced/lifecycle_management_demo.py # Lifecycle management
python examples/advanced/video_demo.py # Video processing
```
---
## Contributing
We welcome contributions! Please ensure:
1. All tests pass (`pytest tests/`)
2. Code follows PEP 8 style
3. New features include tests and documentation
4. Security-sensitive changes include impact analysis
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
---
## References
- [FIPS 140-2 Standards](https://csrc.nist.gov/publications/detail/fips/140/2/final) - Cryptographic module requirements
- [NIST 800-53](https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final) - Security and privacy controls
- [ISO BMFF](https://www.iso.org/standard/68960.html) - Binary format inspiration
---
## License
MIT License - See [LICENSE](LICENSE) for details.
---
## Community & Support
- **[GitHub Discussions](https://github.com/vineethsai/maif/discussions)** - Ask questions, share ideas
- **[Issue Tracker](https://github.com/vineethsai/maif/issues)** - Report bugs or request features
- **[Documentation](https://vineethsai.github.io/maif/)** - Complete guides and API reference
- **[Security](SECURITY.md)** - Report security vulnerabilities
- **[Changelog](CHANGELOG.md)** - See what's new
- **[Specification](SPECIFICATION.md)** - MAIF file format specification
---
Build trustworthy AI agents with cryptographic provenance