https://github.com/agentralabs/agentic-vision

Persistent visual memory for AI agents — capture screenshots, embed with CLIP ViT-B/32, compare, recall. MCP server + Rust core library.
https://github.com/agentralabs/agentic-vision
agentic ai-agents claude clip computer-vision cursor embeddings image-similarity mcp model-context-protocol onnx rust screenshots vision visual-memory
Last synced: 4 months ago
JSON representation
Persistent visual memory for AI agents — capture screenshots, embed with CLIP ViT-B/32, compare, recall. MCP server + Rust core library.
Host: GitHub
URL: https://github.com/agentralabs/agentic-vision
Owner: agentralabs
License: other
Created: 2026-02-18T21:59:16.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-02-23T03:38:32.000Z (4 months ago)
Last Synced: 2026-02-23T04:38:58.434Z (4 months ago)
Topics: agentic, ai-agents, claude, clip, computer-vision, cursor, embeddings, image-similarity, mcp, model-context-protocol, onnx, rust, screenshots, vision, visual-memory
Language: Rust
Homepage: https://crates.io/crates/agentic-vision
Size: 2.13 MB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 12
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project

README

          


  





  

  

  

  





  Quickstart · Why · Benchmarks · How It Works · Install · Full Install Guide · Paper



---

## AI agents can't see across sessions.

Your agent takes a screenshot, analyzes it, and forgets. Next session — blank slate. It can't compare what a page looks like now versus yesterday. It can't recall what the error dialog said three conversations ago. It can't search its own visual history.

Text-based memory exists. Visual memory doesn't — until now.

**AgenticVision** gives AI agents persistent visual memory. Capture images, embed them with CLIP ViT-B/32, store them in a compact binary format, and query them by similarity, time, or description. Every capture is a first-class MCP resource that any LLM can access.

```bash

cargo install agentic-vision-mcp

```

One binary. 11 MCP tools. Persistent `.avis` files. Works with Claude Desktop, VS Code, Cursor, Windsurf, and any MCP-compatible client.



  



---



## Benchmarks

Rust core. CLIP ViT-B/32 via ONNX Runtime. Binary `.avis` format. Real numbers from `cargo test --release`:

| Operation | Time | Notes |

|:---|---:|:---|

| Image capture (file → embed → store) | **47 ms** | CLIP ViT-B/32, 512-dim |

| Similarity search (top-5) | **1-2 ms** | Brute-force cosine, f64 precision |

| Visual diff (pixel-level) | **<1 ms** | 8×8 grid region detection |

| MCP tool round-trip | **7.2 ms** | Including process startup (~6.1 ms) |

| Storage per capture | **~4.26 KB** | Embedding + JPEG thumbnail |

| Capacity per GB | **~250K** | Observations |

> All benchmarks on Apple M4, macOS 26.2, Rust 1.90.0 `--release`. ONNX Runtime for CLIP inference. Fallback mode available when ONNX model is not present.



  



---



## Why AgenticVision

**Agents need visual continuity.** A debugging agent should remember what the UI looked like before and after a code change. A monitoring agent should detect visual regressions. A research agent should build a visual knowledge base over time.

**Capture once, query forever.** Every image is embedded into a 512-dimensional CLIP vector and stored with its JPEG thumbnail, timestamp, and description. Query by cosine similarity, time range, or text search — in milliseconds.

**Binary format, not a database.** The `.avis` file is a single portable binary — 64-byte header, JSON payload, JPEG thumbnails. Copy it, share it, back it up. No server, no database, no dependencies.

**Works with every MCP client.** AgenticVision-MCP exposes 11 tools, 6 resources, and 4 prompts via the Model Context Protocol. Any LLM that speaks MCP gains visual memory automatically.

**Links to AgenticMemory.** The `vision_link` tool connects visual captures to [AgenticMemory](https://github.com/agentralabs/agentic-memory) cognitive graph nodes — bridging what an agent *sees* with what it *knows*.

---



## How It Works



  



1. **Capture** — `vision_capture` accepts images from files, base64, screenshots, or the system clipboard. Each image is resized, embedded via CLIP ViT-B/32 into a 512-dimensional vector, compressed to JPEG thumbnail, and stored in the `.avis` binary file. Screenshots support optional region capture; clipboard reads the current image from the OS clipboard.

2. **Query** — `vision_query` retrieves captures by time range, description, recency, and quality constraints (`min_quality`, `sort_by`). Results include capture metadata, quality scores, thumbnails, and similarity scores.

3. **Compare** — `vision_compare` places two captures side-by-side for LLM analysis. `vision_diff` performs pixel-level differencing with 8×8 grid region detection to identify exactly what changed.

4. **Link** — `vision_link` connects captures to AgenticMemory nodes, bridging visual observations with the agent's cognitive graph. An agent can recall "what did the UI look like when I made that decision?"

**The `.avis` binary format** uses a 64-byte fixed header (magic `0x41564953`, version, counts, timestamps) followed by a JSON payload containing captures with embedded JPEG thumbnails and 512-dim float vectors. Single-file, portable, no external dependencies.

MCP surface area




**11 Tools:**

| Tool | Description |

|:---|:---|

| `vision_capture` | Capture and embed an image (file, base64, screenshot, clipboard), with metadata redaction and quality scoring |

| `vision_compare` | Side-by-side comparison of two captures |

| `vision_query` | Query captures by time, description, recency |

| `vision_ocr` | Extract text from a captured image |

| `vision_similar` | Find visually similar captures (cosine similarity) |

| `vision_track` | Track visual changes to a target over time |

| `vision_diff` | Pixel-level diff between two captures |

| `vision_health` | Quality + staleness + memory-link coverage summary |

| `vision_link` | Link a capture to an AgenticMemory node |

| `session_start` | Begin a named observation session |

| `session_end` | End the current session |

**6 Resources:**

| URI | Description |

|:---|:---|

| `avis://capture/{id}` | Single capture with metadata and thumbnail |

| `avis://session/{id}` | All captures in a session |

| `avis://timeline/{start}/{end}` | Captures within a time range |

| `avis://similar/{id}` | Visually similar captures |

| `avis://stats` | Storage statistics and counts |

| `avis://recent` | Most recent captures |

**4 Prompts:**

| Prompt | Description |

|:---|:---|

| `observe` | Guided visual observation workflow |

| `compare` | Structured comparison between captures |

| `track` | Change tracking over time |

| `describe` | Detailed image description |

---



## Install

**One-liner** (desktop profile, backwards-compatible):

```bash

curl -fsSL https://agentralabs.tech/install/vision | bash

```

**Environment profiles** (one command per environment):

```bash

# Desktop MCP clients (auto-merge Claude Desktop + Claude Code when detected)

curl -fsSL https://agentralabs.tech/install/vision/desktop | bash

# Terminal-only (no desktop config writes)

curl -fsSL https://agentralabs.tech/install/vision/terminal | bash

# Remote/server hosts (no desktop config writes)

curl -fsSL https://agentralabs.tech/install/vision/server | bash

```

| Channel | Command | Result |

|:---|:---|:---|

| GitHub installer (official) | `curl -fsSL https://agentralabs.tech/install/vision \| bash` | Installs release binaries when available, otherwise source fallback; merges MCP config |

| GitHub installer (desktop profile) | `curl -fsSL https://agentralabs.tech/install/vision/desktop \| bash` | Explicit desktop profile behavior |

| GitHub installer (terminal profile) | `curl -fsSL https://agentralabs.tech/install/vision/terminal \| bash` | Installs binaries only; no desktop config writes |

| GitHub installer (server profile) | `curl -fsSL https://agentralabs.tech/install/vision/server \| bash` | Installs binaries only; server-safe behavior |

| crates.io + Cargo deps (official) | `cargo install agentic-vision-mcp` + `cargo add agentic-vision` | Installs MCP server binary and adds the core library crate to your project |

### Server auth and artifact sync

For cloud/server runtime:

```bash

export AGENTIC_TOKEN="$(openssl rand -hex 32)"

```

All MCP clients must send `Authorization: Bearer `.

If `.avis/.amem/.acb` files are on another machine, sync them to the server first.



  



**MCP Server** (for Claude Desktop, VS Code, Cursor, Windsurf):

```bash

cargo install agentic-vision-mcp

```

**Core library** (for Rust projects):

```bash

cargo add agentic-vision

```

**Configure Claude Desktop** (`~/Library/Application Support/Claude/claude_desktop_config.json`):

```json

{

  "mcpServers": {

    "vision": {

      "command": "agentic-vision-mcp",

      "args": ["--vision", "~/.vision.avis", "serve"]

    }

  }

}

```

> See [INSTALL.md](INSTALL.md) for full installation guide, VS Code / Cursor configuration, build from source, and troubleshooting.

> **Do not use `/tmp` for vision files** — macOS and Linux clear this directory periodically. Use `~/.vision.avis` for persistent storage.

## Deployment Model

- **Standalone by default:** AgenticVision is independently installable and operable. Integration with AgenticMemory or AgenticCodebase is optional, never required.

- **Autonomic operations by default:** daemon/runtime maintenance uses safe profile-based defaults with cache hygiene, migration safeguards, and health-ledger snapshots.

| Area | Default behavior | Controls |

|:---|:---|:---|

| Autonomic profile | Conservative local-first posture | `CORTEX_AUTONOMIC_PROFILE=desktop|cloud|aggressive` |

| Cache + registry maintenance | Periodic expiry cleanup and registry GC | `CORTEX_MAINTENANCE_TICK_SECS`, `CORTEX_REGISTRY_GC_EVERY_TICKS`, `CORTEX_REGISTRY_GC_KEEP_DELTAS` |

| Storage migration | Policy-gated with checkpointed auto-safe path | `CORTEX_STORAGE_MIGRATION_POLICY=auto-safe|strict|off` |

| Storage budget policy | 20-year projection + capture rollup under pressure | `CORTEX_STORAGE_BUDGET_MODE=auto-rollup|warn|off`, `CORTEX_STORAGE_BUDGET_BYTES`, `CORTEX_STORAGE_BUDGET_HORIZON_YEARS`, `CORTEX_STORAGE_BUDGET_TARGET_FRACTION` |

| Maintenance throttling | SLA-aware under sustained cache pressure | `CORTEX_SLA_MAX_CACHE_ENTRIES_BEFORE_GC_THROTTLE` |

| Health ledger | Periodic operational snapshots (default: `~/.agentra/health-ledger`) | `CORTEX_HEALTH_LEDGER_DIR`, `AGENTRA_HEALTH_LEDGER_DIR`, `CORTEX_HEALTH_LEDGER_EMIT_SECS` |

---



## Quickstart

### MCP (Claude Desktop, VS Code, Cursor)

After configuring the MCP server (see [Install](#install)), ask your agent:

> "Take a screenshot and remember it."

The LLM calls `vision_capture` automatically. Then later:

> "What did the screen look like earlier?"

The LLM calls `vision_query` to retrieve and display past captures.

### Rust API

```rust

use agentic_vision::{VisionStore, CaptureSource};

let mut store = VisionStore::open("observations.avis")?;

// Capture from file

let id = store.capture(

    CaptureSource::File("screenshot.png"),

    "Homepage after deploy"

)?;

// Find similar

let matches = store.similar(id, 5)?;

for m in matches {

    println!("  {} (similarity: {:.3})", m.description, m.score);

}

```

---

## Validation

| Suite | Tests | Notes |

|:---|---:|:---|

| Rust core (`agentic-vision`) | **38** | Unit + integration (includes screenshot/clipboard) |

| Python SDK tests | **47** | Edge cases, format validation |

| MCP integration suite | **3** | Python → Rust stdio transport |

| Multi-agent suite | **3** | Shared file, vision-memory linking, rapid handoff |

| **Total** | **91** | All passing |

**Two research papers:**

- [Paper I: Cortex — Web Cartography (10 pages, 8 figures, 13 tables)](publication/paper-i-cortex/cortex-paper.pdf)

- [Paper II: AgenticVision-MCP — Persistent Visual Memory via MCP (8 pages, 4 figures, 7 tables)](publication/paper-ii-agentic-vision-mcp/agentic-vision-mcp-paper.pdf)

---

## Repository Structure

This is a Cargo workspace monorepo containing the core library and MCP server.

```

agentic-vision/

├── Cargo.toml                    # Workspace root

├── crates/

│   ├── agentic-vision/           # Core library (crates.io: agentic-vision v0.1.0)

│   └── agentic-vision-mcp/       # MCP server (crates.io: agentic-vision-mcp v0.1.0)

├── tests/                        # Integration tests (Python → Rust, multi-agent)

├── models/                       # ONNX model directory (CLIP ViT-B/32)

├── publication/                  # Research papers (I, II)

├── assets/                       # SVG diagrams and visuals

└── docs/                         # Guides and reference

```

### Running Tests

```bash

# All workspace tests (unit + integration)

cargo test --workspace

# Core library only

cargo test -p agentic-vision

# MCP server only

cargo test -p agentic-vision-mcp

# Python integration tests

python tests/integration/test_mcp_clients.py

python tests/integration/test_multi_agent.py

```

### MCP Server Quick Start

```bash

cargo install agentic-vision-mcp

```

Configure Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`):

```json

{

  "mcpServers": {

    "vision": {

      "command": "agentic-vision-mcp",

      "args": ["--vision", "~/.vision.avis", "serve"]

    }

  }

}

```

`agentic-vision-mcp` supports both line-delimited JSON-RPC and `Content-Length` framed MCP stdio messages.

---

## Roadmap: v0.2.0 — Remote Server Support

The next release is planned to add HTTP/SSE transport for remote deployments. Track progress in [#2](https://github.com/agentralabs/agentic-vision/issues/2).

| Feature | Status |

|:---|:---|

| `--token` bearer auth | Planned |

| `--multi-tenant` per-user vision files | Planned |

| `/health` endpoint | Planned |

| `--tls-cert` / `--tls-key` native HTTPS | Planned |

| OCR with Tesseract (`--features ocr`) | Planned |

| Clipboard TIFF fix | Planned |

| `delete` / `export` / `compact` CLI commands | Planned |

| Docker image + compose | Planned |

| Remote deployment docs | Planned |

Planned CLI shape (not available in current release):

```text

agentic-vision-mcp serve-http --port 8081 --token ""

agentic-vision-mcp serve-http --multi-tenant --data-dir /data/users --port 8081 --token ""

```

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). The fastest ways to help:

1. **Try it** and [file issues](https://github.com/agentralabs/agentic-vision/issues)

2. **Add an MCP tool** — extend the visual memory surface

3. **Write an example** — show a real use case

4. **Improve docs** — every clarification helps someone

---



  _{Built by Agentra Labs}
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/agentralabs/agentic-vision

Awesome Lists containing this project

README