An open API service indexing awesome lists of open source software.

https://github.com/hkuds/catchme

"CatchMe: Make Your AI Agents Truly Personal"
https://github.com/hkuds/catchme

ai-agent clawdbot-plugin llm recall-ai retrieval-systems screen-recorder

Last synced: 2 months ago
JSON representation

"CatchMe: Make Your AI Agents Truly Personal"

Awesome Lists containing this project

README

          


中文 · 日本語 · Español · English


CatchMe Logo

CatchMe: Make Your AI Agents Truly Personal


Capture Your Entire Digital Footprint: Lightweight & Vectorless & Powerful.


License
Python
Platform
Blog
Report


Feishu
WeChat
Discord


Features  · 
How It Works  · 
LLM Config  · 
Get Started  · 
Cost  · 
Community

Just do your thing. CatchMe captures everything else — stored locally to ensure privacy and security.


CatchMe Terminal Demo

**🦞 Makes Your Agents Truly Personal**. CatchMe ships as an agent-compatible skill for CLI agents (OpenClaw, NanoBot, Claude, Cursor, etc.). Run CatchMe independently. Your agents query memories via CLI commands only.
##

## 🎯 Enrich Your Personal Digital Context



Coding

💻 Personal Coding Assistant


"What was I coding in Claude Code today?"



• Code session replay

• Recall your edited files

• Trace what you typed



Research

🔍 Personal Deep Research


"What was I reading about AI yesterday?"



• Web/PDF viewed

• Search queries typed

• Reading info tracked



Files

📁 Personal Files Manager


"Which files did I change today?"



• File changes tracked

• Docs accessed

• Edits reviewed



Digital Life

🧩 Digital Life Overview


"How did I spend my afternoon?"



• App usage tracked

• Workflows replayed

• Activities recalled


## ✨ Key Features

### 📹 Always-On Event Capture
- **Event-Driven Recording**: No timer or delays - catch mouse actions with crosshair annotation instantly.
- **Comprehensive Context**: Five recorders track windows, keyboard, clipboard, notifications, and files around mouse actions.

### 🌲 Intelligent Memory Hierarchy
- **Auto-Organization**: Raw streams structure into five tiers: Day → Session → App → Location → Action.
- **Smart Summaries**: LLM summaries at each level, transforming logs into searchable knowledge trees.

### 🔍 Tree-Based Retrieval
- **No Vector Complexity**: Skip embeddings and VDBs — our system uses tree-based reasoning for navigation.
- **Top-Down Search**: LLM reads summaries, selects relevant branches, and drills down to evidence.

### 🤖 Zero-Config Agent Integration
- **One-File Setup**: Drop a single skill file into any AI agent for instant integration.
- **Immediate Access**: CLI-based screen history queries with zero configuration required.

### 🪶 Ultralight & Privacy-First
- **Minimal Footprint**: ~0.2GB runtime RAM with efficient SQLite + FTS5 storage.
- **Local & Offline**: All data stays on your machine with full offline mode via Ollama/vLLM/LM Studio.

### 🖥️ Rich Web Interface
- **Visual Exploration**: Interactive timelines, memory tree navigation, and real-time system monitoring.
- **Natural Conversation**: Chat with your complete digital footprint using natural language.


CatchMe Web Dashboard

## 💡 CatchMe Architecture

CatchMe transforms raw digital activity into structured, searchable memory through three concurrent stages:

### 🔄 Record → Organize → Reason: Turn digital chaos into queryable memory

**Capture**. Six background recorders silently track your activity. They monitor window focus, keystrokes, mouse movement, screenshots, clipboard, and notifications.

**Index**. Raw events auto-organize into a Hierarchical Activity Tree: Day → Session → App → Location → Action. Each node gets LLM-generated summaries. Fast, meaningful recall without vector embeddings.

**Retrieve**. You ask a question. The LLM traverses your memory tree top-down. It selects relevant nodes and inspects raw data like screenshots or keystrokes. Then synthesizes a precise answer.


CatchMe Pipeline: Capturing → Indexing → Retrieving

### 🌲 Hierarchical Activity Tree
The Activity Tree is CatchMe's memory core. It provides structured, multi-level views of your digital life. Browse high-level summaries or dive into granular details.


Hierarchical Activity Tree Structure

### 🔍 Intelligent Tree Retrieval
CatchMe skips traditional vector search. Instead, the LLM directly navigates your Activity Tree. This enables complex, cross-day reasoning. Precise evidence gathering from raw activity history.


Tree-based Retrieval Process

**📖 Learn More**: Detailed design insights and technical deep-dive available in our [blog](https://hkuds.github.io/CatchMe/).

## 🧠 LLM Configuration

### **❗️ Data Privacy Notice**
• **100% Local Storage**: All raw data (screenshots, keystrokes, activity trees) stays in ~/data/ and never leaves your machine.

• **Offline-First Options**: Local LLMs (Ollama, vLLM, LM Studio) enable fully offline operation without any cloud dependency.

• **⚠️Cloud Provider Caution**: If used, cloud APIs will be used to summarize your daily activities. **Untrusted endpoints may expose private data** — review data policies of your provider carefully.

### **📋 Requirements**
• **Multimodal support**: Your model should be able to handle text + images.

• **Context window**: Make sure the context window of your model exceed `max_tokens` limits in `config.json`.

• **Cost control**: For *forced cost control*, set limits via `llm.max_calls` or increase `filter.mouse_cluster_gap` to reduce summarization frequency.

CatchMe requires an LLM for background summarization and intelligent retrieval. Use **catchme init** (in Get Started)for **guided setup** or follow the **manual configuration** steps below.

For cloud API services:

```json
{
"llm": {
"provider": "openrouter",
"api_key": "sk-or-...",
"api_url": null,
"model": "google/gemini-3-flash-preview"
}
}
```

For local/offline operation:

```json
{
"llm": {
"provider": "ollama",
"api_key": null,
"api_url": null,
"model": "gemma3:4b"
}
}
```

Supported LLM Providers

| Provider | Config name | Default API URL | Get Key |
| ------------------------- | ------------------------ | ------------------------------------------------------- | -------------------------------------------------------------------- |
| **OpenRouter** (gateway) | `openrouter` | `https://openrouter.ai/api/v1` | [openrouter.ai/keys](https://openrouter.ai/keys) |
| **AiHubMix** (gateway) | `aihubmix` | `https://aihubmix.com/v1` | [aihubmix.com](https://aihubmix.com) |
| **SiliconFlow** (gateway) | `siliconflow` | `https://api.siliconflow.cn/v1` | [cloud.siliconflow.cn](https://cloud.siliconflow.cn) |
| **OpenAI** | `openai` | `https://api.openai.com/v1` | [platform.openai.com](https://platform.openai.com/api-keys) |
| **Anthropic** | `anthropic` | `https://api.anthropic.com/v1` | [console.anthropic.com](https://console.anthropic.com) |
| **DeepSeek** | `deepseek` | `https://api.deepseek.com/v1` | [platform.deepseek.com](https://platform.deepseek.com/api_keys) |
| **Gemini** | `gemini` | `https://generativelanguage.googleapis.com/v1beta` | [aistudio.google.com](https://aistudio.google.com/apikey) |
| **Groq** | `groq` | `https://api.groq.com/openai/v1` | [console.groq.com](https://console.groq.com/keys) |
| **Mistral** | `mistral` | `https://api.mistral.ai/v1` | [console.mistral.ai](https://console.mistral.ai) |
| **Moonshot / Kimi** | `moonshot` | `https://api.moonshot.ai/v1` | [platform.moonshot.cn](https://platform.moonshot.cn) |
| **MiniMax** | `minimax` | `https://api.minimax.io/v1` | [platform.minimaxi.com](https://platform.minimaxi.com) |
| **Zhipu AI (GLM)** | `zhipu` | `https://open.bigmodel.cn/api/paas/v4` | [open.bigmodel.cn](https://open.bigmodel.cn) |
| **DashScope (Qwen)** | `dashscope` | `https://dashscope.aliyuncs.com/compatible-mode/v1` | [dashscope.console.aliyun.com](https://dashscope.console.aliyun.com) |
| **VolcEngine** | `volcengine` | `https://ark.cn-beijing.volces.com/api/v3` | [console.volcengine.com](https://console.volcengine.com) |
| **VolcEngine Coding** | `volcengine_coding_plan` | `https://ark.cn-beijing.volces.com/api/coding/v3` | [console.volcengine.com](https://console.volcengine.com) |
| **BytePlus** | `byteplus` | `https://ark.ap-southeast.bytepluses.com/api/v3` | [console.byteplus.com](https://console.byteplus.com) |
| **BytePlus Coding** | `byteplus_coding_plan` | `https://ark.ap-southeast.bytepluses.com/api/coding/v3` | [console.byteplus.com](https://console.byteplus.com) |
| **Ollama** (local) | `ollama` | `http://localhost:11434/v1` | — |
| **vLLM** (local) | `vllm` | `http://localhost:8000/v1` | — |
| **LM Studio** (local) | `lmstudio` | `http://localhost:1234/v1` | — |

> Any OpenAI-compatible endpoint works — just set `api_url` and `api_key` directly.

All Configuration Parameters

| Section | Parameter | Default | Description |
| ------------- | -------------------------- | ----------- | --------------------------------------------------- |
| **web** | `host` | `127.0.0.1` | Dashboard bind address |
| | `port` | `8765` | Dashboard port |
| **llm** | `provider` | — | LLM provider name (see table above) |
| | `api_key` | — | API key for the provider |
| | `api_url` | *(auto)* | Custom endpoint; auto-set per provider if omitted |
| | `model` | — | Model name (provider-specific) |
| | `max_calls` | `0` | Max LLM calls per cycle (`0` = unlimited; set to limit costs) |
| | `max_images_per_cluster` | `5` | Max screenshots sent per event cluster |
| **filter** | `window_min_dwell` | `3.0` | Min window dwell time (sec) before recording |
| | `keyboard_cluster_gap` | `3.0` | Keyboard event clustering gap (sec) |
| | `mouse_cluster_gap` | `3.0` | Time gap (sec) to merge mouse events; **larger values reduce LLM summaries** |
| **summarize** | `language` | `en` | Summary output language (`en`, `zh`, etc.) |
| | `max_tokens_l0`–`l3` | `1200` | Max tokens per tree level (L0=Action … L3=Session) |
| | `temperature` | `0.4` | LLM temperature for summarization |
| | `max_workers` | `2` | Concurrent summarization workers |
| | `debounce_sec` | `3.0` | Debounce before triggering summary |
| | `save_interval_sec` | `5.0` | Tree auto-save interval |
| **retrieve** | `max_prompt_chars` | `42000` | Max chars in retrieval prompt |
| | `max_iterations` | `15` | Max tree traversal iterations |
| | `max_file_chars` | `8000` | Max chars from extracted files |
| | `max_select_nodes` | `7` | Max nodes selected per iteration |
| | `max_tokens_step` | `4096` | Max tokens per retrieval step |
| | `max_tokens_answer` | `8192` | Max tokens for final answer |
| | `temperature_select` | `0.3` | Temperature for node selection |
| | `temperature_answer` | `0.5` | Temperature for answer generation |
| | `temperature_time_resolve` | `0.1` | Temperature for time resolution |
| | `max_tokens_time_resolve` | `1000` | Max tokens for time resolution |

## 🚀 Get Started

### 📦 Install

```bash
git clone https://github.com/HKUDS/catchme.git && cd catchme

conda create -n catchme python=3.11 -y && conda activate catchme

pip install -e .
```

> **macOS** — grant *Accessibility*, *Input Monitoring*, *Screen Recording* in System Settings → Privacy & Security
> **Windows** — run as Administrator for global input monitoring

### ⚡ Init

```bash
catchme init # interactive setup: provider, API key, llm model
```

### 🔥 Run

```bash
catchme awake # start recording
catchme web # visualize and chat

# or through cli
catchme ask -- "What am I doing today?"
```

Full CLI Reference

| Command | Description |
| --------------------------- | ------------------------------------------------------ |
| `catchme awake` | Start the recording daemon |
| `catchme web [-p PORT]` | Launch web dashboard (default `http://127.0.0.1:8765`) |
| `catchme ask -- "question"` | Query your activity in natural language |
| `catchme cost` | Show LLM token usage (last 10 min / today / all time) |
| `catchme disk` | Show storage breakdown & event count |
| `catchme ram` | Show memory usage of running processes |
| `catchme init` | Interactive setup: LLM provider, API key & model |

## 🦞 CatchMe Makes Your Agents Truly Personal
CatchMe ships as an agent-compatible skill for CLI agents (OpenClaw, NanoBot, Claude, Cursor, etc.).

**🪶 Agent Integration:**
Run CatchMe independently. Your agents query memories via CLI commands only.

```bash
# 1. Start CatchMe yourself
catchme awake

# 2. Give the light skill to your agent
cp CATCHME-light.md ~/.cursor/skills/catchme/SKILL.md
```

**Option B — Full Skill** (agent manages the full CatchMe lifecycle autonomously):

```bash
cp CATCHME-full.md ~/.cursor/skills/catchme/SKILL.md
```

### 🔧 Integrate into your current workflow

```python
from catchme import CatchMe
from catchme.pipelines.retrieve import retrieve

# 1. One-line search — fast keyword lookup over all recorded activity
with CatchMe() as mem:
for e in mem.search("meeting notes"):
print(e.timestamp, e.data)

# 2. LLM-powered retrieval — natural language Q&A over your screen history
for step in retrieve("What was I working on this morning?"):
if step["type"] == "answer":
print(step["content"])
```

## 📊 Cost & Efficiency

*Benchmarked with **2 hours of intensive, continuous computer use** on MacBook Air M4.*

| Metric | Value |
| ----------------------------------------------- | ------------------------------------------------------------------------------- |
| **Runtime RAM** | ~0.2 GB |
| **Disk Usage** | ~ 200 MB |
| **Token Throughput** | input ~ 6 M , output ~ 0.7 M | |
| **LLM cost** — `qwen-3.5-plus` | ~ $0.42 via [Aliyun DashScope](https://home.console.aliyun.com/home/dashboard/) |
| **LLM cost** — `gemini-3-flash-preview` | ~ $5.00 via [OpenRouter](https://openrouter.ai/models)
| **Full Retrieval Speed** (depends on question) | 5 - 20s per query using `gemini-3-flash-preview` |

## 🚀 Roadmap
CatchMe evolves with community input. Upcoming features include:

**Multi-Device Recording**. Capture and unify GUI activities across all your machines via LAN synchronization.

**Dynamic Clustering**. Adaptive clustering algorithms that better reflect your actual work patterns and flows, reducing unnecessary costs.

**Enhanced Data Utilization**. Unlock deeper insights from screenshots and metadata beyond current processing pipelines.

> 🌟 **Star this repo** to follow our future updates — your interest keeps us motivated!

We welcome contributions of any kind - whether it's a comment, a bug report, a feature idea, or a pull request. See [CONTRIBUTING.md](CONTRIBUTING.md) to get started.

## 🤝 Community

### Acknowledgments !

CatchMe is inspired by these excellent open-source projects:

| Project | Inspiration |
| --------------------------------------------------------------- | ----------------------------------------------------- |
| [ActivityWatch](https://github.com/ActivityWatch/activitywatch) | Pioneering open-source activity tracking |
| [Screenpipe](https://github.com/mediar-ai/screenpipe) | Screen recording infrastructure for AI agents |
| [Windrecorder](https://github.com/Antonoko/Windrecorder) | Personal screen recording & search on Windows |
| [OpenRecall](https://github.com/openrecall/openrecall) | Open-source alternative to Windows Recall |
| [Selfspy](https://github.com/selfspy/selfspy) | Classic daemon-style activity logging |
| [PageIndex](https://github.com/HKUDS/PageIndex) | Tree-structured document retrieval without embeddings |
| [MineContext](https://github.com/volcengine/MineContext) | Proactive context-aware AI partner & screen capture |

### 🏛️ Ecosystem

CatchMe is part of the **[HKUDS](https://github.com/HKUDS)** agent ecosystem — building the infrastructure layer for personal AI agents:



NanoBot

Ultra-Lightweight Personal AI Assistant


CLI-Anything

Making All Software Agent-Native


ClawWork

AI Assistant → AI Coworker Evolution


ClawTeam

Agent Awarm Intelligence for Full Team Automation




Thanks for visiting ✨ CatchMe



visitors