{"id":45871100,"url":"https://github.com/win4r/memory-lancedb-pro","last_synced_at":"2026-03-02T13:00:58.762Z","repository":{"id":340480273,"uuid":"1165507389","full_name":"win4r/memory-lancedb-pro","owner":"win4r","description":"Enhanced LanceDB memory plugin for OpenClaw — Hybrid Retrieval (Vector + BM25), Cross-Encoder Rerank, Multi-Scope Isolation, Management CLI","archived":false,"fork":false,"pushed_at":"2026-02-28T14:22:48.000Z","size":292,"stargazers_count":667,"open_issues_count":2,"forks_count":175,"subscribers_count":6,"default_branch":"main","last_synced_at":"2026-02-28T14:27:10.969Z","etag":null,"topics":["lancedb","memory","openclaw","openclaw-agent","openclaw-plugin","rag"],"latest_commit_sha":null,"homepage":"https://youtu.be/MtukF1C8epQ","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/win4r.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-24T08:38:15.000Z","updated_at":"2026-02-28T14:24:23.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/win4r/memory-lancedb-pro","commit_stats":null,"previous_names":["win4r/memory-lancedb-pro"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/win4r/memory-lancedb-pro","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/win4r%2Fmemory-lancedb-pro","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/win4r%2Fmemory-lancedb-pro/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/win4r%2Fmemory-lancedb-pro/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/win4r%2Fmemory-lancedb-pro/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/win4r","download_url":"https://codeload.github.com/win4r/memory-lancedb-pro/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/win4r%2Fmemory-lancedb-pro/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29969243,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-01T11:43:06.159Z","status":"ssl_error","status_checked_at":"2026-03-01T11:43:03.887Z","response_time":124,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lancedb","memory","openclaw","openclaw-agent","openclaw-plugin","rag"],"created_at":"2026-02-27T10:01:24.684Z","updated_at":"2026-03-01T12:00:43.406Z","avatar_url":"https://github.com/win4r.png","language":"TypeScript","readme":"\u003cdiv align=\"center\"\u003e\n\n# 🧠 memory-lancedb-pro · OpenClaw Plugin\n\n**Enhanced Long-Term Memory Plugin for [OpenClaw](https://github.com/openclaw/openclaw)**\n\nHybrid Retrieval (Vector + BM25) · Cross-Encoder Rerank · Multi-Scope Isolation · Management CLI\n\n[![OpenClaw Plugin](https://img.shields.io/badge/OpenClaw-Plugin-blue)](https://github.com/openclaw/openclaw)\n[![LanceDB](https://img.shields.io/badge/LanceDB-Vectorstore-orange)](https://lancedb.com)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)\n\n**English** | [简体中文](README_CN.md)\n\n\u003c/div\u003e\n\n---\n\n## 📺 Video Tutorial\n\n\u003e **Watch the full walkthrough — covers installation, configuration, and how hybrid retrieval works under the hood.**\n\n[![YouTube Video](https://img.shields.io/badge/YouTube-Watch%20Now-red?style=for-the-badge\u0026logo=youtube)](https://youtu.be/MtukF1C8epQ)\n🔗 **https://youtu.be/MtukF1C8epQ**\n\n[![Bilibili Video](https://img.shields.io/badge/Bilibili-立即观看-00A1D6?style=for-the-badge\u0026logo=bilibili\u0026logoColor=white)](https://www.bilibili.com/video/BV1zUf2BGEgn/)\n🔗 **https://www.bilibili.com/video/BV1zUf2BGEgn/**\n\n---\n\n## Why This Plugin?\n\nThe built-in `memory-lancedb` plugin in OpenClaw provides basic vector search. **memory-lancedb-pro** takes it much further:\n\n| Feature | Built-in `memory-lancedb` | **memory-lancedb-pro** |\n|---------|--------------------------|----------------------|\n| Vector search | ✅ | ✅ |\n| BM25 full-text search | ❌ | ✅ |\n| Hybrid fusion (Vector + BM25) | ❌ | ✅ |\n| Cross-encoder rerank (Jina / custom endpoint) | ❌ | ✅ |\n| Recency boost | ❌ | ✅ |\n| Time decay | ❌ | ✅ |\n| Length normalization | ❌ | ✅ |\n| MMR diversity | ❌ | ✅ |\n| Multi-scope isolation | ❌ | ✅ |\n| Noise filtering | ❌ | ✅ |\n| Adaptive retrieval | ❌ | ✅ |\n| Management CLI | ❌ | ✅ |\n| Session memory | ❌ | ✅ |\n| Task-aware embeddings | ❌ | ✅ |\n| Any OpenAI-compatible embedding | Limited | ✅ (OpenAI, Gemini, Jina, Ollama, etc.) |\n\n---\n\n## Architecture\n\n```\n┌─────────────────────────────────────────────────────────┐\n│                   index.ts (Entry Point)                │\n│  Plugin Registration · Config Parsing · Lifecycle Hooks │\n└────────┬──────────┬──────────┬──────────┬───────────────┘\n         │          │          │          │\n    ┌────▼───┐ ┌────▼───┐ ┌───▼────┐ ┌──▼──────────┐\n    │ store  │ │embedder│ │retriever│ │   scopes    │\n    │ .ts    │ │ .ts    │ │ .ts    │ │    .ts      │\n    └────────┘ └────────┘ └────────┘ └─────────────┘\n         │                     │\n    ┌────▼───┐           ┌─────▼──────────┐\n    │migrate │           │noise-filter.ts │\n    │ .ts    │           │adaptive-       │\n    └────────┘           │retrieval.ts    │\n                         └────────────────┘\n    ┌─────────────┐   ┌──────────┐\n    │  tools.ts   │   │  cli.ts  │\n    │ (Agent API) │   │ (CLI)    │\n    └─────────────┘   └──────────┘\n```\n\n### File Reference\n\n| File | Purpose |\n|------|---------|\n| `index.ts` | Plugin entry point. Registers with OpenClaw Plugin API, parses config, mounts `before_agent_start` (auto-recall), `agent_end` (auto-capture), and `command:new` (session memory) hooks |\n| `openclaw.plugin.json` | Plugin metadata + full JSON Schema config declaration (with `uiHints`) |\n| `package.json` | NPM package info. Depends on `@lancedb/lancedb`, `openai`, `@sinclair/typebox` |\n| `cli.ts` | CLI commands: `memory list/search/stats/delete/delete-bulk/export/import/reembed/migrate` |\n| `src/store.ts` | LanceDB storage layer. Table creation / FTS indexing / Vector search / BM25 search / CRUD / bulk delete / stats |\n| `src/embedder.ts` | Embedding abstraction. Compatible with any OpenAI-API provider (OpenAI, Gemini, Jina, Ollama, etc.). Supports task-aware embedding (`taskQuery`/`taskPassage`) |\n| `src/retriever.ts` | Hybrid retrieval engine. Vector + BM25 → RRF fusion → Jina Cross-Encoder Rerank → Recency Boost → Importance Weight → Length Norm → Time Decay → Hard Min Score → Noise Filter → MMR Diversity |\n| `src/scopes.ts` | Multi-scope access control. Supports `global`, `agent:\u003cid\u003e`, `custom:\u003cname\u003e`, `project:\u003cid\u003e`, `user:\u003cid\u003e` |\n| `src/tools.ts` | Agent tool definitions: `memory_recall`, `memory_store`, `memory_forget` (core) + `memory_stats`, `memory_list` (management) |\n| `src/noise-filter.ts` | Noise filter. Filters out agent refusals, meta-questions, greetings, and low-quality content |\n| `src/adaptive-retrieval.ts` | Adaptive retrieval. Determines whether a query needs memory retrieval (skips greetings, slash commands, simple confirmations, emoji) |\n| `src/migrate.ts` | Migration tool. Migrates data from the built-in `memory-lancedb` plugin to Pro |\n\n---\n\n## Core Features\n\n### 1. Hybrid Retrieval\n\n```\nQuery → embedQuery() ─┐\n                       ├─→ RRF Fusion → Rerank → Recency Boost → Importance Weight → Filter\nQuery → BM25 FTS ─────┘\n```\n\n- **Vector Search**: Semantic similarity via LanceDB ANN (cosine distance)\n- **BM25 Full-Text Search**: Exact keyword matching via LanceDB FTS index\n- **Fusion Strategy**: Vector score as base, BM25 hits get a 15% boost (tuned beyond traditional RRF)\n- **Configurable Weights**: `vectorWeight`, `bm25Weight`, `minScore`\n\n### 2. Cross-Encoder Reranking\n\n- **Reranker API**: Jina, SiliconFlow, Pinecone, or any compatible endpoint (5s timeout protection)\n- **Hybrid Scoring**: 60% cross-encoder score + 40% original fused score\n- **Graceful Degradation**: Falls back to cosine similarity reranking on API failure\n\n### 3. Multi-Stage Scoring Pipeline\n\n| Stage | Formula | Effect |\n|-------|---------|--------|\n| **Recency Boost** | `exp(-ageDays / halfLife) * weight` | Newer memories score higher (default: 14-day half-life, 0.10 weight) |\n| **Importance Weight** | `score *= (0.7 + 0.3 * importance)` | importance=1.0 → ×1.0, importance=0.5 → ×0.85 |\n| **Length Normalization** | `score *= 1 / (1 + 0.5 * log2(len/anchor))` | Prevents long entries from dominating (anchor: 500 chars) |\n| **Time Decay** | `score *= 0.5 + 0.5 * exp(-ageDays / halfLife)` | Old entries gradually lose weight, floor at 0.5× (60-day half-life) |\n| **Hard Min Score** | Discard if `score \u003c threshold` | Removes irrelevant results (default: 0.35) |\n| **MMR Diversity** | Cosine similarity \u003e 0.85 → demoted | Prevents near-duplicate results |\n\n### 4. Multi-Scope Isolation\n\n- **Built-in Scopes**: `global`, `agent:\u003cid\u003e`, `custom:\u003cname\u003e`, `project:\u003cid\u003e`, `user:\u003cid\u003e`\n- **Agent-Level Access Control**: Configure per-agent scope access via `scopes.agentAccess`\n- **Default Behavior**: Each agent accesses `global` + its own `agent:\u003cid\u003e` scope\n\n### 5. Adaptive Retrieval\n\n- Skips queries that don't need memory (greetings, slash commands, simple confirmations, emoji)\n- Forces retrieval for memory-related keywords (\"remember\", \"previously\", \"last time\", etc.)\n- CJK-aware thresholds (Chinese: 6 chars vs English: 15 chars)\n\n### 6. Noise Filtering\n\nFilters out low-quality content at both auto-capture and tool-store stages:\n- Agent refusal responses (\"I don't have any information\")\n- Meta-questions (\"do you remember\")\n- Greetings (\"hi\", \"hello\", \"HEARTBEAT\")\n\n### 7. Session Memory\n\n- Triggered on `/new` command — saves previous session summary to LanceDB\n- Disabled by default (OpenClaw already has native `.jsonl` session persistence)\n- Configurable message count (default: 15)\n\n### 8. Auto-Capture \u0026 Auto-Recall\n\n- **Auto-Capture** (`agent_end` hook): Extracts preference/fact/decision/entity from conversations, deduplicates, stores up to 3 per turn\n- **Auto-Recall** (`before_agent_start` hook): Injects `\u003crelevant-memories\u003e` context (up to 3 entries)\n\n---\n\n## Installation\n\n### AI-safe install notes (anti-hallucination)\n\nIf you are following this README using an AI assistant, **do not assume defaults**. Always run these commands first and use the real output:\n\n```bash\nopenclaw config get agents.defaults.workspace\nopenclaw config get plugins.load.paths\nopenclaw config get plugins.slots.memory\nopenclaw config get plugins.entries.memory-lancedb-pro\n```\n\nRecommendations:\n- Prefer **absolute paths** in `plugins.load.paths` unless you have confirmed the active workspace.\n- If you use `${JINA_API_KEY}` (or any `${...}` variable) in config, ensure the **Gateway service process** has that environment variable (system services often do **not** inherit your interactive shell env).\n- After changing plugin config, run `openclaw gateway restart`.\n\n### What is the “OpenClaw workspace”?\n\nIn OpenClaw, the **agent workspace** is the agent’s working directory (default: `~/.openclaw/workspace`).\nAccording to the docs, the workspace is the **default cwd**, and **relative paths are resolved against the workspace** (unless you use an absolute path).\n\n\u003e Note: OpenClaw configuration typically lives under `~/.openclaw/openclaw.json` (separate from the workspace).\n\n**Common mistake:** cloning the plugin somewhere else, while keeping a **relative path** like `plugins.load.paths: [\"plugins/memory-lancedb-pro\"]`. Relative paths can be resolved against different working directories depending on how the Gateway is started.\n\nTo avoid ambiguity, use an **absolute path** (Option B) or clone into `\u003cworkspace\u003e/plugins/` (Option A) and keep your config consistent.\n\n### Option A (recommended): clone into `plugins/` under your workspace\n\n```bash\n# 1) Go to your OpenClaw workspace (default: ~/.openclaw/workspace)\n#    (You can override it via agents.defaults.workspace.)\ncd /path/to/your/openclaw/workspace\n\n# 2) Clone the plugin into workspace/plugins/\ngit clone https://github.com/win4r/memory-lancedb-pro.git plugins/memory-lancedb-pro\n\n# 3) Install dependencies\ncd plugins/memory-lancedb-pro\nnpm install\n```\n\nThen reference it with a relative path in your OpenClaw config:\n\n```json\n{\n  \"plugins\": {\n    \"load\": {\n      \"paths\": [\"plugins/memory-lancedb-pro\"]\n    },\n    \"entries\": {\n      \"memory-lancedb-pro\": {\n        \"enabled\": true,\n        \"config\": {\n          \"embedding\": {\n            \"apiKey\": \"${JINA_API_KEY}\",\n            \"model\": \"jina-embeddings-v5-text-small\",\n            \"baseURL\": \"https://api.jina.ai/v1\",\n            \"dimensions\": 1024,\n            \"taskQuery\": \"retrieval.query\",\n            \"taskPassage\": \"retrieval.passage\",\n            \"normalized\": true\n          }\n        }\n      }\n    },\n    \"slots\": {\n      \"memory\": \"memory-lancedb-pro\"\n    }\n  }\n}\n```\n\n### Option B: clone anywhere, but use an absolute path\n\n```json\n{\n  \"plugins\": {\n    \"load\": {\n      \"paths\": [\"/absolute/path/to/memory-lancedb-pro\"]\n    }\n  }\n}\n```\n\n### Restart\n\n```bash\nopenclaw gateway restart\n```\n\n\u003e **Note:** If you previously used the built-in `memory-lancedb`, disable it when enabling this plugin. Only one memory plugin can be active at a time.\n\n### Verify installation (recommended)\n\n1) Confirm the plugin is discoverable/loaded:\n\n```bash\nopenclaw plugins list\nopenclaw plugins info memory-lancedb-pro\n```\n\n2) If anything looks wrong, run the built-in diagnostics:\n\n```bash\nopenclaw plugins doctor\n```\n\n3) Confirm the memory slot points to this plugin:\n\n```bash\n# Look for: plugins.slots.memory = \"memory-lancedb-pro\"\nopenclaw config get plugins.slots.memory\n```\n\n---\n\n## Configuration\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eFull Configuration Example (click to expand)\u003c/strong\u003e\u003c/summary\u003e\n\n```json\n{\n  \"embedding\": {\n    \"apiKey\": \"${JINA_API_KEY}\",\n    \"model\": \"jina-embeddings-v5-text-small\",\n    \"baseURL\": \"https://api.jina.ai/v1\",\n    \"dimensions\": 1024,\n    \"taskQuery\": \"retrieval.query\",\n    \"taskPassage\": \"retrieval.passage\",\n    \"normalized\": true\n  },\n  \"dbPath\": \"~/.openclaw/memory/lancedb-pro\",\n  \"autoCapture\": true,\n  \"autoRecall\": true,\n  \"retrieval\": {\n    \"mode\": \"hybrid\",\n    \"vectorWeight\": 0.7,\n    \"bm25Weight\": 0.3,\n    \"minScore\": 0.3,\n    \"rerank\": \"cross-encoder\",\n    \"rerankApiKey\": \"jina_xxx\",\n    \"rerankModel\": \"jina-reranker-v2-base-multilingual\",\n    \"rerankEndpoint\": \"https://api.jina.ai/v1/rerank\",\n    \"rerankProvider\": \"jina\",\n    \"candidatePoolSize\": 20,\n    \"recencyHalfLifeDays\": 14,\n    \"recencyWeight\": 0.1,\n    \"filterNoise\": true,\n    \"lengthNormAnchor\": 500,\n    \"hardMinScore\": 0.35,\n    \"timeDecayHalfLifeDays\": 60\n  },\n  \"enableManagementTools\": false,\n  \"scopes\": {\n    \"default\": \"global\",\n    \"definitions\": {\n      \"global\": { \"description\": \"Shared knowledge\" },\n      \"agent:discord-bot\": { \"description\": \"Discord bot private\" }\n    },\n    \"agentAccess\": {\n      \"discord-bot\": [\"global\", \"agent:discord-bot\"]\n    }\n  },\n  \"sessionMemory\": {\n    \"enabled\": false,\n    \"messageCount\": 15\n  }\n}\n```\n\n\u003c/details\u003e\n\n### Embedding Providers\n\nThis plugin works with **any OpenAI-compatible embedding API**:\n\n| Provider | Model | Base URL | Dimensions |\n|----------|-------|----------|------------|\n| **Jina** (recommended) | `jina-embeddings-v5-text-small` | `https://api.jina.ai/v1` | 1024 |\n| **OpenAI** | `text-embedding-3-small` | `https://api.openai.com/v1` | 1536 |\n| **Google Gemini** | `gemini-embedding-001` | `https://generativelanguage.googleapis.com/v1beta/openai/` | 3072 |\n| **Ollama** (local) | `nomic-embed-text` | `http://localhost:11434/v1` | _provider-specific_ (set `embedding.dimensions` to match your Ollama model output) |\n\n### Rerank Providers\n\nCross-encoder reranking supports multiple providers via `rerankProvider`:\n\n| Provider | `rerankProvider` | Endpoint | Example Model |\n|----------|-----------------|----------|---------------|\n| **Jina** (default) | `jina` | `https://api.jina.ai/v1/rerank` | `jina-reranker-v2-base-multilingual` |\n| **SiliconFlow** (free tier available) | `siliconflow` | `https://api.siliconflow.com/v1/rerank` | `BAAI/bge-reranker-v2-m3`, `Qwen/Qwen3-Reranker-8B` |\n| **Pinecone** | `pinecone` | `https://api.pinecone.io/rerank` | `bge-reranker-v2-m3` |\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003eSiliconFlow Example\u003c/strong\u003e\u003c/summary\u003e\n\n```json\n{\n  \"retrieval\": {\n    \"rerank\": \"cross-encoder\",\n    \"rerankProvider\": \"siliconflow\",\n    \"rerankEndpoint\": \"https://api.siliconflow.com/v1/rerank\",\n    \"rerankApiKey\": \"sk-xxx\",\n    \"rerankModel\": \"BAAI/bge-reranker-v2-m3\"\n  }\n}\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003ePinecone Example\u003c/strong\u003e\u003c/summary\u003e\n\n```json\n{\n  \"retrieval\": {\n    \"rerank\": \"cross-encoder\",\n    \"rerankProvider\": \"pinecone\",\n    \"rerankEndpoint\": \"https://api.pinecone.io/rerank\",\n    \"rerankApiKey\": \"pcsk_xxx\",\n    \"rerankModel\": \"bge-reranker-v2-m3\"\n  }\n}\n```\n\n\u003c/details\u003e\n\n---\n\n## Optional: JSONL Session Distillation (Auto-memories from chat logs)\n\nOpenClaw already persists **full session transcripts** as JSONL files:\n\n- `~/.openclaw/agents/\u003cagentId\u003e/sessions/*.jsonl`\n\nThis plugin focuses on **high-quality long-term memory**. If you dump raw transcripts into LanceDB, retrieval quality quickly degrades.\n\nInstead, you can run an **hourly distiller** that:\n\n1) Incrementally reads only the **newly appended tail** of each session JSONL (byte-offset cursor)\n2) Filters noise (tool output, injected `\u003crelevant-memories\u003e`, logs, boilerplate)\n3) Uses a dedicated agent to **distill** reusable lessons / rules / preferences into short atomic memories\n4) Stores them via `memory_store` into the right **scope** (`global` or `agent:\u003cagentId\u003e`)\n\n### What you get\n\n- ✅ Fully automatic (cron)\n- ✅ Multi-agent support (main + bots)\n- ✅ No re-reading: cursor ensures the next run only processes new lines\n- ✅ Memory hygiene: quality gate + dedupe + per-run caps\n\n### Script\n\nThis repo includes the extractor script:\n\n- `scripts/jsonl_distill.py`\n\nIt produces a small **batch JSON** file under:\n\n- `~/.openclaw/state/jsonl-distill/batches/`\n\nand keeps a cursor here:\n\n- `~/.openclaw/state/jsonl-distill/cursor.json`\n\nThe script is **safe**: it never modifies session logs.\n\nBy default it skips historical reset snapshots (`*.reset.*`) and excludes the distiller agent itself (`memory-distiller`) to prevent self-ingestion loops.\n\n### Recommended setup (dedicated distiller agent)\n\n#### 1) Create a dedicated agent\n\n```bash\nopenclaw agents add memory-distiller \\\n  --non-interactive \\\n  --workspace ~/.openclaw/workspace-memory-distiller \\\n  --model openai-codex/gpt-5.2\n```\n\n#### 2) Initialize cursor (Mode A: start from now)\n\nThis marks all existing JSONL files as \"already read\" by setting offsets to EOF.\n\n```bash\n# Set PLUGIN_DIR to where this plugin is installed.\n# - If you cloned into your OpenClaw workspace (recommended):\n#   PLUGIN_DIR=\"$HOME/.openclaw/workspace/plugins/memory-lancedb-pro\"\n# - Otherwise, check: `openclaw plugins info memory-lancedb-pro` and locate the directory.\nPLUGIN_DIR=\"/path/to/memory-lancedb-pro\"\n\npython3 \"$PLUGIN_DIR/scripts/jsonl_distill.py\" init\n```\n\n#### 3) Create an hourly cron job (Asia/Shanghai)\n\nTip: start the message with `run ...` so `memory-lancedb-pro`'s adaptive retrieval will skip auto-recall injection (saves tokens).\n\n```bash\n# IMPORTANT: replace \u003cPLUGIN_DIR\u003e in the template below with your actual plugin path.\nMSG=$(cat \u003c\u003c'EOF'\nrun jsonl memory distill\n\nGoal: distill NEW chat content from OpenClaw session JSONL files into high-quality LanceDB memories using memory_store.\n\nHard rules:\n- Incremental only: call the extractor script; do NOT scan full history.\n- Store only reusable memories; skip routine chatter.\n- English memory text + final line: Keywords (zh): ...\n- \u003c 500 chars, atomic.\n- \u003c= 3 memories per agent per run; \u003c= 3 global per run.\n- Scope: global for broadly reusable; otherwise agent:\u003cagentId\u003e.\n\nWorkflow:\n1) exec: python3 \u003cPLUGIN_DIR\u003e/scripts/jsonl_distill.py run\n2) If noop: stop.\n3) Read batchFile (created/pending)\n4) memory_store(...) for selected memories\n5) exec: python3 \u003cPLUGIN_DIR\u003e/scripts/jsonl_distill.py commit --batch-file \u003cbatchFile\u003e\nEOF\n)\n\nopenclaw cron add \\\n  --agent memory-distiller \\\n  --name \"jsonl-memory-distill (hourly)\" \\\n  --cron \"0 * * * *\" \\\n  --tz \"Asia/Shanghai\" \\\n  --session isolated \\\n  --wake now \\\n  --timeout-seconds 420 \\\n  --stagger 5m \\\n  --no-deliver \\\n  --message \"$MSG\"\n```\n\n#### 4) Debug run\n\n```bash\nopenclaw cron run \u003cjobId\u003e --expect-final --timeout 180000\nopenclaw cron runs --id \u003cjobId\u003e --limit 5\n```\n\n### Scope strategy (recommended)\n\nWhen distilling **all agents**, always set `scope` explicitly when calling `memory_store`:\n\n- Broadly reusable → `scope=global`\n- Agent-specific → `scope=agent:\u003cagentId\u003e`\n\nThis prevents cross-bot memory pollution.\n\n### Rollback\n\n- Disable/remove cron job: `openclaw cron disable \u003cjobId\u003e` / `openclaw cron rm \u003cjobId\u003e`\n- Delete agent: `openclaw agents delete memory-distiller`\n- Remove cursor state: `rm -rf ~/.openclaw/state/jsonl-distill/`\n\n---\n\n## CLI Commands\n\n```bash\n# List memories\nopenclaw memory-pro list [--scope global] [--category fact] [--limit 20] [--json]\n\n# Search memories\nopenclaw memory-pro search \"query\" [--scope global] [--limit 10] [--json]\n\n# View statistics\nopenclaw memory-pro stats [--scope global] [--json]\n\n# Delete a memory by ID (supports 8+ char prefix)\nopenclaw memory-pro delete \u003cid\u003e\n\n# Bulk delete with filters\nopenclaw memory-pro delete-bulk --scope global [--before 2025-01-01] [--dry-run]\n\n# Export / Import\nopenclaw memory-pro export [--scope global] [--output memories.json]\nopenclaw memory-pro import memories.json [--scope global] [--dry-run]\n\n# Re-embed all entries with a new model\nopenclaw memory-pro reembed --source-db /path/to/old-db [--batch-size 32] [--skip-existing]\n\n# Migrate from built-in memory-lancedb\nopenclaw memory-pro migrate check [--source /path]\nopenclaw memory-pro migrate run [--source /path] [--dry-run] [--skip-existing]\nopenclaw memory-pro migrate verify [--source /path]\n```\n\n---\n\n## Custom Commands (e.g. `/lesson`)\n\nThis plugin provides the core memory tools (`memory_store`, `memory_recall`, `memory_forget`, `memory_update`). You can define custom slash commands in your Agent's system prompt to create convenient shortcuts.\n\n### Example: `/lesson` command\n\nAdd this to your `CLAUDE.md`, `AGENTS.md`, or system prompt:\n\n```markdown\n## /lesson command\nWhen the user sends `/lesson \u003ccontent\u003e`:\n1. Use memory_store to save as category=fact (the raw knowledge)\n2. Use memory_store to save as category=decision (actionable takeaway)\n3. Confirm what was saved\n```\n\n### Example: `/remember` command\n\n```markdown\n## /remember command\nWhen the user sends `/remember \u003ccontent\u003e`:\n1. Use memory_store to save with appropriate category and importance\n2. Confirm with the stored memory ID\n```\n\n### Built-in Tools Reference\n\n| Tool | Description |\n|------|-------------|\n| `memory_store` | Store a memory (supports category, importance, scope) |\n| `memory_recall` | Search memories (hybrid vector + BM25 retrieval) |\n| `memory_forget` | Delete a memory by ID or search query |\n| `memory_update` | Update an existing memory in-place |\n\n\u003e **Note**: These tools are registered automatically when the plugin loads. Custom commands like `/lesson` are not built into the plugin — they are defined at the Agent/system-prompt level and simply call these tools.\n\n---\n\n## Database Schema\n\nLanceDB table `memories`:\n\n| Field | Type | Description |\n|-------|------|-------------|\n| `id` | string (UUID) | Primary key |\n| `text` | string | Memory text (FTS indexed) |\n| `vector` | float[] | Embedding vector |\n| `category` | string | `preference` / `fact` / `decision` / `entity` / `other` |\n| `scope` | string | Scope identifier (e.g., `global`, `agent:main`) |\n| `importance` | float | Importance score 0–1 |\n| `timestamp` | int64 | Creation timestamp (ms) |\n| `metadata` | string (JSON) | Extended metadata |\n\n---\n\n## Dependencies\n\n| Package | Purpose |\n|---------|---------|\n| `@lancedb/lancedb` ≥0.26.2 | Vector database (ANN + FTS) |\n| `openai` ≥6.21.0 | OpenAI-compatible Embedding API client |\n| `@sinclair/typebox` 0.34.48 | JSON Schema type definitions (tool parameters) |\n\n---\n\n## License\n\nMIT\n\n---\n\n## Buy Me a Coffee\n\n[![\"Buy Me A Coffee\"](https://storage.ko-fi.com/cdn/kofi2.png?v=3)](https://ko-fi.com/aila)\n\n## My WeChat Group and My WeChat QR Code\n\n\u003cimg src=\"https://github.com/win4r/AISuperDomain/assets/42172631/d6dcfd1a-60fa-4b6f-9d5e-1482150a7d95\" width=\"186\" height=\"300\"\u003e\n\u003cimg src=\"https://github.com/win4r/AISuperDomain/assets/42172631/7568cf78-c8ba-4182-aa96-d524d903f2bc\" width=\"214.8\" height=\"291\"\u003e\n\u003cimg src=\"https://github.com/win4r/AISuperDomain/assets/42172631/fefe535c-8153-4046-bfb4-e65eacbf7a33\" width=\"207\" height=\"281\"\u003e\n","funding_links":["https://ko-fi.com/aila"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwin4r%2Fmemory-lancedb-pro","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwin4r%2Fmemory-lancedb-pro","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwin4r%2Fmemory-lancedb-pro/lists"}