https://github.com/cachly-dev/cachly-openclaw
Official Cachly adapter for OpenClaw – persistent sessions, semantic LLM cache, memory storage
https://github.com/cachly-dev/cachly-openclaw
Last synced: about 5 hours ago
JSON representation
Official Cachly adapter for OpenClaw – persistent sessions, semantic LLM cache, memory storage
- Host: GitHub
- URL: https://github.com/cachly-dev/cachly-openclaw
- Owner: cachly-dev
- Created: 2026-04-17T01:37:29.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2026-06-06T21:58:44.000Z (9 days ago)
- Last Synced: 2026-06-06T23:19:35.242Z (9 days ago)
- Language: TypeScript
- Homepage: https://cachly.dev/docs/openclaw
- Size: 47.9 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
Awesome Lists containing this project
README
# @cachly-dev/openclaw
> **You paid $0.08 for that answer. The next 1,000 identical asks: $0.00.**
> Semantic LLM cache + persistent sessions + AI memory. 3 lines. No embeddings required.
[](https://www.npmjs.com/package/@cachly-dev/openclaw)
[](https://www.npmjs.com/package/@cachly-dev/openclaw)
[](https://cachly.dev)
[](../../LICENSE)
---
## Before / After
```typescript
// ❌ Before: Every user message calls OpenAI. Every time. No exceptions.
const reply = await openai.chat.completions.create({ model: 'gpt-4o', messages })
// ✅ After: Same questions = zero API calls = zero cost
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL })
const reply = await cache.getOrSet(userMessage, () =>
openai.chat.completions.create({ model: 'gpt-4o', messages })
)
```
"How do I reset my password?" → "How can I reset my pw?" → **cache hit**. $0.00.
---
## Setup — 60 seconds
```bash
npm install @cachly-dev/openclaw
```
```bash
# Get a free Redis instance at cachly.dev (no credit card):
CACHLY_URL=redis://:password@your-instance.cachly.dev:6379
```
---
## ⚡ Semantic LLM Cache — 3 lines
Every time a user asks the same question in different words, you pay OpenAI again. This stops that.
```typescript
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
// Wrap any LLM call — that's it
const answer = await cache.getOrSet(
userPrompt,
() => openai.chat.completions.create({ model: 'gpt-4o', messages: [...] })
)
```
**Without an embed function + no vectorUrl:** exact-match caching + **local BM25+ fuzzy search** kick in immediately (20–50% savings). No API calls. "how do I reset password?" matches "password reset help" — pure in-process.
**Without an embed function + vectorUrl:** BM25 + hosted pgvector index, higher hit rates across large caches.
**Add semantic matching** for 60–90% savings (10 more lines):
```typescript
const cache = createSemanticLLMCache({
url: process.env.CACHLY_URL!,
vectorUrl: process.env.CACHLY_VECTOR_URL, // from cachly.dev dashboard
embedFn: (text) =>
openai.embeddings.create({ model: 'text-embedding-3-small', input: text })
.then(r => r.data[0].embedding),
threshold: 0.92, // cosine similarity (default)
ttl: 3600, // seconds
})
```
"How do I reset my password?" → "How can I reset my pw?" → **same cache hit**. 💰
---
## 📊 What you save
| Questions/day | Cache hit rate | Monthly saving (GPT-4o) |
|---------------|---------------|------------------------|
| 100 | 40% exact | ~$8 |
| 100 | 70% semantic | ~$22 |
| 1 000 | 70% semantic | ~$220 |
| 10 000 | 70% semantic | ~$2 200 |
After 10 cache hits, the console logs:
```
🎯 cachly: 12,340 tokens saved this session (10 hits)
Full stats → cachly.dev/dashboard
```
(Cost breakdown available in the dashboard.)
---
## 🗄️ Session Store
Persist conversation history in Redis — no cold starts, no lost context:
```typescript
import { createCachlySessionStore } from '@cachly-dev/openclaw'
const sessions = createCachlySessionStore({
url: process.env.CACHLY_URL!,
ttl: 604800, // 7 days
})
const history = await sessions.get(userId)
await sessions.set(userId, [...history, { role: 'user', content: message }])
```
Works with any LLM framework — OpenAI, LangChain, Vercel AI SDK, etc.
---
## 🧠 Memory Adapter
Long-term semantic memory — store facts, recall by meaning:
```typescript
import { createCachlyMemoryAdapter } from '@cachly-dev/openclaw'
const memory = createCachlyMemoryAdapter({
url: process.env.CACHLY_URL!,
vectorUrl: process.env.CACHLY_VECTOR_URL!,
embedFn: myEmbedFn,
ttl: 7776000, // 90 days
})
await memory.store({ id: 'pref-1', text: 'User prefers TypeScript over Python' })
const results = await memory.search('programming language preference', { topK: 5 })
// → [{ text: 'User prefers TypeScript over Python', score: 0.97 }]
```
---
## 🔍 Brain Search (BM25+)
Full-text search over cached data — no embeddings needed:
```typescript
import { brainSearch } from '@cachly-dev/openclaw'
const results = await brainSearch(process.env.CACHLY_VECTOR_URL!, 'deploy authentication error')
// → [{ key: 'lesson:fix:auth', score: 4.2, preview: '...' }]
```
---
## 🧩 Standalone — works with any LLM stack
No OpenClaw needed. Drop into LangChain, Vercel AI SDK, plain `fetch`, or any custom pipeline:
### LangChain
```typescript
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { ChatOpenAI } from '@langchain/openai'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
const llm = new ChatOpenAI()
async function cachedInvoke(prompt: string) {
return cache.getOrSet(
prompt,
() => llm.invoke(prompt).then(r => ({ content: r.content as string, model: 'gpt-4o' }))
)
}
```
### Vercel AI SDK
```typescript
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
import { generateText } from 'ai'
import { openai } from '@ai-sdk/openai'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
export async function POST(req: Request) {
const { prompt } = await req.json()
const result = await cache.getOrSet(
prompt,
() => generateText({ model: openai('gpt-4o'), prompt }).then(r => ({ content: r.text, model: 'gpt-4o' }))
)
return Response.json(result)
}
```
### Plain fetch / any provider
```typescript
import { createSemanticLLMCache } from '@cachly-dev/openclaw'
const cache = createSemanticLLMCache({ url: process.env.CACHLY_URL! })
const answer = await cache.getOrSet(
userMessage,
async () => {
const res = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: { 'x-api-key': process.env.ANTHROPIC_KEY!, 'content-type': 'application/json' },
body: JSON.stringify({ model: 'claude-opus-4-5', max_tokens: 1024, messages: [{ role: 'user', content: userMessage }] }),
})
const json = await res.json()
return { content: json.content[0].text, model: 'claude-opus-4-5', inputTokens: json.usage.input_tokens, outputTokens: json.usage.output_tokens }
}
)
```
---
## 🦾 OpenClaw adapter (bonus)
If you use [OpenClaw](https://openclaw.dev) (22-channel AI assistant), one function wires everything:
```typescript
import { createCachlyOpenClawConfig } from '@cachly-dev/openclaw'
import OpenAI from 'openai'
const openai = new OpenAI()
const cachlyConfig = await createCachlyOpenClawConfig({
url: process.env.CACHLY_URL!,
vectorUrl: process.env.CACHLY_VECTOR_URL,
embedFn: (t) =>
openai.embeddings.create({ model: 'text-embedding-3-small', input: t })
.then(r => r.data[0].embedding),
})
const app = new OpenClawApp({
...cachlyConfig,
// ... rest of your config
})
```
This gives you: semantic cache + persistent sessions + Redis memory — all across WhatsApp, Telegram, Slack, Discord at once.
---
## 👥 Team Brain
One shared cachly instance → every team member gets smarter from each other's work:
```typescript
// Alice fixes a deploy issue:
await brain.learnFromAttempts({ topic: 'deploy:k8s', outcome: 'success', whatWorked: '...' })
// Bob starts a session the next day:
await brain.sessionStart()
// → "💡 alice solved deploy:k8s 1d ago: ..."
```
Team plans from €99/mo (10 seats) at [cachly.dev/teams](https://cachly.dev/teams).
---
## 🚀 Upgrade path
| Level | What you get | Setup |
|-------|-------------|-------|
| **Free — Exact + BM25** | 20–50% reduction, in-process BM25+ fuzzy, zero config | `CACHLY_URL` only |
| **Free — Semantic cache** | 60–90% cost reduction | + `embedFn` |
| **Speed tier** | Hosted pgvector, higher hit rates at scale | Speed plan at cachly.dev |
| **Team Brain** | Shared knowledge, team lessons, analytics | cachly.dev/teams |
---
## 🤖 Use with Python AI Agents
OpenClaw has a TypeScript SDK, but Python AI agents can use Cachly's REST API directly for persistent memory and semantic caching. Here are patterns for the most popular frameworks.
### LangChain — Persistent Agent Memory
```python
import os, requests
from langchain.memory import ConversationBufferMemory
from langchain.schema import BaseMemory
from typing import Any
CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}
class CachlyBrainMemory(BaseMemory):
"""LangChain memory backed by Cachly Brain — survives restarts."""
@property
def memory_variables(self):
return ["brain_context"]
def load_memory_variables(self, inputs: dict) -> dict:
query = inputs.get("input", "")
r = requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
headers=HEADERS,
json={"query": query, "topK": 3},
)
lessons = r.json().get("results", [])
context = "\n".join(f"- {l['whatWorked']}" for l in lessons if l.get("whatWorked"))
return {"brain_context": context or "No relevant memory found."}
def save_context(self, inputs: dict, outputs: dict) -> None:
# Learn from what the agent discovered
output = outputs.get("output", "")
if output:
requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
headers=HEADERS,
json={"topic": "agent:langchain", "outcome": "success", "whatWorked": output[:500]},
)
def clear(self):
pass # Brain is persistent — clear not supported by design
# Usage:
from langchain.agents import initialize_agent, AgentType
from langchain.chat_models import ChatOpenAI
memory = CachlyBrainMemory()
agent = initialize_agent(
tools=[...],
llm=ChatOpenAI(model="gpt-4o"),
agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
memory=memory,
system_message="You have access to a persistent Brain. brain_context = {brain_context}",
)
```
### CrewAI — Shared Team Brain Tool
```python
import os, requests
from crewai_tools import BaseTool
from pydantic import BaseModel, Field
CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}"}
class RecallInput(BaseModel):
query: str = Field(description="What to search for in the Brain")
class CachlyRecallTool(BaseTool):
name: str = "cachly_brain_recall"
description: str = "Search persistent memory for lessons, solutions, and context from past work"
args_schema: type[BaseModel] = RecallInput
def _run(self, query: str) -> str:
r = requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
headers={**HEADERS, "Content-Type": "application/json"},
json={"query": query, "topK": 5},
)
results = r.json().get("results", [])
if not results:
return "No relevant memory found."
return "\n".join(f"[{l['topic']}] {l['whatWorked']}" for l in results)
class LearnInput(BaseModel):
topic: str = Field(description="Category slug like 'deploy:api' or 'fix:auth'")
what_worked: str = Field(description="What solution worked")
outcome: str = Field(default="success", description="success | failure | partial")
class CachlyLearnTool(BaseTool):
name: str = "cachly_brain_learn"
description: str = "Store a lesson in persistent memory so future agents can benefit from it"
args_schema: type[BaseModel] = LearnInput
def _run(self, topic: str, what_worked: str, outcome: str = "success") -> str:
requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
headers={**HEADERS, "Content-Type": "application/json"},
json={"topic": topic, "outcome": outcome, "whatWorked": what_worked},
)
return f"✅ Stored lesson: {topic}"
# Usage with CrewAI:
from crewai import Agent, Task, Crew
researcher = Agent(
role="Research Analyst",
goal="Research topics and store findings for the team",
tools=[CachlyRecallTool(), CachlyLearnTool()],
backstory="You have a persistent memory that survives across sessions.",
)
```
### AutoGen / Microsoft AutoGen
```python
import os, requests
from autogen import AssistantAgent, UserProxyAgent
CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}
def recall_brain(query: str) -> str:
"""Search Cachly Brain for relevant memory."""
r = requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
headers=HEADERS, json={"query": query, "topK": 5},
)
results = r.json().get("results", [])
return "\n".join(f"• [{l['topic']}] {l['whatWorked']}" for l in results) or "No memory found."
def store_lesson(topic: str, what_worked: str, outcome: str = "success") -> str:
"""Store a lesson in Cachly Brain."""
requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
headers=HEADERS, json={"topic": topic, "outcome": outcome, "whatWorked": what_worked},
)
return f"Stored: {topic}"
assistant = AssistantAgent(
name="CachlyAssistant",
system_message="""You are a helpful AI with persistent memory via Cachly Brain.
ALWAYS start by calling recall_brain() with the user's query.
ALWAYS end by calling store_lesson() with what you discovered.""",
llm_config={
"functions": [
{"name": "recall_brain", "description": "Search persistent memory", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"]}},
{"name": "store_lesson", "description": "Store a lesson", "parameters": {"type": "object", "properties": {"topic": {"type": "string"}, "what_worked": {"type": "string"}, "outcome": {"type": "string"}}, "required": ["topic", "what_worked"]}},
],
},
)
```
### LlamaIndex — QueryEngine with Cachly Memory
```python
import os, requests
from llama_index.core.memory import BaseMemory
from llama_index.core.schema import TextNode
CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}
class CachlyMemory(BaseMemory):
"""LlamaIndex memory backed by Cachly Brain."""
def get(self, input: str, **kwargs) -> list[TextNode]:
r = requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/smart-recall",
headers=HEADERS, json={"query": input, "topK": 5},
)
return [
TextNode(text=f"[{l['topic']}] {l.get('whatWorked', '')}")
for l in r.json().get("results", [])
]
def put(self, messages) -> None:
for msg in messages:
if hasattr(msg, "content") and msg.content:
requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/brain/learn",
headers=HEADERS,
json={"topic": "agent:llamaindex", "outcome": "success", "whatWorked": str(msg.content)[:500]},
)
def reset(self) -> None:
pass # Persistent by design
# Usage:
from llama_index.core.chat_engine import CondensePlusContextChatEngine
chat_engine = CondensePlusContextChatEngine.from_defaults(
index.as_retriever(),
memory=CachlyMemory(),
verbose=True,
)
response = chat_engine.chat("How did we fix the last deployment issue?")
```
### Semantic Cache for LLM API Calls (Python)
Skip expensive LLM calls for semantically similar prompts — no embeddings needed on your side:
```python
import os, hashlib, requests
CACHLY_URL = os.environ["CACHLY_URL"]
CACHLY_JWT = os.environ["CACHLY_JWT"]
INSTANCE_ID = os.environ["CACHLY_BRAIN_INSTANCE_ID"]
HEADERS = {"Authorization": f"Bearer {CACHLY_JWT}", "Content-Type": "application/json"}
def cached_llm_call(prompt: str, llm_fn, namespace: str = "cachly:sem:qa") -> str:
"""Call LLM with semantic caching via Cachly."""
# 1. Check semantic cache
r = requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/cache/semantic-search",
headers=HEADERS,
json={"query": prompt, "namespace": namespace, "threshold": 0.85},
)
hit = r.json().get("hit")
if hit:
return hit["value"] # Cache hit — no LLM call needed 🎉
# 2. Cache miss — call LLM
response = llm_fn(prompt)
# 3. Store in semantic cache
key = hashlib.sha256(prompt.encode()).hexdigest()[:16]
requests.post(
f"{CACHLY_URL}/api/v1/instances/{INSTANCE_ID}/cache/semantic",
headers=HEADERS,
json={"key": key, "value": response, "namespace": namespace, "prompt": prompt},
)
return response
# Usage with any LLM:
import openai
client = openai.OpenAI()
def call_gpt(prompt: str) -> str:
return client.chat.completions.create(
model="gpt-4o", messages=[{"role": "user", "content": prompt}]
).choices[0].message.content
answer = cached_llm_call("What is the capital of France?", call_gpt)
```
### Environment Setup for Python Agents
```bash
pip install requests python-dotenv
# .env
CACHLY_URL=https://api.cachly.dev
CACHLY_JWT=cky_live_... # from cachly.dev → Dashboard → API Keys
CACHLY_BRAIN_INSTANCE_ID=... # from cachly.dev → Dashboard → Brain
```
Get your free instance at **[cachly.dev/setup-ai](https://cachly.dev/setup-ai)** — no credit card required.
---
## Links
- 📖 [cachly.dev docs](https://cachly.dev/docs)
- 🧠 [AI Memory / MCP Server](https://cachly.dev/docs/ai-memory)
- 📦 [`@cachly-dev/mcp-server`](https://www.npmjs.com/package/@cachly-dev/mcp-server) — give your AI editor persistent memory (51 MCP tools)
- 🤖 [OpenClaw](https://openclaw.dev)
- 📦 [npm](https://www.npmjs.com/package/@cachly-dev/openclaw)
- 🐛 [Issues](https://github.com/cachly-dev/cachly/issues)
---
Apache-2.0 © [cachly.dev](https://cachly.dev)