https://github.com/jagjeevanak/mem-oracle
A locally-running documentation oracle that indexes web docs and injects relevant snippets into Claude Code context.
https://github.com/jagjeevanak/mem-oracle
ai claude-code coding-agent cursor opencode plugin
Last synced: 7 days ago
JSON representation
A locally-running documentation oracle that indexes web docs and injects relevant snippets into Claude Code context.
- Host: GitHub
- URL: https://github.com/jagjeevanak/mem-oracle
- Owner: JagjeevanAK
- License: mit
- Created: 2026-01-20T08:59:59.000Z (14 days ago)
- Default Branch: main
- Last Pushed: 2026-01-24T07:26:58.000Z (10 days ago)
- Last Synced: 2026-01-24T09:23:40.044Z (10 days ago)
- Topics: ai, claude-code, coding-agent, cursor, opencode, plugin
- Language: TypeScript
- Homepage: https://mem-oracle.vercel.app
- Size: 559 KB
- Stars: 2
- Watchers: 0
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# mem-oracle
A locally-running documentation oracle that indexes web docs and injects relevant snippets into Claude Code context.
## Features
- **Seed-first indexing**: Index the seed page immediately, then continue background crawling
- **Local storage**: SQLite metadata + disk-based vector store (no external dependencies)
- **Pluggable embeddings**: Local TF-IDF fallback, or use OpenAI/Voyage/Cohere APIs
- **Claude Code plugin**: Hook scripts that auto-inject relevant docs into prompts
- **Optional MCP server**: Explicit tool calls for search/index operations
## Quick Start
### Claude Code Plugin (Recommended)
```bash
# In Claude Code terminal:
/plugin add jagjeevanak/mem-oracle
```
That's it! The plugin will:
- Auto-install dependencies
- Auto-start the worker service in the background
- Auto-inject relevant documentation into your prompts
### Manual Installation
```bash
# Clone and install
git clone https://github.com/jagjeevanak/mem-oracle.git
cd mem-oracle
bun install
# Start the worker service
bun run worker
# In another terminal, index some docs
bun run src/index.ts index https://nextjs.org/docs/getting-started
# Search indexed docs
bun run src/index.ts search "how to use server components"
```
## Usage
### CLI Commands
```bash
# Start the worker HTTP service (default: http://127.0.0.1:7432)
bun run src/index.ts worker
# Start the MCP server (stdio)
bun run src/index.ts mcp
# Index a documentation URL
bun run src/index.ts index
# Search indexed documentation
bun run src/index.ts search
# Show indexing status
bun run src/index.ts status
```
### Worker API
The worker service exposes these HTTP endpoints:
```
POST /index - Index a documentation site
POST /retrieve - Search for relevant snippets
GET /status - Get indexing status
DELETE /docset/:id - Delete a docset
GET /health - Health check
```
#### Index Request
```json
{
"baseUrl": "https://nextjs.org",
"seedSlug": "/docs/getting-started",
"name": "Next.js Docs",
"waitForSeed": true
}
```
#### Retrieve Request
```json
{
"query": "how to use server components",
"topK": 5
}
```
### MCP Tools
When running as an MCP server, these tools are available:
- `search_docs` - Search indexed documentation
- `get_snippets` - Get specific documentation chunks
- `index_docs` - Index a documentation website
- `index_status` - Get indexing status
## Configuration
Configuration is stored in `~/.mem-oracle/config.json`:
```json
{
"dataDir": "~/.mem-oracle",
"embedding": {
"provider": "local",
"model": "all-MiniLM-L6-v2",
"batchSize": 32
},
"vectorStore": {
"provider": "local"
},
"worker": {
"port": 7432,
"host": "127.0.0.1"
},
"crawler": {
"concurrency": 3,
"requestDelay": 500,
"timeout": 30000,
"maxPages": 1000
}
}
```
### Using API Embeddings
To use OpenAI embeddings:
```json
{
"embedding": {
"provider": "openai",
"model": "text-embedding-3-small",
"apiKey": "sk-..."
}
}
```
Or Voyage AI:
```json
{
"embedding": {
"provider": "voyage",
"model": "voyage-2",
"apiKey": "..."
}
}
```
## Claude Code Integration
### Install as Plugin
```bash
# In Claude Code terminal
> /plugin add jagjeevanak/mem-oracle
> /plugin install mem-oracle
```
Then restart Claude Code. The plugin will automatically:
- Check if the worker service is running on session start
- Retrieve relevant docs when you submit prompts
- Auto-index documentation URLs detected in your prompts
### Manual Setup
1. Start the worker service:
```bash
bun run worker
```
2. The plugin hooks in `.claude-plugin/hooks/` handle lifecycle events
### As MCP Server
Add to your Claude Code MCP configuration:
```json
{
"mcpServers": {
"mem-oracle": {
"command": "bun",
"args": ["run", "/path/to/mem-oracle/src/index.ts", "mcp"]
}
}
}
```
## Architecture
### System Overview
```mermaid
flowchart TB
subgraph Client["Client Layer"]
CC[Claude Code]
CLI[CLI]
end
subgraph Integration["Integration Layer"]
PH[Plugin Hooks]
MCP[MCP Server]
end
subgraph Service["Service Layer"]
WS[Worker Service
:7432]
OR[Orchestrator]
end
subgraph Processing["Processing Pipeline"]
FE[Fetcher]
EX[Extractor]
CH[Chunker]
CR[Crawler]
end
subgraph Embedding["Embedding Layer"]
direction LR
LE[Local TF-IDF]
OE[OpenAI]
VE[Voyage]
CE[Cohere]
end
subgraph Storage["Storage Layer"]
SQL[(SQLite
Metadata)]
VS[(Vector Store)]
CA[(Content Cache)]
end
CC --> PH
CC -.-> MCP
CLI --> WS
PH --> WS
MCP --> OR
WS --> OR
OR --> FE
OR --> EX
OR --> CH
OR --> CR
FE --> CA
CR --> FE
EX --> CH
CH --> LE & OE & VE & CE
LE & OE & VE & CE --> VS
OR --> SQL
OR --> VS
```
### Indexing Flow
```mermaid
sequenceDiagram
participant U as User/Claude
participant W as Worker
participant O as Orchestrator
participant F as Fetcher
participant E as Extractor
participant C as Chunker
participant EM as Embedder
participant DB as SQLite
participant VS as VectorStore
U->>W: POST /index {baseUrl, seedSlug}
W->>O: indexDocset(input)
O->>DB: createDocset()
O->>DB: createPage(seedUrl)
Note over O,VS: Seed Page Indexing (Synchronous)
O->>F: fetch(seedUrl)
F-->>O: HTML/MD content
O->>E: extract(content)
E-->>O: {title, text, links}
O->>C: chunk(extractedContent)
C-->>O: chunks[]
O->>EM: embed(chunks)
EM-->>O: vectors[]
O->>VS: upsert(vectors)
O->>DB: updatePage(indexed)
O-->>W: docset
W-->>U: {docsetId, status}
Note over O,VS: Background Crawling (Async)
loop For each discovered link
O->>DB: getNextPendingPage()
O->>F: fetch(pageUrl)
O->>E: extract()
O->>C: chunk()
O->>EM: embed()
O->>VS: upsert()
O->>DB: updatePage(indexed)
end
```
### Retrieval Flow
```mermaid
sequenceDiagram
participant U as User/Claude
participant W as Worker
participant O as Orchestrator
participant EM as Embedder
participant VS as VectorStore
participant DB as SQLite
U->>W: POST /retrieve {query}
W->>O: search(query)
O->>EM: embedSingle(query)
EM-->>O: queryVector
O->>DB: listDocsets()
DB-->>O: docsets[]
loop For each docset
O->>VS: search(namespace, queryVector, topK)
VS-->>O: results[]
end
O->>O: sort & merge results
O-->>W: SearchResult[]
W-->>U: {results, query}
```
### Data Flow
```mermaid
flowchart LR
subgraph Input
URL[Doc URL]
end
subgraph Fetch
HTTP[HTTP Request]
CACHE[Cache Check]
end
subgraph Extract
HTML[HTML Parser]
MD[MD Parser]
READ[Readability]
end
subgraph Process
CHUNK[Chunker]
EMBED[Embedder]
end
subgraph Store
META[Metadata
SQLite]
VEC[Vectors
JSON]
CONT[Content
Cache]
end
URL --> CACHE
CACHE -->|miss| HTTP
CACHE -->|hit| Extract
HTTP --> CONT
HTTP --> Extract
HTML --> READ
MD --> Extract
READ --> CHUNK
CHUNK --> EMBED
EMBED --> VEC
CHUNK --> META
```
## Development
```bash
# Run with hot reload
bun run dev
# Type check
bun run typecheck
# Run tests
bun test
```
## License
MIT