https://github.com/jagjeevanak/mem-oracle

A locally-running documentation oracle that indexes web docs and injects relevant snippets into Claude Code context.
https://github.com/jagjeevanak/mem-oracle

ai claude-code coding-agent cursor opencode plugin

Last synced: 7 days ago
JSON representation

A locally-running documentation oracle that indexes web docs and injects relevant snippets into Claude Code context.

Host: GitHub
URL: https://github.com/jagjeevanak/mem-oracle
Owner: JagjeevanAK
License: mit
Created: 2026-01-20T08:59:59.000Z (14 days ago)
Default Branch: main
Last Pushed: 2026-01-24T07:26:58.000Z (10 days ago)
Last Synced: 2026-01-24T09:23:40.044Z (10 days ago)
Topics: ai, claude-code, coding-agent, cursor, opencode, plugin
Language: TypeScript
Homepage: https://mem-oracle.vercel.app
Size: 559 KB
Stars: 2
Watchers: 0
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

README

          # mem-oracle

A locally-running documentation oracle that indexes web docs and injects relevant snippets into Claude Code context.

## Features

- **Seed-first indexing**: Index the seed page immediately, then continue background crawling

- **Local storage**: SQLite metadata + disk-based vector store (no external dependencies)

- **Pluggable embeddings**: Local TF-IDF fallback, or use OpenAI/Voyage/Cohere APIs

- **Claude Code plugin**: Hook scripts that auto-inject relevant docs into prompts

- **Optional MCP server**: Explicit tool calls for search/index operations

## Quick Start

### Claude Code Plugin (Recommended)

```bash

# In Claude Code terminal:

/plugin add jagjeevanak/mem-oracle

```

That's it! The plugin will:

- Auto-install dependencies

- Auto-start the worker service in the background

- Auto-inject relevant documentation into your prompts

### Manual Installation

```bash

# Clone and install

git clone https://github.com/jagjeevanak/mem-oracle.git

cd mem-oracle

bun install

# Start the worker service

bun run worker

# In another terminal, index some docs

bun run src/index.ts index https://nextjs.org/docs/getting-started

# Search indexed docs

bun run src/index.ts search "how to use server components"

```

## Usage

### CLI Commands

```bash

# Start the worker HTTP service (default: http://127.0.0.1:7432)

bun run src/index.ts worker

# Start the MCP server (stdio)

bun run src/index.ts mcp

# Index a documentation URL

bun run src/index.ts index 

# Search indexed documentation

bun run src/index.ts search 

# Show indexing status

bun run src/index.ts status

```

### Worker API

The worker service exposes these HTTP endpoints:

```

POST /index     - Index a documentation site

POST /retrieve  - Search for relevant snippets

GET  /status    - Get indexing status

DELETE /docset/:id - Delete a docset

GET  /health    - Health check

```

#### Index Request

```json

{

  "baseUrl": "https://nextjs.org",

  "seedSlug": "/docs/getting-started",

  "name": "Next.js Docs",

  "waitForSeed": true

}

```

#### Retrieve Request

```json

{

  "query": "how to use server components",

  "topK": 5

}

```

### MCP Tools

When running as an MCP server, these tools are available:

- `search_docs` - Search indexed documentation

- `get_snippets` - Get specific documentation chunks

- `index_docs` - Index a documentation website

- `index_status` - Get indexing status

## Configuration

Configuration is stored in `~/.mem-oracle/config.json`:

```json

{

  "dataDir": "~/.mem-oracle",

  "embedding": {

    "provider": "local",

    "model": "all-MiniLM-L6-v2",

    "batchSize": 32

  },

  "vectorStore": {

    "provider": "local"

  },

  "worker": {

    "port": 7432,

    "host": "127.0.0.1"

  },

  "crawler": {

    "concurrency": 3,

    "requestDelay": 500,

    "timeout": 30000,

    "maxPages": 1000

  }

}

```

### Using API Embeddings

To use OpenAI embeddings:

```json

{

  "embedding": {

    "provider": "openai",

    "model": "text-embedding-3-small",

    "apiKey": "sk-..."

  }

}

```

Or Voyage AI:

```json

{

  "embedding": {

    "provider": "voyage",

    "model": "voyage-2",

    "apiKey": "..."

  }

}

```

## Claude Code Integration

### Install as Plugin

```bash

# In Claude Code terminal

> /plugin add jagjeevanak/mem-oracle

> /plugin install mem-oracle

```

Then restart Claude Code. The plugin will automatically:

- Check if the worker service is running on session start

- Retrieve relevant docs when you submit prompts

- Auto-index documentation URLs detected in your prompts

### Manual Setup

1. Start the worker service:

```bash

bun run worker

```

2. The plugin hooks in `.claude-plugin/hooks/` handle lifecycle events

### As MCP Server

Add to your Claude Code MCP configuration:

```json

{

  "mcpServers": {

    "mem-oracle": {

      "command": "bun",

      "args": ["run", "/path/to/mem-oracle/src/index.ts", "mcp"]

    }

  }

}

```

## Architecture

### System Overview

```mermaid

flowchart TB

    subgraph Client["Client Layer"]

        CC[Claude Code]

        CLI[CLI]

    end

    subgraph Integration["Integration Layer"]

        PH[Plugin Hooks]

        MCP[MCP Server]

    end

    subgraph Service["Service Layer"]

        WS[Worker Service
:7432]

        OR[Orchestrator]

    end

    subgraph Processing["Processing Pipeline"]

        FE[Fetcher]

        EX[Extractor]

        CH[Chunker]

        CR[Crawler]

    end

    subgraph Embedding["Embedding Layer"]

        direction LR

        LE[Local TF-IDF]

        OE[OpenAI]

        VE[Voyage]

        CE[Cohere]

    end

    subgraph Storage["Storage Layer"]

        SQL[(SQLite
Metadata)]

        VS[(Vector Store)]

        CA[(Content Cache)]

    end

    CC --> PH

    CC -.-> MCP

    CLI --> WS

    PH --> WS

    MCP --> OR

    WS --> OR

    OR --> FE

    OR --> EX

    OR --> CH

    OR --> CR

    FE --> CA

    CR --> FE

    EX --> CH

    CH --> LE & OE & VE & CE

    LE & OE & VE & CE --> VS

    OR --> SQL

    OR --> VS

```

### Indexing Flow

```mermaid

sequenceDiagram

    participant U as User/Claude

    participant W as Worker

    participant O as Orchestrator

    participant F as Fetcher

    participant E as Extractor

    participant C as Chunker

    participant EM as Embedder

    participant DB as SQLite

    participant VS as VectorStore

    U->>W: POST /index {baseUrl, seedSlug}

    W->>O: indexDocset(input)

    O->>DB: createDocset()

    O->>DB: createPage(seedUrl)

    

    Note over O,VS: Seed Page Indexing (Synchronous)

    O->>F: fetch(seedUrl)

    F-->>O: HTML/MD content

    O->>E: extract(content)

    E-->>O: {title, text, links}

    O->>C: chunk(extractedContent)

    C-->>O: chunks[]

    O->>EM: embed(chunks)

    EM-->>O: vectors[]

    O->>VS: upsert(vectors)

    O->>DB: updatePage(indexed)

    O-->>W: docset

    W-->>U: {docsetId, status}

    Note over O,VS: Background Crawling (Async)

    loop For each discovered link

        O->>DB: getNextPendingPage()

        O->>F: fetch(pageUrl)

        O->>E: extract()

        O->>C: chunk()

        O->>EM: embed()

        O->>VS: upsert()

        O->>DB: updatePage(indexed)

    end

```

### Retrieval Flow

```mermaid

sequenceDiagram

    participant U as User/Claude

    participant W as Worker

    participant O as Orchestrator

    participant EM as Embedder

    participant VS as VectorStore

    participant DB as SQLite

    U->>W: POST /retrieve {query}

    W->>O: search(query)

    O->>EM: embedSingle(query)

    EM-->>O: queryVector

    

    O->>DB: listDocsets()

    DB-->>O: docsets[]

    

    loop For each docset

        O->>VS: search(namespace, queryVector, topK)

        VS-->>O: results[]

    end

    

    O->>O: sort & merge results

    O-->>W: SearchResult[]

    W-->>U: {results, query}

```

### Data Flow

```mermaid

flowchart LR

    subgraph Input

        URL[Doc URL]

    end

    subgraph Fetch

        HTTP[HTTP Request]

        CACHE[Cache Check]

    end

    subgraph Extract

        HTML[HTML Parser]

        MD[MD Parser]

        READ[Readability]

    end

    subgraph Process

        CHUNK[Chunker]

        EMBED[Embedder]

    end

    subgraph Store

        META[Metadata
SQLite]

        VEC[Vectors
JSON]

        CONT[Content
Cache]

    end

    URL --> CACHE

    CACHE -->|miss| HTTP

    CACHE -->|hit| Extract

    HTTP --> CONT

    HTTP --> Extract

    HTML --> READ

    MD --> Extract

    READ --> CHUNK

    CHUNK --> EMBED

    EMBED --> VEC

    CHUNK --> META

```

## Development

```bash

# Run with hot reload

bun run dev

# Type check

bun run typecheck

# Run tests

bun test

```

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jagjeevanak/mem-oracle

Awesome Lists containing this project

README