https://github.com/fstamatelopoulos/cerefox

Personal knowledge base with hybrid search and read/write access for AI agents
https://github.com/fstamatelopoulos/cerefox
ai-knowledge-base hybrid-search mcp pgvector python rag semantic-search supabase
Last synced: about 2 months ago
JSON representation
Personal knowledge base with hybrid search and read/write access for AI agents
Host: GitHub
URL: https://github.com/fstamatelopoulos/cerefox
Owner: fstamatelopoulos
License: apache-2.0
Created: 2026-03-07T20:16:22.000Z (4 months ago)
Default Branch: main
Last Pushed: 2026-05-27T18:16:58.000Z (about 2 months ago)
Last Synced: 2026-05-27T18:24:21.475Z (about 2 months ago)
Topics: ai-knowledge-base, hybrid-search, mcp, pgvector, python, rag, semantic-search, supabase
Language: Python
Homepage:
Size: 3.15 MB
Stars: 9
Watchers: 1
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: SECURITY.md
Awesome Lists containing this project

README

          


  



# Cerefox

**User-owned shared memory for AI agents.** A persistent, curated knowledge layer that multiple AI tools can read and write, backed by Postgres + pgvector.

[![Apache 2.0 License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](LICENSE)

[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://python.org)

[![Node 20+](https://img.shields.io/badge/node-20+-green.svg)](https://nodejs.org)

---

## What is Cerefox?

Cerefox is a **user-owned knowledge memory layer**: a persistent, curated knowledge base that sits between you and the AI tools you use.

The primary use case is **shared memory across AI agents**: knowledge written by one tool (Claude, ChatGPT, Cursor, or a custom agent) becomes immediately available to all others. This prevents context fragmentation, so the same information doesn't have to be re-explained in every session.

Cerefox is **asynchronous shared memory, not a message bus**. It solves the persistent context problem: knowledge written in one context is findable in any other. A user curates project documents and an AI agent discovers them through search without being told they exist. An agent writes a decision during a coding session and a different agent, on a different machine, running a different model, finds it days later. A user switches from one AI tool to another and the accumulated knowledge carries over without manual transfer. The boundaries that Cerefox dissolves are between agents, between sessions, between human and machine, and across time.

> For the full project vision, principles, and roadmap direction, see [`docs/research/vision.md`](docs/research/vision.md).

- **Agent-first, not human-first**: AI agents are first-class citizens on both sides: they read *and* write; humans curate and validate

- **Own your data**: everything lives in a Postgres database you control (Supabase free tier or self-hosted)

- **Cross-agent coordination**: agents on separate machines and runtimes coordinate through persistent shared context (see `docs/guides/agent-coordination.md`)

- **Not a note-taking app**: Cerefox is knowledge *infrastructure*, not a replacement for Obsidian, Notion, or Bear; those tools handle authoring, Cerefox handles indexing and agent access

- **Hybrid search**: full-text + semantic search finds relevant knowledge even with fuzzy or conceptual queries

- **Any agent, anywhere**: remote MCP via Supabase Edge Functions; ChatGPT via Custom GPT + GPT Actions

- **Keep it cheap**: Supabase free tier + low-cost cloud embeddings; see `docs/guides/operational-cost.md`

---

## Features

| Feature | Details |

|---------|---------|

| **Hybrid search** | Combines full-text (BM25) + semantic (vector) search with a configurable alpha weight |

| **Metadata-filtered search** | JSONB containment filter (`@>`) on document metadata; server-side, GIN-indexed; composable with project filter and all search modes; available across all access paths (MCP, CLI, web UI, GPT Actions) |

| **Metadata search** | Standalone metadata-only search (no text query needed); find documents by key-value criteria, project, and date range; optional content inclusion with byte budget; dedicated MCP tool, CLI command, and web UI page |

| **Project discovery** | `cerefox_list_projects` MCP tool for agents to discover available projects; all search results include human-readable `project_names` alongside UUIDs |

| **Heading-aware chunking** | Greedy section accumulation — H1/H2/H3 sections accumulate until MAX_CHUNK_CHARS; heading breadcrumb preserved per chunk |

| **Cloud embeddings** | OpenAI `text-embedding-3-small` (768-dim) via API — or swap to Fireworks AI |

| **Remote MCP endpoint** | `cerefox-mcp` Supabase Edge Function — MCP Streamable HTTP; connect Claude Desktop, Claude Code, or Cursor with just a URL and anon key; no Python install needed |

| **Local MCP server** | `cerefox mcp` stdio server -- local alternative with zero Edge Function usage, lower latency, and offline support; requires Python + uv + local clone |

| **Web UI** | React + TypeScript SPA (Mantine UI) at `/app/`; FastAPI JSON API backend; Markdown viewer, search with 4 modes, document editing, project management |

| **Multi-format ingest** | `.md`, `.txt`, `.pdf` (pypdf), `.docx` (python-docx) |

| **Batch ingest** | `cerefox ingest-dir` recurses directories |

| **Deduplication** | SHA-256 content hash; re-ingesting the same file is a no-op |

| **Backup and restore** | JSON snapshots, optional git commit |

| **Small-to-big retrieval** | `cerefox_context_expand` RPC returns chunk neighbours for richer context |

| **Audit log** | Immutable, append-only log of all write operations (create, update, delete, status change). Author attribution with `author_type` ('user' or 'agent'). Browsable via web UI, queryable via MCP tool and Edge Function |

| **Review status** | Schema-level `review_status` on documents (`approved` / `pending_review`). Auto-transitions based on author_type. Filterable on search |

| **Version governance** | Version archival (protect specific versions from cleanup), configurable retention (`CEREFOX_VERSION_CLEANUP_ENABLED`), version diff viewer |

| **Usage tracking** | Opt-in logging of all operations (reads and writes) across all access paths. Tracks operation type, access path (remote-mcp, local-mcp, edge-function, webapp, cli), requestor identity, query text, and result count. Controlled via `cerefox config-set usage_tracking_enabled true/false` -- no redeploy needed |

| **Analytics dashboard** | `/app/analytics` -- 7 interactive charts: calls per day, access path breakdown, top documents, top readers, operations donut, reader word cloud, and reader-to-document access pattern visualization (HEB). Date range + project + path filters. CSV export. |

---

## Project status

Cerefox is a single-maintainer open-source project, currently at **v0.5.0** and in

its **"Polish & Distribution" arc** — the work that takes it from "runnable from a

git clone" to "installable like any other modern CLI". Highlights of what's

already shipped (full history in [`CHANGELOG.md`](CHANGELOG.md)):

- A complete Cerefox feature surface: hybrid search, metadata-filtered search,

  small-to-big retrieval, implicit versioning with a per-document audit log,

  soft-delete with a trash bin, multi-project membership.

- Three integration paths for AI agents: local stdio MCP, remote MCP via

  Supabase Edge Functions, and a Custom GPT via GPT Actions. Plus a CLI fallback

  for local coding agents.

- A React + Mantine web UI at `/app/` with full read/write coverage of the

  knowledge base.

**Where the project is headed** is captured in

[`docs/specs/polish-and-distribution-design.md`](docs/specs/polish-and-distribution-design.md)

(also tracked iteration-by-iteration in [`docs/plan.md`](docs/plan.md)):

| Release | Theme | Ships |

|---|---|---|

| v0.2.0 | Foundations + first TS artifact | `VERSION` source-of-truth · OSS hygiene files · SemVer + script-language policies · `scripts/cut_release.ts` (first TS script outside Edge Functions and frontend) |

| v0.3.0 | "Install anywhere" | `~/.cerefox/` user-state root · `cerefox docs` CLI + `/app/help` web UI · schema-version-mismatch banner · first two Python scripts ported to TS (`sync_docs.ts`, `db_status.ts`) · `_shared/` TS module seeded |

| v0.4.x | TS MCP server | Local `cerefox mcp` becomes a TypeScript Bun/Node process, published as [`@cerefox/memory`](https://www.npmjs.com/package/@cerefox/memory) on npm · 10th MCP tool `cerefox_get_help` · `_shared/mcp-tools/` shared by remote EF + local server · OIDC trusted publishing |

| **v0.5.0** (current) | TS CLI | `cerefox` binary added to `@cerefox/memory` (same package, growing surface) — callable from any directory, no Python install needed · 6 new lifecycle commands (`init`, `doctor`, `status`, `configure-agent`, `self-update`, `sync-self-docs`) · automatic self-doc ingest (Layer 2 of MCP discoverability) · tab completion for bash/zsh/fish · documented exit codes · Python CLI deprecated (functional through v0.7) |

| v0.6.0 – v0.7.0 | TS web server + ingestion pipeline | FastAPI → Hono · Python ingestion → TS · all inside `@cerefox/memory` (single npm package, growing surface) |

| v0.8.0 – v0.9.0 | Python retirement | Deprecation banners → removal |

| **v1.0.0** | Stability commitment | Strict SemVer becomes binding; long-lived API contract |

Until v1.0.0 the SemVer policy in [`CONTRIBUTING.md`](CONTRIBUTING.md) is

aspirational — breaking changes can land in minor versions when there's a good

reason. After v1.0.0 it's binding. **The npm install path is now open** as of

v0.4.0: end users can run `npx -y --package=@cerefox/memory cerefox mcp` for

the local MCP server (no Python required). The Python CLI + web UI + ingestion

pipeline still need a clone + `uv` install; that changes through v0.5–v0.7 as

the remaining components migrate. (v0.4–v0.5.0 also shipped a dedicated

`cerefox-mcp` bin; dropped in v0.5.1 as redundant with `cerefox mcp`.)

---

## Getting Started

> **Full walkthrough**: `docs/guides/quickstart.md` -- zero to first ingested document and connected agent in 15 minutes.

>

> **Upgrading from v0.4.x?** See [`docs/guides/migration-v0.5.md`](docs/guides/migration-v0.5.md) — your existing MCP configs keep working; the new `cerefox` CLI is opt-in.

### Quickstart (npm path — recommended as of v0.5.0)

```bash

# One-line install (detects Bun or installs it, falls back to npm):

curl -fsSL https://github.com/fstamatelopoulos/cerefox/releases/latest/download/install.sh | sh

# Or direct:

npm install -g @cerefox/memory      # Node ≥ 20

# bun install -g @cerefox/memory    # alternative — faster

cerefox init                         # interactive 5-step setup

cerefox doctor                       # verify the install

cerefox configure-agent --tool claude-code   # wire up Claude Code

```

That's the path for end users who don't need to hack on Cerefox itself. For

the schema deploy + web UI + ingestion pipeline, see the "Building from

source" section below (Python is still the path for those in v0.5).

### Prerequisites for the npm install path

| Tool | Why | Install |

|---|---|---|

| **Node 20+** or **Bun 1.0+** | Runtime for the `cerefox` bin (includes `cerefox mcp` subcommand for MCP clients) | [nodejs.org](https://nodejs.org/) · [bun.sh](https://bun.sh) |

| A Supabase account | Database + pgvector + Edge Functions (free tier is enough) | [supabase.com](https://supabase.com/) |

| An embedding API key | OpenAI `text-embedding-3-small` (default) or Fireworks AI | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |

### Building from source / Contributors

For the schema deploy, the web UI, the ingestion pipeline, or contributing

to Cerefox itself:

| Tool | Why | Install |

|---|---|---|

| **Python 3.11+** with [`uv`](https://docs.astral.sh/uv/) | Schema deploy (`scripts/db_deploy.py`), web server (until v0.6), ingestion pipeline (until v0.7) | `curl -LsSf https://astral.sh/uv/install.sh \| sh` |

| **Node 20+** with `npm` | One-time `npm install && npm run build` to produce the React SPA bundle the web UI serves | [nodejs.org](https://nodejs.org/) or `nvm install 20` |

| **[Bun](https://bun.sh) 1.x** | TypeScript scripts (`scripts/*.ts`), `_shared/` tests, `@cerefox/memory` build | `curl -fsSL https://bun.sh/install \| bash` |

| A Supabase account + embedding API key | Same as above | (same links) |

Full contributor setup in [CONTRIBUTING.md](CONTRIBUTING.md).

### 1. Clone and install

```bash

git clone https://github.com/fstamatelopoulos/cerefox.git

cd cerefox

uv sync

```

### 2. Set up Supabase (free)

1. Sign up at [supabase.com](https://supabase.com) — a GitHub login works fine.

2. Create a new project. Give it a name (e.g. `cerefox`) and set a database password (store it somewhere safe — you'll need it once).

3. On the project creation screen leave the defaults:

   - **Enable Data API** ✅ — required (the Python client uses this)

   - **Enable automatic RLS** — leave unchecked (single-user app, not needed)

### 3. Configure `.env`

```bash

cp .env.example .env

```

Open `.env` and fill in these values:

| Variable | Where to find it |

|---|---|

| `CEREFOX_SUPABASE_URL` | Supabase → Project Settings → API → Project URL |

| `CEREFOX_SUPABASE_KEY` | Supabase → Project Settings → API Keys → **Secret key** (`sb_secret_…`). Legacy `service_role` JWT also works. |

| `CEREFOX_DATABASE_URL` | Supabase → Project Settings → Database → **Connection pooling → Session Pooler** (port `5432`). See notes below. |

| `OPENAI_API_KEY` | [platform.openai.com/api-keys](https://platform.openai.com/api-keys) |

| `CEREFOX_SUPABASE_ANON_KEY` (only for Edge Functions / MCP / GPT Actions) | Supabase → Project Settings → API Keys → **Legacy → anon** (JWT, `eyJ…`). The new `sb_publishable_…` does **not** work for Edge Function Bearer auth. See [`docs/guides/setup-supabase.md` → Supabase API keys (2026)](docs/guides/setup-supabase.md#supabase-api-keys-2026). |

**`CEREFOX_DATABASE_URL` notes:**

- Use the **Session Pooler** (port `5432`), not the Transaction Pooler (`6543`, no DDL) or the Direct Connection (IPv6-only on free tier).

- The Session Pooler may not be a first-class option in the new "Connect" dialog; either find it under **Connection pooling**, or take the Transaction Pooler URI and change `:6543` → `:5432`.

- The username must include your project ref: `postgres.your-project-ref` — not just `postgres`. Without the suffix Supabase returns "Tenant or user not found".

- Append `?sslmode=require` to enforce TLS.

- Full reference: [`docs/guides/setup-supabase.md` → Connection pooling (2026)](docs/guides/setup-supabase.md#connection-pooling-2026).

### 4. Deploy the schema

```bash

uv run python scripts/db_deploy.py

```

### 5. Deploy the Edge Functions

Edge Functions handle server-side embedding so AI agents never need a local model. Requires the [Supabase CLI](https://supabase.com/docs/guides/cli).

```bash

npx supabase functions deploy cerefox-search

npx supabase functions deploy cerefox-ingest

npx supabase functions deploy cerefox-mcp

```

Set your OpenAI key as a Supabase secret (used by the functions at runtime):

```bash

npx supabase secrets set OPENAI_API_KEY=sk-...your-key...

```

### 6. Build the web UI

The web UI is a React + Vite SPA. Build it once now (and re-run after any frontend change):

```bash

cd frontend

npm install

npm run build

cd ..

```

This produces `frontend/dist/`, which `uv run cerefox web` serves at `/app/`. Skipping this step is the most common "the web UI returns 404" cause.

### 7. Ingest a document and open the web UI

```bash

uv run cerefox ingest my-notes.md --title "My notes"

uv run cerefox web                # → http://localhost:8000

```

**Optional**: ingest the Cerefox docs themselves so AI agents can look up project details:

```bash

# Create a "cerefox" project first, then sync README + all docs/ into it.

uv run cerefox create-project cerefox

uv run python scripts/sync_docs.py

```

Re-run `sync_docs.py` any time after updating documentation to keep the knowledge base current.

**Try with sample data**: the `test-data/` directory contains six diverse markdown documents

you can ingest to experiment with search before adding your own content:

```bash

uv run cerefox ingest-dir test-data/ --recursive

```

---

## Architecture

```

cerefox_documents     cerefox_chunks

─────────────────     ───────────────────────────────

id, title, source     id, document_id, chunk_index

content_hash          heading_path, heading_level

project_id            content, char_count

metadata (JSONB)      embedding_primary (VECTOR 768)

chunk_count           fts (TSVECTOR, generated)

```

Search RPCs (MCP tools): `cerefox_hybrid_search`, `cerefox_fts_search`,

`cerefox_semantic_search`, `cerefox_search_docs`, `cerefox_reconstruct_doc`,

`cerefox_context_expand`, `cerefox_save_note`

---

## Connecting AI agents

**Option 1 — Remote MCP (recommended)** — just a URL, a legacy anon JWT (Supabase → Project Settings → API Keys → **Legacy → anon**, not the new `sb_publishable_…` key — see [setup-supabase.md](docs/guides/setup-supabase.md#supabase-api-keys-2026)), and `npx`:

The `cerefox-mcp` Supabase Edge Function speaks MCP Streamable HTTP. No Python, no local

repo clone — works from any machine with Node.js installed.

```bash

# Claude Code (native HTTP transport)

claude mcp add --transport http cerefox \

  https://.supabase.co/functions/v1/cerefox-mcp \

  --header "Authorization: Bearer "

```

For Claude Desktop, use [`supergateway`](https://www.npmjs.com/package/supergateway) as

a stdio-to-HTTP bridge in `claude_desktop_config.json`:

```json

{

  "mcpServers": {

    "cerefox": {

      "command": "npx",

      "args": [

        "-y", "supergateway",

        "--streamableHttp", "https://.supabase.co/functions/v1/cerefox-mcp",

        "--header", "Authorization: Bearer "

      ]

    }

  }

}

```

For Cursor, use `url` + `headers.Authorization` in `mcp.json`.

**Option 2 — ChatGPT (web + desktop)** via Custom GPT + GPT Actions (requires ChatGPT Plus):

Create a Custom GPT and add an Action pointing at the Supabase Edge Functions — no local

install, no MCP config, works from both ChatGPT web and desktop. Uses the Supabase anon key

as Bearer auth.

**Option 3 — Local stdio MCP (legacy fallback)** — requires Python + uv + local repo clone:

```json

{

  "mcpServers": {

    "cerefox": {

      "command": "uv",

      "args": ["--directory", "/path/to/cerefox", "run", "cerefox", "mcp"]

    }

  }

}

```

**Option 4 — Shell CLI for local coding agents** — no MCP setup at all:

Modern local coding agents (Claude Code, OpenAI Codex CLI, opencode, OpenClaw, Hermes, …)

have a Bash tool. If you've already got Cerefox checked out and your `.env` configured for

the CLI, you can simply point the agent at the repo path in its system prompt / project

memory, and tell it to read `AGENT_GUIDE.md`. The agent reads and writes Cerefox by

running `uv run cerefox …`. No `.mcp.json`, no `claude mcp add`, no Claude Desktop edit.

Useful when you want one Cerefox checkout to serve any number of local agents in the

same project with zero per-agent configuration.

Full setup for all options: `docs/guides/connect-agents.md`

---

## Documentation

| Guide | Description |

|-------|-------------|

| `docs/guides/quickstart.md` | Zero to first document in 15 minutes |

| `docs/guides/setup-supabase.md` | Supabase project setup |

| `docs/guides/configuration.md` | All configuration options |

| `docs/guides/connect-agents.md` | MCP agent integration |

| `docs/guides/cli.md` | Complete CLI reference (all `cerefox` subcommands) |

| `docs/guides/agent-coordination.md` | Multi-agent coordination patterns and best practices |

| `docs/guides/response-limits.md` | Response size limits: per-path behaviour and tuning |

| `docs/guides/access-paths.md` | All access layers, credentials, and integration paths |

| `docs/guides/setup-local.md` | Local Docker setup |

| `docs/guides/ops-scripts.md` | Backup, restore, migrate, sync docs |

| `docs/guides/setup-cloud-run.md` | Google Cloud Run deployment |

| `docs/guides/operational-cost.md` | Cost breakdown for all deployment options |

| `docs/guides/upgrading.md` | Standard upgrade checklist, version-specific notes |

| `CONTRIBUTING.md` | How to contribute to Cerefox |

---

## License

Apache 2.0 — see LICENSE.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fstamatelopoulos/cerefox

Awesome Lists containing this project

README