{"id":45498302,"url":"https://github.com/nicholasglazer/gnosis-mcp","last_synced_at":"2026-04-10T14:01:58.693Z","repository":{"id":338554583,"uuid":"1158255214","full_name":"nicholasglazer/gnosis-mcp","owner":"nicholasglazer","description":"Serve your PostgreSQL docs to AI agents over MCP. 7 tools, 3 resources, 2 deps.","archived":false,"fork":false,"pushed_at":"2026-02-15T06:12:46.000Z","size":74,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-15T11:46:04.674Z","etag":null,"topics":["ai","asyncpg","developer-tools","documentation","knowledge-base","llm","mcp","mcp-server","model-context-protocol","pgvector","postgresql","python","rag","search","self-hosted","vector-search"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/gnosis-mcp/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nicholasglazer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"nicholasglazer"}},"created_at":"2026-02-15T03:33:57.000Z","updated_at":"2026-02-15T06:12:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/nicholasglazer/gnosis-mcp","commit_stats":null,"previous_names":["nicholasglazer/gnosis-mcp"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/nicholasglazer/gnosis-mcp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholasglazer%2Fgnosis-mcp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholasglazer%2Fgnosis-mcp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholasglazer%2Fgnosis-mcp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholasglazer%2Fgnosis-mcp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nicholasglazer","download_url":"https://codeload.github.com/nicholasglazer/gnosis-mcp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nicholasglazer%2Fgnosis-mcp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29721043,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T15:10:41.462Z","status":"ssl_error","status_checked_at":"2026-02-22T15:10:04.636Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","asyncpg","developer-tools","documentation","knowledge-base","llm","mcp","mcp-server","model-context-protocol","pgvector","postgresql","python","rag","search","self-hosted","vector-search"],"created_at":"2026-02-22T18:00:39.060Z","updated_at":"2026-04-10T14:01:58.673Z","avatar_url":"https://github.com/nicholasglazer.png","language":"Python","funding_links":["https://github.com/sponsors/nicholasglazer"],"categories":[],"sub_categories":[],"readme":"\u003c!-- mcp-name: io.github.nicholasglazer/gnosis --\u003e\n\u003cdiv align=\"center\"\u003e\n\n\u003ch1\u003eGnosis MCP\u003c/h1\u003e\n\n\u003cp\u003e\u003cstrong\u003eTurn your docs into a searchable knowledge base for AI agents.\u003cbr\u003epip install, ingest, serve.\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp\u003e\n  \u003ca href=\"https://pypi.org/project/gnosis-mcp/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/v/gnosis-mcp?color=blue\" alt=\"PyPI\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/gnosis-mcp/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/dm/gnosis-mcp?color=green\" alt=\"Downloads\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://pypi.org/project/gnosis-mcp/\"\u003e\u003cimg src=\"https://img.shields.io/pypi/pyversions/gnosis-mcp\" alt=\"Python\"\u003e\u003c/a\u003e\n  \u003ca href=\"LICENSE\"\u003e\u003cimg src=\"https://img.shields.io/badge/license-MIT-green\" alt=\"MIT License\"\u003e\u003c/a\u003e\n  \u003ca href=\"https://github.com/nicholasglazer/gnosis-mcp/actions\"\u003e\u003cimg src=\"https://github.com/nicholasglazer/gnosis-mcp/actions/workflows/publish.yml/badge.svg\" alt=\"CI\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\n  \u003ca href=\"#quick-start\"\u003eQuick Start\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#git-history\"\u003eGit History\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#web-crawl\"\u003eWeb Crawl\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#backends\"\u003eBackends\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#editor-integrations\"\u003eEditors\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#tools--resources\"\u003eTools\u003c/a\u003e \u0026middot;\n  \u003ca href=\"#embeddings\"\u003eEmbeddings\u003c/a\u003e \u0026middot;\n  \u003ca href=\"llms-full.txt\"\u003eFull Reference\u003c/a\u003e\n\u003c/p\u003e\n\n\u003ca href=\"#quick-start\"\u003e\u003cimg src=\"https://raw.githubusercontent.com/nicholasglazer/gnosis-mcp/main/demo-hero.gif\" alt=\"Gnosis MCP — ingest docs, search, view stats, serve\" width=\"700\"\u003e\u003c/a\u003e\n\u003cbr\u003e\n\u003csub\u003eIngest docs \u0026rarr; Search with highlights \u0026rarr; Stats overview \u0026rarr; Serve to AI agents\u003c/sub\u003e\n\n\u003c/div\u003e\n\n---\n\n### Without a docs server\n\n- LLMs hallucinate API signatures that don't exist\n- Entire files dumped into context — 3,000 to 8,000+ tokens each\n- Architecture decisions buried across dozens of files\n\n### With Gnosis MCP\n\n- `search_docs` returns ranked, highlighted excerpts (~600 tokens)\n- Real answers grounded in your actual documentation\n- Works across hundreds of docs instantly\n\n---\n\n## How gnosis-mcp compares\n\n| Feature | gnosis-mcp | Context7 | Grounded Docs | mcp-local-rag |\n|---------|:---------:|:-------:|:------------:|:------------:|\n| **Your own docs** | Yes | No (public libs only) | Yes | Yes |\n| **Zero config** (pip + 2 commands) | Yes | Yes | Yes | Yes |\n| **Local embeddings** (no API key) | ONNX | No | Requires provider | Yes |\n| **Hybrid search** (keyword + semantic) | FTS5/tsvector + vector | No | Optional | Yes |\n| **PostgreSQL backend** | pgvector + HNSW | No | No | No |\n| **Web crawling** | Built-in | No | Yes | No |\n| **Git history indexing** | Yes | No | No | No |\n| **File watching** (auto re-ingest) | Yes | No | No | No |\n| **REST API** | Yes | No | No | No |\n| **Write tools** (upsert/delete) | Yes | No | No | No |\n| **Link graph** (get_related) | Yes | No | No | No |\n| **Smart chunking** (heading-aware) | Yes | N/A | Yes | Yes |\n| **Content hashing** (skip unchanged) | Yes | N/A | No | No |\n| **llms.txt** | Yes | No | No | No |\n| **Test count** | 599+ | Unknown | Unknown | Unknown |\n| **Dependencies** | 2 (mcp + aiosqlite) | npm ecosystem | npm ecosystem | npm ecosystem |\n\n**TL;DR**: Context7 indexes *public library docs*. gnosis-mcp indexes *your own private docs*. They're complementary — use both.\n\n---\n\n## Features\n\n- **Zero config** — SQLite by default, `pip install` and go\n- **Hybrid search** — keyword (BM25) + semantic (local ONNX embeddings, no API key)\n- **Git history** — ingest commit messages as searchable context (`ingest-git`)\n- **Web crawl** — ingest documentation from any website via sitemap or link crawl\n- **Multi-format** — `.md` `.txt` `.ipynb` `.toml` `.csv` `.json` + optional `.rst` `.pdf`\n- **Auto-linking** — `relates_to` frontmatter creates a navigable document graph\n- **Watch mode** — auto-re-ingest on file changes\n- **PostgreSQL ready** — pgvector + tsvector when you need scale\n\n## Quick Start\n\n```bash\npip install gnosis-mcp\ngnosis-mcp ingest ./docs/       # loads docs into SQLite (auto-created)\ngnosis-mcp serve                # starts MCP server\n```\n\nThat's it. Your AI agent can now search your docs.\n\n**Want semantic search?** Add local embeddings — no API key needed:\n\n```bash\npip install gnosis-mcp[embeddings]\ngnosis-mcp ingest ./docs/ --embed   # ingest + embed in one step\ngnosis-mcp serve                    # hybrid search auto-activated\n```\n\nTest it before connecting to an editor:\n\n```bash\ngnosis-mcp search \"getting started\"           # keyword search\ngnosis-mcp search \"how does auth work\" --embed # hybrid semantic+keyword\ngnosis-mcp stats                               # see what was indexed\n```\n\n\u003cdetails\u003e\n\u003csummary\u003eTry without installing (uvx)\u003c/summary\u003e\n\n```bash\nuvx gnosis-mcp ingest ./docs/\nuvx gnosis-mcp serve\n```\n\n\u003c/details\u003e\n\n## Web Crawl\n\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"https://raw.githubusercontent.com/nicholasglazer/gnosis-mcp/main/demo-crawl.gif\" alt=\"Gnosis MCP — crawl docs with dry-run, fetch, search, SSRF protection\" width=\"700\"\u003e\n\u003cbr\u003e\n\u003csub\u003eDry-run discovery \u0026rarr; Crawl \u0026amp; ingest \u0026rarr; Search crawled docs \u0026rarr; SSRF protection\u003c/sub\u003e\n\u003c/div\u003e\n\n\u003cbr\u003e\n\nIngest docs from any website — no local files needed:\n\n```bash\npip install gnosis-mcp[web]\n\n# Crawl via sitemap (best for large doc sites)\ngnosis-mcp crawl https://docs.stripe.com/ --sitemap\n\n# Depth-limited link crawl with URL filter\ngnosis-mcp crawl https://fastapi.tiangolo.com/ --depth 2 --include \"/tutorial/*\"\n\n# Preview what would be crawled\ngnosis-mcp crawl https://docs.python.org/ --dry-run\n\n# Force re-crawl + embed for semantic search\ngnosis-mcp crawl https://docs.sveltekit.dev/ --sitemap --force --embed\n```\n\nRespects `robots.txt`, caches with ETag/Last-Modified for incremental re-crawl, and rate-limits requests (5 concurrent, 0.2s delay). Crawled pages use the URL as the document path and hostname as the category — searchable like any other doc.\n\n## Git History\n\nTurn commit messages into searchable context — your agent learns *why* things were built, not just *what* exists:\n\n```bash\ngnosis-mcp ingest-git .                                  # current repo, all files\ngnosis-mcp ingest-git /path/to/repo --since 6m           # last 6 months only\ngnosis-mcp ingest-git . --include \"src/*\" --max-commits 5 # filtered + limited\ngnosis-mcp ingest-git . --dry-run                         # preview without ingesting\ngnosis-mcp ingest-git . --embed                           # embed for semantic search\n```\n\nEach file's commit history becomes a searchable markdown document stored as `git-history/\u003cfile-path\u003e`. The agent finds it via `search_docs` like any other doc — no new tools needed. Incremental re-ingest skips files with unchanged history.\n\n## Editor Integrations\n\nAdd the server config to your editor — your AI agent gets `search_docs`, `get_doc`, and `get_related` tools automatically:\n\n```json\n{\n  \"mcpServers\": {\n    \"docs\": {\n      \"command\": \"gnosis-mcp\",\n      \"args\": [\"serve\"]\n    }\n  }\n}\n```\n\n| Editor | Config file |\n|--------|------------|\n| **Claude Code** | `.claude/mcp.json` (or [install as plugin](#claude-code-plugin)) |\n| **Cursor** | `.cursor/mcp.json` |\n| **Windsurf** | `~/.codeium/windsurf/mcp_config.json` |\n| **JetBrains** | Settings \u003e Tools \u003e AI Assistant \u003e MCP Servers |\n| **Cline** | Cline MCP settings panel |\n\n\u003cdetails\u003e\n\u003csummary\u003eVS Code (GitHub Copilot) — slightly different key\u003c/summary\u003e\n\nAdd to `.vscode/mcp.json` (note: `\"servers\"` not `\"mcpServers\"`):\n\n```json\n{\n  \"servers\": {\n    \"docs\": {\n      \"command\": \"gnosis-mcp\",\n      \"args\": [\"serve\"]\n    }\n  }\n}\n```\n\nAlso discoverable via the VS Code MCP gallery — search `@mcp gnosis` in the Extensions view.\n\n\u003c/details\u003e\n\n## Transport: Stdio vs HTTP\n\nGnosis supports two MCP transports. Which one you pick changes how sessions connect:\n\n| | Stdio (default) | Streamable HTTP |\n|---|---|---|\n| **Start** | `gnosis-mcp serve` | `gnosis-mcp serve --transport streamable-http` |\n| **Connection** | One parent process owns stdin/stdout | Any number of clients connect via HTTP |\n| **Sharing** | 1:1 — each editor/session spawns its own server | N:1 — one server, many sessions |\n| **State** | DB, file watcher, embeddings per-process | Shared across all clients |\n| **Best for** | Single editor, quick start | Multiple terminals, CI/CD, remote access |\n\n**Why this matters:** Gnosis maintains persistent state — a SQLite/PostgreSQL database, an embedding cache, and (with `--watch`) a file system watcher. With stdio, each editor session spawns a separate server process with its own state. With HTTP, you start the server once and every session shares the same database and watcher.\n\nFor AI coding tools that open multiple sessions (e.g., Claude Code with agent teams, or parallel terminal tabs), HTTP avoids duplicate processes and keeps all sessions reading from the same index:\n\n```json\n{\n  \"mcpServers\": {\n    \"docs\": {\n      \"type\": \"url\",\n      \"url\": \"http://127.0.0.1:8000/mcp\"\n    }\n  }\n}\n```\n\nStart the server separately (or via systemd/Docker):\n\n```bash\ngnosis-mcp serve --transport streamable-http --host 0.0.0.0 --port 8000\n```\n\nStdio MCP servers like `@modelcontextprotocol/server-postgres` are stateless proxies — they forward a SQL query and return results, so per-session spawning is fine. Gnosis is stateful, which is why HTTP transport is the better choice for multi-session setups.\n\n## REST API\n\n\u003e v0.10.0+ — Enable native HTTP endpoints alongside MCP on the same port.\n\n```bash\ngnosis-mcp serve --transport streamable-http --rest\n```\n\nWeb apps can now query your docs over plain HTTP — no MCP protocol required.\n\n| Endpoint | Description |\n|----------|-------------|\n| `GET /health` | Server status, version, doc count |\n| `GET /api/search?q=\u0026limit=\u0026category=` | Search docs (auto-embeds with local provider) |\n| `GET /api/docs/{path}` | Get document by file path |\n| `GET /api/docs/{path}/related` | Get related documents |\n| `GET /api/categories` | List categories with counts |\n| `GET /api/context?topic=\u0026limit=\u0026category=` | Usage-weighted context summary |\n| `GET /api/graph/stats?category=` | Knowledge graph topology |\n\n**Environment variables:**\n\n| Variable | Description |\n|----------|-------------|\n| `GNOSIS_MCP_REST=true` | Enable REST API (same as `--rest`) |\n| `GNOSIS_MCP_CORS_ORIGINS` | CORS allowed origins: `*` or comma-separated list |\n| `GNOSIS_MCP_API_KEY` | Optional Bearer token auth |\n\n**Examples:**\n\n```bash\n# Health check\ncurl http://127.0.0.1:8000/health\n\n# Search\ncurl \"http://127.0.0.1:8000/api/search?q=authentication\u0026limit=5\"\n\n# With API key\ncurl -H \"Authorization: Bearer sk-secret\" \"http://127.0.0.1:8000/api/search?q=setup\"\n```\n\n## Backends\n\n| | SQLite (default) | SQLite + embeddings | PostgreSQL |\n|---|---|---|---|\n| **Install** | `pip install gnosis-mcp` | `pip install gnosis-mcp[embeddings]` | `pip install gnosis-mcp[postgres]` |\n| **Config** | Nothing | Nothing | Set `GNOSIS_MCP_DATABASE_URL` |\n| **Search** | FTS5 keyword (BM25) | Hybrid keyword + semantic (RRF) | tsvector + pgvector hybrid |\n| **Embeddings** | None | Local ONNX (23MB, no API key) | Any provider + HNSW index |\n| **Multi-table** | No | No | Yes (`UNION ALL`) |\n| **Best for** | Quick start, keyword-only | Semantic search without a server | Production, large doc sets |\n\n**Auto-detection:** Set `GNOSIS_MCP_DATABASE_URL` to `postgresql://...` and it uses PostgreSQL. Don't set it and it uses SQLite. Override with `GNOSIS_MCP_BACKEND=sqlite|postgres`.\n\n\u003cdetails\u003e\n\u003csummary\u003ePostgreSQL setup\u003c/summary\u003e\n\n```bash\npip install gnosis-mcp[postgres]\nexport GNOSIS_MCP_DATABASE_URL=\"postgresql://user:pass@localhost:5432/mydb\"\ngnosis-mcp init-db              # create tables + indexes\ngnosis-mcp ingest ./docs/       # load your markdown\ngnosis-mcp serve\n```\n\nFor hybrid semantic+keyword search, also enable pgvector:\n\n```sql\nCREATE EXTENSION IF NOT EXISTS vector;\n```\n\nThen backfill embeddings:\n\n```bash\ngnosis-mcp embed                        # via OpenAI (default)\ngnosis-mcp embed --provider ollama      # or use local Ollama\n```\n\n\u003c/details\u003e\n\n## Claude Code Plugin\n\nFor Claude Code users, install as a plugin to get the MCP server plus slash commands:\n\n```bash\nclaude plugin marketplace add nicholasglazer/gnosis-mcp\nclaude plugin install gnosis\n```\n\nThis gives you:\n\n| Component | What you get |\n|-----------|-------------|\n| **MCP server** | `gnosis-mcp serve` — auto-configured |\n| **`/gnosis:search`** | Search docs with keyword or `--semantic` hybrid mode |\n| **`/gnosis:status`** | Health check — connectivity, doc stats, troubleshooting |\n| **`/gnosis:manage`** | CRUD — add, delete, update metadata, bulk embed |\n\nThe plugin works with both SQLite and PostgreSQL backends.\n\n\u003cdetails\u003e\n\u003csummary\u003eManual setup (without plugin)\u003c/summary\u003e\n\nAdd to `.claude/mcp.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"gnosis\": {\n      \"command\": \"gnosis-mcp\",\n      \"args\": [\"serve\"]\n    }\n  }\n}\n```\n\nFor PostgreSQL, add `\"env\": {\"GNOSIS_MCP_DATABASE_URL\": \"postgresql://...\"}`.\n\n\u003c/details\u003e\n\n## Tools \u0026 Resources\n\nGnosis MCP exposes 9 tools and 3 resources over [MCP](https://modelcontextprotocol.io/). Your AI agent calls these automatically when it needs information from your docs.\n\n| Tool | What it does | Mode |\n|------|-------------|------|\n| `search_docs` | Search by keyword or hybrid semantic+keyword | Read |\n| `get_doc` | Retrieve a full document by path | Read |\n| `get_related` | Find linked/related documents (multi-hop, relation type filtering) | Read |\n| `search_git_history` | Search indexed git commit history | Read |\n| `get_context` | Usage-weighted context summary | Read |\n| `get_graph_stats` | Knowledge graph topology: orphans, hubs, relation distribution | Read |\n| `upsert_doc` | Create or replace a document | Write |\n| `delete_doc` | Remove a document and its chunks | Write |\n| `update_metadata` | Change title, category, tags | Write |\n\nRead tools are always available. Write tools require `GNOSIS_MCP_WRITABLE=true`.\n\n| Resource URI | Returns |\n|-----|---------|\n| `gnosis://docs` | All documents — path, title, category, chunk count |\n| `gnosis://docs/{path}` | Full document content |\n| `gnosis://categories` | Categories with document counts |\n\n### How search works\n\n```bash\n# Keyword search — works on both SQLite and PostgreSQL\ngnosis-mcp search \"stripe webhook\"\n\n# Hybrid search — keyword + semantic (requires [embeddings] or pgvector)\ngnosis-mcp search \"how does billing work\" --embed\n\n# Filtered — narrow results to a specific category\ngnosis-mcp search \"auth\" -c guides\n```\n\nWhen called via MCP, the agent passes a `query` string for keyword search. With embeddings configured, search automatically combines keyword and semantic results using Reciprocal Rank Fusion. Results include a `highlight` field with matched terms in `\u003cmark\u003e` tags.\n\n### Context Loading\n\nThe `get_context` tool provides usage-weighted document summaries — ideal for session startup or \"what matters most?\" queries.\n\n```bash\n# Most-accessed docs (no topic)\nget_context(limit=10)\n\n# Topic-focused with access enrichment\nget_context(topic=\"deployment\", category=\"guides\")\n```\n\nBehind the scenes, Gnosis tracks which documents are accessed via `search_docs` and `get_doc`, then uses access frequency to rank importance. Disable tracking with `GNOSIS_MCP_ACCESS_LOG=false`.\n\n### Graph \u0026 Links\n\nGnosis automatically extracts links from your documentation — both frontmatter `relates_to` declarations and markdown links in content. Use the graph tools to explore connections:\n\n```bash\n# Direct neighbors\nget_related(\"guides/auth.md\")\n\n# Multi-hop traversal (2 levels deep, with titles)\nget_related(\"guides/auth.md\", depth=2, include_titles=True)\n\n# Filter out noisy git history links\nget_related(\"guides/auth.md\", relation_type=\"relates_to\")\n\n# Graph topology: find orphans and hubs\nget_graph_stats()\n```\n\n**Relation types:** `relates_to` (frontmatter), `content_link` (body markdown links), `git_co_change` (commit co-occurrence), `git_ref` (git history → source file), `links_to` (web crawl).\n\n## Embeddings\n\nEmbeddings enable semantic search — finding docs by meaning, not just keywords.\n\n**Local ONNX (recommended)** — zero-config, no API key:\n\n```bash\npip install gnosis-mcp[embeddings]\ngnosis-mcp ingest ./docs/ --embed       # ingest + embed in one step\ngnosis-mcp embed                        # or embed existing chunks separately\n```\n\nUses [MongoDB/mdbr-leaf-ir](https://huggingface.co/MongoDB/mdbr-leaf-ir) (~23MB quantized, Apache 2.0). Auto-downloads on first run.\n\n**Remote providers** — OpenAI, Ollama, or any OpenAI-compatible endpoint:\n\n```bash\ngnosis-mcp embed --provider openai      # requires GNOSIS_MCP_EMBED_API_KEY\ngnosis-mcp embed --provider ollama      # uses local Ollama server\n```\n\n**Pre-computed vectors** — pass `embeddings` to `upsert_doc` or `query_embedding` to `search_docs` from your own pipeline.\n\n## Configuration\n\nAll settings via environment variables. Nothing required for SQLite — it works with zero config.\n\n| Variable | Default | Description |\n|----------|---------|-------------|\n| `GNOSIS_MCP_DATABASE_URL` | SQLite auto | PostgreSQL URL or SQLite file path |\n| `GNOSIS_MCP_BACKEND` | `auto` | Force `sqlite` or `postgres` |\n| `GNOSIS_MCP_WRITABLE` | `false` | Enable write tools |\n| `GNOSIS_MCP_TRANSPORT` | `stdio` | Transport: `stdio`, `sse`, or `streamable-http` |\n| `GNOSIS_MCP_EMBEDDING_DIM` | `1536` | Vector dimension for init-db |\n\n\u003cdetails\u003e\n\u003csummary\u003eAll configuration variables\u003c/summary\u003e\n\n**Database:** `GNOSIS_MCP_SCHEMA` (public), `GNOSIS_MCP_CHUNKS_TABLE` (documentation_chunks), `GNOSIS_MCP_LINKS_TABLE` (documentation_links), `GNOSIS_MCP_SEARCH_FUNCTION` (custom search on PG).\n\n**Search \u0026 chunking:** `GNOSIS_MCP_CONTENT_PREVIEW_CHARS` (200), `GNOSIS_MCP_CHUNK_SIZE` (4000), `GNOSIS_MCP_SEARCH_LIMIT_MAX` (20).\n\n**Connection pool (PostgreSQL):** `GNOSIS_MCP_POOL_MIN` (1), `GNOSIS_MCP_POOL_MAX` (3).\n\n**Webhooks:** `GNOSIS_MCP_WEBHOOK_URL`, `GNOSIS_MCP_WEBHOOK_TIMEOUT` (5s).\n\n**Embeddings:** `GNOSIS_MCP_EMBED_PROVIDER` (openai/ollama/custom/local), `GNOSIS_MCP_EMBED_MODEL`, `GNOSIS_MCP_EMBED_DIM` (384), `GNOSIS_MCP_EMBED_API_KEY`, `GNOSIS_MCP_EMBED_URL`, `GNOSIS_MCP_EMBED_BATCH_SIZE` (50).\n\n**Column overrides:** `GNOSIS_MCP_COL_FILE_PATH`, `GNOSIS_MCP_COL_TITLE`, `GNOSIS_MCP_COL_CONTENT`, `GNOSIS_MCP_COL_CHUNK_INDEX`, `GNOSIS_MCP_COL_CATEGORY`, `GNOSIS_MCP_COL_AUDIENCE`, `GNOSIS_MCP_COL_TAGS`, `GNOSIS_MCP_COL_EMBEDDING`, `GNOSIS_MCP_COL_TSV`, `GNOSIS_MCP_COL_SOURCE_PATH`, `GNOSIS_MCP_COL_TARGET_PATH`, `GNOSIS_MCP_COL_RELATION_TYPE`.\n\n**Logging:** `GNOSIS_MCP_LOG_LEVEL` (INFO).\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eCustom search function (PostgreSQL)\u003c/summary\u003e\n\nDelegate search to your own PostgreSQL function for custom ranking:\n\n```sql\nCREATE FUNCTION my_schema.my_search(\n    p_query_text text,\n    p_categories text[],\n    p_limit integer\n) RETURNS TABLE (\n    file_path text, title text, content text,\n    category text, combined_score double precision\n) ...\n```\n\n```bash\nGNOSIS_MCP_SEARCH_FUNCTION=my_schema.my_search\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eMulti-table mode (PostgreSQL)\u003c/summary\u003e\n\nQuery across multiple doc tables:\n\n```bash\nGNOSIS_MCP_CHUNKS_TABLE=documentation_chunks,api_docs,tutorial_chunks\n```\n\nAll tables must share the same schema. Reads use `UNION ALL`. Writes target the first table.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eCLI reference\u003c/summary\u003e\n\n```\ngnosis-mcp ingest \u003cpath\u003e [--dry-run] [--force] [--embed]    Load files into database\ngnosis-mcp ingest-git \u003crepo\u003e [--since] [--max-commits] [--include] [--exclude] [--dry-run] [--embed]\ngnosis-mcp crawl \u003curl\u003e [--sitemap] [--depth N] [--include] [--exclude] [--dry-run] [--force] [--embed]\ngnosis-mcp serve [--transport stdio|sse|streamable-http] [--ingest PATH] [--watch PATH]\ngnosis-mcp search \u003cquery\u003e [-n LIMIT] [-c CAT] [--embed]    Search docs\ngnosis-mcp stats                                           Document, chunk, and embedding counts\ngnosis-mcp check                                           Verify DB connection + sqlite-vec\ngnosis-mcp embed [--provider P] [--model M] [--dry-run]    Backfill embeddings\ngnosis-mcp init-db [--dry-run]                             Create tables + indexes\ngnosis-mcp export [-f json|markdown|csv] [-c CAT]          Export documents\ngnosis-mcp diff \u003cpath\u003e                                     Preview changes on re-ingest\ngnosis-mcp cleanup [--days N]                              Purge old access log entries\ngnosis-mcp fix-link-types                                  Migrate git-history links to proper types\n```\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eHow ingestion works\u003c/summary\u003e\n\n`gnosis-mcp ingest` scans a directory for supported files and loads them into the database:\n\n- **Multi-format** — Markdown native; `.txt`, `.ipynb`, `.toml`, `.csv`, `.json` auto-converted. Optional: `.rst` (`[rst]` extra), `.pdf` (`[pdf]` extra)\n- **Smart chunking** — splits by H2 headings (H3/H4 for oversized sections), never splits inside code blocks or tables\n- **Frontmatter** — extracts `title`, `category`, `audience`, `tags` from YAML frontmatter\n- **Auto-linking** — `relates_to` in frontmatter creates bidirectional links for `get_related`\n- **Auto-categorization** — infers category from parent directory name\n- **Incremental** — content hashing skips unchanged files (`--force` to override)\n- **Watch mode** — `gnosis-mcp serve --watch ./docs/` auto-re-ingests on changes\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003eArchitecture\u003c/summary\u003e\n\n```\nsrc/gnosis_mcp/\n├── backend.py         DocBackend protocol + create_backend() factory\n├── pg_backend.py      PostgreSQL — asyncpg, tsvector, pgvector\n├── sqlite_backend.py  SQLite — aiosqlite, FTS5, sqlite-vec hybrid search (RRF)\n├── sqlite_schema.py   SQLite DDL — tables, FTS5, triggers, vec0 virtual table\n├── config.py          Config from env vars, backend auto-detection\n├── db.py              Backend lifecycle + FastMCP lifespan\n├── server.py          FastMCP server — 9 tools, 3 resources, auto-embed queries\n├── ingest.py          File scanner + converters — multi-format, smart chunking\n├── crawl.py           Web crawler — sitemap/BFS, robots.txt, ETag caching\n├── parsers/           Non-file ingest sources (git history, future: schemas)\n│   └── git_history.py Git log → markdown documents per file\n├── watch.py           File watcher — mtime polling, auto-re-ingest\n├── schema.py          PostgreSQL DDL — tables, indexes, search functions\n├── embed.py           Embedding providers — OpenAI, Ollama, custom, local ONNX\n├── local_embed.py     Local ONNX embedding engine — HuggingFace model download\n└── cli.py             CLI — serve, ingest, crawl, search, embed, stats, check, cleanup\n```\n\n\u003c/details\u003e\n\n## Available On\n\n[MCP Registry](https://registry.modelcontextprotocol.io) (feeds VS Code MCP gallery and GitHub Copilot) · [PyPI](https://pypi.org/project/gnosis-mcp/) · [mcp.so](https://mcp.so) · [Glama](https://glama.ai) · [cursor.directory](https://cursor.directory)\n\n## AI-Friendly Docs\n\n| File | Purpose |\n|------|---------|\n| [`llms.txt`](llms.txt) | Quick overview — what it does, tools, config |\n| [`llms-full.txt`](llms-full.txt) | Complete reference in one file |\n| [`llms-install.md`](llms-install.md) | Step-by-step installation guide |\n\n## Performance\n\nBenchmarked on SQLite (in-memory) with keyword search (FTS5 + BM25):\n\n| Corpus | QPS | p50 | p95 | p99 | Hit Rate |\n|--------|-----|-----|-----|-----|----------|\n| 100 docs / 300 chunks | ~9,800 | 0.09ms | 0.16ms | 0.18ms | 100% |\n| 500 docs / 1,500 chunks | ~3,500 | 0.24ms | 0.51ms | 0.82ms | 100% |\n\nInstall size: ~23MB with `[embeddings]` (ONNX model). Base install is ~5MB.\n\nRun the benchmark yourself:\n\n```bash\npython tests/bench/bench_search.py                # 100 docs, 1000 queries\npython tests/bench/bench_search.py --docs 500     # larger corpus\npython tests/bench/bench_search.py --json          # machine-readable output\n```\n\n599+ tests, 10 eval cases (90% hit rate, 0.85 MRR on sample corpus). All tests run without a database.\n\n## Development\n\n```bash\ngit clone https://github.com/nicholasglazer/gnosis-mcp.git\ncd gnosis-mcp\npython -m venv .venv \u0026\u0026 source .venv/bin/activate\npip install -e \".[dev]\"\npytest                    # 599+ tests, no database needed\nruff check src/ tests/\n```\n\nAll tests run without a database. Keep it that way.\n\nGood first contributions: new embedding providers, export formats, ingestion for new file types (via optional extras). Open an issue first for larger changes.\n\n## Sponsors\n\nIf Gnosis MCP saves you time, consider [sponsoring the project](https://github.com/sponsors/nicholasglazer).\n\n## License\n\n[MIT](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicholasglazer%2Fgnosis-mcp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnicholasglazer%2Fgnosis-mcp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnicholasglazer%2Fgnosis-mcp/lists"}