{"id":47178795,"url":"https://github.com/dumkydewilde/mcp-memory-layer","last_synced_at":"2026-03-13T07:04:31.697Z","repository":{"id":343676768,"uuid":"1153101059","full_name":"dumkydewilde/mcp-memory-layer","owner":"dumkydewilde","description":"A template for building your own BI MCP with dbt, LLMs and multi-user corrections","archived":false,"fork":false,"pushed_at":"2026-03-11T09:37:29.000Z","size":279,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-11T16:34:54.105Z","etag":null,"topics":["bi","data","dbt","llm","mcp-server"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dumkydewilde.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-08T22:26:34.000Z","updated_at":"2026-03-11T09:37:33.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/dumkydewilde/mcp-memory-layer","commit_stats":null,"previous_names":["dumkydewilde/mcp-memory-layer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/dumkydewilde/mcp-memory-layer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumkydewilde%2Fmcp-memory-layer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumkydewilde%2Fmcp-memory-layer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumkydewilde%2Fmcp-memory-layer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumkydewilde%2Fmcp-memory-layer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dumkydewilde","download_url":"https://codeload.github.com/dumkydewilde/mcp-memory-layer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dumkydewilde%2Fmcp-memory-layer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30460818,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-13T06:34:02.089Z","status":"ssl_error","status_checked_at":"2026-03-13T06:33:49.182Z","response_time":60,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bi","data","dbt","llm","mcp-server"],"created_at":"2026-03-13T07:04:21.249Z","updated_at":"2026-03-13T07:04:31.692Z","avatar_url":"https://github.com/dumkydewilde.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MCP Memory Layer for Text-to-SQL\n\nAn [MCP](https://modelcontextprotocol.io/) server that wraps a DuckDB/MotherDuck data warehouse with a **memory layer** — corrections, dbt model context, and query popularity tracking — so LLMs write better SQL on the first try.\n\nComes with a ready-to-run **jaffle_shop** example and an **evaluation framework** for A/B testing memory features.\n\n## The problem\n\nLLMs generating SQL against a data warehouse hit the same mistakes over and over:\n\n- **Naming traps** — `raw_orders.customer` vs `stg_orders.customer_id`, amounts in cents vs dollars\n- **Stale tables** — denormalized snapshots that look useful but have incomplete data\n- **Missing business context** — column descriptions, upstream lineage, and pre-computed metrics that the raw schema doesn't reveal\n- **Reinventing joins** — ignoring common access patterns that dbt already optimized\n\nEach mistake costs a round-trip: the LLM writes bad SQL, gets an error, tries again. With complex schemas (100+ models), these round-trips add up fast.\n\n## How it works\n\nThe memory layer sits between the LLM and the database, providing three types of context that raw schema metadata can't:\n\n### 1. Corrections\n\nA version-controlled JSON file of schema \"gotchas\" — things an LLM can't infer from `DESCRIBE` alone:\n\n```\n\"raw_orders amounts (subtotal, tax_paid, order_total) are stored in CENTS.\n Use stg_orders which converts to dollars.\"\n```\n\n```\n\"AVOID the daily_revenue table — it is a stale snapshot covering only\n 3 of 6 locations. Use the orders table instead.\"\n```\n\nCorrections are matched to incoming questions by table name, column name, and keyword overlap. The LLM calls `get_corrections` before writing SQL and gets the top 3 relevant tips.\n\nNew corrections can be saved during a session (`save_correction`) when the LLM discovers something non-obvious — building institutional knowledge over time.\n\n### 2. dbt model context\n\nParses the dbt `manifest.json` to serve column descriptions, upstream lineage, tests, and (truncated) SQL — the same curated metadata your analytics engineers wrote, delivered as small token-efficient slices rather than dumping the entire catalog.\n\nTools: `list_dbt_models`, `get_dbt_context`, `get_model_sql`\n\n### 3. Query popularity tracking\n\nRecords which tables, columns, and join patterns are actually used in queries. Over time this builds a picture of common access patterns:\n\n```\norders: queried 47 times\n  → commonly joined to customers ON customer_id (INNER, 32x)\n  Popular columns: order_total (select), ordered_at (where)\n```\n\nThis steers the LLM toward proven patterns instead of inventing joins from scratch.\n\n## Quick start\n\nThe repo includes a complete jaffle_shop example — a DuckDB database with dbt models, pre-seeded corrections, and popularity data.\n\n```bash\nuv sync\nuv run mcp-memory\n```\n\n### Claude Desktop configuration\n\n```json\n{\n  \"mcpServers\": {\n    \"memory-layer\": {\n      \"command\": \"uv\",\n      \"args\": [\"run\", \"--directory\", \"/path/to/mcp-memory-layer\", \"mcp-memory\"]\n    }\n  }\n}\n```\n\n### Environment variables\n\n| Variable | Default | Description |\n|---|---|---|\n| `MCP_MEMORY_DATA_DIR` | `data/` | Base directory for data files |\n| `MCP_MEMORY_DUCKDB_PATH` | `data/jaffle_shop/jaffle_shop.duckdb` | DuckDB database path |\n| `MCP_MEMORY_MANIFEST_PATH` | `dbt_project/target/manifest.json` | dbt manifest.json path |\n| `MCP_MEMORY_CORRECTIONS_PATH` | `data/corrections.json` | Corrections JSON path |\n| `MCP_MEMORY_POPULARITY_DB` | `data/popularity.duckdb` | Popularity tracking database |\n| `MCP_MEMORY_CORRECTIONS` | `true` | Enable/disable corrections |\n| `MCP_MEMORY_DBT` | `true` | Enable/disable dbt context |\n| `MCP_MEMORY_POPULARITY` | `true` | Enable/disable popularity tracking |\n\n## Evaluation framework\n\nThe `eval/` directory contains an A/B testing harness that measures how each memory feature affects SQL quality:\n\n```bash\n# Compare baseline (no memory) vs all features\nuv run python -m eval.harness --config baseline --api openai\nuv run python -m eval.harness --config all_features --api openai\n\n# Generate comparison report\nuv run python -m eval.report eval/results/baseline.json eval/results/all_features.json\n```\n\nConfigurations: `baseline`, `corrections`, `dbt`, `popularity`, `all_features`\n\nQuestions include \"dead-end traps\" — stale tables that look correct but produce wrong results. These specifically test whether corrections can prevent the LLM from falling into schema traps.\n\n## Bring your own project\n\nThe memory layer works with any DuckDB/dbt project — not just the bundled jaffle_shop demo.\n\n### Quick setup\n\n```bash\n# Initialize config directory (~/.mcp-memory/)\nmcp-memory-cli init \\\n  --duckdb-path /path/to/your/database.duckdb \\\n  --manifest-path /path/to/your/dbt/target/manifest.json\n```\n\nThis creates:\n- `~/.mcp-memory/config.toml` — paths and feature flags\n- `~/.mcp-memory/corrections.json` — empty corrections store (grows as you use it)\n\nThen run the server:\n\n```bash\nmcp-memory    # reads config from ~/.mcp-memory/config.toml\n```\n\n### Connecting your dbt manifest\n\nThe `manifest` path supports multiple sources. The server resolves the manifest on startup:\n\n```toml\n[paths]\n# Local file — point to your dbt project's target directory\nmanifest = \"/path/to/your/dbt/target/manifest.json\"\n\n# URL — S3 presigned URL, GCS signed URL, or any HTTP endpoint\nmanifest = \"https://my-bucket.s3.amazonaws.com/dbt/manifest.json\"\n```\n\n**Local file** is the simplest: run `dbt compile` (or `dbt build`) and point to `target/manifest.json`.\n\n**URL** is useful for teams: upload the manifest to a shared bucket as part of your dbt CI/CD pipeline (e.g. `dbt build \u0026\u0026 aws s3 cp target/manifest.json s3://...`). The server caches fetched manifests locally and re-fetches when the cache is older than 1 hour.\n\n### Config file\n\nInstead of env vars, you can configure everything in `~/.mcp-memory/config.toml`:\n\n```toml\n[paths]\nduckdb = \"/path/to/your/database.duckdb\"\nmanifest = \"/path/to/your/dbt/target/manifest.json\"\ncorrections = \"~/.mcp-memory/corrections.json\"\n# popularity_db = \"~/.mcp-memory/popularity.duckdb\"\n\n[features]\nquery = true\ncorrections = true\ndbt = true\npopularity = true\n```\n\nPrecedence: **env vars \u003e config.toml \u003e defaults**. You can mix both — use the config file for stable paths and env vars for overrides.\n\n### What you need\n\n| Component | Required? | Notes |\n|-----------|-----------|-------|\n| DuckDB database | Yes | Local `.duckdb` file |\n| dbt manifest | Recommended | Local file or URL. Without it, `list_dbt_models` and `get_dbt_context` are disabled. |\n| corrections.json | No | Starts empty, grows via `save_correction` tool calls |\n| popularity seed | No | Popularity tracking auto-populates from real queries |\n\n### Without dbt\n\nIf you don't use dbt, set `dbt = false` in config (or `MCP_MEMORY_DBT=false`). The server runs with just the query tool + corrections + popularity tracking. You can still save corrections about your schema and benefit from popularity-based join suggestions.\n\n## Why a semantic/memory layer for MCP?\n\nMCP gives LLMs access to tools. But tools alone aren't enough — an LLM with `execute_query` and `list_tables` will still write bad SQL against an unfamiliar schema because:\n\n1. **Schema metadata is necessary but not sufficient.** Column names and types tell you *what exists*, not *how to use it correctly*. That `amounts are in cents` insight? It's not in the schema. It's in someone's head, a Slack thread, or a dbt description that the LLM never sees.\n\n2. **LLMs don't learn from their mistakes within a session.** If an LLM hits a naming trap in turn 1, it has no mechanism to avoid it in turn 10 (or in the next conversation). Corrections make that learning persistent and shareable.\n\n3. **dbt already solved the documentation problem.** Analytics teams invest heavily in documenting models — column descriptions, tests, lineage. A semantic layer surfaces that work directly to the LLM, rather than having it guess from raw `information_schema`.\n\n4. **Usage patterns encode tribal knowledge.** Which tables do people actually query? What joins work? Popularity tracking captures the implicit knowledge that experienced analysts have but schemas don't express.\n\nThe memory layer turns one-shot SQL generation into an iterative, self-improving system. Each session can contribute corrections. Each query refines popularity stats. The more you use it, the better it gets.\n\n## Development\n\n```bash\nuv sync                    # install dependencies\nuv run pytest              # run tests\nuv run ruff check .        # lint\n```\n\nRequires Python \u003e= 3.11. Uses `sqlglot` for SQL parsing (DuckDB dialect) and `FastMCP` for the MCP server.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdumkydewilde%2Fmcp-memory-layer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdumkydewilde%2Fmcp-memory-layer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdumkydewilde%2Fmcp-memory-layer/lists"}