{"id":50331625,"url":"https://github.com/kanenasingreece/shared_memory","last_synced_at":"2026-05-29T10:00:46.848Z","repository":{"id":359602637,"uuid":"1246621365","full_name":"KanenasInGreece/Shared_Memory","owner":"KanenasInGreece","description":"Three-tier semantic memory for local AI agents — Claude Code, Gemini CLI, and LM Studio sharing one persistent brain (Postgres/pgvector + Neo4j + BGE-M3).","archived":false,"fork":false,"pushed_at":"2026-05-22T15:13:16.000Z","size":218,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T20:17:53.146Z","etag":null,"topics":["ai-agents","claude-code","embeddings","gemini-cli","knowledge-graph","lm-studio","local-ai","mcp","memory","neo4j","pgvector","rag"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KanenasInGreece.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-22T11:30:47.000Z","updated_at":"2026-05-22T15:13:20.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/KanenasInGreece/Shared_Memory","commit_stats":null,"previous_names":["kanenasingreece/shared_memory"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/KanenasInGreece/Shared_Memory","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KanenasInGreece%2FShared_Memory","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KanenasInGreece%2FShared_Memory/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KanenasInGreece%2FShared_Memory/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KanenasInGreece%2FShared_Memory/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KanenasInGreece","download_url":"https://codeload.github.com/KanenasInGreece/Shared_Memory/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KanenasInGreece%2FShared_Memory/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33646428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai-agents","claude-code","embeddings","gemini-cli","knowledge-graph","lm-studio","local-ai","mcp","memory","neo4j","pgvector","rag"],"created_at":"2026-05-29T10:00:27.055Z","updated_at":"2026-05-29T10:00:46.815Z","avatar_url":"https://github.com/KanenasInGreece.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Shared Memory Framework\n\n**A local, private, shared brain for every AI agent on your workstation.**\nEvery insight one agent gains is available to every other — across sessions, across tools, across models. Knowledge stays yours.\n\nA unified semantic and relational memory layer built from first principles to survive the interference problem and scale safely to concurrent multi-agent workloads.\n\n![Claude Code](https://img.shields.io/badge/Claude_Code-Skill-blue)\n![Codex CLI](https://img.shields.io/badge/Codex_CLI-Skill-blue)\n![Grok](https://img.shields.io/badge/Grok-Skill-blue)\n![Gemini CLI](https://img.shields.io/badge/Gemini_CLI-Skill-blue)\n![LM Studio](https://img.shields.io/badge/LM_Studio-MCP-blue)\n![Neo4j](https://img.shields.io/badge/Neo4j-Graph-green)\n![Postgres](https://img.shields.io/badge/Postgres%2Bpgvector-Vector-green)\n![BGE-M3](https://img.shields.io/badge/BGE--M3-1024--dim-purple)\n![Local \u0026 Private](https://img.shields.io/badge/Local_%26_Private-yes-success)\n\n---\n\n## Table of Contents\n\n1. [The Vision: One Brain, Many Agents](#1-the-vision-one-brain-many-agents)\n2. [The Problem: Why RAG Systems Forget](#2-the-problem-why-rag-systems-forget)\n3. [Architecture Overview: Three Tiers](#3-architecture-overview-three-tiers)\n4. [OS Prerequisites — Fedora / Linux](#4-os-prerequisites--fedora--linux)\n5. [Infrastructure Setup: Docker Compose](#5-infrastructure-setup-docker-compose)\n6. [Database Schema](#6-database-schema)\n7. [Inference Backends (llama.cpp)](#7-inference-backends-llamacpp)\n8. [The Hive-Mind Gateway: Why It Exists](#8-the-hive-mind-gateway-why-it-exists)\n9. [Starting the Full Stack](#9-starting-the-full-stack)\n10. [Agent Integration: First-Time Setup](#10-agent-integration-first-time-setup)\n11. [Agent Access: CLI and MCP](#11-agent-access-cli-and-mcp)\n12. [The Save Path — From Artifact to Memory](#12-the-save-path--from-artifact-to-memory)\n13. [The Sleep Cycle — Consolidation](#13-the-sleep-cycle--consolidation)\n14. [Audit Logging](#14-audit-logging)\n15. [Retrieval: Three-Tier Lookup](#15-retrieval-three-tier-lookup)\n16. [LM Studio MCP Configuration](#16-lm-studio-mcp-configuration)\n17. [Testing](#17-testing)\n18. [Open Problems](#18-open-problems)\n19. [Development Roadmap — Multi-Agent Safe Workstation](#19-development-roadmap--multi-agent-safe-workstation)\n20. [References](#20-references)\n\n---\n\n## 1. The Vision: One Brain, Many Agents\n\nEvery AI workstation today runs several tools in parallel — a terminal agent, a desktop chat model, a coding assistant. Each of them works hard in a session, reasons through a problem, discovers something useful. Then the session ends, and all of that is gone. The next tool starts cold, the next session starts from zero. They do not talk to each other. They cannot.\n\nThis framework is built around one idea: those tools should share a brain. When Gemini CLI figures out why the proxy was failing, any other agent should already know the next time it is asked about the proxy. When LM Studio runs a consolidation on a set of architectural facts, those summaries should be there for any agent that searches next.\n\n**The consumers, and how they connect:**\n\n- **Claude Code** — uses `memory_bridge.py` packaged as a Claude skill (`/shared-memory`). Install the skill directory under `~/.claude/skills/`.\n\n- **Codex CLI** — uses `memory_bridge.py` packaged as a Codex skill (`$shared-memory`). Install the skill directory under `~/.codex/skills/`. SKILL.md frontmatter enables implicit invocation when the task description matches.\n\n- **Grok** — uses `memory_bridge.py` packaged as a Grok skill (`/shared-memory`). Install the skill directory under `~/.grok/skills/`.\n\n- **Gemini CLI** — uses `memory_bridge.py` packaged as a Gemini skill (`/activate shared-memory`). Install the skill directory under `~/.gemini/skills/`.\n\n- **LM Studio** — uses an MCP server (`vector-skill.py`), registered in `mcp.json`. The model calls `save_artifact` and `hybrid_search_and_rerank` as tools against the same backend.\n\nThe infrastructure underneath all agents is identical: one coordinator managing all Postgres and Neo4j connections, one embedding space enforced by BGE-M3, one consolidation daemon synthesising shared narratives. The agents differ; the memory layer does not.\n\nThe design is intentionally agent-agnostic: any tool that can make HTTP calls can reach the coordinator directly on port 8888. Adding a new agent type is a matter of packaging — not changing the backend.\n\n### Three diagnostic tests\n\nVishakha Gupta's *AI Memory \u0026 Cognition: The Architect's Playbook* (ApertureData, May 2026) proposes three questions that any serious AI memory system must be able to answer. They are reproduced here with the current state of this framework's answers — updated with every release.\n\n**The Retrieval Test:** *Can the agent explain why it retrieved a specific memory? Not just what was retrieved, but which specific context, session, and principal metadata informed the decision.*\n\n\u003e As of v0.3.4: search results carry `tier` (fact | community_summary), `score_normalized` (sigmoid of raw reranker logit → [0, 1]), `matched_entities` (intersection of the query string against the saved entity list), and `graph_context` as a structured list of `{rel_type, name, label}` triples. An agent can reason: *\"I returned a Tier-3 community synthesis — normalized score 0.91, matching entity OutboxPattern — alongside two Tier-1 precision hits.\"* **Gap remaining:** retrieval events are not yet audited (no record of who searched, when, from which agent); cross-encoder span attribution is not yet exposed.\n\n**The Consolidation Test:** *When the agent learns something new, does the system update a coherent knowledge base, or does it just accumulate versions? After six months, do you have one \"truth\" or three conflicting ones?*\n\n\u003e As of v0.3.4: one row per entity, not three. The `community_summaries` table uses `ON CONFLICT (metadata-\u003e\u003e'entity') DO UPDATE` — each consolidation cycle replaces the row with a cumulatively synthesised narrative (the LLM receives the prior summary as context). `summary_history JSONB` (migration 004) records the previous N versions before each overwrite, enabling drift auditing. The consolidation daemon now uses `AsyncGraphDatabase` + `loop.run_in_executor()` — `LISTEN/NOTIFY` signals are no longer dropped under write bursts. **Gap remaining:** consolidation is per-entity; two summaries for overlapping entities may diverge slightly if the LLM synthesises them in separate calls. No cross-entity reconciliation step yet.\n\n**The Lineage Test:** *Can I trace a decision back to the original source — the raw image, the specific video frame, or the precise document page — or just the text summary extracted from it?*\n\n\u003e As of v0.3.4: decisions trace fully to human (`WAS_ATTRIBUTED_TO`), AI agent (`WAS_ASSISTED_BY`), and project (`PROJECT_OF`). Community summaries link back to their source facts via `source_pg_ids`. The optional `source_ref` metadata key (e.g. `\"design-doc.pdf#p12\"`, `\"meeting.mp4@00:04:32\"`) propagates through the coordinator to the `Fact` Neo4j node. `HAD_OUTCOME` self-loop edges on Decision nodes close the forward trace: decision → outcome → rating + notes. `/memory/graph` enforces read-only access at the driver level (`default_access_mode=\"READ\"`); outbox rows cannot be double-processed during restart (atomic `in_progress` claim + startup recovery). **Gap remaining:** `source_ref` is not enforced — agents supply it when they can. No back-edge yet from a raw `Fact` to the `Decision` it influenced (planned for a later phase).\n\n### What we are building toward\n\nBeyond storing facts, the framework is evolving to answer questions that no other tool on your workstation can answer today:\n\n\u003e *\"Who decided on a consolidator on project shared\\_memory, when, and was that a good decision?\"*\n\nTarget answer shape: *\"Xenofon, using Claude Code, decided that project shared\\_memory should have a consolidator — to simulate dreaming — on 2026-05-20. The related document is ADR-001. He was using Postgres with pgvector as an outbox to achieve non-blocking Neo4j writes, giving optional consistency guarantees on Neo4j and hard guarantees on Postgres. Retrospective as of 2026-05-28: good — held up under multi-agent load.\"*\n\nThis requires not just storing knowledge, but storing **who decided what, with which tool, in which context, and whether it held up**. It requires a provenance layer with first-class nodes for people, AI agents, projects, and decisions — not just facts.\n\n### The signal we are saving\n\nThe governing rule: **save what GitHub cannot tell you.** Code is on GitHub. Git blame gives you what changed and when. What is permanently lost without explicit capture:\n\n| Save — signal | Skip — noise |\n|---|---|\n| Why a decision was made + alternatives rejected | The code that resulted from it |\n| What was known / unknown at decision time | Raw web search results |\n| Who participated and with which AI tool | Debug output, stack traces |\n| Milestones + the context that made them significant | Test results (unless they caused a decision) |\n| Retrospectives: was the decision right after N weeks? | Health checks, routine saves |\n| Abandoned approaches and why they were dropped | Intermediate build artifacts |\n\nEvery memory save should answer at least one of: **Who? Why? What was rejected? Was it right?**\n\n### Saving everything vs. saving what matters\n\nThis distinction is not cosmetic — it directly determines what you can query later.\n\nIf you adopt a \"save everything\" policy (logs, test output, status checks, raw search results), the shared memory fills with low-signal noise. Consolidation groups semantically similar content into community summaries, so noise consolidates into more noise: you end up with thematic summaries of debug sessions rather than thematic summaries of decisions. Retrieval accuracy degrades because high-density noisy clusters crowd out the sparse, high-signal facts.\n\n**What you can query with disciplined saves:**\n\n```\n# Who decided, when, under what conditions, and with which tool?\n\"Who decided to use an outbox for Neo4j writes on the shared_memory project?\"\n→ Xenofon, using Claude Code, on 2026-05-20.\n   Condition at the time: Neo4j had no native async write path compatible with asyncpg.\n   Rationale: non-blocking — Postgres guarantees hard consistency, Neo4j is eventual.\n   Alternatives rejected: synchronous writes (too slow), no Neo4j (lost graph queries).\n\n# Provenance chain — who + what AI assisted\n\"What decisions did Claude Code assist with on project shared_memory?\"\n→ Decision: Add outbox-as-WAL for Neo4j writes (2026-05-20)\n   Decision: Use FOREACH over UNWIND for empty-list safety in Cypher (2026-05-28)\n   Decision: Add consolidation daemon as a dreaming analogue (2026-05-20)\n\n# Reasoning behind a specific approach\n\"Why does the coordinator use FOREACH instead of UNWIND?\"\n→ UNWIND produces zero rows for an empty list — the write silently drops.\n   FOREACH handles empty lists safely. Saved 2026-05-28 by Claude Code.\n\n# What was abandoned and why\n\"What embedding models were considered before BGE-M3?\"\n→ MiniLM-384: rejected — too few dimensions for cross-agent coherence.\n   BGE-base-768: evaluated — acceptable, not best-in-class for multilingual.\n   BGE-M3-1024: selected — highest multilingual retrieval quality in class.\n\n# Was a past decision successful? (Phase C — retrospectives)\n\"Was the outbox-as-WAL approach a good decision for the shared_memory project?\"\n→ Retrospective 2026-06-15 (rating: 8/10): held up under multi-agent load.\n   Note: outbox replay on crash worked correctly; Neo4j lag \u003c 200 ms typical.\n   Suggested follow-up: add TTL pruning for applied rows \u003e 30 days.\n\n# Phase A (done): who decided + which AI + which project + why\n# Phase C (done): outcomes, retrospectives, was it right after N weeks?\n```\n\n**What you cannot query if you save noise:**\n\n```\n# Only works if the reasoning was explicitly saved\n\"Why was the consolidation threshold set to 5 facts?\"\n→ No result — this tuning choice was never recorded with rationale.\n   Fix: save a decision with rationale when the threshold is next changed.\n\n# Transient runtime state is never here\n\"What did the health check return yesterday at 14:30?\"\n→ Not in memory. Routine health checks are not saved — check Prometheus or logs.\n\n# Current code state lives in Git, not memory\n\"What is the current value of DENSITY_THRESHOLD in consolidation_loop.py?\"\n→ Read the file. Memory holds decisions about code, not code itself.\n\n# Retrospectives require save_retrospective (Phase C — now available)\n\"Was the BGE-M3 selection the right call?\"\n→ No retrospective saved yet. Use save_retrospective --pg-id \u003cid\u003e --rating high --notes \"...\"\n```\n\nThe governing heuristic: **if you can get the answer in 3 seconds from `git log`, `grep`, or `cat`, don't save it here.** Memory is for context that evaporates without capture — the why behind a decision, the options that were weighed, the outcome after the fact.\n\n### Local mounts — your work stays yours\n\nBoth databases are deployed via Docker Compose with host-mounted volumes. The data lives on your filesystem, not inside a container — you can back it up with any standard tool, and a container restart or upgrade does not lose what you have accumulated.\n\n```yaml\n# Postgres data on the host filesystem — survives container rebuilds\nvolumes:\n  - /your/databases/postgres/data:/var/lib/postgresql/data:z\n\n# Neo4j data on the host filesystem — same guarantee\nvolumes:\n  - /your/databases/neo4j/data:/data:z\n```\n\n\u003e **Note for Fedora/RHEL users:** The `:z` suffix is required — it sets the SELinux label so the container process can read and write the host directory. Without it, Neo4j and Postgres fail silently.\n\n### The binding element: 1024-dimensional BGE-M3\n\nWhat makes the three tools a unified memory system rather than three separate stores is the embedding model. Every vector in the system — saved by Gemini CLI, saved by LM Studio, saved by any CLI agent, re-embedded by the consolidation daemon — was generated by the same BGE-M3 instance through the same gateway. The coordinate system is shared. Cosine similarity between a vector one agent saved last Tuesday and a query another agent is making right now is a genuine semantic comparison.\n\n---\n\n## 2. The Problem: Why RAG Systems Forget\n\nThe common assumption in RAG architectures is that you can save everything and the vector database will sort it out. This assumption has been formally disproved.\n\nBarman et al. (2026) in *\"The Geometry of Forgetting\"* expose what they call the **Dimensionality Illusion**: BGE-M3 is nominally 1024-dimensional but concentrates its variance in approximately 16 effective dimensions — a figure that holds across MiniLM at 384 dimensions and BGE-base at 768 as well, regardless of what the model card claims.\n\nAn agent navigating that space is not moving through a vast semantic landscape. It is moving through a narrow corridor, and every new memory saved into the same neighborhood is another body crowding that corridor. Retrieval accuracy does not dip gradually — it degrades as a power law with database size, driven by the mechanism the paper names: **semantic interference**. You are most vulnerable where you would expect to gain the most value from your memory.\n\nThis is the problem the Shared Memory Framework is designed to address. The solution has three parts: a dual-store architecture that separates episodic from structural memory, a consolidation loop that synthesises high-density clusters into a thematic semantic tier before interference pressure accumulates, and a single shared embedding space enforced across all agents.\n\n\u003e **The biological parallel:** The **Complementary Learning Systems** hypothesis (McClelland, McNaughton \u0026 O'Reilly, 1995) proposes that the hippocampus holds fast, episodic, pattern-separated traces while the neocortex extracts slow statistical patterns across episodes — abstract, generalizable, thematic. This transfer happens primarily during offline states, including sleep. The architecture here implements the same division: Neo4j as the hippocampus, `community_summaries` as the neocortex, and the consolidation daemon as the sleep cycle.\n\n---\n\n## 3. Architecture Overview: Three Tiers\n\n```\n┌──────────────────────────────────────────────────────────────────────────┐\n│                              AGENT LAYER                                 │\n│                                                                          │\n│  Claude Code   Grok    Gemini CLI   LM Studio (MCP)    Any HTTP          │\n│  (skill)       (skill) (skill)      vector-skill.py    client            │\n│  memory_bridge.py ←→ memory_bridge.py ←→ memory_bridge.py               │\n└─────────────┬──────────────────────┬─────────────────────┬──────────────┘\n              │                      │                     │\n              └──────────────────────▼─────────────────────┘\n                                     │ HTTP (all memory ops)\n                         ┌───────────▼────────────────────────┐\n                         │  Hive-Mind Gateway + Coordinator   │\n                         │  hive_mind_proxy.py  :8888         │\n                         │                                    │\n                         │  /memory/save   → coordinator.py  │\n                         │  /memory/search → coordinator.py  │\n                         │  /memory/graph  → coordinator.py  │\n                         │  /v1/embeddings → :8070 (BGE-M3)  │\n                         │  /v1/reranking  → :8071            │\n                         │  default        → :5000 (LLM)     │\n                         └────────┬──────────────────────────┘\n                                  │ spawns\n                         ┌────────▼───────────────┐\n                         │  Consolidation Daemon  │\n                         │  consolidation_loop.py │\n                         │  LISTEN new_artifact   │\n                         └────────┬───────────────┘\n                                  │ writes\n              ┌───────────────────▼───────────────────────────┐\n              │                MEMORY LAYER                    │\n              │                                                │\n              │  ┌──────────────────────┐  ┌───────────────┐  │\n              │  │  PostgreSQL+pgvector │  │    Neo4j      │  │\n              │  │                      │  │               │  │\n              │  │ technical_docs       │  │ Fact nodes    │  │\n              │  │  (Tier 1 — Episodic) │  │ Entity hubs   │  │\n              │  │                      │  │ MENTIONS edges│  │\n              │  │ community_summaries  │  │ CommunitySumm │  │\n              │  │  (Tier 3 — Semantic) │  │ SUMMARIZED_BY │  │\n              │  │                      │  └───────────────┘  │\n              │  │ neo4j_outbox         │                      │\n              │  │  (coordinator WAL)   │                      │\n              │  └──────────────────────┘                      │\n              └────────────────────────────────────────────────┘\n```\n\n| Tier | Store | Role | Biological Analogy |\n|---|---|---|---|\n| **1 — Episodic** | `technical_docs` (Postgres + pgvector) | Original facts, full content, surgical precision via cosine similarity | Hippocampus — fast, specific, pattern-separated |\n| **2 — Structural** | Neo4j `Fact` nodes (keyed by `pg_id`) | Relationships, provenance, `consolidated` flag, Entity hubs | Hippocampus — relational context cosine similarity cannot express |\n| **3 — Semantic** | `community_summaries` (Postgres + pgvector) | Consolidated thematic narratives; queried first on retrieval | Neocortex — slow, abstract, statistical regularities across episodes |\n\n**Retrieval always queries Tier 3 first** (thematic orientation), then Tier 1 (surgical precision), then expands through Neo4j (relational context). Artifacts saved by one agent become retrievable by all others once the consolidation daemon runs.\n\n---\n\n## 4. OS Prerequisites — Fedora / Linux\n\nAn agentic workstation running Neo4j, Postgres, LM Studio, and multiple MCP servers creates many more filesystem watchers than a standard desktop. Fedora's default kernel limits will cause failures under this load.\n\n### Raise inotify limits\n\n```bash\n# Create a persistent sysctl override\necho \"fs.inotify.max_user_instances=1024\" | sudo tee /etc/sysctl.d/90-inotify.conf\necho \"fs.inotify.max_user_watches=524288\" | sudo tee -a /etc/sysctl.d/90-inotify.conf\n\n# Apply immediately (no reboot required)\nsudo sysctl -p /etc/sysctl.d/90-inotify.conf\n\n# Verify\nsysctl fs.inotify.max_user_instances fs.inotify.max_user_watches\n```\n\nA stock Fedora workstation defaults to 128 instances and 65536 watches — adequate for a desktop, not for a workstation running five database services, two MCP runtimes, and a file watcher per active LLM tool.\n\n---\n\n## 5. Infrastructure Setup: Docker Compose\n\nNeo4j and Postgres are the two persistent stores. Both run in Docker. See `postgres_neo4j_limits.yaml` for the full compose file; the key structure is:\n\n```yaml\nservices:\n  neo4j:\n    image: neo4j:5-community\n    ports:\n      - \"7474:7474\"   # Browser UI\n      - \"7687:7687\"   # Bolt protocol\n    volumes:\n      - /your/databases/neo4j/data:/data:z\n    environment:\n      - NEO4J_AUTH=neo4j/${NEO4J_PASSWORD}\n      - NEO4J_PLUGINS=[\"apoc\"]\n      # Neo4j 5 uses double underscores (__) for nested config keys\n      - NEO4J_server_memory_heap_max__size=2G\n      - NEO4J_server_memory_pagecache_size=2G\n    restart: always\n\n  postgres:\n    image: pgvector/pgvector:pg17\n    ports:\n      - \"5432:5432\"\n    volumes:\n      - /your/databases/postgres/data:/var/lib/postgresql/data:z\n    command: postgres -c shared_buffers=1GB -c work_mem=64MB\n    environment:\n      - POSTGRES_PASSWORD=${PG_PASSWORD}\n      - POSTGRES_DB=agent_data\n    restart: always\n```\n\n```bash\n# Start both services\ndocker compose -f postgres_neo4j_limits.yaml up -d\n\n# Verify\ndocker compose -f postgres_neo4j_limits.yaml ps\n```\n\nCredentials are read from environment variables — copy `.env.example` to `.env` and fill in `NEO4J_PASSWORD` and `PG_PASSWORD` before starting.\n\n---\n\n## 6. Database Schema\n\nRun these once against the Postgres instance to create the vector extension and both tables.\n\n```sql\n-- Connect: psql postgresql://postgres:${PG_PASSWORD}@localhost:5432/agent_data\n\nCREATE EXTENSION IF NOT EXISTS vector;\n\n-- Tier 1: episodic facts from all agents\nCREATE TABLE IF NOT EXISTS technical_docs (\n    id            SERIAL PRIMARY KEY,\n    content       TEXT NOT NULL,\n    metadata      JSONB,\n    embedding     vector(1024),\n    content_hash  TEXT UNIQUE\n);\nCREATE INDEX IF NOT EXISTS technical_docs_embedding_idx\n    ON technical_docs USING ivfflat (embedding vector_cosine_ops);\n\n-- Tier 3: consolidated thematic narratives\nCREATE TABLE IF NOT EXISTS community_summaries (\n    id             SERIAL PRIMARY KEY,\n    content        TEXT NOT NULL,\n    metadata       JSONB,\n    embedding      vector(1024),\n    source_pg_ids  integer[]       -- IDs of technical_docs rows that contributed to this summary\n);\nCREATE INDEX IF NOT EXISTS community_summaries_embedding_idx\n    ON community_summaries USING ivfflat (embedding vector_cosine_ops);\n```\n\n### Neo4j constraints\n\n```cypher\n// Run in Neo4j Browser or cypher-shell\nCREATE CONSTRAINT fact_pg_id IF NOT EXISTS FOR (f:Fact) REQUIRE f.pg_id IS UNIQUE;\nCREATE CONSTRAINT entity_name IF NOT EXISTS FOR (e:Entity) REQUIRE e.name IS UNIQUE;\nCREATE CONSTRAINT summary_pg_id IF NOT EXISTS FOR (s:CommunitySummary) REQUIRE s.pg_id IS UNIQUE;\n```\n\n\u003e **Key schema rule:** Every fact saved must include `\"entities\": [\"Name1\", \"Name2\"]` in its metadata. The saver creates `Entity` nodes and `MENTIONS` edges for each name. Without them the fact is stored and retrievable by vector search, but the consolidation daemon will never cluster it into Tier 3. The graph layer is the prerequisite for the semantic layer.\n\nFull schema with all Neo4j labels and relationship types: [`shared-memory/Documentation/schema.md`](shared-memory/Documentation/schema.md)\n\n### Ontology configuration\n\nAll Neo4j label names and relationship types are defined in `ontology.yaml` at the repo root. The defaults match the schema above. Override any value to adapt the graph to your naming conventions without touching Python source — then restart the scripts.\n\n```yaml\n# ontology.yaml — excerpt showing defaults\nlabels:\n  fact: Fact\n  entity: Entity\n  community_summary: CommunitySummary\n  # Provenance layer (Phase A)\n  decision: Decision       # architectural / design decision\n  human: Human             # person who owns a decision\n  ai_agent: AIAgent        # AI tool that assisted\n  project: Project         # project scope\n  activity: Activity       # work session context\n  milestone: Milestone     # significant achievement marker\n\nrelationships:\n  entity_link: MENTIONS          # Fact → Entity, written on save\n  entity_link_alias: REPORTS_ON  # legacy alias accepted by consolidation\n  summarized_by: SUMMARIZED_BY\n  # Provenance relationships (Phase A)\n  was_attributed_to: WAS_ATTRIBUTED_TO  # Decision → Human\n  was_assisted_by: WAS_ASSISTED_BY      # Decision → AIAgent\n  project_of: PROJECT_OF                # Decision → Project\n  supersedes: SUPERSEDES                # Decision → Decision\n  informed_by: INFORMED_BY              # Decision → Decision\n  had_outcome: HAD_OUTCOME              # Decision → (self or Milestone)\n\nconsolidation:\n  density_threshold: 5        # unconsolidated Facts per Entity to trigger synthesis\n```\n\nSet `SMEM_ONTOLOGY_PATH=/path/to/your/ontology.yaml` to load from a non-default location. If the file is absent the stack starts with the built-in defaults — no configuration required for a standard deployment.\n\n---\n\n## 7. Inference Backends (llama.cpp)\n\nTwo models serve the embedding and reranking paths. Both are hosted via `llama-server` on separate ports. A third port (5000) hosts the reasoning LLM — typically LM Studio's inference server, but any OpenAI-compatible endpoint works.\n\n```bash\n# BGE-M3 — embedding model, port 8070\nllama-server --model /path/to/bge-m3-Q8_0.gguf --port 8070 --embedding --pooling mean\n\n# BGE-Reranker-v2-m3 — reranking model, port 8071\nllama-server --model /path/to/bge-reranker-v2-m3.gguf --port 8071 --reranking\n```\n\n\u003e **Never call ports 8070 or 8071 directly.** All agents must go through the Hive-Mind Gateway on port 8888. The gateway is what enforces the shared embedding space — if any agent bypasses it, the 1024-dim consistency guarantee is broken in operational practice.\n\n---\n\n## 8. The Hive-Mind Gateway: Why It Exists\n\n### The hardcoded embedder problem\n\nMany tools in this stack are built around the OpenAI API. LM Studio's internal agent tooling and other OpenAI-compatible clients accept an API base URL and call `/v1/embeddings` against it. Without a gateway, the choices are:\n\n- Point every tool individually at port 8070 — fragile, breaks reranking which lives on 8071\n- Accept that each tool calls whatever model it prefers — produces different vector spaces, destroying cross-agent retrieval\n- Let credentials leak to the real OpenAI API if a tool ignores the local override\n\nThe gateway solves all three. Every tool points at `http://localhost:8888/v1`. The gateway routes internally:\n\n| Path | Backend |\n|---|---|\n| `/v1/embeddings` | Port 8070 (BGE-M3, 1024-dim) |\n| `/v1/reranking` | Port 8071 (BGE-Reranker-v2-m3) |\n| All other requests | Port 5000 (reasoning LLM) |\n\nOne endpoint. All agents. Same vector space.\n\n### From ThreadingHTTPServer to async aiohttp — why streaming required a rewrite\n\nThe first versions of the gateway used Python's stdlib `http.server.ThreadingHTTPServer` with `urllib` for upstream calls. This worked for embedding and reranking (which return quickly), but it broke fundamentally for LLM streaming: `urllib` buffers the entire upstream response before returning. A 4,000-token generation at 20 tokens/second takes 200 seconds, delivered as a single write — that is not streaming.\n\nThe v6 async rewrite replaced the entire implementation with `aiohttp.web` + `aiohttp.ClientSession`. Key properties:\n\n- **True streaming:** `iter_any()` pipes upstream chunks to the client as they arrive. The first token reaches the client in milliseconds.\n- **RFC 7230 hop-by-hop filtering:** `Transfer-Encoding`, `Content-Length`, `Connection`, and other hop-by-hop headers are stripped from both request and response. Forwarding a stale `Content-Length` alongside a chunked stream causes clients to truncate or hang.\n- **`auto_decompress=False`:** aiohttp decompresses upstream responses by default but still forwards `Content-Encoding: gzip`. A client receiving decompressed bytes labelled as compressed double-decompresses — corruption. Disabled so compressed bytes and headers travel together.\n- **`CancelledError` always re-raised:** swallowing it leaves tasks as zombies; graceful shutdown stalls indefinitely.\n- **Self-defusing signal handler:** after the first SIGINT/SIGTERM, both handlers are removed. A second Ctrl+C falls back to Python's default `KeyboardInterrupt` — emergency hard-abort if the drain stalls on a hung backend.\n- **HTTP 503 for unreachable backends, 504 for connect timeout:** correct semantics for client retry logic.\n\n---\n\n## 9. Starting the Full Stack\n\nThe startup sequence is order-dependent. The gateway must be up before any embedding or save operation. Starting the gateway also starts the consolidation daemon — you do not need to manage them separately.\n\n**1. Start databases**\n```bash\ndocker compose -f postgres_neo4j_limits.yaml up -d\n```\n\n**2. Start BGE-M3 and BGE-Reranker-v2-m3** (llama-server, ports 8070 and 8071)\n\n**3. Start the reasoning LLM** (LM Studio or any OpenAI-compatible server on port 5000)\n\n**4. Start the Hive-Mind Gateway** — this also starts the consolidation daemon automatically\n```bash\nuv run --with aiohttp python shared-memory/scripts/hive_mind_proxy.py 8888\n```\n\nYou will see two log lines confirming both are up:\n```\nINFO  ### Hive-Mind Proxy on :8888 [aiohttp]\nINFO  Consolidation daemon started (pid XXXXX)\nINFO  Listening for 'new_artifact' notifications...\n```\n\n**5. LM Studio** — start the application; it will pick up the MCP servers from `mcp.json` automatically.\n\nStep 4 is the only manual step required after databases and models are running. The proxy starts the daemon; the daemon registers its Postgres listener; both shut down cleanly when the proxy receives SIGINT or SIGTERM.\n\n**Verify the full stack is healthy:**\n```bash\ncurl http://localhost:8888/health\n# {\"status\":\"ok\",\"embedder\":\"ok\",\"reranker\":\"ok\",\"llm\":\"ok\",\"daemon\":\"running\"}\n```\n\nHTTP 200 means the save/search path (embedder + reranker) is operational. HTTP 503 means at least one critical backend is down — do not attempt saves until resolved. The `llm` and `daemon` fields are informational; their degradation affects consolidation only.\n\n**Daemon watchdog:** the gateway automatically restarts the consolidation daemon if it crashes, with exponential backoff and a circuit breaker (5 crashes / 10 min). If the circuit breaker trips, restart the gateway.\n\n\u003e **Network exposure:** The gateway binds to `127.0.0.1:8888` by default — localhost only. Set `PROXY_BIND=0.0.0.0` in `.env` to opt into all-interfaces binding (e.g. inside an isolated Docker or VM network). The coordinator API is unauthenticated — do not expose port 8888 on an untrusted network. See [SECURITY.md](SECURITY.md) for details.\n\n---\n\n## 10. Agent Integration: First-Time Setup\n\nThis section covers where to place files and how to register each agent. For runtime usage (commands and examples) see [§11: Agent Access: CLI and MCP](#11-agent-access-cli-and-mcp).\n\n### Clone the repository and set up the environment\n\n```bash\ngit clone https://github.com/KanenasInGreece/Shared_Memory.git\ncd Shared_Memory\ncp .env.example .env\n# Edit .env — fill in NEO4J_PASSWORD and PG_PASSWORD\n```\n\nCreate a virtual environment and install dependencies:\n\n```bash\npython -m venv .venv\nsource .venv/bin/activate\n\n# Runtime only\npip install -r requirements.txt\n\n# Runtime + test dependencies\npip install -r requirements-dev.txt\n```\n\nActivate the venv in every new shell session before running any script:\n\n```bash\nsource .venv/bin/activate\n```\n\n\u003e **uv users:** all commands in this README use `uv run --with ...` which handles dependencies automatically without a venv. Both approaches work — use whichever fits your workflow.\n\n### Smoke-test the bridge\n\nAfter the full stack is running, verify the bridge works from any shell:\n\n```bash\nuv run --with httpx --with python-dotenv \\\n  python /path/to/Shared_Memory/shared-memory/scripts/memory_bridge.py search \"test\" 3\n```\n\n### Claude Code\n\nClaude Code loads skills from `~/.claude/skills/`. Create the skill directory with a symlink so scripts always stay in sync with the repo:\n\n```bash\nmkdir -p ~/.claude/skills/shared-memory\n\n# Symlink scripts — always in sync with the repo\nln -s /path/to/Shared_Memory/shared-memory/scripts ~/.claude/skills/shared-memory/scripts\n\n# Copy SKILL.md (or symlink it too)\ncp shared-memory-skill/shared-memory/SKILL.md ~/.claude/skills/shared-memory/SKILL.md\n```\n\nInvoke in any Claude Code session:\n\n```\n/shared-memory\n```\n\n### Grok\n\nGrok loads skills from `~/.grok/skills/`. Same symlink pattern:\n\n```bash\nmkdir -p ~/.grok/skills/shared-memory\n\n# Symlink scripts — always in sync with the repo\nln -s /path/to/Shared_Memory/shared-memory/scripts ~/.grok/skills/shared-memory/scripts\n\n# Copy SKILL.md\ncp shared-memory-skill/shared-memory/SKILL.md ~/.grok/skills/shared-memory/SKILL.md\n```\n\nInvoke in any Grok session:\n\n```\n/shared-memory\n```\n\n### Codex CLI\n\nCodex CLI loads skills from `~/.codex/skills/` (global) or `.agents/skills/` (project-level). Install globally so the skill is available in every project:\n\n```bash\nmkdir -p ~/.codex/skills/shared-memory\n\n# Symlink scripts — always in sync with the repo\nln -s /path/to/Shared_Memory/shared-memory/scripts ~/.codex/skills/shared-memory/scripts\n\n# Copy SKILL.md\ncp shared-memory/SKILL.md ~/.codex/skills/shared-memory/SKILL.md\n```\n\nInvoke explicitly in any Codex CLI session:\n\n```\n$shared-memory\n```\n\nCodex CLI also supports **implicit invocation**: if the description in SKILL.md's frontmatter matches the task, the skill is loaded automatically without an explicit `$` call.\n\n\u003e **AGENTS.md:** Codex CLI reads `AGENTS.md` at the project root before each session (their equivalent of `CLAUDE.md`). This repo provides `AGENTS.md` alongside `AGENT.md` — both contain the same architectural guidance.\n\n### Gemini CLI\n\nGemini CLI loads skills from `~/.gemini/skills/`. Drop the `shared-memory` skill directory there:\n\n```bash\nmkdir -p ~/.gemini/skills\n\n# Copy (standalone — updates require a re-copy)\ncp -r shared-memory-skill/shared-memory ~/.gemini/skills/shared-memory\n\n# Or symlink (always in sync with the repo)\nln -s /path/to/Shared_Memory/shared-memory-skill/shared-memory ~/.gemini/skills/shared-memory\n```\n\nActivate in any Gemini CLI session:\n\n```\n/activate shared-memory\n```\n\n### LM Studio\n\nLM Studio integrates through two files: an MCP config (`mcp.json`) and the MCP server script (`vector-skill.py`).\n\n**Step 1 — Place `vector-skill.py`**\n\nPut it anywhere that stays accessible, for example:\n\n```bash\nmkdir -p ~/ai/shared-memory\ncp vector-skill.py ~/ai/shared-memory/vector-skill.py\n```\n\nLM Studio does not manage this path — you reference it by absolute path in `mcp.json`.\n\n**Step 2 — Configure and place `mcp.json`**\n\nEdit `mcp.json` from this repo: replace all `YOUR_*` placeholders with real values and update the absolute path to `vector-skill.py` in the `rag-orchestrator` entry. Then save it to LM Studio's MCP config location (`~/.lmstudio/mcp.json` on Linux and macOS).\n\n**Step 3 — Configure and load the system prompt**\n\n`system-prompt.md` is the operational contract for the LM Studio model. It defines:\n\n- **Search-first directive** — the model must call `rag-orchestrator` → `hybrid_search_and_rerank` as the first tool on every query. `rag-orchestrator` already includes Neo4j graph expansion internally; no separate graph MCP is needed.\n- **Gateway mandate** — the architectural context explicitly states that all embedding and reranking calls route through port 8888; the model must never reference 8070 or 8071 directly.\n- **Consolidation awareness** — the model knows that every save triggers a Postgres `pg_notify` and that the consolidation daemon (auto-started with the gateway) synthesises Tier 3 summaries. It also knows to warn you if the daemon is not running.\n- **Memory cycle** — when to absorb (end of task, new decision) and that `\"entities\"` in save metadata is required for Tier 3 eligibility.\n\nBefore importing, fill in the `[YOUR ...]` placeholder fields at the top (name, location, hardware, OS). Then import in LM Studio: **Settings → System Prompt → Import**.\n\n**Step 4 — Verify**\n\nStart LM Studio. The `rag-orchestrator` MCP server should appear in the tool panel. If it shows an error, confirm the full stack is running (gateway on :8888, databases up) and that there are no remaining `YOUR_*` placeholders in `mcp.json`.\n\n---\n\n## 11. Agent Access: CLI and MCP\n\n| Consumer | Interface | Entry point | Consolidation trigger |\n|---|---|---|---|\n| **Claude Code** | CLI (skill `/shared-memory`) | `~/.claude/skills/shared-memory/scripts/memory_bridge.py` | via coordinator → `pg_notify` |\n| **Codex CLI** | CLI (skill `$shared-memory`) | `~/.codex/skills/shared-memory/scripts/memory_bridge.py` | via coordinator → `pg_notify` |\n| **Grok** | CLI (skill `/shared-memory`) | `~/.grok/skills/shared-memory/scripts/memory_bridge.py` | via coordinator → `pg_notify` |\n| **Gemini CLI** | CLI (skill `/activate shared-memory`) | `~/.gemini/skills/shared-memory/scripts/memory_bridge.py` | via coordinator → `pg_notify` |\n| **LM Studio** | MCP (FastMCP) | `vector-skill.py` → `rag-orchestrator` in `mcp.json` | via coordinator → `pg_notify` |\n| **Any HTTP client** | REST | `POST http://localhost:8888/memory/save\\|search\\|graph` | via coordinator → `pg_notify` |\n\nAll three paths route through the coordinator on port 8888. The coordinator owns all Postgres and Neo4j connections — agents no longer connect to the databases directly. The Hive-Mind Gateway must be running before any save or search.\n\n### CLI usage\n\n```bash\n# Check the framework version\npython shared-memory/scripts/memory_bridge.py --version\n# → {\"version\": \"0.3.0\", \"tool\": \"shared-memory-framework\"}\n\n# Search — semantic + rerank + Neo4j expansion\nuv run --with httpx \\\n  python shared-memory/scripts/memory_bridge.py search \"bgem3 interference problem\" 5\n\n# Save — always include source and entities\nuv run --with httpx \\\n  python shared-memory/scripts/memory_bridge.py save \\\n  \"The proxy routes all embeddings through :8888 to enforce 1024-dim consistency.\" \\\n  '{\"source\":\"claude-code\",\"entities\":[\"hive_mind_proxy\",\"BGE-M3\",\"SharedMemory\"]}'\n\n# Save a decision — structured flags, no JSON blob required\nuv run --with httpx \\\n  python shared-memory/scripts/memory_bridge.py save_decision \\\n  --title \"Route all embeddings through the gateway\" \\\n  --decided-by \"Xenofon\" \\\n  --project \"shared-memory\" \\\n  --rationale \"Enforces 1024-dim consistency across all agents; prevents dimension mismatch on retrieval\" \\\n  --assisted-by \"claude-sonnet-4-6\" \\\n  --alternatives \"direct port 8070 calls, per-agent embedding models\" \\\n  --confidence \"high\" \\\n  --entities \"BGE-M3,hive_mind_proxy,SharedMemory\"\n\n# Query decisions — who decided what, with which AI, on which project\nuv run --with httpx \\\n  python shared-memory/scripts/memory_bridge.py graph \\\n  \"MATCH (h:Human)-[:WAS_ATTRIBUTED_TO]-(d:Decision)-[:PROJECT_OF]-\u003e(p:Project)\n   OPTIONAL MATCH (d)-[:WAS_ASSISTED_BY]-\u003e(ai:AIAgent)\n   RETURN h.name, d.title, d.rationale, d.date, p.name, ai.name\n   ORDER BY d.date DESC LIMIT 5\"\n\n# Graph query — entity hub sizes (top referenced concepts)\nuv run --with httpx \\\n  python shared-memory/scripts/memory_bridge.py graph \\\n  \"MATCH (e:Entity)\u003c-[:MENTIONS]-(f:Fact) RETURN e.name, count(f) AS refs ORDER BY refs DESC LIMIT 10\"\n```\n\n### Coordinator HTTP API\n\nThe coordinator exposes four endpoints on port 8888. These can be called directly by any HTTP client — agents, scripts, or future tools.\n\n| Method | Path | Body | Response |\n|---|---|---|---|\n| `POST` | `/memory/save` | `{content, metadata, agent_id?, scope?, visibility?}` | `{status, pg_id, neo4j, message}` |\n| `POST` | `/memory/search` | `{query, limit?, scope?, agent_id?}` | `{status, results[]}` |\n| `POST` | `/memory/graph` | `{cypher, params?}` | `{status, records[]}` |\n| `GET` | `/memory/status/{pg_id}` | — | `{pg_id, neo4j, retries, applied_at}` |\n\n\u003e **`/memory/graph` is read-only enforced.** Queries containing `CREATE`, `DELETE`, `DETACH DELETE`, `SET`, `MERGE`, `CALL`, `LOAD CSV`, or `DROP` are rejected with HTTP 400 before reaching Neo4j. Use it for `MATCH`/`RETURN`/`WITH`/`WHERE` exploration only.\n\n**Write acknowledgment:** saves return `200 OK` once the fact is committed to Postgres. The outbox row for Neo4j is written in the same transaction; Neo4j application is asynchronous. Use `GET /memory/status/{pg_id}` to confirm Neo4j application, or pass `?consistency=neo4j` (Phase 2) to block until the outbox row is applied.\n\n### Skill activation\n\n```\n/shared-memory          # Claude Code and Grok\n$shared-memory          # Codex CLI (explicit); also auto-matched via SKILL.md description\n/activate shared-memory # Gemini CLI\n```\n\n---\n\n## 12. The Save Path — From Artifact to Memory\n\nThe save path runs inside the coordinator (`coordinator.py`) on every `POST /memory/save`:\n\n```\ncaller: POST /memory/save {content, metadata, agent_id, scope, visibility}\n       ↓\nembed(content) via :8888 — retry with exponential backoff (4 attempts)\n       ↓ 503 if all retries fail — hard mandate: no save without a vector\nacquire per-entity asyncio.Lock for each name in metadata[\"entities\"]\n       ↓ serializes concurrent writes to the same entity cluster\nBEGIN TRANSACTION\n  INSERT INTO technical_docs ... ON CONFLICT (content_hash) DO UPDATE\n       ↓ idempotent: SHA-256 hash prevents duplicates; agent_id/scope/visibility stored\n  INSERT INTO neo4j_outbox (pg_id, cypher_params)\n       ↓ outbox row committed atomically — Phase 2 worker drains this\n  SELECT pg_notify('new_artifact', {\"pg_id\": id})\nCOMMIT  ← 200 OK returned to caller here (Postgres-ack)\n       ↓\nMERGE (f:Fact {pg_id}) in Neo4j  [Phase 1 — direct write; replaced by outbox worker in Phase 2]\nfor each entity name in metadata[\"entities\"]:\n    MERGE (e:Entity {name})\n    MERGE (f)-[:MENTIONS]-\u003e(e)\n       ↓\ndaemon receives NOTIFY → adds pg_id to pending_pg_ids → idle timer starts\n```\n\n\u003e **Hard Mandate — Embedding Integrity:** Saves return 503 if the embedding service is unreachable after all retries. An artifact without a vector is invisible to semantic search — this failure must surface, never be swallowed.\n\n\u003e **Per-entity write serialization:** Concurrent saves targeting the same entity are serialized via `asyncio.Lock[entity_name]`. This prevents duplicate `Entity` hub creation under agent-swarm concurrency and ensures the consolidation daemon sees a consistent cluster. (Phase 4 replaces this with Postgres advisory locks for multi-process deployment.)\n\n\u003e **Cross-DB atomicity:** The outbox row is written in the same Postgres transaction as the fact. If the process crashes after commit, the outbox row survives and the Phase 2 worker replays the Neo4j write on restart. The ADR-001 dangling-Fact window is eliminated in Phase 2.\n\n\u003e **Audit logging:** Every event in the save path — coordinator unreachable, malformed metadata, missing entities, Neo4j sync failures, and successful saves — is optionally logged based on `MEMORY_LOG_LEVEL`. See [§14: Audit Logging](#14-audit-logging).\n\n---\n\n## 13. The Sleep Cycle — Consolidation\n\nThe consolidation daemon is the neocortical layer of the architecture. It does not poll — polling would compete with inference workloads that need full GPU headroom. It waits for a Postgres `NOTIFY`, then applies a dual gate before acting: an idle timer and a graph density check.\n\n### Trigger logic\n\n- Each `pg_notify` adds the artifact's `pg_id` to `pending_pg_ids` and resets a 15-minute idle timer.\n- After 15 minutes with no new notifications (idle threshold), consolidation runs.\n- A 45-minute hard backstop fires during continuous ingestion even if notifications never stop — preventing indefinite deferral.\n\n### Per-community consolidation\n\nThe daemon uses the queued `pg_id`s as entry points into Neo4j, not as the consolidation targets themselves. From each entry point it traverses to Entity hubs and counts unconsolidated Fact neighbors. Communities with fewer than 5 unconsolidated Facts wait — sparse neighborhoods are not ready for synthesis.\n\nFor each community that meets the threshold:\n\n1. Fetch the most recent `CommunitySummary` for that Entity from Postgres (if any).\n2. Call the LLM via `:8888 → :5000` to integrate new facts into the existing narrative — **cumulative**, not a new isolated snapshot. This prevents content drift from parallel summary fragments about the same entity.\n3. Re-embed the new narrative via BGE-M3 through `:8888`.\n4. Write to `community_summaries`; create/update `CommunitySummary` node in Neo4j; link source Facts via `SUMMARIZED_BY`; set `Fact.consolidated = true`.\n\n\u003e **Why centroid averaging is not used:** The obvious compression approach — averaging related embeddings into a centroid — collapses the angular distinctions that cosine similarity depends on (Vangara \u0026 Gopinath, 2026, *\"The Geometry of Consolidation\"*). The LLM instead generates new language representing the theme of the cluster, which is then re-embedded from scratch. This produces a new semantic point that did not exist before — not a mathematical blend. Retrievable volume grows O(log n) with LLM-based consolidation versus O(n) without it.\n\n### Re-consolidation\n\nThe `consolidated` flag is not permanent. If future ingestion introduces unflagged Facts with sufficient neighborhood density that pull previously-consolidated Facts back into a candidate community, the entire cluster becomes eligible again.\n\n---\n\n## 14. Audit Logging\n\nThe save path in both `memory_bridge.py` and `vector-skill.py` writes structured JSON log entries to per-tool files. Logging is **off by default** — enable it by setting `MEMORY_LOG_LEVEL` in `.env`.\n\n### Configuration\n\n| Variable | Default | Description |\n|---|---|---|\n| `MEMORY_LOG_LEVEL` | `0` (off) | Controls which events are logged |\n| `MEMORY_LOG_PATH` | `~/.shared-memory/logs` | Directory where log files are written |\n\n### Log levels\n\n| Level | Events logged |\n|---|---|\n| `0` | Nothing (default) |\n| `1` | **Warnings** — save succeeded but `entities` missing; fact is stored but ineligible for consolidation |\n| `2` | Warnings + **errors** — gateway down (save aborted), malformed metadata JSON, non-dict metadata, Neo4j sync failure |\n| `3` | All above + **successful saves** — records `pg_id`, `source`, and entity count on every completed save |\n| `4` | All above + **full content copy** — includes the complete `content` field in each entry; warns if content exceeds 10 KB |\n\n### Per-tool log files\n\nEach entry point writes to its own file. Concurrent writes from CLI agents (both using `memory_bridge.py`) are safe — `O_APPEND` mode writes are atomic on Linux for writes smaller than `PIPE_BUF` (4096 bytes); individual log lines are well within that limit. Rotation is excluded from the writing tools to eliminate any write/rotate race condition.\n\n| Tool | Log file |\n|---|---|\n| CLI tools / Gemini CLI | `{MEMORY_LOG_PATH}/memory_bridge.log` |\n| LM Studio MCP | `{MEMORY_LOG_PATH}/vector_skill.log` |\n\n### Log format\n\nEach line is a self-contained JSON object:\n\n```json\n{\"ts\": \"2026-05-24T14:32:01.123456\", \"tool\": \"memory_bridge\", \"event\": \"no_entities\", \"pg_id\": 42, \"source\": \"gemini_cli\"}\n{\"ts\": \"2026-05-24T14:35:17.891234\", \"tool\": \"memory_bridge\", \"event\": \"save_success\", \"pg_id\": 43, \"source\": \"gemini_cli\", \"entity_count\": 2}\n{\"ts\": \"2026-05-24T14:41:03.552109\", \"tool\": \"vector_skill\",   \"event\": \"gateway_down\", \"content_preview\": \"Architectural dec...\"}\n```\n\n`event` is one of: `gateway_down`, `bad_metadata`, `bad_metadata_type`, `neo4j_sync_failed`, `no_entities`, `save_success`.\n\n### Daily merge by the consolidation daemon\n\nThe consolidation daemon runs `merge_logs()` once per calendar day on the first 1-second poll of a new day. It uses the logrotate pattern:\n\n1. Rename `memory_bridge.log` → `memory_bridge.log.rotating` and `vector_skill.log` → `vector_skill.log.rotating`. Writing tools create fresh files on next open.\n2. Parse all entries from both rotating files, sort by timestamp, group by calendar date.\n3. For each date, merge with any existing archive and write `shared_memory_YYYY-MM-DD.log.gz` (atomic `os.replace`).\n4. Delete the `.rotating` files.\n\nThe `shared_memory_` prefix distinguishes merged archives from agent memory files in the same directory.\n\n```\n~/.shared-memory/logs/\n  memory_bridge.log               ← active, append-only\n  vector_skill.log                ← active, append-only\n  shared_memory_2026-05-23.log.gz ← yesterday, merged\n  shared_memory_2026-05-22.log.gz ← two days ago, merged\n```\n\nIf the daemon is not running, per-tool logs accumulate; entries from multiple days are correctly split into separate dated archives on the next merge run.\n\n---\n\n## 15. Retrieval: Three-Tier Lookup\n\nBoth the MCP tool (`hybrid_search_and_rerank` in `vector-skill.py`) and the CLI (`memory_bridge.py search`) implement the same retrieval chain:\n\n1. **Embed the query** via BGE-M3 through `:8888`.\n2. **Global context scan:** query `community_summaries` — top-1 thematic match. This orients the result set toward the most relevant synthesised narrative.\n3. **Semantic hit:** query `technical_docs` — top-20 candidates by cosine similarity.\n4. **Rerank:** BGE-Reranker-v2-m3 via `:8888` scores all 20 candidates against the original query and returns the top-N by cross-encoder relevance.\n5. **Relational expansion:** for each top-N hit, query Neo4j for related entities and facts — surfaces structural context that vector similarity cannot express.\n\nVector retrieval and graph traversal fail differently. Cosine similarity degrades with semantic crowding. Graph traversal executes structural logic — path length, relationship type, graph density — and does not degrade with interference. As `technical_docs` accumulates interference pressure, facts that become harder to surface through vector retrieval remain fully reachable through graph traversal. The two layers compensate for each other's weaknesses.\n\n---\n\n## 16. LM Studio MCP Configuration\n\nEdit `mcp.json` — replace all `YOUR_*` placeholders with real values and update the absolute path to `vector-skill.py`. Save it to `~/.lmstudio/mcp.json` (or wherever LM Studio reads MCP config on your system).\n\nThe `rag-orchestrator` entry runs the custom MCP server for this framework. It is the only memory MCP server needed — it covers semantic retrieval (Tier 1 + Tier 3) and Neo4j graph expansion in a single call, and routes all writes through the coordinator's atomicity and locking guarantees.\n\n\u003e **Why no separate graph MCP?** A direct-bolt Neo4j MCP server (e.g. `neo4j-agent-memory`) bypasses the coordinator entirely: no per-entity locks, no outbox atomicity, no SHA-256 deduplication, and no read-only Cypher guard. Any write it makes produces orphaned Neo4j nodes with no corresponding Postgres record — invisible to semantic search and outside the consolidation pipeline. `rag-orchestrator` already includes Neo4j graph expansion; a separate graph MCP adds ambiguity and write-safety risk without adding capability.\n\n```json\n{\n  \"mcpServers\": {\n    \"rag-orchestrator\": {\n      \"command\": \"uv\",\n      \"args\": [\n        \"run\", \"--with\", \"fastmcp\",\n        \"--with\", \"httpx\",\n        \"--with\", \"psycopg2-binary\",\n        \"--with\", \"neo4j\",\n        \"--with\", \"python-dotenv\",\n        \"python\", \"/path/to/your/vector-skill.py\"\n      ]\n    },\n    \"tavily-mcp\": {\n      \"command\": \"npx\",\n      \"args\": [\"-y\", \"tavily-mcp@latest\"],\n      \"env\": {\n        \"TAVILY_API_KEY\": \"YOUR_TAVILY_API_KEY\"\n      }\n    }\n  }\n}\n```\n\n### Web search — choose your provider\n\nThe framework treats web search as a pluggable MCP slot. The `mcp.json` above uses Tavily; Brave Search is a fully local-key alternative with no per-query metering. Use whichever fits your setup — the rest of the stack does not care which one is registered, as long as the tool name you reference in your system prompt matches the MCP server key.\n\n**Tavily** (default — advanced search, image results, 15-result depth):\n```json\n\"tavily-mcp\": {\n  \"command\": \"npx\",\n  \"args\": [\"-y\", \"tavily-mcp@latest\"],\n  \"env\": {\n    \"TAVILY_API_KEY\": \"YOUR_TAVILY_API_KEY\",\n    \"DEFAULT_PARAMETERS\": \"{\\\"include_images\\\": true, \\\"max_results\\\": 15, \\\"search_depth\\\": \\\"advanced\\\"}\"\n  }\n}\n```\nGet a key at [tavily.com](https://tavily.com).\n\n**Brave Search** (alternative — privacy-focused, independent index, no per-query cost on paid plans):\n```json\n\"brave-search\": {\n  \"command\": \"npx\",\n  \"args\": [\"-y\", \"@modelcontextprotocol/server-brave-search\"],\n  \"env\": {\n    \"BRAVE_API_KEY\": \"YOUR_BRAVE_API_KEY\"\n  }\n}\n```\nGet a key at [brave.com/search/api](https://brave.com/search/api).\n\n\u003e **Adjust your system prompt to match.** The `COGNITIVE HIERARCHY` section in `system-prompt.md` references the search tool by its MCP server key. If you switch from `tavily-mcp` to `brave-search`, update that reference so the model knows which tool to call for web lookups.\n\n---\n\n## 17. Testing\n\nAll tests are fully mocked — no live database or gateway required. Run from the project root.\n\n```bash\n# Full suite\nuv run --with pytest --with pytest-asyncio --with fastmcp \\\n       --with psycopg2-binary --with httpx --with neo4j \\\n       pytest tests/ -v\n\n# Single file\nuv run --with pytest --with pytest-asyncio --with fastmcp \\\n       --with psycopg2-binary --with httpx --with neo4j \\\n       pytest tests/test_vector_skill.py\n\n# Single test case\nuv run --with pytest --with pytest-asyncio --with fastmcp \\\n       --with psycopg2-binary --with httpx --with neo4j \\\n       pytest tests/test_vector_skill.py::test_mcp_save_artifact_success\n\n# Skip LLM calls in consolidation tests\nMOCK_LLM=1 uv run --with pytest --with pytest-asyncio --with fastmcp \\\n           --with psycopg2-binary --with httpx --with neo4j \\\n           pytest tests/test_consolidation_e2e.py\n```\n\n| Test file | Coverage |\n|---|---|\n| `test_memory_bridge.py` | Embedding hard mandate, save idempotency, search + rerank + fallback, Neo4j expansion |\n| `test_vector_skill.py` | MCP tool contracts (save, search, health check, reasoning trace) |\n| `test_consolidation_e2e.py` | Consolidation cycle with mock LLM, density threshold, community summary write, `source_pg_ids` populated |\n| `test_logging.py` | `_append_log` level filtering, per-tool file routing, content size warnings; `save_artifact` logging at each event type; `merge_logs` sort order, multi-tool merge, malformed line handling, daily archive merge, logrotate cleanup |\n\n---\n\n## 18. Open Problems\n\n### Stored Prompt Injection (partially mitigated)\n\nWeb-retrieved content enters the same ingestion pipeline as internally authored facts. A crafted document can embed near a legitimate fact cluster and — after consolidation — contaminate `community_summaries` as trusted context for all agents.\n\n**Implemented:** `[BEGIN/END RETRIEVED FACTS]` delimiters and a \"treat as DATA\" preamble in consolidation prompts harden the Tier 3 synthesis path. Tier 1 retrieval (raw facts in agent context windows) remains unprotected.\n\n**Planned:** ingestion boundary sanitisation; counterfactual simulation pass. Full details in [SECURITY.md](SECURITY.md).\n\n**Do not ingest external or web-retrieved content at volume before implementing the remaining defences.**\n\n### Agent Authentication (planned — Phase 2C)\n\n`agent_id` is self-reported in the request body. Any caller can impersonate any agent. `AGENT_TOKENS` env var and `Authorization: Bearer \u003ctoken\u003e` middleware are designed; implementation is the next security PR. See `.env.example` for the token format.\n\n### Entity Resolution\n\nThe consolidation daemon clusters facts by entity names supplied by callers. Two callers using different names for the same concept (`\"hive_mind_proxy\"` vs `\"Hive-Mind Gateway\"`) produce separate clusters and separate community summaries. As the agent population grows, entity resolution — merging synonymous nodes — becomes a real structural problem. Not implemented.\n\n### Consolidation Quality\n\nThe daemon trusts the LLM to synthesise accurately. There is no quantitative signal for whether a generated narrative is a sharp thematic abstraction or a lossy blur. Without a quality measure, tuning the density threshold or summarisation prompt is guesswork.\n\n### Density Threshold Calibration\n\n`density_threshold` in `ontology.yaml` (default 5) is architecturally necessary but empirically uncalibrated. Configurable without code changes; the right value for a given corpus requires empirical tuning.\n\n### Observability\n\nPer-save audit logging (§14) records gateway failures, missing entities, and Neo4j sync errors. What it does not provide is a system-level signal for whether consolidation is improving retrieval quality over time.\n\n---\n\n## 19. Development Roadmap — Multi-Agent Safe Workstation\n\nThis framework is actively evolving toward a workstation where any number of AI agents can read and write shared memory concurrently without corrupting each other's state, impersonating each other, or poisoning shared narratives. The table below tracks where that transition stands.\n\n### Completed\n\n| Phase | Milestone | Status |\n|---|---|---|\n| **Foundation** | Three-tier storage (Postgres + Neo4j), BGE-M3 gateway, consolidation daemon, save/search/graph CLI | ✅ Done |\n| **Consolidation pipeline** | LISTEN/NOTIFY trigger, explicit entity contract, gateway routing for re-embedding, cumulative narrative synthesis | ✅ Done |\n| **Coordinator** | asyncpg connection pool, per-entity `asyncio.Lock`, outbox pattern — all Postgres and Neo4j I/O centralised, ADR-001 cross-DB atomicity risk eliminated | ✅ Done |\n| **Concurrency hardening** | FOR UPDATE SKIP LOCKED, atomic retry increment, single UNWIND batch query, acquired-lock tracking, ON CONFLICT upsert for community_summaries, embedding refresh on re-save, LISTEN reconnect, event-loop non-blocking poll | ✅ Done |\n| **Security baseline** | Read-only Cypher guard, localhost-only bind (PROXY_BIND opt-in), opaque error responses, bounded limit, ONT label validation at startup, prompt injection delimiters | ✅ Done |\n| **Configurable ontology — Path A** | All Neo4j labels and relationship types in `ontology.yaml`; ONT singleton with validation; falls back to hardcoded defaults; density threshold configurable | ✅ Done |\n| **Agent integration** | Claude Code, Grok, Gemini CLI, LM Studio (MCP), Codex CLI — all 5 agents live, SKILL.md carries YAML frontmatter for implicit Codex invocation, `AGENTS.md` project context file added | ✅ Done |\n| **Schema migrations** | Migration runner; 001 (multi-agent schema: agent_id, scope, visibility, neo4j_outbox); 002 (concurrency hardening: unique index on community_summaries, covering index on outbox); 003 (source provenance: `source_pg_ids integer[]` on community_summaries, back-fill from metadata) | ✅ Done |\n| **Provenance layer — Phase A** | PROV-O-inspired ontology: 6 new node labels (`Decision`, `Human`, `AIAgent`, `Project`, `Activity`, `Milestone`) and 8 provenance relationships (`WAS_ATTRIBUTED_TO`, `WAS_ASSISTED_BY`, `WAS_GENERATED_BY`, `PROJECT_OF`, `ACTED_ON_BEHALF_OF`, `SUPERSEDES`, `INFORMED_BY`, `HAD_OUTCOME`). Coordinator ingress validates `type:decision` saves (rejects missing `decided_by` / `project` / `rationale` before the row touches the outbox WAL). Outbox dispatches decision rows to a dedicated `_apply_decision_outbox_row` that materialises the full PROV-O subgraph in a single atomic Neo4j session. Plain `Fact` saves unchanged. | ✅ Done |\n| **Provenance layer — Phase B** | `save_decision` subcommand in `memory_bridge.py` (named flags — `--title`, `--decided-by`, `--project`, `--rationale` required; `--assisted-by`, `--alternatives`, `--confidence`, `--entities` optional) and `save_decision` MCP tool in `vector-skill.py`. `build_decision_metadata()` pure helper. `--version` flag added to `memory_bridge.py`. | ✅ Done |\n| **Three-test fixes (v0.3.1)** | Retrieval visibility: search results carry `tier`, `score_normalized` (sigmoid), `matched_entities`, structured `graph_context` list. Consolidation history: `summary_history JSONB` column on `community_summaries` (migration 004) — prior summary appended before each `DO UPDATE`, capped at 20. Lineage: `source_ref` optional metadata key flows from coordinator to Neo4j `Fact.source_ref` property. 14 new tests added. `schema.md` \"appends new rows\" inaccuracy corrected. | ✅ Done |\n\n### In Progress / Planned\n\n| Phase | Milestone | Notes |\n|---|---|---|\n| **Provenance layer — Phase C** | Retrospective layer: `HAD_OUTCOME` edge written as a dated edge property (not a node) so lineage is preserved without node explosion; Why-To loop — agents query past retrospectives before executing new work in the same area | Phase B is the prerequisite. |\n| **Provenance layer — Phase D** ✅ | Four named query shortcuts in `memory_bridge.py query \u003ctemplate\u003e`: `who-decided`, `agent-decisions`, `retrospectives`, `why-to-check`. Filter values sanitised before Cypher interpolation. Raw `graph` subcommand preserved for custom traversals. SKILL.md Task 3 restructured to document both paths. 7 new tests — 91 total. | v0.3.3. |\n| **Provenance layer — Phase E** | Separate `pruning_loop.py` on a slow cron; enforces the information foraging heuristic (save if retrieval utility + decision impact \u003e storage cost); `type:decision` and `decision_impact`-flagged rows are unconditionally shielded; plain facts compete on retrieval frequency × age | Decoupled from the consolidation daemon — different cadence. |\n| **Agent authentication (Phase 2C)** | `AGENT_TOKENS` env var; `Authorization: Bearer \u003ctoken\u003e` middleware; server-side `agent_id` enforcement; scope isolation by verified identity | Next security PR. Format documented in `.env.example`. Requires coordinated rollout across agents. |\n| **Ontology as graph (Path B)** | Bootstrap `(:Class)` nodes + `SCO` relationships from `ontology.yaml` into Neo4j on startup; replace `ONT.*` string constants with startup-cached dict read from graph; enables live ontology inspection and Neosemantics (n10s) forward compatibility | Path A is the prerequisite ✅. Does not replace `ontology.yaml` — yaml stays the human-editable source; graph is a materialised copy. |\n| **Entity type enrichment** | Apply Neo4j multi-label to distinguish entity kinds — `:Entity:Person`, `:Entity:System`, `:Entity:Tool`, `:Entity:Decision` etc. — without breaking existing queries | Path A + Path B are the prerequisites. Enables richer graph traversal and type-aware consolidation clustering. |\n| **Entity resolution** | Detect and merge synonymous Entity nodes (`\"hive_mind_proxy\"` ≡ `\"Hive-Mind Gateway\"`); maintain a canonical name + alias set; re-link Fact nodes on merge | The entity contract (explicit caller-supplied names) makes this tractable. Implementation is a background reconciliation job, not a save-path change. |\n| **Horizontal agent expansion** | Packaging guides and integration templates for additional agent types (VS Code extensions, Claude Desktop, any MCP-capable tool, REST-only agents) | The coordinator's HTTP API is already agent-agnostic. New agents require packaging only — no backend changes. |\n| **Ingestion boundary sanitisation** | Trust-tier tagging for web-retrieved content; strip instructional patterns; quarantine external facts before Tier 3 promotion | Security prerequisite for ingesting external content at volume. |\n| **Counterfactual simulation pass** | Before committing a consolidated narrative, verify every claim traces to a source Fact node; reject narratives that introduce unsourced claims | Completes the stored-injection defence. |\n| **Python packaging** | Rename `shared-memory/` → `shared_memory/`, add `__init__.py` files and `pyproject.toml`; replace `sys.path` hack in `vector-skill.py` with `from shared_memory.scripts.ontology import ONT` | Low urgency; enables clean imports when the codebase grows. |\n\n---\n\n## 20. References\n\n- **AI Memory \u0026 Cognition: The Architect's Playbook** (Vishakha Gupta, ApertureData, May 2026) — Proposes the KMC Blueprint (Knowledge · Memory · Context) and the three diagnostic tests used in the [§1 Vision](#1-the-vision-one-brain-many-agents) section: Retrieval, Consolidation, and Lineage. [aperturedata.io/resources/ai-memory-cognition-the-architects-playbook](https://www.aperturedata.io/resources/ai-memory-cognition-the-architects-playbook)\n- **The Geometry of Forgetting** (Barman et al., 2026) — *Exposing the Dimensionality Illusion*. arXiv:2604.06222\n- **The Geometry of Consolidation** (Vangara \u0026 Gopinath, 2026) — NeurIPS 2026 submission. Proves centroid averaging collapses retrieval identity.\n- **Active Dreaming Memory (ADM)** (Dudekula Kasim Vali, 2025) — Biologically-Inspired Episodic Consolidation. engrXiv preprint, DOI: 10.31224/5919\n- **Complementary Learning Systems** (McClelland, McNaughton \u0026 O'Reilly, 1995) — *Psychological Review* 102(3):419–457\n\n---\n\n*Neo4j · PostgreSQL/pgvector · BGE-M3 · aiohttp · FastMCP · Docker*\n\n---\n\n## Connect\n\nIf this framework is useful to you, or you are building something in the same space — local AI memory, multi-agent architectures, or knowledge graph systems — I would be glad to connect.\n\nI write about these projects and the ideas behind them on LinkedIn and X. Follow for articles, updates, and the reasoning behind architectural decisions that do not fit in a README.\n\n- **LinkedIn:** [linkedin.com/in/xsmotsenigos](https://www.linkedin.com/in/xsmotsenigos/)\n- **X:** [x.com/xsmotsenigos](https://x.com/xsmotsenigos/)\n\n---\n\nCopyright 2026 Xenofon S. Motsenigos. Licensed under the [Apache License, Version 2.0](LICENSE).\nIf you reuse or build on this work, attribution to the original author is appreciated.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkanenasingreece%2Fshared_memory","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkanenasingreece%2Fshared_memory","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkanenasingreece%2Fshared_memory/lists"}