{"id":31900953,"url":"https://github.com/aaronsb/knowledge-graph-system","last_synced_at":"2025-10-13T12:45:05.037Z","repository":{"id":318367854,"uuid":"1070011302","full_name":"aaronsb/knowledge-graph-system","owner":"aaronsb","description":null,"archived":false,"fork":false,"pushed_at":"2025-10-06T20:02:47.000Z","size":911,"stargazers_count":0,"open_issues_count":11,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-06T20:34:27.378Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aaronsb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-05T04:33:16.000Z","updated_at":"2025-10-06T20:02:50.000Z","dependencies_parsed_at":"2025-10-06T20:34:31.373Z","dependency_job_id":"0a4c4ebd-3ad0-4c47-bdfa-9ed7d378bf56","html_url":"https://github.com/aaronsb/knowledge-graph-system","commit_stats":null,"previous_names":["aaronsb/knowledge-graph-system"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/aaronsb/knowledge-graph-system","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fknowledge-graph-system","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fknowledge-graph-system/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fknowledge-graph-system/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fknowledge-graph-system/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aaronsb","download_url":"https://codeload.github.com/aaronsb/knowledge-graph-system/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aaronsb%2Fknowledge-graph-system/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279015063,"owners_count":26085643,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-13T02:00:06.723Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-13T12:45:00.498Z","updated_at":"2025-10-13T12:45:05.029Z","avatar_url":"https://github.com/aaronsb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Knowledge Graph System\n\n**Transform documents into queryable concept networks. Not retrieval - understanding.**\n\n## What This Does\n\nThis system extracts concepts and relationships from documents, building a persistent knowledge graph you can explore semantically. Instead of searching for similar text, you discover how ideas connect across your entire corpus.\n\nFeed it research papers, meeting notes, code commits, or philosophical texts. The system identifies concepts, understands their relationships, and preserves evidence trails back to source material. Query by meaning, not keywords. Traverse connections between ideas, not just similarity scores.\n\n**The difference matters:** Traditional RAG retrieves text chunks that match your query. Knowledge graphs reveal how concepts *relate* - what enables what, what contradicts what, what emerges from what. The graph grows smarter with each document, automatically connecting new concepts to existing knowledge.\n\n**Built on:** Apache AGE (PostgreSQL graph extension), FastAPI REST architecture, modular LLM providers (OpenAI/Anthropic), TypeScript client tooling, and Model Context Protocol integration.\n\n## How It Works\n\nDocuments flow through smart chunking that respects natural boundaries\n  ↓\nLLM extraction identifies concepts, relationships, and evidence quotes\n  ↓\nGraph construction in PostgreSQL with Apache AGE extension\n  ↓\nVector embeddings enable semantic search across concepts\n  ↓\nQuery interface reveals connections and provides provenance\n\n**The iterative pattern:** Each chunk queries recent concepts before processing. The LLM sees what the graph already knows, enabling cross-chunk relationship detection. Early chunks populate the graph. Later chunks connect to existing concepts. Hit rates climb from 0% to 60%+ as the graph learns your domain.\n\n**Multi-document synthesis:** Concepts automatically merge across files when semantically similar. A term mentioned in chapter 1 links to the same concept in chapter 10, even across different documents in the same ontology.\n\n## Why This Matters\n\nYou've invested time (and API tokens) extracting knowledge from documents. Traditional systems rebuild that understanding on every query. This system *remembers*.\n\n**Persistent concept extraction** → Ideas become first-class entities with labels, search terms, and relationships\n\n**Relationship modeling** → Concepts ENABLE, SUPPORT, CONTRADICT, IMPLY each other with confidence scores\n\n**Graph traversal** → Explore connections between ideas across document boundaries\n\n**Evidence provenance** → Every concept links to source quotes with paragraph references\n\n**Cross-ontology enrichment** → Ingest related documents into different ontologies; shared concepts bridge them naturally\n\n**Time as emergent property** → Causal relationships (CAUSES, RESULTS_FROM, ENABLES) create observable time arrows without explicit timestamps\n\n## Quick Start\n\n**Prerequisites:** Docker, Python 3.11+, Node.js 18+\n\n```bash\n# 1. Setup infrastructure\n./scripts/setup.sh\n\n# 2. Configure AI provider\n./scripts/configure-ai.sh\n\n# 3. Start API server\nsource venv/bin/activate\nuvicorn src.api.main:app --reload --port 8000\n\n# 4. Install TypeScript client\ncd client \u0026\u0026 npm install \u0026\u0026 npm run build \u0026\u0026 ./install.sh \u0026\u0026 cd ..\n\n# 5. Ingest documents\nkg ingest file document.txt --ontology \"My Research\"\n\n# 6. Query concepts\nkg search query \"recursive patterns\"\nkg ontology list\nkg database stats\n```\n\n**For Claude Desktop/Code integration:** See [MCP Setup Guide](docs/guides/MCP_SETUP.md)\n\n## Live Example\n\nAfter ingesting project commit history and pull requests into separate ontologies:\n\n```bash\n# Search across both ontologies\nkg search query \"Apache AGE migration\"\n\n# Result: \"Apache AGE Migration\" concept\n#   - 6 evidence instances\n#   - Found in: \"Knowledge Graph Project History\" (commits)\n#   - Found in: \"Knowledge Graph Project Pull Requests\" (PRs)\n#   - Relationships:\n#       ENABLES → RBAC Capabilities\n#       PREVENTS → Dual Database Complexity\n#       RESULTS_FROM → Unified Architecture\n```\n\nThe system understood commits and PRs describe the same architectural change from different perspectives. It merged evidence, enriched relationships, and revealed the strategic narrative without explicit linking.\n\n## When To Use This\n\n**Research exploration** → Navigate philosophical texts by concept relationships, not linear reading\n\n**Codebase understanding** → Trace architectural decisions across commits, PRs, and documentation\n\n**Meeting analysis** → Extract action items, decisions, and dependencies across discussion threads\n\n**Knowledge synthesis** → Discover connections between documents you didn't know were related\n\n**Historical reconstruction** → Build timelines from causal relationships (CAUSES, PRECEDES, EVOLVES_INTO)\n\n**Financial analysis** → Track entities and relationships across transaction records\n\n**Travel journals** → Map locations, experiences, and insights across trip logs\n\nThe pattern generalizes: any structured record content can become a queryable knowledge graph.\n\n## Architecture Highlights\n\n- **Apache AGE (PostgreSQL extension)** - Graph database with openCypher query support and production RBAC\n- **Unified PostgreSQL architecture** - Graph data, job queue, and application state in one database\n- **Job approval workflow** - Pre-ingestion cost estimates, manual or auto-approval, lifecycle management\n- **Modular AI providers** - Swap between OpenAI, Anthropic, or implement custom extractors\n- **Content deduplication** - SHA-256 hashing prevents reprocessing identical documents\n- **Ontology management** - Group related documents; rename or delete with cascading job cleanup\n- **Vector search + graph traversal** - Semantic similarity finds concepts, relationships explain connections\n- **Evidence preservation** - Every concept links to source quotes with document and paragraph references\n- **TypeScript client** - Unified CLI and future MCP server mode for multi-agent access\n- **Dry-run capabilities** - Preview ingestion operations before committing API tokens\n\n## What Makes This Different\n\nNot a vector database. Not a new embedding model. A synthesis:\n\n**LLM-powered extraction** → Understands concepts and relationships, not just word patterns\n\n**Graph storage** → Models how ideas connect, not just where they appear\n\n**Evidence-based retrieval** → Provides source quotes with provenance, not isolated chunks\n\n**Persistent knowledge** → Builds understanding over time, not ephemeral query-time synthesis\n\n**Multi-dimensional querying** → Semantic search finds concepts, graph traversal explains relationships\n\n**Emergent temporal structure** → Causal relationships create observable time arrows without explicit ordering\n\n## Technology Stack\n\n- **PostgreSQL 16 + Apache AGE** - Graph database with openCypher support\n- **FastAPI** - Async REST API server with job queue\n- **Python 3.11+** - Ingestion pipeline, LLM extraction, graph operations\n- **TypeScript/Node.js 18+** - Unified client (CLI + future MCP mode)\n- **OpenAI / Anthropic** - Modular LLM provider abstraction\n- **Docker Compose** - Infrastructure orchestration\n\n## Current Status\n\n**Working (Phase 1):**\n- ✅ Apache AGE graph database with vector search\n- ✅ FastAPI REST API with async job queue\n- ✅ TypeScript CLI (`kg` command)\n- ✅ Background processing with progress tracking\n- ✅ Content-based deduplication (SHA-256)\n- ✅ Cost tracking and pre-ingestion estimates\n- ✅ Job approval workflow with auto-approve option\n- ✅ Ontology management (create, rename, delete with cascade)\n- ✅ Dry-run mode for directory ingestion\n\n**Planned (Phase 2):**\n- [ ] MCP server mode in unified TypeScript client\n- [ ] Graph query endpoints in API\n- [ ] Real-time updates (WebSocket/SSE)\n- [ ] API authentication \u0026 authorization\n- [ ] Rate limiting \u0026 request validation\n\n**Future Explorations:**\n- [ ] Advanced graph algorithms (PageRank, community detection)\n- [ ] Web UI for visual graph exploration\n- [ ] Export to GraphML/JSON formats\n- [ ] Incremental updates (avoid reprocessing)\n- [ ] Phrase-based path finding between concepts\n\n## Learn More\n\nNavigate the documentation by purpose:\n\n**Getting Started:**\n- [Quick Start Guide](docs/guides/QUICKSTART.md) - Get running in 5 minutes\n- [MCP Setup Guide](docs/guides/MCP_SETUP.md) - Claude Desktop/Code integration\n- [AI Provider Configuration](docs/guides/AI_PROVIDERS.md) - OpenAI, Anthropic, or custom\n\n**Understanding the System:**\n- [Architecture Overview](docs/architecture/ARCHITECTURE.md) - How components fit together\n- [Concept Deep Dive](docs/reference/CONCEPT.md) - Why knowledge graphs vs RAG\n- [Enrichment Journey](docs/reference/ENRICHMENT_JOURNEY.md) - How the graph learns from multiple perspectives\n- [Concepts \u0026 Terminology](docs/reference/CONCEPTS_AND_TERMINOLOGY.md) - Ontologies, stitching, pruning, integrity\n\n**Using the System:**\n- [Examples \u0026 Demos](docs/guides/EXAMPLES.md) - Real queries with actual results\n- [Backup \u0026 Restore](docs/guides/BACKUP_RESTORE.md) - Protecting your token investment\n- [Documentation Index](docs/README.md) - Browse all documentation by category\n\n**Technical Details:**\n- [ADR-016: Apache AGE Migration](docs/architecture/ADR-016-apache-age-migration.md) - Why PostgreSQL + AGE\n- [ADR-014: Job Approval Workflow](docs/architecture/ADR-014-job-approval-workflow.md) - Ingestion lifecycle\n- [Development Guide](CLAUDE.md) - For contributors and developers\n\n## Project Structure\n\n```\nknowledge-graph-system/\n├── src/api/              # FastAPI REST server\n│   ├── lib/              # Shared ingestion library\n│   │   ├── ai_providers.py    # Modular LLM abstraction\n│   │   ├── llm_extractor.py   # Concept extraction\n│   │   ├── age_client.py      # Apache AGE operations\n│   │   └── ingestion.py       # Chunk processing\n│   ├── routes/           # REST API endpoints\n│   ├── services/         # Job queue, scheduler, deduplication\n│   └── workers/          # Background ingestion workers\n│\n├── client/               # TypeScript unified client\n│   └── src/\n│       ├── cli/          # CLI commands\n│       ├── api/          # HTTP client\n│       └── mcp/          # MCP server mode (future)\n│\n├── scripts/              # Management utilities\n│   ├── setup.sh          # Infrastructure setup\n│   ├── start-api.sh      # Start API server\n│   └── configure-ai.sh   # AI provider config\n│\n├── schema/\n│   └── init.sql          # Apache AGE schema\n│\n└── docs/                 # Documentation\n    ├── architecture/     # ADRs and design\n    ├── guides/          # User guides\n    ├── reference/       # Concepts and terminology\n    └── development/     # Dev journals\n```\n\n## Contributing\n\nThis is an experimental exploration of knowledge graphs, LLM extraction, and semantic understanding. Feedback, issues, and contributions welcome.\n\n## Acknowledgments\n\nBuilt with:\n- [Apache AGE](https://age.apache.org/) - PostgreSQL graph extension\n- [Model Context Protocol](https://modelcontextprotocol.io/) - LLM integration standard\n- [OpenAI](https://openai.com/) / [Anthropic](https://anthropic.com/) - LLM providers\n- [FastAPI](https://fastapi.tiangolo.com/) - Modern Python API framework\n\n---\n\n*Not just retrieval. Understanding.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronsb%2Fknowledge-graph-system","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faaronsb%2Fknowledge-graph-system","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faaronsb%2Fknowledge-graph-system/lists"}