{"id":15020151,"url":"https://github.com/jallermax/knowledge-nexus","last_synced_at":"2025-04-09T19:50:25.232Z","repository":{"id":249673830,"uuid":"819614193","full_name":"Jallermax/knowledge-nexus","owner":"Jallermax","description":"GraphRAG for Second Brain. Ingest knowledge -\u003e build knowledge graphs -\u003e Query relevant knowledge | Explore connections","archived":false,"fork":false,"pushed_at":"2025-02-10T15:11:59.000Z","size":2942,"stargazers_count":14,"open_issues_count":5,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-23T21:45:50.282Z","etag":null,"topics":["graphrag","graphs","knowledge-graph","notion","second-brain"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jallermax.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-24T21:42:09.000Z","updated_at":"2025-03-03T06:03:15.000Z","dependencies_parsed_at":"2024-07-30T00:52:15.842Z","dependency_job_id":"b8df7e7d-db7b-418c-8988-aefe2d4b08d8","html_url":"https://github.com/Jallermax/knowledge-nexus","commit_stats":null,"previous_names":["jallermax/knowledge-nexus"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jallermax%2Fknowledge-nexus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jallermax%2Fknowledge-nexus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jallermax%2Fknowledge-nexus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jallermax%2Fknowledge-nexus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jallermax","download_url":"https://codeload.github.com/Jallermax/knowledge-nexus/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103807,"owners_count":21048238,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graphrag","graphs","knowledge-graph","notion","second-brain"],"created_at":"2024-09-24T19:54:38.975Z","updated_at":"2025-04-09T19:50:25.208Z","avatar_url":"https://github.com/Jallermax.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Knowledge Nexus: Your AI-Powered Personal Knowledge Discovery Engine\n[![CI](https://github.com/Jallermax/knowledge-nexus/actions/workflows/ci.yml/badge.svg)](https://github.com/Jallermax/knowledge-nexus/actions/workflows/ci.yml)\n\n## 🛠 Getting Started\n\n### Running Data Ingestion:\n\n#### Using Python environment\n\n1. install neo4j\n2. install python\n3. make and configure `.env` in the root directory from `.env.example`\n4. adjust options in `config/config.yaml` if necessary\n5. `pip install -r requirements.txt`\n6. `python main.py`\n\n#### Alternative: Using docker-compose \n\n1. install docker and docker-compose\n2. make and configure `.env` in the root directory from `.env.example`\n3. adjust options in `config/config.yaml` if necessary\n4. run `docker-compose up -d --build` from the root\n\u003c/br\u003e\n\n\u003e ⚠️ Current cache limitations:\n\u003e - **Notion-API cache:** Designed for session scope caching, using FS cache with long TTL will prevent fetching updated pages\n\u003e - **Processed pages and links cache:** Designed for rapid test and development. Prevents sync or removal of already processed and cached pages and links from the graph\n\n### Running Q\u0026A app:\n\n1. Prerequisites: running Neo4j instance with processed data\n2. `pip install -r requirements.txt`\n3. `python -m streamlit run app_st.py`\n\n## 🌟 Project Overview\n\nKnowledge Nexus is an advanced personal knowledge management system that transforms the way individuals organize,\nprocess, and discover insights from their digital content. By leveraging the power of AI and graph databases, this\nproject addresses the challenge of information overload and disconnected data silos that many knowledge workers face in\ntoday's digital landscape.\n\nUnlike traditional note-taking or knowledge management tools that rely heavily on manual organization, Knowledge Nexus\nautomates the process of extracting key concepts, generating insights, and creating meaningful connections across your\npersonal knowledge base.\n\n1. Data Ingestion:\n![ingestion visualization](/docs/ingestion.png)\n2. Talking to your data graph: \n![ingestion visualization](/docs/streamlit.png)\n\n### High-level architecture:\n```mermaid\nflowchart TB\n\n    subgraph IngestionApp [\"Ingestion Module\"]\n        direction LR\n\t    subgraph Ingestion [\"Pluggable Data providers\"]\n\t        direction TB\n\t        NotionProvider\u003e\"Notion Provider\"]\n\t        TodoistProvider\u003e\"Todoist Provider\"]\n\t        WebProvider\u003e\"Web Scrapper\"]\n\t        CustomProviders\u003e\"Custom Providers\"]\n\t    end\n\n\t    subgraph ProcessingPipeline [\"Processing Pipeline\"]\n\t        direction LR\n\t        Chunking[[\"Content Chunking\"]]\n\t        EntityExtraction[[\"Entity Extraction\"]]\n\t        TopicModeling[[\"Topic Modeling\"]]\n\t        Clusterization[[\"Clusterization\"]]\n\t        Embedding[[\"Embedding Generation\"]]\n\t    end\n\n\t    UnifiedData[/\"Raw Graph\n\t    (based on data structure)\"/]\n\n\t    KnowledgeGraph[/\"Enriched Knowledge Graph\n\t    (structure + semantic relations)\"/]\n\n    end\n    subgraph QnAApp [\"Q\u0026A Module\"]\n        QueryProcessor[\"Query Processor\"]\n        StreamlitInterface[\"Streamlit Interface\"]\n    end\n\n    subgraph DataSources [\"External Data Sources\"]\n        Notion[\"Notion API\"]\n        APIs[\"Other APIs/resources\"]\n    end\n\n    subgraph Flow [\"User Flow\"]\n\t    User(\"👤 2. User\")\n\t    Prepare(\"🧠📩 1. Prepare knowledge\")\n    end\n    Neo4j[(Neo4j Graph Database)]\n\n\n    %% Connections\n    Prepare--\u003e|Initiate Data Ingestion|Ingestion\n    DataSources--\u003e|Fetching Data|Ingestion\n    Ingestion--\u003eUnifiedData\n    UnifiedData--\u003eProcessingPipeline\n    ProcessingPipeline--\u003eKnowledgeGraph\n    KnowledgeGraph--\u003eNeo4j\n    QueryProcessor\u003c--\u003eNeo4j\n    StreamlitInterface\u003c--\u003eQueryProcessor\n    User\u003c--\u003e|Asks question|StreamlitInterface\n\n\n    %% Legend\n    subgraph Legend\n        Implemented[\"Implemented\"]\n        Future[\"Planned\"]\n    end\n\n    %% Styling\n    classDef implemented fill:#90EE90,stroke:#333,color:#000,stroke-width:2px;\n    classDef future fill:#FFB6C1,stroke:#333,color:#000,stroke-width:2px,stroke-dasharray: 5 5;\n    classDef transparent fill:#E6E6FA,fill-opacity:0.1,stroke:#333,stroke-width:5px;\n\n    class Prepare,Notion,NotionProvider,Chunking,Embedding,QueryProcessor,KnowledgeGraph,Neo4j,User,UnifiedData,StreamlitInterface implemented;\n    class Todoist,TodoistProvider,APIs,CustomProviders,Web,WebProvider,EntityExtraction,TopicModeling,Clusterization future;\n    class Implemented implemented;\n    class Future future;\n    class IngestionApp,QnAApp,Flow transparent;\n```\n\n## 🎯 Key Challenges Addressed\n\n1. **Information Overload**: Knowledge Nexus cuts through the noise by automatically extracting key entities and\n   insights from various content sources, helping you focus on what's important.\n\n2. **Manual Processing Overhead**: Traditional tools require significant manual effort to organize and connect\n   information. Knowledge Nexus automates this process, saving you time and cognitive effort.\n\n3. **Limited Contextual Understanding**: While tools like Obsidian or Roam Research rely on explicit links, Knowledge\n   Nexus uses AI to understand semantic and topical relationships, creating a richer, more nuanced knowledge graph.\n\n4. **Disconnected Data Silos**: By importing and processing data from various sources into a single, interconnected\n   knowledge graph, Knowledge Nexus bridges the gaps between your different information repositories.\n\n5. **Difficulty in Discovering New Connections**: The AI-powered system can uncover non-obvious relationships between\n   different pieces of information, potentially leading to new insights or ideas that you might have missed.\n\n## 🚀 Key Features\n\n- **Multi-Source Data Integration**: Import content from Notion, Pocket, web pages, and more (extensible architecture\n  for adding new sources).\n- **AI-Powered Entity and Topic Extraction**: Automatically identify and extract key entities and topics from processed\n  content.\n- **Intelligent Insight Generation**: Leverage AI to generate concise insights from your personal knowledge base (PKMS).\n- **Semantic Knowledge Graph Construction**: Build a comprehensive, interconnected graph of entities, topics, and\n  content using Neo4j, reflecting not just explicit links but semantic relationships.\n- **Contextual Querying and Exploration**: Easily retrieve relevant content and explore connections within your\n  knowledge graph.\n- **Personalized Knowledge Assistant**: Tailored to your specific needs and preferences, helping you find tools,\n  frameworks, and best practices aligned with your views.\n\n## 📊 Project Status and Roadmap\n\n### ✅ Implemented\n- Modular Pipeline for data ingestion, processing, and graph building with configurable caching of processed data.\n- [Notion API](https://developers.notion.com/reference/get-database) integration with configurable request caching: Successfully ingesting documents from Notion Knowledge Base (all pages or from specified root page). Repeated ingestion will process only updated pages. \n- Basic Graph Construction: Creating graph connections based on knowledge base organizational structure and explicit page mentions.\n- Semantic Search: Implemented content embeddings for advanced search capabilities.\n- Basic Streamlit app for querying the graph and visualizing connections.\n\n\u003cdetails\u003e \n   \u003csummary\u003eClick to see supported Notion Links 🔗\u003c/summary\u003e\n   \u003cbr\u003e\n\n| Type                                     | Parse Markdown Text | Parse References | Recursive Parsing |\n|------------------------------------------|:-------------------:|:----------------:|:-----------------:|\n| **Page Properties**                      |\n| Title                                    |          ✅          |        ✅         |         ✅         |\n| Rich Text                                |          ✅          |        ✅         |         ✅         |\n| Select                                   |          ✅          |       N/A        |        N/A        |\n| Status                                   |          ✅          |       N/A        |        N/A        |\n| Multi-select                             |          ✅          |       N/A        |        N/A        |\n| Number                                   |          ✅          |       N/A        |        N/A        |\n| Date                                     |          ✅          |       N/A        |        N/A        |\n| People                                   |          ✅          |       N/A        |        N/A        |\n| Files                                    |          ✅          |        ❌         |        N/A        |\n| Checkbox                                 |          ✅          |       N/A        |        N/A        |\n| URL                                      |          ✅          |        ✅         |         ❌         |\n| Email                                    |          ✅          |       N/A        |        N/A        |\n| Phone Number                             |          ✅          |       N/A        |        N/A        |\n| Formula                                  |          ✅          |       N/A        |        N/A        |\n| Relation                                 |          ✅          |        ✅         |         ✅         |\n| Rollup                                   |          ✅          |       N/A        |        N/A        |\n| Created Time                             |          ✅          |       N/A        |        N/A        |\n| Created By                               |          ✅          |       N/A        |        N/A        |\n| Last Edited Time                         |          ✅          |       N/A        |        N/A        |\n| Last Edited By                           |          ✅          |       N/A        |        N/A        |\n| Unique ID                                |          ✅          |       N/A        |        N/A        |\n| Verification                             |          ✅          |       N/A        |        N/A        |\n| **Database Properties**                  |\n| Title                                    |          ✅          |        ❌         |         ❌         |\n| Rich Text                                |         N/A         |       N/A        |        N/A        |\n| Select                                   |          ❌          |       N/A        |        N/A        |\n| Multi-select                             |          ❌          |       N/A        |        N/A        |\n| Date                                     |         N/A         |       N/A        |        N/A        |\n| People                                   |         N/A         |       N/A        |        N/A        |\n| Files                                    |         N/A         |       N/A        |        N/A        |\n| Checkbox                                 |         N/A         |       N/A        |        N/A        |\n| URL                                      |         N/A         |       N/A        |        N/A        |\n| Email                                    |         N/A         |       N/A        |        N/A        |\n| Phone Number                             |         N/A         |       N/A        |        N/A        |\n| Formula                                  |         N/A         |       N/A        |        N/A        |\n| Relation                                 |          ❌          |        ❌         |         ❌         |\n| Rollup                                   |         N/A         |       N/A        |        N/A        |\n| Created Time                             |          ❌          |       N/A        |        N/A        |\n| Created By                               |          ❌          |       N/A        |        N/A        |\n| Last Edited Time                         |          ❌          |       N/A        |        N/A        |\n| Last Edited By                           |          ❌          |       N/A        |        N/A        |\n| **Blocks**                               |\n| Paragraph                                |          ✅          |        ✅         |         ✅         |\n| Heading 1                                |          ✅          |        ✅         |         ✅         |\n| Heading 2                                |          ✅          |        ✅         |         ✅         |\n| Heading 3                                |          ✅          |        ✅         |         ✅         |\n| Bulleted List Item                       |          ✅          |        ✅         |         ✅         |\n| Numbered List Item                       |          ✅          |        ✅         |         ✅         |\n| To-do                                    |          ✅          |        ✅         |         ✅         |\n| Toggle                                   |          ✅          |        ✅         |         ✅         |\n| Code                                     |          ✅          |        ✅         |        N/A        |\n| Quote                                    |          ✅          |        ✅         |         ✅         |\n| Callout                                  |          ✅          |        ✅         |         ✅         |\n| Mention (except mentions of page blocks) |          ✅          |        ✅         |        N/A        |\n| Equation                                 |          ✅          |       N/A        |        N/A        |\n| Bookmark                                 |          ✅          |        ✅         |        N/A        |\n| Image                                    |          ✅          |        ❌         |        N/A        |\n| Video                                    |          ✅          |        ❌         |        N/A        |\n| Audio                                    |          ✅          |        ❌         |        N/A        |\n| File                                     |          ✅          |        ❌         |        N/A        |\n| PDF                                      |          ✅          |        ❌         |        N/A        |\n| Embed                                    |          ✅          |        ✅         |        N/A        |\n| Link Preview                             |          ✅          |        ✅         |        N/A        |\n| Divider                                  |          ✅          |       N/A        |        N/A        |\n| Table of Contents                        |          ✅          |       N/A        |        N/A        |\n| Breadcrumb                               |          ✅          |       N/A        |        N/A        |\n| Column List                              |          ✅          |       N/A        |        N/A        |\n| Column                                   |          ✅          |       N/A        |        N/A        |\n| Synced Block                             |          ✅          |        ✅         |         ✅         |\n| Template                                 |          ✅          |        ✅         |         ✅         |\n| Link to Page                             |          ✅          |        ✅         |         ✅         |\n| Table                                    |          ✅          |       N/A        |        N/A        |\n| Table Row                                |          ✅          |       N/A        |        N/A        |\n| Child Page                               |          ✅          |        ✅         |         ✅         |\n| Child Database (except linked and views) |          ✅          |        ✅         |         ✅         |\n| **Comments**                             |          ❌          |        ❌         |         ❌         |\n\n\u003c/details\u003e\n\n### 🛠️ In Development\n- Multi-Source Data Integration: Expanding beyond Notion to include Pocket, web pages, and more. \nMake these integrations easy to plug in. \n- Semantic Layer: Adding connections based on topics and ideas using semantic entity extraction\n  - Use core entity/node types (Page, Database, Topic, Person, Location) as well as domain-specific (Project, Task, Tool, Goal)  \n- Node Clustering: Implementing clustering for better organization and insight discovery.\n- Comprehensive RAG Mechanism: Developing an advanced retrieval-augmented generation system. \u003cdetails\u003e \u003csummary\u003eClick to see draft implementation details\u003c/summary\u003e\n  1. Generate query questions to the graph from user requests\n  2. Retrieve semantically similar pages\n  3. Fetch close neighbors of these pages based on semantic proximity\n  4. Provide LLM with context from the closest pages (semantically)\n  5. Visualize the graph showing found pages, their semantic scores, neighbors, connections, and topic clusters\u003c/details\u003e\n\n- Achieve 90%+ test coverage\n\n### 🔮 Future Plans\n- Streamlit chat interface with dashboard for visualizing insights and connections (InfraNodus-like).\n- Add cross-source coreference resolution to merge the same entities from different sources (leverage string matching, embedding similarity, and context analysis).\n  - disambiguate entities with the same name but different meanings. Consider entity context and graph relationships.\n- Add evaluation mechanism (langfuse?) for entity extraction and graph building with different models, contexts, and prompts.\n- Add evaluation mechanism (RAGAS?) for RAG with different embedding models, query generations, and retrieval flows.\n- Dynamic Topic and Cluster Recalculation: Efficiently update topics and clusters upon ingestion of new sources.\n- Advanced Visualization: Develop more sophisticated options for exploring the knowledge graph.\n- Self-hosted LLM Options: Provide alternatives to OpenAI's API for enhanced privacy.\n- Enhanced Personalization: Implement adaptive learning of user preferences and interests.\n- Implement token-cost [estimation](https://github.com/AgentOps-AI/tokencost).\n\n## 👥 Who Is It For?\n\nKnowledge Nexus is primarily designed for individual users who:\n\n- Deal with large amounts of information from various sources\n- Seek to uncover new insights and connections within their knowledge base\n- Want to reduce the cognitive overhead of manual knowledge management\n- Are looking for a personal research assistant to aid in complex tasks or decision-making\n\n## 📚 Resources and inspirations\n\n- [Awesome-LLM-KG](https://github.com/RManLuo/Awesome-LLM-KG) - A collection of papers and resources about unifying\n  large language models (LLMs) and knowledge graphs (KGs).\n- [GraphRAG](https://github.com/microsoft/graphrag) -Microsoft's GraphRAG research paper and implementation\n\n## 🤝 Contributing\n\nCurrently, Knowledge Nexus is a personal project, but ideas and suggestions are welcome! Feel free to open an issue for\ndiscussion or submit a pull request with proposed changes.\n\n## 🔒 Privacy and Data Handling\n\nKnowledge Nexus is designed with the privacy in mind. All data is stored locally on your machine. The only external\nservice used currently is OpenAI's API for AI processing, which is subject to their privacy policy and data handling practices.\nLater other LLM adapters will be added including adapters for self-hosted LLMs. \n\n---\n\nEmpower your mind, uncover hidden insights, and navigate your personal sea of knowledge with unprecedented ease. Welcome\nto Knowledge Nexus – where your information comes to life!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjallermax%2Fknowledge-nexus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjallermax%2Fknowledge-nexus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjallermax%2Fknowledge-nexus/lists"}