{"id":24244467,"url":"https://github.com/kuzudb/graph-rag-workshop","last_synced_at":"2025-06-20T12:15:23.693Z","repository":{"id":259258284,"uuid":"877020622","full_name":"kuzudb/graph-rag-workshop","owner":"kuzudb","description":"Graph RAG workshop using Kùzu and LanceDB for hybrid RAG","archived":false,"fork":false,"pushed_at":"2024-11-27T20:15:47.000Z","size":4311,"stargazers_count":8,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-10T02:52:27.403Z","etag":null,"topics":["graph-rag","graphdb","hybrid-rag","kuzu","kuzudb","lancedb","vector-search"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kuzudb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-23T00:24:08.000Z","updated_at":"2024-12-29T05:31:52.000Z","dependencies_parsed_at":"2024-10-23T23:31:51.874Z","dependency_job_id":"2bc7a633-7640-4c34-8c4c-f76d61ba427c","html_url":"https://github.com/kuzudb/graph-rag-workshop","commit_stats":null,"previous_names":["kuzudb/google-devfest-graph-rag","kuzudb/graph-rag-workshop"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuzudb%2Fgraph-rag-workshop","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuzudb%2Fgraph-rag-workshop/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuzudb%2Fgraph-rag-workshop/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuzudb%2Fgraph-rag-workshop/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kuzudb","download_url":"https://codeload.github.com/kuzudb/graph-rag-workshop/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233971372,"owners_count":18759212,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graph-rag","graphdb","hybrid-rag","kuzu","kuzudb","lancedb","vector-search"],"created_at":"2025-01-14T22:49:00.328Z","updated_at":"2025-01-14T22:49:02.630Z","avatar_url":"https://github.com/kuzudb.png","language":"Python","readme":"# Graph RAG and Hybrid RAG Workshop\n\nWorkshop on showing the benefits of Graph RAG and its combination with Vector RAG (Hybrid RAG).\n\nThe following stack is used:\n\n- Graph database: [Kùzu](https://kuzudb.com/)\n- Vector database: [LanceDB](https://lancedb.com/)\n- LLM prompting: [ell](https://docs.ell.so/), a language model programming framework\n- Embedding model: OpenAI `text-embedding-3-small`\n- Entity \u0026 relationship extraction: [LlamaIndex](https://docs.llamaindex.ai/) + OpenAI `gpt-4o-mini`\n- Generation model: OpenAI `gpt-4o-mini`\n- Reranking: Cohere [reranker](https://docs.cohere.com/v2/reference/rerank)\n\nThe system we'll be building has the following high-level architecture:\n\n![](./assets/hybrid-rag.png)\n\n## Dataset\n\nThe dataset used in this workshop is the [BlackRock founders dataset](./data/blackrock), which\nare three small text files containing information about the founders of the asset management firm\nBlackRock.\n\nThe aim of the workshop is to show how we can build a hybrid RAG system that utilizes a graph\ndatabase and a vector database to answer questions about the dataset.\n\n## Setup environment\n\nWe will be using the Python API of Kùzu and a combination of scripts that utilize the required\ndependencies.\n\n### `uv` package manager\n\nIt's recommended to use Astral's [`uv` package manager](https://docs.astral.sh/uv/) to manage both\nPython and its dependencies. You can install the required version of Python (3.12) using `uv` with\nthe following command:\n\n```bash\nuv python install 3.12\n```\n\nAll the dependencies are indicated in the `pyproject.toml` file and the associated `uv.lock` file\nprovided in this repo. Simply sync the dependencies to your local virtual environment with the\nfollowing command:\n\n```bash\n# Sync dependencies and allow uv to create a local .venv\nuv sync\n\n# Run scripts\nuv run crud.py\nuv run graph_rag.py\nuv run vector_rag.py\nuv run hybrid_rag.py\n```\n\n\n### If using system Python\n\n\u003e [!NOTE]\n\u003e Alternatively you can use your system's Python installation and pip to install the dependencies\n\u003e via `requirements.txt`.\n\n```bash\n# Activate virtual environment\npython -m venv .venv\nsource .venv/bin/activate\npython -m pip install -r requirements.txt\n\n# Run scripts\npython crud.py\npython graph_rag.py\npython vector_rag.py\npython hybrid_rag.py\n```\n\n## Description of steps\n\n### 1. Construct the graph\n\nThe script `crud.py` extracts entities and relationships from the provided\n[BlackRock founders dataset](./data/blackrock) and constructs a graph that is stored in Kùzu.\n\n```bash\nuv run crud.py\n```\n\nThe script `crud.py` does the following:\n- Chunk the text, generate embeddings, and stores the embeddings in a [LanceDB](https://lancedb.com/),\nan embedded vector database\n- Use the LlamaIndex framework and its\n[property graph index](https://docs.llamaindex.ai/en/stable/module_guides/indexing/lpg_index_guide/)\nto extract entities and relationships from the unstructured text.\n- Store the extracted entities and relationships in Kùzu, an embedded graph database\n- Augment the graph with additional entities and relationships obtained from external sources\n\n### 2. Traditional RAG (via vector search)\n\nThe script `vector_rag.py` runs retrieval-augmented generation (RAG) that leverages semantic\n(vector) search. To retrieve from the vector database, the script first embeds the question and then\nsearches for the nearest neighbors using cosine similarity. It then retrieves the context (chunks of\ntext) that are most similar to the question. The script finally uses the LLM to generate a response\nusing the retrieved context.\n\n```bash\nuv run vector_rag.py\n```\n\n### 3. Graph RAG\n\nThe script `graph_rag.py` runs retrieval-augmented generation (RAG) that leverages the graph\ndatabase to answer questions. To retrieve from the graph database, the script first translates\nthe question into a Cypher query, which is then executed against the graph database. The retrieved\nentities and relationships are then used as context to generate a response using the LLM.\n\n```bash\nuv run graph_rag.py\n```\n\n### 4. Hybrid RAG\n\nThe script `hybrid_rag.py` runs retrieval-augmented generation (RAG) that leverages *both* the\nvector database and the graph database. The vector and graph retrieval contexts are concatenated\ntogether and passed to the LLM to generate a response.\n\n```bash\nuv run hybrid_rag.py\n```\n\n## Workshop exercises\n\nIn this section, we'll go through the workshop exercises.\n\n### 1. Traditional RAG\n\nYou can use the script `vector_rag.py`, that performs naive chunking of the text, creates vector\nembeddings of the chunks, and stores them in a vector database.\n\nWe'll answer the following questions using traditional RAG.\n\n#### Q1: Who are the founders of BlackRock? Return the names as a numbered list.\n```\nThe founders of BlackRock are:\n\n1. Larry Fink\n2. Robert Kapito\n3. Susan Wagner\n```\n\n\u003e [!NOTE]\n\u003e The above list of the BlackRock founders is not exhaustive. Five more cofounders exist, as is\n\u003e revealed in a simple Google search. To address this, you can augment the graph (in the next step) with the additional information to improve the relevance and factual accuracy of the response.\n\n#### Q2: Where did Larry Fink graduate from?\n```\nLarry Fink graduated from UCLA, where he earned a BA in political science in 1974 and an MBA in 1976.\n```\n\n#### Q3: When was Susan Wagner born?\n```\nThe relevant context does not provide information about Susan Wagner's birth date. Therefore, I cannot answer the question about when Susan Wagner was born.\n```\n\n\u003e [!NOTE]\n\u003e In the given text data, Susan Wagner's birth date is not mentioned. You can use a graph\n\u003e with the additional information added in it to improve the relevance and factual accuracy of the response.\n\n#### Q4: How did Larry Fink and Rob Kapito meet?\n```\nLarry Fink and Rob Kapito first met while working at First Boston in 1979, where Kapito served in the Public Finance department. This initial meeting laid the foundation for their future partnership when they later co-founded BlackRock in 1988.\n```\n\n### 2. Graph RAG\n\nYou can use the script `graph_rag.py` to answer the following questions using graph RAG. During\ngraph construction, we added additional cofounders of BlackRock to the graph, as well as some of\ntheir birth dates.\n\nJust like earlier, we'll answer the following questions using Graph RAG.\n\n#### Q1: Who are the founders of BlackRock? Return the names as a numbered list.\n```\n1. Barbara Novick  \n2. Hugh Frater  \n3. Keith Anderson  \n4. Ralph Schlosstein  \n5. Robert Kapito  \n6. Larry Fink  \n7. Ben Golub  \n8. Susan Wagner  \n```\n\n\u003e [!NOTE]\n\u003e Because we augmented the graph with external knowledge, the list of founders is now exhaustive.\n\n#### Q2: Where did Larry Fink graduate from?\n\n```\nLarry Fink graduated from UCLA.\n```\n\n\u003e [!NOTE]\n\u003e Unlike vector retrieval-based RAG, the graph only stored the name of the university that Larry Fink\n\u003e graduated from, so the LLM only has this context to answer the question as it did.\n\n#### Q3: When was Susan Wagner born?\n\n```\nSusan Wagner was born on May 26, 1961.\n```\n\n\u003e [!NOTE]\n\u003e The graph stored the birth dates of Larry Fink and Susan Wagner, so the LLM was able to answer the\n\u003e question correctly where the vector search did not.\n\n#### Q4: How did Larry Fink and Rob Kapito meet?\n\n```\nThe relevant context does not provide any information about how Larry Fink and Rob Kapito met. Therefore, I cannot answer the question.\n```\n\n\u003e [!NOTE]\n\u003e The graph did not store any information about how Larry Fink and Rob Kapito met, so the LLM\n\u003e was unable to answer the question in this case.\n\n\n### 3. Hybrid RAG\n\nYou can use the script `hybrid_rag.py` to answer the following questions using hybrid RAG. The two\nretrieval contexts (vector and graph) are concatenated together and passed to the LLM to generate\na response.\n\n#### Q1: Who are the founders of BlackRock? Return the names as a numbered list.\n```\nThe founders of BlackRock are:\n\n1. Barbara Novick  \n2. Hugh Frater  \n3. Keith Anderson  \n4. Ralph Schlosstein  \n5. Robert Kapito  \n6. Larry Fink  \n7. Ben Golub  \n8. Susan Wagner  \n```\n\n#### Q2: Where did Larry Fink graduate from?\n\n```\nLarry Fink graduated from UCLA, where he earned both a BA in political science in 1974 and an MBA in 1976.\n```\n\n#### Q3: When was Susan Wagner born?\n\n```\nSusan Wagner was born on May 26, 1961.\n```\n\n#### Q4: How did Larry Fink and Rob Kapito meet?\n\n```\nLarry Fink and Rob Kapito first met while working at First Boston in 1979, where Kapito served in the Public Finance department. This meeting marked the beginning of their professional relationship, which later led them to become partners in founding BlackRock in 1988.\n```\n\n## Additional exercises\n\n- Try to answer questions about data that is not present in the original text. For example, you\ncan try to ask \"When was Barbara Novick born?\"\n- Try to answer questions that require some commonsense reasoning based on the text provided.\nFor example, \"Does Susan Wagner still work at BlackRock?\"\n- Try to answer questions that require reasoning over multiple sentences. For example, \"Which of\nBlackRock's cofounders also worked at First Boston, and where were they born?\"\n\n\n## Conclusions\n\nThe hybrid RAG methodology (with reranking) provides factually accurate\nresponses in all four cases. In cases where the graph didn't contain the answer, the vector search\nprovided relevant context that allowed the LLM to generate a response. In cases where the graph\ncontained the answer but the raw text didn't, hybrid RAG was able to rerank the results from the\ngraph and vector search in way that on average, provided relevant responses.\n\nNote that the hybrid RAG system is not perfect. If the information is not present (either explicitly\nor implicitly in the text), it cannot provide an answer to the question because the LLM cannot\nreason over the required information to formulate a response.\n\nThe key takeaways are:\n- Graphs can be a helpful tool for factual (extractive) question-answering tasks in RAG\n- Traditional RAG (vector-based) is useful for abstractive question-answering tasks, where the\ninformation is not explicitly stated in the exact words of the question\n- Just like data quality is of paramount importance in any retrieval system, for hybrid or Graph\nRAG, the quality of the graph (entities and relationships) is crucial to the quality of the responses\ngenerated\n\nFeel free to clone/fork this repo and try out the workflow on your own datasets!","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuzudb%2Fgraph-rag-workshop","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkuzudb%2Fgraph-rag-workshop","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuzudb%2Fgraph-rag-workshop/lists"}