{"id":17526850,"url":"https://github.com/ksachdeva/langchain-graphrag","last_synced_at":"2025-04-05T09:04:46.397Z","repository":{"id":250345492,"uuid":"832364803","full_name":"ksachdeva/langchain-graphrag","owner":"ksachdeva","description":"GraphRAG / From Local to Global: A Graph RAG Approach to Query-Focused Summarization","archived":false,"fork":false,"pushed_at":"2024-10-31T16:15:22.000Z","size":525,"stargazers_count":101,"open_issues_count":2,"forks_count":13,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-23T09:09:10.947Z","etag":null,"topics":["graphrag","langchain","llm","rag"],"latest_commit_sha":null,"homepage":"https://langchain-graphrag.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ksachdeva.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-07-22T21:56:48.000Z","updated_at":"2025-03-20T10:40:17.000Z","dependencies_parsed_at":"2024-08-02T22:57:26.158Z","dependency_job_id":"5c3d3a5f-bf09-4acf-9be5-fa6fc4ea3a65","html_url":"https://github.com/ksachdeva/langchain-graphrag","commit_stats":null,"previous_names":["ksachdeva/langchain-graphrag"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksachdeva%2Flangchain-graphrag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksachdeva%2Flangchain-graphrag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksachdeva%2Flangchain-graphrag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ksachdeva%2Flangchain-graphrag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ksachdeva","download_url":"https://codeload.github.com/ksachdeva/langchain-graphrag/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247312068,"owners_count":20918344,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["graphrag","langchain","llm","rag"],"created_at":"2024-10-20T15:02:34.083Z","updated_at":"2025-04-05T09:04:46.355Z","avatar_url":"https://github.com/ksachdeva.png","language":"Python","readme":"# GraphRAG - Powered by LangChain\n\n[![Documentation build status](https://readthedocs.org/projects/langchain-graphrag/badge/?version=latest\n)](https://langchain-graphrag.readthedocs.io/en/latest/)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n\n\nThis is an implementation of GraphRAG as described in\n\nhttps://arxiv.org/pdf/2404.16130\n\nFrom Local to Global: A Graph RAG Approach to Query-Focused Summarization\n\nOfficial implementation by the authors of the paper is available at:\n\nhttps://github.com/microsoft/graphrag/\n\n## Guides\n\n- [Text Unit Extraction](https://langchain-graphrag.readthedocs.io/en/latest/guides/text_units_extraction/)\n- [Graph Extraction](https://langchain-graphrag.readthedocs.io/en/latest/guides/graph_extraction/)\n\n## Why re-implementation 🤔?\n\n### Personal Preference\n\nWhile I generally prefer utilizing and refining existing implementations, as re-implementation often isn't optimal, I decided to take a different approach after encountering several challenges with the official version.\n\n### Issues with the Official Implementation\n\n- Lacks integration with popular frameworks like LangChain, LlamaIndex, etc.\n- Limited to OpenAI and AzureOpenAI models, with no support for other providers.\n\n### Why relying on established frameworks like LangChain?\n\nUsing an established foundation like LangChain offers numerous benefits. It abstracts various providers, whether related to LLMs, embeddings, vector stores, etc., allowing for easy component swapping without altering core logic or adding complex support. More importantly, a solid foundation like this lets you focus on the problem's core logic rather than reinventing the wheel.\n\nLangChain also supports advanced features like batching and streaming, provided your components align with the framework’s guidelines. For instance, using chains (LCEL) allows you to take full advantage of these capabilities.\n\n### Modularity \u0026 Extensibility focused design\n\nThe APIs are designed to be modular and extensible. You can replace any component with your own implementation as long as it implements the required interface. \n\nGiven the nature of the domain, this is important for conducting experiments by swapping out various components.\n\n## Install \n\n```bash\npip install langchain-graphrag\n```\n\n## Projects\n\nThere are 2 projects in the repo:\n\n### `langchain_graphrag`\n\nThis is the core library that implements the GraphRAG paper. It is built on top of the `langchain` library.\n\n#### An example code for local search using the API\n\nBelow is a snippet taken from the `example-app` to show the style of API\nand extensibility offered by the library.\n\nAlmost all the components (classes/functions) can be replaced by your own\nimplementations. The library is designed to be modular and extensible.\n\n```python\n# Reload the vector Store that stores\n# the entity name \u0026 description embeddings\nentities_vector_store = ChromaVectorStore(\n    collection_name=\"entity_name_description\",\n    persist_directory=str(vector_store_dir),\n    embedding_function=make_embedding_instance(\n        embedding_type=embedding_type,\n        model=embedding_model,\n        cache_dir=cache_dir,\n    ),\n)\n\n# Build the Context Selector using the default\n# components; You can supply the various components\n# and achieve as much extensibility as you want\n# Below builds the one using default components.\ncontext_selector = ContextSelector.build_default(\n    entities_vector_store=entities_vector_store,\n    entities_top_k=10,\n    community_level=cast(CommunityLevel, level),\n)\n\n# Context Builder is responsible for taking the\n# result of Context Selector \u0026 building the\n# actual context to be inserted into the prompt\n# Keeping these two separate further increases\n# extensibility \u0026 maintainability\ncontext_builder = ContextBuilder.build_default(\n    token_counter=TiktokenCounter(),\n)\n\n# load the artifacts\nartifacts = load_artifacts(artifacts_dir)\n\n# Make a langchain retriever that relies on\n# context selection \u0026 builder\nretriever = LocalSearchRetriever(\n    context_selector=context_selector,\n    context_builder=context_builder,\n    artifacts=artifacts,\n)\n\n# Build the LocalSearch object\nlocal_search = LocalSearch(\n    prompt_builder=LocalSearchPromptBuilder(),\n    llm=make_llm_instance(llm_type, llm_model, cache_dir),\n    retriever=retriever,\n)\n\n# it's a callable that returns the chain\nsearch_chain = local_search()\n\n# you could invoke\n# print(search_chain.invoke(query))\n\n# or, you could stream\nfor chunk in search_chain.stream(query):\n    print(chunk, end=\"\", flush=True)\n```\n\n\n#### Clone the repo\n\n```bash\ngit clone https://github.com/ksachdeva/langchain-graphrag.git\n```\n#### Open in VSCode devcontainer (Recommended)\n\nDevcontainer will install all the dependencies\n\n#### If not using devcontainer\n\nMake sure you have `rye` installed. See https://rye.astral.sh/\n\n```bash\n# sync all the dependencies\nrye sync\n```\n\n### `examples/simple-app`\n\nThis is a simple `typer` based CLI app.\n\nIn terms of configuration it is limited by the number of command line options exposed.\n\nThat said, the way core library is written you can easily replace any component by\nyour own implementation i.e. your choice of LLM, embedding models etc. Even some of\nthe classes as long as they implement the required interface.\n\n**Note**:\n\nMake sure to rename `.env.example` with `.env` if you are using OpenAI or AzureOpenAI\nand fill in the necessary environment variables.\n\n#### Indexing \n\n```bash\nrye run simple-app-indexer --llm-type azure_openai --llm-model gpt-4o --embedding-type azure_openai --embedding-model text-embedding-3-small\n```\n\n```bash\n# To see more options\n$ rye run simple-app-indexer --help                  \nUsage: main.py indexer index [OPTIONS]                                                                                            \n                                                                                                                                   \n╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ *  --input-file                                     FILE                          [default: None] [required]                    │\n│ *  --output-dir                                     DIRECTORY                     [default: None] [required]                    │\n│ *  --cache-dir                                      DIRECTORY                     [default: None] [required]                    │\n│ *  --llm-type                                       [openai|azure_openai|ollama]  [default: None] [required]                    │\n│ *  --llm-model                                      TEXT                          [default: None] [required]                    │\n│ *  --embedding-type                                 [openai|azure_openai|ollama]  [default: None] [required]                    │\n│ *  --embedding-model                                TEXT                          [default: None] [required]                    │\n│    --chunk-size                                     INTEGER                       Chunk size for text splitting [default: 1200] │\n│    --chunk-overlap                                  INTEGER                       Chunk overlap for text splitting              │\n│                                                                                   [default: 100]                                │\n│    --ollama-num-context                             INTEGER                       Context window size for ollama model          │\n│                                                                                   [default: None]                               │\n│    --enable-langsmith      --no-enable-langsmith                                  Enable Langsmith                              │\n│                                                                                   [default: no-enable-langsmith]                │\n│    --help                                                                         Show this message and exit.                   │\n╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n```\n\n#### Global Search\n\n```bash\nrye run simple-app-global-search --llm-type azure_openai --llm-model gpt-4o --query \"What are the top themes in this story?\"\n```\n\n```bash\n$ rye run simple-app-global-search --help\nUsage: main.py query global-search [OPTIONS]\n                                                                                                                                            \n╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ *  --output-dir                                     DIRECTORY                     [default: None] [required]                              │\n│ *  --cache-dir                                      DIRECTORY                     [default: None] [required]                              │\n│ *  --llm-type                                       [openai|azure_openai|ollama]  [default: None] [required]                              │\n│ *  --llm-model                                      TEXT                          [default: None] [required]                              │\n│ *  --query                                          TEXT                          [default: None] [required]                              │\n│    --level                                          INTEGER                       Community level to search [default: 2]                  │\n│    --ollama-num-context                             INTEGER                       Context window size for ollama model [default: None]    │\n│    --enable-langsmith      --no-enable-langsmith                                  Enable Langsmith [default: no-enable-langsmith]         │\n│    --help                                                                         Show this message and exit.                             │\n╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n```\n\n#### Local Search\n\n```bash\nrye run simple-app-local-search --llm-type azure_openai --llm-model gpt-4o --query \"Who is Scrooge, and what are his main relationships?\" --embedding-type azure_openai --embedding-model text-embedding-3-small\n```\n\n```bash\n$ rye run simple-app-local-search --help\nUsage: main.py query local-search [OPTIONS]                                                                                                 \n                                                                                                                                             \n╭─ Options ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮\n│ *  --output-dir                                     DIRECTORY                     [default: None] [required]                              │\n│ *  --cache-dir                                      DIRECTORY                     [default: None] [required]                              │\n│ *  --llm-type                                       [openai|azure_openai|ollama]  [default: None] [required]                              │\n│ *  --llm-model                                      TEXT                          [default: None] [required]                              │\n│ *  --query                                          TEXT                          [default: None] [required]                              │\n│    --level                                          INTEGER                       Community level to search [default: 2]                  │\n│ *  --embedding-type                                 [openai|azure_openai|ollama]  [default: None] [required]                              │\n│ *  --embedding-model                                TEXT                          [default: None] [required]                              │\n│    --ollama-num-context                             INTEGER                       Context window size for ollama model [default: None]    │\n│    --enable-langsmith      --no-enable-langsmith                                  Enable Langsmith [default: no-enable-langsmith]         │\n│    --help                                                                         Show this message and exit.                             │\n╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯\n```\n\nSee `examples/simple-app/README.md` for more details.\n\n## Roadmap / Things to do\n\nThe state of the library is far from complete. \n\nHere are some of the things that need to be done to make it more useful:\n\n- [ ] Add more guides\n- [ ] Document the APIs\n- [ ] Add more tests","funding_links":[],"categories":["Python"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fksachdeva%2Flangchain-graphrag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fksachdeva%2Flangchain-graphrag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fksachdeva%2Flangchain-graphrag/lists"}