Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/gusye1234/nano-graphrag

A simple, easy-to-hack GraphRAG implementation
https://github.com/gusye1234/nano-graphrag

gpt gpt-4o graphrag learning-by-doing llm rag

Last synced: about 1 month ago
JSON representation

A simple, easy-to-hack GraphRAG implementation

Awesome Lists containing this project

README

        


nano-GraphRAG


A simple, easy-to-hack GraphRAG implementation


⚠️ It's still under development and not ready yet ⚠️









😭 [GraphRAG](https://arxiv.org/pdf/2404.16130) is good and powerful, but the official [implementation](https://github.com/microsoft/graphrag/tree/main) is difficult/painful to **read or hack**.

😊 This project provides a **smaller, faster, cleaner GraphRAG**, while remaining the core functionality.

🎁 Excluding `tests` and prompts, `nano-graphrag` is about **700 lines of code**.

👌 Small yet **scalable**, **asynchronous** and **fully typed**

## TODO before publishing

- [x] Index
- [x] Chunking
- [x] Entity extraction
- [x] Entity summary
- [x] Compute communities
- [x] Entities Embedding
- [x] Community Report
- [ ] Query
- [ ] Global
- [ ] Local

## Install

**Install from PyPi**

```shell
pip install nano-graphrag
```

**Install from source**

```shell
# clone this repo first
cd nano-graphrag
pip install -e .
```

## Quick Start - Not yet

download a copy of A Christmas Carol by Charles Dickens:

```shell
curl https://raw.githubusercontent.com/gusye1234/nano-graphrag/main/tests/mock_data.txt > ./book.txt
```

Use the below python snippet:

```python
from nano_graphrag import GraphRAG

graph_func = GraphRAG(working_dir="./dickens")

with open("./book.txt") as f
graph_func.insert(f.read())

print(graph_func.query("What are the top themes in this story?"))
```

Next time you initialize a `GraphRAG` from the same `working_dir`, it will reload all the contexts automatically.

### Async Support

For each method `NAME(...)` , there is a corresponding async method `aNAME(...)`

```python
await graph_func.ainsert(...)
await graph_func.aquery(...)
...
```

### Available Parameters

In IDE/VSCode, hovering your cursor on `GraphRAG` to see all the available parameters.

## Advanced - Prompts

`nano-graphrag` use prompts from `nano_graphrag.prompt.PROMPTS` dict object. You can play with it and replace any prompt inside.

## Advanced - Storage

You can replace all storage-related components to your own implementation, `nano-graphrag` mainly uses three kinds of storage:

- `base.BaseKVStorage` for storing key-json pairs of data.
- By default we use disk file storage as the backend.
- `GraphRAG(.., key_string_value_json_storage_cls=YOURS,...)`
- `base.BaseVectorStorage` for indexing embeddings.
- By default we use [`milvus-lite`](https://github.com/milvus-io/milvus-lite) as the backend.
- `GraphRAG(.., vector_db_storage_cls=YOURS,...)`
- `base.BaseGraphStorage` for storing knowledge graph.
- By default we use [`networkx`](https://github.com/networkx/networkx) as the backend.
- `GraphRAG(.., graph_storage_cls=YOURS,...)`

## Benchmark - Not yet

...