https://github.com/dreden33773/embergraph
To query a `multi directed graph` with `label + attribute filtering` efficiently, support Neo4j and SQLite3 as storage layer.
https://github.com/dreden33773/embergraph
graduation-project graph-database neo4j parallel-computing polars python3 rust sqlite3 subgraph-matching tokio
Last synced: 8 days ago
JSON representation
To query a `multi directed graph` with `label + attribute filtering` efficiently, support Neo4j and SQLite3 as storage layer.
- Host: GitHub
- URL: https://github.com/dreden33773/embergraph
- Owner: DrEden33773
- License: mit
- Created: 2025-03-21T21:18:20.000Z (about 1 year ago)
- Default Branch: master
- Last Pushed: 2025-04-08T16:55:29.000Z (about 1 year ago)
- Last Synced: 2025-04-08T17:38:49.841Z (about 1 year ago)
- Topics: graduation-project, graph-database, neo4j, parallel-computing, polars, python3, rust, sqlite3, subgraph-matching, tokio
- Language: Rust
- Homepage:
- Size: 464 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# EmberGraph (pre-published)
To query a pattern on `multi directed graph` with `label / attribute filtering` efficiently, support `Neo4j` and `SQLite3` as the storage layer.
> ## ❤️ Please give a `Star` / `Follow` if you like this project
## To begin with
The project is still under development, however it has already been proved to be a correct implementation.
The test dataset is `LDBC-SNB-SF0.1`, and the example used for testing all come from `LDBC-SNB Business Intelligence (BI 1 ~ 20)`. You could find the actual `query` statements from `./resources/cypher`.
`EmberGraph` is only one-more-step to be formally published:
> To **optimize** the `matching order generation algorithm` (most likely to be `iterative dynamic programming` just like what `Neo4j` has done since it's `2.2` version)
## What's the most impressive?
Even though we're still working on the one-last-step to use the **better** `matching-order seeking strategy`, the performance of this `pre-published version` is approximately **equivalent** to that of `Neo4j`. (Oh, even **faster** in some cases!)
## How to build
You should have installed `git-lfs` first, to correctly clone this project:
```bash
git clone https://github.com/DrEden33773/EmberGraph
git lfs pull
cargo build # This will automatically install uv if in need, and then initialize `./ember-graph-import`
```
Then, you should manually do some initialization steps:
1. Transformed the original dataset into a `Neo4j-import-friendly` one. (Using `./ember-graph-import/raw_data_formatter.py`)
2. Generated basic `label-based` statistics. (Using `./ember-graph-import/statistics_generator.py`)
3. Imported `Neo4j-import-friendly` dataset into `Neo4j` or `SQLite3`. (Using `./ember-graph-import/neo4j_admin_import.py` or `./ember-graph-import/sqlite_import.py`)
All of the `python scripts` mentioned above could executed via the command:
```bash
cd ./ember-graph-import
source ./.venv/bin/activate
uv run
```
Right now, if you want to, you could run commands:
```bash
cargo run --example bi_ # (where x in [1..=20])
```
To check the query result of `bi_1` to `bi_20`.
## Something important for `release` mode building
Yes, you might have guessed -- It's totally possible to get the highest performance to build under the `release` mode.
However, I have to mind you that could be `TOO SLOW`.
So, if you really don't mind, here're several better options:
- Linux:
```bash
cargo build --release --all-target -j $(nproc)
```
- Mac:
```bash
cargo build --release --all-target -j $(sysctl -n hw.ncpu)
```
- Windows(Powershell):
```powershell
cargo build --release --all-target -j $env:NUMBER_OF_PROCESSORS
```
- Windows(CMD):
```cmd
cargo build --release --all-target -j %NUMBER_OF_PROCESSORS%
```
After all, that's a short-term pain for long-term gain😂.
Then, you could run the command below to check the query result of `bi_1` to `bi_20`:
```bash
cargo run --release --example bi_ # (where x in [1..=20])
```