{"id":44308733,"url":"https://github.com/mstryoda/goraphdb","last_synced_at":"2026-02-13T05:01:20.718Z","repository":{"id":336856014,"uuid":"1151263475","full_name":"mstrYoda/goraphdb","owner":"mstrYoda","description":"A graph database provides Cypher query, fluent builder and management UI implemented in Golang.","archived":false,"fork":false,"pushed_at":"2026-02-11T16:29:31.000Z","size":1020,"stargazers_count":51,"open_issues_count":0,"forks_count":2,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-12T11:52:08.898Z","etag":null,"topics":["database","golang","graph","graph-db","nosql","nosql-database"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mstrYoda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-02-06T08:50:30.000Z","updated_at":"2026-02-11T21:10:24.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mstrYoda/goraphdb","commit_stats":null,"previous_names":["mstryoda/goraphdb"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mstrYoda/goraphdb","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstrYoda%2Fgoraphdb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstrYoda%2Fgoraphdb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstrYoda%2Fgoraphdb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstrYoda%2Fgoraphdb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mstrYoda","download_url":"https://codeload.github.com/mstrYoda/goraphdb/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mstrYoda%2Fgoraphdb/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29396847,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-13T04:26:15.637Z","status":"ssl_error","status_checked_at":"2026-02-13T04:16:29.732Z","response_time":78,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","golang","graph","graph-db","nosql","nosql-database"],"created_at":"2026-02-11T03:34:50.068Z","updated_at":"2026-02-13T05:01:20.710Z","avatar_url":"https://github.com/mstrYoda.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GraphDB\n\nA high-performance, embeddable graph database written in Go. Built on top of [bbolt](https://github.com/etcd-io/bbolt) (B+tree key-value store), it supports concurrent queries, secondary indexes, optional hash-based sharding, and a subset of the Cypher query language — all in a single dependency-free binary.\n\n![Query Editor Screenshot](assets/query.png)\n\n## Features\n\n- **Directed labeled graph** — nodes and edges with arbitrary JSON-like properties  \n  `alice ---follows---\u003e bob`, `server ---consumes---\u003e queue`\n- **Node labels** — first-class `:Person`, `:Movie` labels with dedicated index and Cypher support (`MATCH (n:Person)`)\n- **Concurrent reads** — fully parallel BFS, DFS, Cypher, and query-builder calls via MVCC\n- **50 GB+ ready** — bbolt memory-mapped storage with configurable `MmapSize`\n- **Graph algorithms** — BFS, DFS, Shortest Path (unweighted \u0026 Dijkstra), All Paths, Connected Components, Topological Sort\n- **Fluent query builder** — chainable Go API with filtering, pagination, and direction control\n- **Secondary indexes** — O(log N) property lookups with auto-maintenance on single writes\n- **Composite indexes** — multi-property indexes for fast compound lookups (`CreateCompositeIndex(\"city\", \"age\")`)\n- **Cypher query language** — read and write support with index-aware execution, LIMIT push-down, ORDER BY + LIMIT heap, query plan caching, `OPTIONAL MATCH`, `EXPLAIN`/`PROFILE`, parameterized queries, and `CREATE` for inserting nodes and edges\n- **Query timeout** — `CypherContext`/`CypherWithParamsContext` accept `context.Context` for deadline-based cancellation at scan loop boundaries\n- **Transactions** — `Begin`/`Commit`/`Rollback` API for multi-statement atomic operations with read-your-writes semantics\n- **EXPLAIN / PROFILE** — query plan tree with operator types; `PROFILE` adds per-operator row counts and wall-clock timing\n- **OPTIONAL MATCH** — left-outer-join semantics for graph patterns (unmatched bindings become `nil`)\n- **Byte-budgeted node cache** — sharded concurrent LRU cache with memory-based eviction (default 128 MB); predictable memory footprint regardless of node sizes\n- **Data integrity** — CRC32 (Castagnoli) checksums on all node/edge data, verified on every read, with a `VerifyIntegrity()` full scan\n- **Binary encoding** — MessagePack property serialization (3–5× faster, 30–50% smaller than JSON) with backward-compatible format detection\n- **Structured logging** — `log/slog` integration for all write operations, errors, and lifecycle events\n- **Parameterized queries** — `$param` tokens in Cypher for safe substitution and plan reuse\n- **Prepared statement caching** — bounded LRU query cache (10K entries) with `PrepareCypher`/`ExecutePrepared`/`ExecutePreparedWithParams` API and server-side `/api/cypher/prepare` + `/api/cypher/execute` endpoints\n- **Streaming results** — `CypherStream()` returns a lazy `RowIterator` for O(1) memory on non-sorted queries; NDJSON streaming via `POST /api/cypher/stream`\n- **Slow query log** — configurable threshold (default 100ms); queries exceeding the threshold are logged at WARN level with duration, row count, and truncated query text\n- **Cursor pagination** — O(limit) cursor-based `ListNodes`/`ListEdges`/`ListNodesByLabel` APIs; no offset scanning. Server endpoints: `GET /api/nodes/cursor`, `GET /api/edges/cursor`\n- **Prometheus metrics** — dependency-free atomic counters with Prometheus text exposition at `GET /metrics`; tracks queries, slow queries, cache hits/misses, node/edge CRUD, index lookups, and live gauges\n- **Batch operations** — `AddNodeBatch` / `AddEdgeBatch` for bulk loading with single-fsync transactions\n- **Worker pool** — built-in goroutine pool for concurrent query execution\n- **Optional sharding** — hash-based partitioning across multiple bbolt files; edges co-located with source nodes for single-shard traversals\n- **Management UI** — built-in web console with a Cypher query editor, interactive graph visualization (cytoscape.js), index management, and a node/edge explorer\n\n## Installation\n\n```bash\ngo get github.com/mstrYoda/goraphdb\n```\n\n## Quick Start\n\n```go\npackage main\n\nimport (\n    \"fmt\"\n    \"log\"\n\n    graphdb \"github.com/mstrYoda/goraphdb\"\n)\n\nfunc main() {\n    // Open (or create) a database.\n    db, err := graphdb.Open(\"./my.db\", graphdb.DefaultOptions())\n    if err != nil {\n        log.Fatal(err)\n    }\n    defer db.Close()\n\n    // Add nodes with arbitrary properties.\n    alice, _ := db.AddNode(graphdb.Props{\"name\": \"Alice\", \"age\": 30})\n    bob, _ := db.AddNode(graphdb.Props{\"name\": \"Bob\", \"age\": 25})\n\n    // Add a directed labeled edge.\n    db.AddEdge(alice, bob, \"follows\", graphdb.Props{\"since\": \"2024\"})\n\n    // Query neighbors.\n    neighbors, _ := db.NeighborsLabeled(alice, \"follows\")\n    for _, n := range neighbors {\n        fmt.Println(n.GetString(\"name\")) // Bob\n    }\n\n    // BFS traversal.\n    results, _ := db.BFSCollect(alice, 3, graphdb.Outgoing)\n    for _, r := range results {\n        fmt.Printf(\"depth=%d  %s\\n\", r.Depth, r.Node.GetString(\"name\"))\n    }\n\n    // Cypher query.\n    ctx := context.Background()\n    res, _ := db.Cypher(ctx, `MATCH (a {name: \"Alice\"})-[:follows]-\u003e(b) RETURN b.name`)\n    for _, row := range res.Rows {\n        fmt.Println(row[\"b.name\"]) // Bob\n    }\n}\n```\n\n## Configuration\n\n```go\nopts := graphdb.Options{\n    ShardCount:         1,                      // 1 = single process (default), N = hash-sharded\n    WorkerPoolSize:     8,                      // goroutines for concurrent query execution\n    CacheBudget:        128 * 1024 * 1024,      // 128 MB byte-budget LRU cache for hot nodes\n    SlowQueryThreshold: 100 * time.Millisecond, // log queries slower than this (0 = disabled)\n    NoSync:             false,                  // true = skip fsync (faster writes, risk of data loss)\n    ReadOnly:           false,                  // open in read-only mode\n    MmapSize:           256 * 1024 * 1024,      // 256 MB initial mmap\n}\ndb, err := graphdb.Open(\"./data\", opts)\n```\n\nUse `graphdb.DefaultOptions()` for sensible defaults tuned for ~50 GB datasets.\n\n## API Reference\n\n### Node Operations\n\n```go\n// Create\nid, err := db.AddNode(graphdb.Props{\"name\": \"Alice\", \"age\": 30})\nids, err := db.AddNodeBatch([]graphdb.Props{...})    // bulk insert (single tx)\n\n// Read\nnode, err := db.GetNode(id)\nname := node.GetString(\"name\")      // \"Alice\"\nage  := node.GetFloat(\"age\")        // 30.0\nexists, err := db.NodeExists(id)\ncount := db.NodeCount()\n\n// Update\nerr = db.UpdateNode(id, graphdb.Props{\"age\": 31})    // merge\nerr = db.SetNodeProps(id, graphdb.Props{\"name\": \"A\"}) // full replace\n\n// Delete\nerr = db.DeleteNode(id)              // also removes all connected edges\n\n// Scan \u0026 Filter\nnodes, err := db.FindNodes(func(n *graphdb.Node) bool {\n    return n.GetFloat(\"age\") \u003e 25\n})\nerr = db.ForEachNode(func(n *graphdb.Node) error {\n    fmt.Println(n.Props)\n    return nil\n})\n```\n\n### Node Labels\n\n```go\n// Create a node with labels\nid, err := db.AddNodeWithLabels([]string{\"Person\", \"Employee\"}, graphdb.Props{\"name\": \"Alice\"})\n\n// Add / remove labels on existing nodes\nerr = db.AddLabel(id, \"Admin\")\nerr = db.RemoveLabel(id, \"Employee\")\n\n// Query labels\nlabels, err := db.GetLabels(id)           // [\"Person\", \"Admin\"]\nhas, err := db.HasLabel(id, \"Person\")     // true\n\n// Find all nodes with a label (index-backed)\npeople, err := db.FindByLabel(\"Person\")\n```\n\n### Transactions\n\n```go\n// Multi-statement atomic operations with read-your-writes semantics.\ntx, err := db.Begin()\n\nalice, _ := tx.AddNode(graphdb.Props{\"name\": \"Alice\"})\nbob, _ := tx.AddNode(graphdb.Props{\"name\": \"Bob\"})\ntx.AddEdge(alice, bob, \"follows\", nil)\n\n// Read uncommitted data within the same transaction.\nnode, _ := tx.GetNode(alice) // visible before commit\n\nerr = tx.Commit()   // atomically persists all changes\n// — or —\nerr = tx.Rollback() // discards all changes\n```\n\n### Edge Operations\n\n```go\n// Create  —  alice ---follows---\u003e bob\nedgeID, err := db.AddEdge(alice, bob, \"follows\", graphdb.Props{\"since\": \"2024\"})\nids, err := db.AddEdgeBatch([]graphdb.Edge{...})\n\n// Read\nedge, err := db.GetEdge(edgeID)\noutEdges, err := db.OutEdges(alice)                   // all outgoing\ninEdges, err := db.InEdges(bob)                       // all incoming\nallEdges, err := db.Edges(alice)                      // both directions\nlabeled, err := db.OutEdgesLabeled(alice, \"follows\")  // by label\nbyLabel, err := db.EdgesByLabel(\"follows\")            // all edges with label\ncount := db.EdgeCount()\n\n// Update\nerr = db.UpdateEdge(edgeID, graphdb.Props{\"weight\": 1.5})\n\n// Delete\nerr = db.DeleteEdge(edgeID)\n\n// Predicates\nhas, err := db.HasEdge(alice, bob)\nhas, err := db.HasEdgeLabeled(alice, bob, \"follows\")\ndeg, err := db.Degree(alice, graphdb.Outgoing)\n\n// Neighbors\nnodes, err := db.Neighbors(alice)                          // outgoing neighbors\nnodes, err := db.NeighborsLabeled(alice, \"follows\")        // filtered by label\nnodes, err := db.NeighborsDirection(alice, graphdb.Both)   // both directions\n```\n\n### Traversal Algorithms\n\n```go\n// BFS — breadth-first search with visitor callback\nerr = db.BFS(startID, maxDepth, graphdb.Outgoing, edgeFilter, func(r *graphdb.TraversalResult) bool {\n    fmt.Printf(\"depth=%d node=%v\\n\", r.Depth, r.Node.Props[\"name\"])\n    return true // return false to stop early\n})\n\n// Convenience collectors\nresults, err := db.BFSCollect(startID, 3, graphdb.Outgoing)\nresults, err := db.DFSCollect(startID, 3, graphdb.Outgoing)\n\n// Filtered traversals\nresults, err := db.BFSFiltered(startID, 3, graphdb.Outgoing, edgeFilter, nodeFilter)\nresults, err := db.DFSFiltered(startID, 3, graphdb.Outgoing, edgeFilter, nodeFilter)\n```\n\n### Pathfinding\n\n```go\n// Shortest path (unweighted BFS)\npath, err := db.ShortestPath(from, to)\npath, err := db.ShortestPathLabeled(from, to, \"follows\")\n\n// Dijkstra (weighted)\npath, err := db.ShortestPathWeighted(from, to, \"weight\", 1.0)\n\n// All paths (up to maxDepth)\npaths, err := db.AllPaths(from, to, 5)\n\n// Connectivity\nexists, err := db.HasPath(from, to)\ncomponents, err := db.ConnectedComponents()\nsorted, err := db.TopologicalSort()  // Kahn's algorithm, errors on cycles\n```\n\n### Secondary Indexes\n\n```go\n// Create an index on a property (scans existing nodes)\nerr = db.CreateIndex(\"name\")\n\n// Fast lookup — O(log N) via B+tree prefix scan\nnodes, err := db.FindByProperty(\"name\", \"Alice\")\n\n// Check if a property is indexed\nindexed := db.HasIndex(\"name\")\n\n// Drop / rebuild\nerr = db.DropIndex(\"name\")\nerr = db.ReIndex(\"name\")\n```\n\n\u003e **Index maintenance**: `AddNode`, `UpdateNode`, `SetNodeProps`, and `DeleteNode` automatically update indexes within the same transaction (zero extra fsync). `AddNodeBatch` skips auto-indexing for performance — call `CreateIndex()` or `ReIndex()` after batch inserts.\n\n### Composite Indexes\n\n```go\n// Create a composite index on multiple properties (scans existing nodes)\nerr = db.CreateCompositeIndex(\"city\", \"age\")\n\n// Fast compound lookup — O(log N) via B+tree prefix scan\nnodes, err := db.FindByCompositeIndex(map[string]any{\"city\": \"Istanbul\", \"age\": 30})\n\n// Cypher queries use composite indexes automatically\n// MATCH (n {city: \"Istanbul\", age: 30}) RETURN n  → composite index seek\n\n// Management\nhas := db.HasCompositeIndex(\"city\", \"age\")\nindexes := db.ListCompositeIndexes() // [][]string\nerr = db.DropCompositeIndex(\"city\", \"age\")\n```\n\n### Prepared Statements \u0026 Query Cache\n\n```go\n// Prepare a parameterized query (parsed once, cached)\npq, err := db.PrepareCypher(\"MATCH (n {name: $name}) RETURN n\")\n\nctx := context.Background()\n\n// Execute with different parameters — no re-parsing\nresult, err := db.ExecutePreparedWithParams(ctx, pq, map[string]any{\"name\": \"Alice\"})\nresult, err = db.ExecutePreparedWithParams(ctx, pq, map[string]any{\"name\": \"Bob\"})\n\n// Execute without parameters\nresult, err = db.ExecutePrepared(ctx, pq)\n\n// Query cache statistics (bounded LRU, default 10K entries)\nstats := db.QueryCacheStats()\nfmt.Printf(\"hits=%d misses=%d entries=%d\\n\", stats.Hits, stats.Misses, stats.Entries)\n```\n\n### Streaming Results (Iterator)\n\n```go\nctx := context.Background()\n\n// CypherStream returns a lazy RowIterator — O(1) memory for non-sorted queries.\niter, err := db.CypherStream(ctx, \"MATCH (n) RETURN n.name LIMIT 100\")\nif err != nil {\n    log.Fatal(err)\n}\ndefer iter.Close()\n\nfor iter.Next() {\n    row := iter.Row()\n    fmt.Println(row[\"n.name\"])\n}\nif err := iter.Err(); err != nil {\n    log.Fatal(err)\n}\n\n// Parameterized streaming\niter, err = db.CypherStreamWithParams(ctx,\n    \"MATCH (n {city: $city}) RETURN n.name\",\n    map[string]any{\"city\": \"Istanbul\"},\n)\n```\n\n### Write Cypher (CREATE)\n\n```go\nctx := context.Background()\n\n// CREATE works through the unified Cypher() API — no separate function needed.\nresult, err := db.Cypher(ctx, `CREATE (n:Person {name: \"Alice\", age: 30}) RETURN n`)\nnode := result.Rows[0][\"n\"].(*graphdb.Node) // access created node\n\n// Create two nodes and an edge in one statement.\ndb.Cypher(ctx, `CREATE (a:Person {name: \"Alice\"})-[:FOLLOWS]-\u003e(b:Person {name: \"Bob\"})`)\n\n// Multiple comma-separated patterns.\ndb.Cypher(ctx, `CREATE (a:City {name: \"Istanbul\"}), (b:City {name: \"Ankara\"})`)\n\n// CREATE without RETURN — fire-and-forget.\ndb.Cypher(ctx, `CREATE (n:Movie {title: \"The Matrix\", year: 1999})`)\n\n// Dedicated API with creation statistics.\ncr, _ := db.CypherCreate(ctx, `CREATE (n:Person {name: \"Eve\"}) RETURN n`)\nfmt.Println(cr.Stats.NodesCreated) // 1\nfmt.Println(cr.Stats.LabelsSet)    // 1\nfmt.Println(cr.Stats.PropsSet)     // 1\n```\n\n### Query Timeout \u0026 Cancellation\n\n```go\n// All query methods accept context.Context as the first argument for\n// timeout/cancellation. The context is checked at key iteration points\n// (full scans, edge traversals, index scans).\n\nctx, cancel := context.WithTimeout(context.Background(), 500*time.Millisecond)\ndefer cancel()\n\nresult, err := db.Cypher(ctx, `MATCH (n) RETURN n`)\nif errors.Is(err, context.DeadlineExceeded) {\n    log.Println(\"query timed out\")\n}\n\n// Parameterized queries also accept context:\nresult, err = db.CypherWithParams(ctx,\n    \"MATCH (n {name: $name}) RETURN n\",\n    map[string]any{\"name\": \"Alice\"},\n)\n```\n\n### Cursor Pagination\n\n```go\n// List nodes with cursor-based pagination — O(limit) per page, no offset scan.\npage, err := db.ListNodes(0, 20) // first page, 20 nodes\nfor _, n := range page.Nodes {\n    fmt.Printf(\"id=%d name=%s\\n\", n.ID, n.GetString(\"name\"))\n}\n\n// Next page: pass the cursor from the previous page.\nif page.HasMore {\n    page2, _ := db.ListNodes(page.NextCursor, 20)\n    // ...\n}\n\n// Edges and label-filtered nodes also supported.\nedgePage, _ := db.ListEdges(0, 50)\nlabelPage, _ := db.ListNodesByLabel(\"Person\", 0, 20)\n```\n\n### Slow Query Log\n\n```go\n// Queries exceeding SlowQueryThreshold are logged at WARN level automatically.\nopts := graphdb.DefaultOptions()\nopts.SlowQueryThreshold = 50 * time.Millisecond // default: 100ms, 0 = disabled\n\n// Log output (slog):\n// WARN slow query detected query=\"MATCH (n) RETURN n\" duration=152ms rows=50000\n```\n\n### Prometheus Metrics\n\n```go\n// All metrics are atomic counters — zero contention, no external dependencies.\nm := db.Metrics()\n\n// Programmatic access\nsnap := m.Snapshot() // map[string]any with all counters + live gauges\n\n// Prometheus text exposition (for /metrics endpoint or manual use)\nm.WritePrometheus(os.Stdout)\n// Output:\n// # HELP graphdb_queries_total Total number of Cypher query executions\n// # TYPE graphdb_queries_total counter\n// graphdb_queries_total 42\n// ...\n```\n\nAvailable metrics:\n- **Counters**: `graphdb_queries_total`, `graphdb_slow_queries_total`, `graphdb_query_errors_total`, `graphdb_cache_hits_total`, `graphdb_cache_misses_total`, `graphdb_nodes_created_total`, `graphdb_nodes_deleted_total`, `graphdb_edges_created_total`, `graphdb_edges_deleted_total`, `graphdb_index_lookups_total`\n- **Gauges**: `graphdb_nodes_current`, `graphdb_edges_current`, `graphdb_node_cache_bytes_used`, `graphdb_node_cache_budget_bytes`, `graphdb_query_cache_entries`, `graphdb_query_cache_capacity`\n\n### Fluent Query Builder\n\n```go\nresult, err := db.NewQuery().\n    From(alice).\n    FollowEdge(\"follows\").\n    Dir(graphdb.Outgoing).\n    Depth(3).\n    Where(func(n *graphdb.Node) bool {\n        return n.GetFloat(\"age\") \u003e 25\n    }).\n    Limit(10).\n    Offset(0).\n    UseBFS().        // or .UseDFS()\n    Execute()\n\nfor _, node := range result.Nodes {\n    fmt.Println(node.GetString(\"name\"))\n}\n```\n\n### Concurrent Queries\n\n```go\nctx := context.Background()\n\n// Run multiple queries in parallel using the built-in worker pool.\nresults, err := db.NewConcurrentQuery().\n    Add(db.NewQuery().From(alice).FollowEdge(\"follows\").Depth(2)).\n    Add(db.NewQuery().From(bob).FollowEdge(\"follows\").Depth(2)).\n    Execute(ctx)\n\n// Or run arbitrary functions concurrently.\nvalues, errs := db.ExecuteFunc(ctx,\n    func() (interface{}, error) { return db.ShortestPath(alice, charlie) },\n    func() (interface{}, error) { return db.BFSCollect(bob, 3, graphdb.Outgoing) },\n)\n```\n\n### Cypher Query Language\n\nGraphDB supports a read-only subset of the [Cypher](https://neo4j.com/docs/cypher-manual/current/) query language with index-aware execution, LIMIT push-down, and query plan caching.\n\n#### Supported Patterns\n\n```go\nctx := context.Background()\n\n// 1. All nodes\nres, _ := db.Cypher(ctx, `MATCH (n) RETURN n`)\n\n// 2. Property filter (uses index if available)\nres, _ = db.Cypher(ctx, `MATCH (n {name: \"Alice\"}) RETURN n`)\n\n// 3. WHERE clause\nres, _ = db.Cypher(ctx, `MATCH (n) WHERE n.age \u003e 25 RETURN n`)\n\n// 4. Single-hop pattern match\nres, _ = db.Cypher(ctx, `MATCH (a)-[:follows]-\u003e(b) RETURN a, b`)\n\n// 5. Filtered traversal with property projection\nres, _ = db.Cypher(ctx, `MATCH (a {name: \"Alice\"})-[:follows]-\u003e(b) RETURN b.name`)\n\n// 6. Variable-length path (1 to 3 hops)\nres, _ = db.Cypher(ctx, `MATCH (a)-[:follows*1..3]-\u003e(b) RETURN b`)\n\n// 7. Any edge type with type() function\nres, _ = db.Cypher(ctx, `MATCH (a)-[r]-\u003e(b) RETURN type(r), b`)\n\n// 8. Label-based matching (index-backed)\nres, _ = db.Cypher(ctx, `MATCH (n:Person) RETURN n`)\nres, _ = db.Cypher(ctx, `MATCH (a:Person)-[:follows]-\u003e(b:Person) RETURN a, b`)\n\n// 9. OPTIONAL MATCH — left-outer-join (nil when no match)\nres, _ = db.Cypher(ctx, `MATCH (n:Person) OPTIONAL MATCH (n)-[r:WROTE]-\u003e(b) RETURN n.name, b`)\n```\n\n#### EXPLAIN / PROFILE\n\n```go\n// EXPLAIN — returns the query plan without executing (zero I/O)\nres, _ := db.Cypher(ctx, `EXPLAIN MATCH (n:Person) WHERE n.age \u003e 25 RETURN n`)\nfmt.Println(res.Plan.String())\n// EXPLAIN:\n// └── ProduceResults (n)\n//     └── Filter (WHERE clause)\n//         └── NodeByLabelScan (n:Person)\n\n// PROFILE — executes and returns the plan annotated with actual row counts + timing\nres, _ = db.Cypher(ctx, `PROFILE MATCH (n:Person) RETURN n`)\nfmt.Println(res.Plan.String())\n// PROFILE:\n// └── ProduceResults (n) [rows=42, time=150µs]\n//     └── NodeByLabelScan (n:Person) [rows=42]\n\n// The actual query results are still available:\nfor _, row := range res.Rows {\n    fmt.Println(row[\"n\"])\n}\n```\n\n#### Parameterized Queries\n\n```go\n// Use $param tokens to prevent injection and enable plan caching.\nres, _ := db.CypherWithParams(ctx,\n    `MATCH (n {name: $name}) WHERE n.age \u003e $minAge RETURN n`,\n    map[string]any{\"name\": \"Alice\", \"minAge\": 25},\n)\n```\n\n#### ORDER BY, LIMIT, Prepared Queries\n\n```go\nctx := context.Background()\n\n// ORDER BY + LIMIT — uses a min-heap for top-K efficiency\nres, _ := db.Cypher(ctx, `MATCH (n) WHERE n.age \u003e 20 RETURN n.name ORDER BY n.age DESC LIMIT 5`)\n\n// LIMIT push-down — stops scanning early when no ORDER BY is present\nres, _ = db.Cypher(ctx, `MATCH (n) RETURN n LIMIT 10`)\n\n// Prepared queries — parse once, execute many times\npq, _ := db.PrepareCypher(`MATCH (n {name: \"Alice\"})-[:follows]-\u003e(b) RETURN b.name`)\nres1, _ := db.ExecutePrepared(ctx, pq)\nres2, _ := db.ExecutePrepared(ctx, pq) // no re-parsing\n\n// Results\nfor _, row := range res.Rows {\n    fmt.Println(row[\"n.name\"], row[\"n.age\"])\n}\n```\n\n### Statistics\n\n```go\nstats, err := db.Stats()\nfmt.Printf(\"Nodes: %d, Edges: %d, Shards: %d, Disk: %.1f MB\\n\",\n    stats.NodeCount, stats.EdgeCount, stats.ShardCount,\n    float64(stats.DiskSizeBytes)/1024/1024)\n```\n\n### Data Integrity\n\n```go\n// Verify all node and edge data across all shards (CRC32 checksums).\nreport, err := db.VerifyIntegrity()\nfmt.Printf(\"Checked %d nodes, %d edges\\n\", report.NodesChecked, report.EdgesChecked)\n\nif report.OK() {\n    fmt.Println(\"All data intact!\")\n} else {\n    for _, e := range report.Errors {\n        fmt.Println(e) // \"shard 0, nodes[00000001]: props checksum mismatch ...\"\n    }\n}\n```\n\n## Replication\n\nGraphDB supports **single-leader replication** with automatic failover. One node accepts all writes (the leader), while multiple read replicas (followers) serve read queries for horizontal read scaling.\n\n### Architecture\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                      Replication Cluster                        │\n│                                                                 │\n│  ┌──────────────┐   gRPC StreamWAL   ┌──────────────┐          │\n│  │    Leader     │ ────────────────► │  Follower 1   │          │\n│  │              │                    │              │          │\n│  │  Writes ──► WAL ──────────────► │  Applier ──► DB│          │\n│  │              │                    └──────────────┘          │\n│  │              │   gRPC StreamWAL   ┌──────────────┐          │\n│  │              │ ────────────────► │  Follower 2   │          │\n│  │              │                    │              │          │\n│  │              │                    │  Applier ──► DB│          │\n│  └──────┬───────┘                    └──────────────┘          │\n│         │                                                       │\n│         │ Raft (leader election only)                           │\n│         └──────── heartbeats ──────── all nodes                 │\n│                                                                 │\n│  Query Router:                                                  │\n│    MATCH  → any node (local)                                    │\n│    CREATE → leader (forwarded via HTTP if received by follower) │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Components\n\n| Component | File(s) | Purpose |\n|---|---|---|\n| **WAL** | `wal.go`, `wal_entry.go` | Append-only log of all committed mutations. Segmented (64 MB), CRC32 checksums, msgpack encoding. Monotonic LSN for ordering. |\n| **Applier** | `applier.go` | Replays WAL entries on followers. Deterministic (uses leader's IDs), idempotent (skips duplicate LSNs), sequential. |\n| **Log Shipping** | `replication/server.go`, `replication/client.go` | gRPC server-streaming RPC. Leader streams WAL entries; follower applies them via the Applier. Auto-reconnect with exponential backoff. |\n| **Leader Election** | `replication/election.go` | hashicorp/raft integration for automatic leader election and failover. Raft is used only for election — data flows through the WAL pipeline. |\n| **Query Router** | `replication/router.go` | Routes reads locally, forwards writes to the leader via HTTP. Integrates with election for dynamic leader discovery. |\n| **Write Handler** | `replication/write_handler.go` | HTTP endpoint on the leader that accepts forwarded write operations from follower routers. |\n\n### Configuration\n\n```go\n// Leader node\nleader, _ := graphdb.Open(\"./data\", graphdb.Options{\n    ShardCount: 4,\n    EnableWAL:  true,\n    Role:       \"leader\",\n})\n\n// Follower node\nfollower, _ := graphdb.Open(\"./data-replica\", graphdb.Options{\n    ShardCount: 4,\n    Role:       \"follower\",  // rejects all direct writes\n})\n```\n\n### Roles\n\n- **`\"\"` or `\"standalone\"`** — default, no replication. WAL is optional.\n- **`\"leader\"`** — accepts writes, records to WAL, ships entries to followers.\n- **`\"follower\"`** — read-only. All public write methods return `ErrReadOnlyReplica`. Only the internal Applier can write.\n\nRoles can be changed at runtime via `db.SetRole(\"leader\")` — used by the Raft election callback when leadership changes.\n\n### WAL Format\n\n```\n┌──────────┬──────────────────┬──────────┐\n│ 4B length│ msgpack WALEntry  │ 4B CRC32 │  ← one frame\n└──────────┴──────────────────┴──────────┘\n```\n\n- **Segment files**: `wal-0000000000.log`, `wal-0000000001.log`, …\n- **Rotation**: new segment at 64 MB\n- **16 operation types**: AddNode, AddNodeBatch, UpdateNode, SetNodeProps, DeleteNode, AddEdge, AddEdgeBatch, DeleteEdge, UpdateEdge, AddNodeWithLabels, AddLabel, RemoveLabel, CreateIndex, DropIndex, CreateCompositeIndex, DropCompositeIndex\n- **Tailing**: WALReader supports live tailing — returns `io.EOF` at the end of the active segment, resumes on next call when new data is appended\n\n### Write Forwarding\n\nWhen a follower's Router receives a write request:\n1. The local DB rejects it with `ErrReadOnlyReplica`\n2. The Router serializes the operation as JSON\n3. The operation is forwarded to the leader's `/api/write` HTTP endpoint\n4. The leader executes it locally and returns the result\n5. The mutation flows back to followers via the WAL → gRPC pipeline\n\n## Architecture\n\n```\n┌──────────────────────────────────────────────────────────────┐\n│                     Management UI                            │\n│   React + TypeScript + Tailwind · cytoscape.js · CodeMirror  │\n│   Query Editor · Dashboard · Indexes · Explorer              │\n├──────────────────────────────────────────────────────────────┤\n│                    HTTP / JSON API                            │\n│   /api/cypher · /api/nodes · /api/edges · /api/indexes       │\n│   /api/stats · /api/write (forwarded writes) · CORS          │\n├──────────────────────────────────────────────────────────────┤\n│                     Replication Layer                         │\n│   WAL · gRPC Log Shipping · Applier · Raft Election          │\n│   Query Router · Write Forwarding · Role Management          │\n├──────────────────────────────────────────────────────────────┤\n│                        Public API                            │\n│   Node/Edge CRUD · Labels · Transactions (Begin/Commit)      │\n│   BFS/DFS · Paths · Query Builder · VerifyIntegrity          │\n├──────────────────────────────────────────────────────────────┤\n│                     Cypher Engine                            │\n│   Lexer → Parser → AST → Executor (index-aware)             │\n│   EXPLAIN/PROFILE · OPTIONAL MATCH · Parameterized ($param)  │\n│   Query plan cache · LIMIT push-down · Top-K heap           │\n├──────────────────────────────────────────────────────────────┤\n│                    Shard Manager                             │\n│   Hash-based routing · Cross-shard edge handling             │\n│   Worker pool · Sharded LRU node cache                       │\n├──────────────────────────────────────────────────────────────┤\n│                   Storage Layer                              │\n│   bbolt (B+tree) · Memory-mapped files · MVCC               │\n│   MessagePack encoding · CRC32 checksums · Labels index      │\n├──────────────────────────────────────────────────────────────┤\n│  nodes│edges│adj_*│idx_prop│idx_edge_type│node_labels│idx_lbl│\n└──────────────────────────────────────────────────────────────┘\n```\n\n### Storage Layout (bbolt buckets)\n\n| Bucket | Key | Value | Purpose |\n|---|---|---|---|\n| `nodes` | `uint64 nodeID` | MessagePack props + CRC32 | Node data |\n| `edges` | `uint64 edgeID` | Binary edge + CRC32 | Edge data (from, to, label, props) |\n| `adj_out` | `nodeID \\| edgeID` | `targetID \\| label` | Outgoing adjacency list |\n| `adj_in` | `nodeID \\| edgeID` | `sourceID \\| label` | Incoming adjacency list |\n| `idx_prop` | `\"prop:value\" \\| nodeID` | ∅ | Secondary property index |\n| `idx_edge_type` | `\"label\" \\| edgeID` | ∅ | Edge type index |\n| `node_labels` | `uint64 nodeID` | MessagePack `[]string` | Node label storage |\n| `idx_node_label` | `\"label\" \\| nodeID` | ∅ | Label → node index |\n| `meta` | `\"node_counter\"` / `\"edge_counter\"` | `uint64` | ID allocation counters |\n\n### Concurrency Model\n\n- **Reads** are fully parallel — `GetNode`, `BFS`, `Cypher`, etc. never acquire a mutex. bbolt's MVCC provides snapshot isolation.\n- **Writes** are serialized per-shard by bbolt's single-writer lock.\n- The `closed` flag is an `atomic.Bool` — checked by every operation without locking.\n- A built-in worker pool (default 8 goroutines) dispatches concurrent queries.\n\n### Sharding\n\nWhen `ShardCount \u003e 1`, node IDs are hash-partitioned (`nodeID % shardCount`) across separate bbolt files:\n\n- **Edges** are co-located with their source node → `OutEdges(x)` hits **1 shard**.\n- **Incoming adjacency** is stored in the target node's shard → `InEdges(x)` hits **1 shard**.\n- Cross-shard edge creation uses two separate transactions (two fsyncs) instead of one.\n\nFor most use cases, `ShardCount: 1` (default) is sufficient and avoids cross-shard overhead.\n\n## Management UI\n\nGraphDB ships with a built-in web-based management console for exploring your data visually.\n\n### Running the UI\n\n```bash\n# Build the React frontend (one-time)\ncd ui \u0026\u0026 npm install \u0026\u0026 npm run build \u0026\u0026 cd ..\n\n# Start the server (serves both the API and the UI)\ngo run ./cmd/graphdb-ui/ -db ./my-data.db -ui ./ui/dist\n# → Open http://localhost:7474\n```\n\nFor development with hot-reload:\n\n```bash\n# Terminal 1 — Go API server\ngo run ./cmd/graphdb-ui/ -db ./my-data.db\n\n# Terminal 2 — React dev server (auto-proxies API calls)\ncd ui \u0026\u0026 npm run dev\n# → Open http://localhost:5173\n```\n\n### Pages\n\n| Page | What it does |\n|---|---|\n| **Query Editor** | Write and run Cypher queries with syntax highlighting. Results are shown as a table or as an interactive graph. Includes example queries and keeps a history of past queries. Press `Ctrl+Enter` to execute. |\n| **Dashboard** | See your database at a glance — total nodes, edges, shard count, disk usage, and which indexes are active. Quick links to other pages. |\n| **Index Management** | Create, drop, or rebuild property indexes through the UI. Each index is shown with its type (B+tree) and status. |\n| **Graph Explorer** | Browse all nodes in a paginated list. Click any node to see its properties and a visual graph of its direct connections. Click nodes in the graph to navigate and explore further. |\n\n### REST API\n\nThe UI communicates through a JSON API that you can also use directly:\n\n```bash\n# Database stats\ncurl http://localhost:7474/api/stats\n\n# Run a Cypher query\ncurl -X POST http://localhost:7474/api/cypher \\\n  -d '{\"query\": \"MATCH (n) RETURN n LIMIT 10\"}'\n\n# List indexes\ncurl http://localhost:7474/api/indexes\n\n# Create an index\ncurl -X POST http://localhost:7474/api/indexes \\\n  -d '{\"property\": \"name\"}'\n\n# List nodes (paginated)\ncurl http://localhost:7474/api/nodes?limit=20\u0026offset=0\n\n# Get a node's neighborhood (node + neighbors + edges)\ncurl http://localhost:7474/api/nodes/1/neighborhood\n\n# Create a node\ncurl -X POST http://localhost:7474/api/nodes \\\n  -d '{\"props\": {\"name\": \"Alice\", \"age\": 30}}'\n\n# Create an edge\ncurl -X POST http://localhost:7474/api/edges \\\n  -d '{\"from\": 1, \"to\": 2, \"label\": \"follows\"}'\n\n# Cursor pagination (O(limit) per page)\ncurl 'http://localhost:7474/api/nodes/cursor?limit=20'\ncurl 'http://localhost:7474/api/nodes/cursor?cursor=42\u0026limit=20'\ncurl 'http://localhost:7474/api/edges/cursor?limit=50'\n\n# Prepare and execute a statement\ncurl -X POST http://localhost:7474/api/cypher/prepare \\\n  -d '{\"query\": \"MATCH (n {name: $name}) RETURN n\"}'\ncurl -X POST http://localhost:7474/api/cypher/execute \\\n  -d '{\"stmt_id\": \"abc123\", \"params\": {\"name\": \"Alice\"}}'\n\n# NDJSON streaming\ncurl -X POST http://localhost:7474/api/cypher/stream \\\n  -d '{\"query\": \"MATCH (n) RETURN n.name LIMIT 100\"}'\n\n# Prometheus metrics\ncurl http://localhost:7474/metrics\n\n# Query cache stats\ncurl http://localhost:7474/api/cache/stats\n```\n\n### Tech Stack\n\n- **Frontend**: React 18, TypeScript, Vite, Tailwind CSS, [cytoscape.js](https://js.cytoscape.org/) (graph rendering), [CodeMirror](https://codemirror.net/) (query editor), [Lucide](https://lucide.dev/) (icons)\n- **Backend**: Go `net/http` with the standard library router (Go 1.22+), no external web framework\n\n## Examples\n\n```bash\n# Minimal quickstart\ngo run ./cmd/graphdb/\n\n# Social network — CRUD, traversals, paths, query builder, concurrency\ngo run ./examples/social/\n\n# Cypher query patterns — all 7 read patterns + ORDER BY + LIMIT\ngo run ./examples/cypher/\n\n# Benchmark — 100K nodes, batch insert, index-aware Cypher performance\ngo run ./examples/benchmark/\n\n# Labels, transactions, parameterized queries\ngo run ./examples/labels_tx/\n\n# EXPLAIN/PROFILE — query plan inspection and profiling\ngo run ./examples/explain_profile/\n\n# OPTIONAL MATCH — left-outer-join semantics\ngo run ./examples/optional_match/\n\n# Data integrity — CRC32 checksums + VerifyIntegrity scan\ngo run ./examples/integrity/\n```\n\n## Benchmarks\n\nRun the built-in benchmarks:\n\n```bash\ngo test -bench=. -benchmem\n```\n\nOr run the 100K-node performance example:\n\n```bash\ngo run ./examples/benchmark/\n```\n\nTypical results on Apple M-series:\n\n| Operation | Throughput |\n|---|---|\n| AddNodeBatch (100K nodes) | ~120 ms |\n| CreateIndex (100K nodes) | ~180 ms |\n| FindByProperty (indexed) | \u003c 1 ms |\n| Cypher property filter (indexed, 100K) | \u003c 1 ms |\n| Cypher 1-hop traversal (indexed) | \u003c 1 ms |\n| Cypher ORDER BY + LIMIT 10 (100K) | ~60 ms |\n| 1000× repeated Cypher (cached) | ~200 ms |\n\n## Testing\n\n```bash\ngo test -v ./...\ngo test -race ./...        # race detector\ngo test -bench=. -benchmem # benchmarks\n```\n\n## Roadmap\n\n### Phase 1 — Foundation\n- [ ] **Hot Backup / Restore** — consistent snapshot using bbolt's built-in `WriteTo`, zero downtime\n- [x] **Write-Ahead Log (WAL)** — append-only segmented log (64 MB segments, CRC32, msgpack) with WALReader tailing support\n- [x] **Write Cypher** — `CREATE` support in the Cypher engine\n- [x] **Prometheus Metrics** — atomic counters with Prometheus text exposition\n\n### Phase 2 — Replication \u0026 Reliability\n- [x] **Single-Leader Replication** — WAL → gRPC log shipping → follower Applier pipeline with 16 operation types\n- [x] **Leader Election** — hashicorp/raft integration for automatic failover with dynamic role switching\n- [x] **Query Router** — read/write routing with HTTP write forwarding from followers to leader\n- [x] **Read-Only Replicas** — `writeGuard` on all 16 public write methods, `ErrReadOnlyReplica` sentinel\n- [ ] **Point-in-Time Recovery** — replay WAL from a backup snapshot to restore data to any past timestamp\n- [ ] **Change Data Capture (CDC)** — streaming API for external consumers to subscribe to graph mutations in real time\n- [ ] **Authentication \u0026 TLS** — user/password auth and encrypted connections for network-exposed deployments\n\n### Phase 3 — Distributed Cluster\n- [x] **gRPC Inter-Node Protocol** — `StreamWAL` server-streaming RPC for replication with auto-reconnect\n- [ ] **Cluster Membership** — node discovery and health checking via gossip protocol (`hashicorp/memberlist`)\n- [ ] **Shard Placement Manager** — catalog of shard→node assignments, stored in its own Raft group\n- [ ] **Distributed Query Coordinator** — route Cypher queries to the correct node(s), scatter-gather for cross-shard queries, merge results\n- [ ] **Distributed Edge Writes** — two-phase commit for edges that span different cluster nodes\n- [ ] **Shard Rebalancing \u0026 Migration** — move shards between nodes when a node joins or leaves the cluster\n- [ ] **Cluster-Aware UI** — topology view, per-node stats, shard distribution map, leader/follower status\n\n### Phase 4 — Production Hardening\n- [ ] **Range Indexes** — B+tree range scans for numerical/date properties (`WHERE n.age \u003e 25` without full scan)\n- [ ] **Graph Partitioning** — smarter shard placement (METIS/Fennel) to minimize cross-shard edges\n- [ ] **Bloom Filters** — fast `HasEdge()` checks without touching disk\n- [ ] **Schema Constraints** — unique properties, required fields, edge cardinality rules\n- [x] **Query Timeout \u0026 Cancellation** — context-based cancellation for long-running queries\n- [ ] **Connection Pooling \u0026 Rate Limiting** — protect against runaway queries in multi-tenant setups\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmstryoda%2Fgoraphdb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmstryoda%2Fgoraphdb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmstryoda%2Fgoraphdb/lists"}