{"id":40273156,"url":"https://github.com/devmessias/gsql2rsql","last_synced_at":"2026-02-22T16:36:08.710Z","repository":{"id":333589066,"uuid":"1136400896","full_name":"devmessias/gsql2rsql","owner":"devmessias","description":"OpenCypher to Databricks/spark SQL transpiler - query your Delta Lake tables as a graph","archived":false,"fork":false,"pushed_at":"2026-02-18T18:54:14.000Z","size":6054,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-18T19:52:39.879Z","etag":null,"topics":["databricks","graph","graph-database","spark-sql","sql"],"latest_commit_sha":null,"homepage":"https://devmessias.github.io/gsql2rsql","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devmessias.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-17T16:17:23.000Z","updated_at":"2026-02-18T18:54:16.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/devmessias/gsql2rsql","commit_stats":null,"previous_names":["devmessias/gsql2rsql"],"tags_count":27,"template":false,"template_full_name":null,"purl":"pkg:github/devmessias/gsql2rsql","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmessias%2Fgsql2rsql","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmessias%2Fgsql2rsql/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmessias%2Fgsql2rsql/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmessias%2Fgsql2rsql/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devmessias","download_url":"https://codeload.github.com/devmessias/gsql2rsql/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devmessias%2Fgsql2rsql/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29718470,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-22T15:10:41.462Z","status":"ssl_error","status_checked_at":"2026-02-22T15:10:04.636Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["databricks","graph","graph-database","spark-sql","sql"],"created_at":"2026-01-20T03:00:44.306Z","updated_at":"2026-02-22T16:36:08.686Z","avatar_url":"https://github.com/devmessias.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# gsql2rsql\n\n[![PyPI version](https://badge.fury.io/py/gsql2rsql.svg)](https://badge.fury.io/py/gsql2rsql)\n[![CI](https://github.com/devmessias/gsql2rsql/actions/workflows/ci.yml/badge.svg)](https://github.com/devmessias/gsql2rsql/actions/workflows/ci.yml)\n[![Documentation](https://img.shields.io/badge/docs-mkdocs-blue)](https://devmessias.github.io/gsql2rsql)\n[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\n\n\n**Query your Delta Tables as a Graph**\n\nNo need for a separate graph database. Write intuitive OpenCypher queries, get Databricks SQL automatically.\n\n\u003e **Why Databricks?**\n\u003e\n\u003e Databricks provides tables designed for massive scale, enabling efficient storage and querying of tens of billions of triples with features like time travel No ETL or migration needed—just query your data lake as a graph. Recently, Databricks released support for recursive queries, unlocking the use of SQL warehouses for graph-type queries.\n\u003e\n\n---\n\n## Why gsql2rsql?\n\n| Challenge | Solution |\n|-----------|----------|\n| Graph queries require complex SQL with `WITH RECURSIVE` | Write 5 lines of Cypher instead |\n| Need to maintain a separate graph database | Query Delta Lake directly |\n| LLM-generated complex SQL is hard to audit | Human-readable Cypher + deterministic transpilation (optionally pass to LLM for final optimization) |\n| Scaling to tens of billions of triples is costly in graph DBs | Delta Lake stores billions of triples efficiently, with Spark scalability |\n\n## See It in Action\n\n```bash\npip install gsql2rsql\n```\n\n```python\nfrom gsql2rsql import GraphContext\n\n# Point to your existing Delta tables - no migration needed\ngraph = GraphContext(\n    nodes_table=\"catalog.fraud.nodes\",\n    edges_table=\"catalog.fraud.edges\",\n)\n\n# Write graph queries with familiar Cypher syntax\nsql = graph.transpile(\"\"\"\n    MATCH path = (origin:Person {id: 12345})-[:TRANSACTION*1..4]-\u003e(dest:Person)\n    WHERE dest.risk_score \u003e 0.8\n    RETURN dest.id, dest.name, dest.risk_score, length(path) AS depth\n    ORDER BY depth, dest.risk_score DESC\n    LIMIT 3\n\"\"\")\n\nprint(sql)\n```\n\n**5 lines of Cypher → optimized Databricks SQL with recursive CTEs**\n\n\u003cdetails\u003e\n\u003csummary\u003eClick to see the generated SQL (auto-generated from transpiler)\u003c/summary\u003e\n\n```sql\nWITH RECURSIVE\n  paths_1 AS (\n-- Base case: direct edges (depth = 1)\nSELECT\n  e.src AS start_node,\n  e.dst AS end_node,\n  1 AS depth,\n  ARRAY(e.src, e.dst) AS path,\n  ARRAY(NAMED_STRUCT('src', e.src, 'dst', e.dst, 'amount', e.amount, 'timestamp', e.timestamp)) AS path_edges,\n  ARRAY(e.src) AS visited\nFROM catalog.fraud.edges e\nJOIN catalog.fraud.nodes src ON src.id = e.src\nWHERE (relationship_type = 'TRANSACTION') AND (src.id) = (12345)\n\nUNION ALL\n\n-- Recursive case: extend paths\nSELECT\n  p.start_node,\n  e.dst AS end_node,\n  p.depth + 1 AS depth,\n  CONCAT(p.path, ARRAY(e.dst)) AS path,\n  ARRAY_APPEND(p.path_edges, NAMED_STRUCT('src', e.src, 'dst', e.dst, 'amount', e.amount, 'timestamp', e.timestamp)) AS path_edges,\n  CONCAT(p.visited, ARRAY(e.src)) AS visited\nFROM paths_1 p\nJOIN catalog.fraud.edges e\n  ON p.end_node = e.src\nWHERE p.depth \u003c 4\n  AND NOT ARRAY_CONTAINS(p.visited, e.dst)\n  AND (relationship_type = 'TRANSACTION')\n  )\nSELECT \n   _gsql2rsql_dest_id AS id\n  ,_gsql2rsql_dest_name AS name\n  ,_gsql2rsql_dest_risk_score AS risk_score\n  ,(SIZE(_gsql2rsql_path_id) - 1) AS depth\nFROM (\n  SELECT\n sink.id AS _gsql2rsql_dest_id\n,sink.name AS _gsql2rsql_dest_name\n,sink.risk_score AS _gsql2rsql_dest_risk_score\n,source.id AS _gsql2rsql_origin_id\n,source.name AS _gsql2rsql_origin_name\n,source.risk_score AS _gsql2rsql_origin_risk_score\n,p.start_node\n,p.end_node\n,p.depth\n,p.path AS _gsql2rsql_path_id\n,p.path_edges AS _gsql2rsql_path_edges\n  FROM paths_1 p\n  JOIN catalog.fraud.nodes sink\nON sink.id = p.end_node\n  JOIN catalog.fraud.nodes source\nON source.id = p.start_node\n  WHERE p.depth \u003e= 1 AND p.depth \u003c= 4 AND (sink.risk_score) \u003e (0.8)\n) AS _proj\nORDER BY depth ASC, _gsql2rsql_dest_risk_score DESC\nLIMIT 3\n```\n\n\u003c/details\u003e\n\n---\n\n\u003e **Early Stage Project — Not for OLTP or end-user queries**\n\u003e\n\u003e This project is in **early development**. APIs may change, features may be incomplete, and bugs are expected. Contributions and feedback are welcome!\n\u003e\n\u003e This transpiler is for **internal analytics and exploration** (data science, engineering, analysis). It obviously makes no sense for OLTP! If you plan to expose transpiled queries to end users, be careful: implement validation, rate limiting, and security. Use common sense.\n\u003e\n\u003e\n\n## Real-World Examples\n\n### Fraud Detection\n\n    ```cypher\n    -- Find fraud rings: accounts connected through suspicious transactions\n    MATCH (a:Account)-[:TRANSFER*2..4]-\u003e(b:Account)\n    WHERE a.flagged = true AND b.flagged = true\n    RETURN DISTINCT a.id, b.id, length(path) AS hops\n    ```\n\n    [See more fraud detection queries →](examples/fraud.md)\n\n### Credit Analysis\n\n    ```cypher\n    -- Analyze credit exposure through guarantor chains\n    MATCH path = (borrower:Customer)-[:GUARANTEES*1..3]-\u003e(guarantor:Customer)\n    WHERE borrower.credit_score \u003c 600\n    RETURN borrower.id, COLLECT(guarantor.id) AS chain\n    ```\n\n    [See more credit analysis queries →](examples/credit.md)\n\n### Social Network\n\n    ```cypher\n    -- Friends of friends who work at tech companies\n    MATCH (me:Person {id: 123})-[:KNOWS*1..2]-\u003e(friend)-[:WORKS_AT]-\u003e(c:Company)\n    WHERE c.industry = 'Technology'\n    RETURN DISTINCT friend.name, c.name\n    ```\n\n    [See all feature examples →](examples/features.md)\n\n---\n\n**That's it!** No schema boilerplate, no complex setup.\n\n[Full User Guide →](user-guide.md)\n\n---\n\n## Low-Level API (Without GraphContext)\n\nFor advanced use cases or non-Triple-Store schemas, use the components directly:\n\n```python\nfrom gsql2rsql import OpenCypherParser, LogicalPlan, SQLRenderer\nfrom gsql2rsql.common.schema import NodeSchema, EdgeSchema, EntityProperty\nfrom gsql2rsql.renderer.schema_provider import SimpleSQLSchemaProvider, SQLTableDescriptor\n\n# 1. Define schema (SimpleSQLSchemaProvider)\nschema = SimpleSQLSchemaProvider()\n\nperson = NodeSchema(\n    name=\"Person\",\n    node_id_property=EntityProperty(\"id\", int),\n    properties=[EntityProperty(\"name\", str)],\n)\nschema.add_node(\n    person,\n    SQLTableDescriptor(table_name=\"people\", node_id_columns=[\"id\"]),\n)\n\nknows = EdgeSchema(\n    name=\"KNOWS\",\n    source_node_id=\"Person\",\n    sink_node_id=\"Person\",\n)\nschema.add_edge(\n    knows,\n    SQLTableDescriptor(table_name=\"friendships\"),\n)\n\n# 2. Transpile\nparser = OpenCypherParser()\nast = parser.parse(\"MATCH (p:Person)-[:KNOWS]-\u003e(f:Person) RETURN p.name, f.name\")\nplan = LogicalPlan.process_query_tree(ast, schema)\nplan.resolve(original_query=\"...\")\n\nrenderer = SQLRenderer(db_schema_provider=schema)\nsql = renderer.render_plan(plan)\n```\n\n---\n\n## Key Features\n\n| Feature | Description |\n|---------|-------------|\n| **Variable-length paths** | `[:REL*1..5]` via `WITH RECURSIVE` |\n| **Cycle detection** | Automatic `ARRAY_CONTAINS` checks |\n| **Path functions** | `length(path)`, `nodes(path)`, `relationships(path)` |\n| **No-label nodes** | `(a)-[:REL]-\u003e(b:Label)` matches any node type for `a` |\n| **Inline filters** | `(n:Person {id: 123})` pushes predicates to source |\n| **Undirected edges** | `(a)-[:KNOWS]-(b)` via optimized UNION ALL |\n| **Aggregations** | COUNT, SUM, AVG, COLLECT, etc. |\n| **Type safety** | Schema validation before SQL generation |\n\n---\n\n## Architecture\n\ngsql2rsql uses a **4-phase pipeline** for correctness:\n\n```\nOpenCypher → Parser → Planner → Resolver → Renderer → SQL\n```\n\n1. **Parser**: Cypher → AST (syntax only, no schema)\n2. **Planner**: AST → Logical operators (semantics)\n3. **Resolver**: Validate columns \u0026 types against schema\n4. **Renderer**: Operators → Databricks SQL\n\nThis separation ensures each phase has clear responsibilities and can be tested independently.\n\n---\n\n## Documentation\n\n| Section | Description |\n|---------|-------------|\n| [**User Guide**](user-guide.md) | Getting started, GraphContext, schema setup |\n| [**Examples**](examples/index.md) | 69 complete queries with generated SQL |\n\n---\n\n## Project Status\n\n\u003e **Research Project**\n\u003e\n\n **Contributions welcome!**\n\n- [GitHub Repository](https://github.com/devmessias/gsql2rsql)\n- [Issue Tracker](https://github.com/devmessias/gsql2rsql/issues)\n- [Contributing Guide](contributing.md)\n\n---\n\n## License\n\nMIT License - see [LICENSE](https://github.com/devmessias/gsql2rsql/blob/main/LICENSE)\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmessias%2Fgsql2rsql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevmessias%2Fgsql2rsql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevmessias%2Fgsql2rsql/lists"}