{"id":19376644,"url":"https://github.com/gavinray97/graphqlcalcite","last_synced_at":"2025-10-05T07:49:37.372Z","repository":{"id":43641644,"uuid":"440668037","full_name":"GavinRay97/GraphQLCalcite","owner":"GavinRay97","description":"Generating Federated GraphQL API's from Datasources with Apache Calcite","archived":false,"fork":false,"pushed_at":"2022-02-21T18:59:57.000Z","size":1959,"stargazers_count":35,"open_issues_count":1,"forks_count":4,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-10-05T07:49:35.357Z","etag":null,"topics":["apache-calcite","calcite","graphql","graphql-to-sql"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GavinRay97.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-12-21T22:34:43.000Z","updated_at":"2025-10-04T02:51:29.000Z","dependencies_parsed_at":"2022-09-15T02:01:17.249Z","dependency_job_id":null,"html_url":"https://github.com/GavinRay97/GraphQLCalcite","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/GavinRay97/GraphQLCalcite","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GavinRay97%2FGraphQLCalcite","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GavinRay97%2FGraphQLCalcite/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GavinRay97%2FGraphQLCalcite/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GavinRay97%2FGraphQLCalcite/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GavinRay97","download_url":"https://codeload.github.com/GavinRay97/GraphQLCalcite/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GavinRay97%2FGraphQLCalcite/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278425499,"owners_count":25984686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-calcite","calcite","graphql","graphql-to-sql"],"created_at":"2024-11-10T08:44:41.612Z","updated_at":"2025-10-05T07:49:37.336Z","avatar_url":"https://github.com/GavinRay97.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Apache Calcite \u003c-\u003e Distributed, Federated GraphQL API\n\n- [Apache Calcite \u003c-\u003e Distributed, Federated GraphQL API](#apache-calcite---distributed-federated-graphql-api)\n- [Goals](#goals)\n- [Roadmap and Current Progress](#roadmap-and-current-progress)\n    - [The Roadmap](#the-roadmap)\n    - [Walkthrough of Current Progress](#walkthrough-of-current-progress)\n- [Technical Architecture](#technical-architecture)\n    - [Approach and Design](#approach-and-design)\n    - [Why Kotlin](#why-kotlin)\n- [Youtube Presentation](#youtube-presentation)\n- [Related Projects and Reference Material](#related-projects-and-reference-material)\n\nThis repo contains a work-in-progress prototype and research project on using Apache Calcite as the backbone of GraphQL\nservices.\n\nSimilar work has been done by LinkedIn on [Coral](https://github.com/linkedin/coral), though the GraphQL implementation\nis not yet publically available and Coral uses an internal form of IR that is slightly modified from Calcite's.\nAdditionally, the goals and usecase of Coral are somewhat different from those of this project.\n\n# Goals\n\n**As a user I want:**\n\n- To be able to access and modify data stored anywhere, given it has a well-defined structure (schema)\n    - This includes Relational Databases, Non-relational Databases, data stored on-disk (CSV, JSON, etc)\n- To be able to query data from multiple databases in a single query (distributed/federated querying)\n- Ability to join across data sources by defining virtual relationships\n- Performance\n    - Low latency, p90 for queries competitive with a hand-written implementation\n    - Federated queries should complete quickly enough to be usable in day-to-day clientside operations\n\n**As an engineer, I want:**\n\n- An industrial-grade query planner and optimizer backing the query execution\n- Extensibility. Ability to easily write new engines/adapters to run operations on\n    - IE: Exposing Airtable's API as a GraphQL/SQL-queryable source\n- To work with a widely-used and mature set of tools.\n- Standardization. To not invent anything myself, because:\n    - A) I'm probably not qualified to do so\n    - B) There are likely people who have spent the length of my lifetime thinking about and solving a similar problem\n\n# Roadmap and Current Progress\n\nAs of today, given a Calcite schema (IE from a JDBC connection, or any other adapter), this project can generate the\ncorresponding GraphQL schema and execute queries successfully. (See \"Walkthrough of Current Progress\" below)\n\nGraphQL queries against this schema are able to be converted into their corresponding Calcite Relational Algebra\nexpression, then executed against the Calcite adapter giving the proper results.\n\n## The Roadmap\n\n- [x] Convert a Calcite `Schema` into corresponding GraphQL Schema types\n    - [x] Generate GraphQL object types for tables\n    - [x] Generate `where` boolean expression type to allow filtering\n- [x] Convert a GraphQL query AST into matching Calcite Relational Algebra expression (`RelNode`) for the table\n    - [x] Where (critical/most important)\n    - [x] Limit\n    - [x] Offset\n    - [ ] Distinct\n    - [ ] Group By/Aggregations (nice to have)\n- [x] Execute the converted Relational Algebra expression against data source, returning correct results\n- [x] Automatically generate resolvers for the generated GraphQL query operations that perform the execution of the\n  query (current execution is invoked manually)\n- [x] Support Queries\n- [ ] Support Mutations\n- [x] Figure out whether it is possible to support Subscriptions with Calcite's adapter model\n    - [ ] Support subscriptions if so (nice to have)\n- [x] Support `JOIN` / nested GraphQL field access and queries\n- [x] Design system for dynamic Calcite schema registration and modification while program is running\n- [ ] Figure out how to let users implement their own data sources via HTTP (or similar)\n\n## Walkthrough of Current Progress\n\nTo give a sense of what exactly the above translates into, here's an illustration of the current functionality.\n\nGiven the schema of some Calcite data source table, like:\n\n```sql\nCREATE TABLE \"EMPS\"\n(\n    EMPNO    int,\n    DEPTNO   int,\n    ENAME    text,\n    HIREDATE timestamptz,\n    JOB      text,\n    MGR      int,\n    SAL      numeric,\n    COMM     numeric\n);\n```\n\nA GraphQL schema is generated:\n\n```graphql\ntype Query {\n    EMP(limit: Int, offset: Int, order_by: String, where: EMP_bool_exp): [EMP!]\n}\n\ntype EMP {\n    EMPNO: Int\n    # other fields...\n}\n\ninput EMP_bool_exp {\n    EMPNO: Int_comparison_exp\n    # other fields...\n    _and: [EMP_bool_exp!]\n    _not: EMP_bool_exp\n    _or: [EMP_bool_exp!]\n}\n```\n\nNow, if we write a GraphQL query against this generated schema, something like below:\n\n```graphql\nquery {\n    EMP(\n        limit: 2,\n        offset: 1,\n        where: {\n            _or: [\n                { DEPTNO: { _eq: 20 } },\n                { DEPTNO: { _eq: 30 } }\n            ]\n            _and: [\n                { SAL: { _gte: 1500 } }\n                {\n                    _or: [\n                        { JOB: { _eq: \"SALESMAN\" } },\n                        { JOB: { _eq: \"MANAGER\" } }\n                    ]\n                }\n            ]\n        }\n    ) {\n        ... columns\n    }\n}\n```\n\nWe can execute it. Calcite allows us to do a lot of things now. For instance, here is the query plan, at various stages\nof planning:\n\n```\n-- Logical Plan\nLogicalSort(offset=[1], fetch=[2])\n  LogicalFilter(condition=[AND(SEARCH($7, Sarg[20, 30]), \u003e=($5, 1500), SEARCH($2, Sarg['MANAGER':CHAR(8), 'SALESMAN']:CHAR(8)))])\n    JdbcTableScan(table=[[JDBC_SCOTT, EMP]])\n\n-- Mid Plan\nLogicalSort(subset=[rel#9:RelSubset#2.ENUMERABLE.[]], offset=[1], fetch=[2])\n  LogicalFilter(subset=[rel#6:RelSubset#1.NONE.[]], condition=[AND(SEARCH($7, Sarg[20, 30]), \u003e=($5, 1500), SEARCH($2, Sarg['MANAGER':CHAR(8), 'SALESMAN']:CHAR(8)))])\n    JdbcTableScan(subset=[rel#4:RelSubset#0.JDBC.JDBC_SCOTT.[]], table=[[JDBC_SCOTT, EMP]])\n\n-- Best Plan\nEnumerableLimit(offset=[1], fetch=[2])\n  JdbcToEnumerableConverter\n    JdbcFilter(condition=[AND(SEARCH($7, Sarg[20, 30]), \u003e=($5, 1500), SEARCH($2, Sarg['MANAGER':CHAR(8), 'SALESMAN']:CHAR(8)))])\n      JdbcTableScan(table=[[JDBC_SCOTT, EMP]])\n```\n\nAnd here is the relational expression built out of the GraphQL query, represented as a (simplified + optimized) SQL\nquery. This is immensely useful for debugging purposes and understanding what's happening:\n\n```sql\nSELECT *\nFROM \"SCOTT\".\"EMP\"\nWHERE \"DEPTNO\" IN (20, 30)\n  AND \"SAL\" \u003e= 1500\n  AND \"JOB\" IN ('MANAGER', 'SALESMAN')\nOFFSET 1 ROWS FETCH NEXT 2 ROWS ONLY\n```\n\nAnd finally, the results of our query:\n\n```sql\nEMPNO\n,ENAME,JOB,MGR,HIREDATE,SAL,COMM,DEPTNO\n\n7566,JONES,MANAGER,7839,1981-02-04,2975.00,null,20,\n7698,BLAKE,MANAGER,7839,1981-01-05,2850.00,null,30,\n```\n\nWe can use Calcite's query plan visualizer to understand what's going on:\n\n![calcite query plan web visualizer sample](./readme-images/calcite-query-plan-example.png)\n\nThere is still much left to prove out, but hopefully this should give some insight as to \"what the heck is it you're\ntrying to build\"?\n\n# Technical Architecture\n\n## Approach and Design\n\nThe high level approach that has been pursued here can be broken down by some explanatory images.\n\n\u003e Note: These images are taken from [Stamatis Zampetakis](http://people.apache.org/~zabetak/) fantastic presentation on\n\u003e YouTube\n\u003e\n\u003e [_\"An introduction to query processing \u0026 Apache Calcite\"_](https://www.youtube.com/watch?v=p1O3E33FIs8)\n\nThe following image gives a high-level overview of the pieces of most query processors:\n\n![query-processor-general-architecture](./readme-images/calcite-query-processor-architecture.png)\n\nBelow, we can see how these high-level pieces map to Calcite's API classes and interfaces, as well as the boundaries\nbetween the \"core\" pieces, and which pieces are open to being written by developers as extensions.\n\nCircled in blue are the two areas we are most interested in:\n\n- The region depicting the `SqlParser` and `SqlToRelConverter` shows how regular SQL queries are converted/translated\n  into Relational Algebra expressions. We should in theory (and in practice) be able to do a similar thing to convert\n  GraphQL queries into relational expressions.\n- The region on the righthand side, containing `Schema` and `CatalogReader` have been circled to call attention to how\n  the Server's GraphQL API is auto-generated. We can ask Calcite to give us the Schema for any of it's data sources, and\n  we are able to use the metadata from it to generate GraphQL types and resolvers.\n\n![query-processor-calcite-interfaces](./readme-images/calcite-query-processor-graphql-architecture.png)\n\nWith these pieces in place, you can see below how rather than SQL query, we might be able to write a GraphQL query with\nidentical semantics, and continue using Calcite as though we were a \"plain-old SQL query\"\n\n![sql-query-to-calcite-ast-example](./readme-images/graphql-query-to-relnode.png)\n\nVisualized in an image, the process is roughly:\n\n![query-pipeline](./readme-images/gql-calcite-query-pipeline-2.png)\n\nSome restrictions and assumptions made about the shape of the GraphQL API and corresponding queries, and this is what\nlets this entire thing be possible.\n\nIE, the generated GraphQL schema operations only allow queries which have behaviors/semantics that can be mapped 1-to-1\nto SQL.\n\nWith this, we can restrict the \"domain\" of GraphQL to the \"domain\" of standard SQL, and then our work is just one of\nwriting the facades/conversions.\n\nThis is a 10,000ft view of the technical architecture and approach taken to the problem in this project. For details,\nsee code.\n\n## Youtube Presentation\n\nVideo from the January 2022 Apache Calcite online Meetup where a very early draft of this work was presented:\n\n[![Apache Calcite 2022 Meetup January, GraphQL Presentation](./readme-images/graphql-apache-calcite-cover.png)](https://youtu.be/ae95vICkOnc \"Apache Calcite 2022 Meetup January, GraphQL Presentation\")\n\n## Why Kotlin\n\nThis project is written in Kotlin.\n\nIf you were to check the commit history, you would find that there were, at one point, functioning prototypes in both\nJava and Scala 3 too.\n\nUltimately, Kotlin struck the best balance between language features and tooling + support + ecosystem. It is an\nincredibly productive and pragmatic language.\n\nThis project is performance-critical, and benchmarks showed that the latest Kotlin (1.6.10) is competitive with the\nlatest Java (JDK 17) in terms of performance, and in some cases idiomatic code would perform somewhat better than the\nJava equivalent.\n\nIf further developments show negative performance impact from using a language other than Java, I will rewrite it all in\nJava.\n\n# Related Projects and Reference Material\n\n- [Substrait](https://github.com/substrait-io/substrait)\n- [Apache Drill](https://drill.apache.org/)\n- [Trino (formerly Presto)](https://trino.io/)\n- [Coral](https://github.com/linkedin/coral)\n    - https://engineering.linkedin.com/blog/2020/coral\n    - https://www.youtube.com/watch?v=C5t3QYch1Tk\n    - https://www.dremio.com/subsurface/live/session/coral-and-transport-udfs-building-blocks-of-a-postmodern-data-warehouse/\n  \n  \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgavinray97%2Fgraphqlcalcite","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgavinray97%2Fgraphqlcalcite","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgavinray97%2Fgraphqlcalcite/lists"}