{"id":29874963,"url":"https://github.com/datasqrl/sqrl","last_synced_at":"2026-04-09T14:06:37.193Z","repository":{"id":64465435,"uuid":"386832979","full_name":"DataSQRL/sqrl","owner":"DataSQRL","description":"Data Streaming Framework to build data APIs, data lakes, and LLM tooling with SQL.","archived":false,"fork":false,"pushed_at":"2025-07-28T18:52:09.000Z","size":60603,"stargazers_count":122,"open_issues_count":146,"forks_count":16,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-07-28T19:08:29.484Z","etag":null,"topics":["api","data-pipeline","database","event-driven","event-driven-microservices","streaming"],"latest_commit_sha":null,"homepage":"https://datasqrl.github.io/sqrl","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DataSQRL.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-07-17T04:11:30.000Z","updated_at":"2025-07-28T17:19:51.000Z","dependencies_parsed_at":"2023-11-07T01:16:53.831Z","dependency_job_id":"17aa3121-e940-49bf-913d-d660773065a1","html_url":"https://github.com/DataSQRL/sqrl","commit_stats":null,"previous_names":[],"tags_count":35,"template":false,"template_full_name":null,"purl":"pkg:github/DataSQRL/sqrl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fsqrl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fsqrl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fsqrl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fsqrl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DataSQRL","download_url":"https://codeload.github.com/DataSQRL/sqrl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DataSQRL%2Fsqrl/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267973560,"owners_count":24174408,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","data-pipeline","database","event-driven","event-driven-microservices","streaming"],"created_at":"2025-07-31T01:45:48.232Z","updated_at":"2026-02-10T21:16:15.962Z","avatar_url":"https://github.com/DataSQRL.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# DataSQRL\n\n[![CircleCI](https://dl.circleci.com/status-badge/img/gh/DataSQRL/sqrl/tree/main.svg?style=svg)](https://dl.circleci.com/status-badge/redirect/gh/DataSQRL/sqrl/tree/main)\n[![Docs](https://img.shields.io/badge/docs-available-brightgreen.svg)](https://datasqrl.github.io/sqrl)\n\u003c!--[![codecov](https://codecov.io/gh/datasqrl/sqrl/branch/main/graph/badge.svg)](https://codecov.io/gh/datasqrl/sqrl) --\u003e\n[![License](https://img.shields.io/github/license/datasqrl/sqrl.svg)](LICENSE)\n[![Docker Image Version](https://img.shields.io/docker/v/datasqrl/cmd?sort=semver)](https://hub.docker.com/r/datasqrl/cmd/tags)\n[![Maven Central](https://img.shields.io/maven-central/v/com.datasqrl/sqrl-root)](https://repo1.maven.org/maven2/com/datasqrl/sqrl-root/)\n\nDataSQRL is a data automation framework for building reliable data pipelines, data APIs (REST, MCP, GraphQL), and data products in SQL using open-source technologies.\n\nDataSQRL provides three key elements for AI-assisted data platform automation:\n1. **World Model:** DataSQRL builds a source-to-sink computational graph of the data processing including schemas, connectors, and mappings, which provides a comprehensive world model to ground generative AI. \n2. **Simulation:** DataSQRL includes a runtime and testing framework to ensure data integrity and act as a simulator in iterative refinement loops with real-world feedback. \n3. **Verification:** Since the entire data pipeline is defined in SQL, it is easy to understand and verify. DataSQRL produces detailed execution plans and lineage graphs to assist automated and manual analysis. \n\nDataSQRL generates the deployment artifacts to execute the entire pipeline on open-source technologies like PostgreSQL, Apache Kafka, Apache Flink, and Apache Iceberg on your existing infrastructure with Docker, Kubernetes, or cloud-managed services.\n\n![DataSQRL Pipeline Architecture](/documentation/static/img/diagrams/automation_overview.png)\n\nDataSQRL models data pipelines with the following requirements:\n\n* 🛡️ **Data Consistency Guarantees:** Exactly-once processing, data consistency across all outputs, schema alignment, and data lineage tracking.\n* 🔒 **Production-grade Reliability:** Robust, highly available, scalable, secure, access-controlled, and observable data pipelines.\n* 🚀 **Developer Workflow Integration:** Local development, quick iteration with feedback, CI/CD support, and comprehensive testing framework.\n\nTo learn more about DataSQRL, check out [the documentation](https://docs.datasqrl.com/).\n\nTo see how DataSQRL provides feedback and guides AI coding agents to build data products autonomously, [view this demo video](https://www.youtube.com/watch?v=RfMzdrtrEqQ).\n\n## Getting Started\n\nTo create a new data project with DataSQRL, use the `init` command in an empty folder.\n\n```bash\n docker run --rm -v $PWD:/build datasqrl/cmd init api messenger\n```\n(Use `${PWD}` in Powershell on Windows).\n\nThis creates a new data API project called `messenger` with some sample data sources and a simple data processing script called `messenger.sqrl`.\n\nRun the project with\n```bash\ndocker run -it --rm -p 8888:8888 -p 8081:8081 -v $PWD:/build datasqrl/cmd run messenger-prod-package.json\n```\n\nThis launches the entire data pipeline for ingesting, processing, storing, and serving messages.\nYou can access the API in your browser [http://localhost:8888/v1/graphiql/](http://localhost:8888/v1/graphiql/) and add messages with the following mutation:\n\n```graphql\nmutation {\n    Messages(event: {message: \"Hello World\"}) {\n    message_time\n  }\n}\n```\n\nQuery messages with:\n```graphql\n{\n    Messages {\n    message\n    message_time\n  }\n}\n```\n\nAlternatively, you can query messages through [REST](http://localhost:8888/v1/rest) or [MCP](http://localhost:8888/v1/mcp).\nOnce you are done, terminate the pipeline with `CTRL-C`.\n\nFor additional data processing, edit the `messenger.sqrl` script - for example to aggregate messages:\n```sql\nTotalMessages := SELECT COUNT(*) as num_messages, MAX(message_time) as latest_timestamp\n                 FROM Messages LIMIT 1;\n```\n\nTo run the test case, execute:\n```bash\ndocker run -it --rm -v $PWD:/build datasqrl/cmd test messenger-test-package.json\n```\n\nTo build the deployment assets for the data pipeline, execute\n```bash\ndocker run --rm -v $PWD:/build datasqrl/cmd compile messenger-prod-package.json\n``` \nThe `build/deploy` directory contains the Flink compiled plan, Kafka topic definitions, PostgreSQL schema and view definitions, server queries, MCP tool definitions, and GraphQL data model.\nThose assets can be deployed in containerized environments (e.g. via Kubernetes) or cloud-managed services. \n\nRead the [full Getting Started tutorial](https://docs.datasqrl.com//docs/getting-started) or check out the [DataSQRL Examples repository](https://github.com/DataSQRL/datasqrl-examples/) for more examples creating MCP servers, data APIs, Iceberg views and more.\n\n## Why DataSQRL?\n\nAI-driven data platform automation is within reach. However, trust-worthy automation requires more than generative AI.\nIt requires a [world model](https://lingo.csail.mit.edu/blog/world_models/) that understands your data landscape, enforces constraints, and provides the grounding and feedback loops needed for safe, reliable automation.\n\nDataSQRL is an open-source world model for data platform automation.\nAs a modular framework, it provides the building blocks to build a customized world model for your organization to give AI a set of guardrails that ensure generated solutions are safe, reliable, and perform well in production.\n\n## How DataSQRL Works\n\n![Example Data Processing DAG](documentation/static/img/screenshots/dag_example.png)\n\nDataSQRL is a modular compiler framework for data pipelines that (deterministically) automates a lot of data plumbing code in data pipelines.\nThis significantly reduces the complexity of AI-assisted (i.e. probabilistic) automation and provides feedback through deep introspection of the pipeline code.\n\nThis allows you to generate data processing logic in SQL using any AI coding tools or agents.\nDataSQRL compiles the SQL into a data processing DAG (Directed Acyclic Graph) according to the provided configuration.\nThe analyzer traverses the DAG to detect potential data inconsistencies, performance, or scalability issues.\nThe cost-based optimizer cuts the DAG into segments executed by different engines (e.g. Flink, Kafka, Postgres, Vert.x), generating the necessary physical plans, schemas, and connectors for a fully integrated, reliable, and consistent data pipeline.\nThe compiled artifacts are fed back to the AI for iterative refinement to improve the solution incrementally.\n\nIn addition, the compiled deployment assets can be executed locally in Docker, Kubernetes, or by a managed cloud service. DataSQRL comes with a testing framework for simulation of the data pipeline.\nThis provides real-world feedback on the results and operational characteristics that are included in the iterative refinement feedback loop.\n\nDataSQRL gives you full visibility and control over the generated data pipeline. Since the entire pipeline is implemented in SQL it is easy to understand and verify manually.\n\nDataSQRL uses proven open-source technologies to execute the generated deployment assets. You can use your existing infrastructure or cloud services for runtime, DataSQRL is only used at compile time. \n\nDataSQRL has a rich [function library](https://docs.datasqrl.com/docs/functions) and provides [connectors](https://docs.datasqrl.com/docs/connectors/) for many popular data systems (Kafka, Iceberg, Postgres, and many more).\nIn addition, DataSQRL is an extensible framework, and you can add custom functions, source/sink connectors, and entire execution engines.\n\n\u003c!--\n[DataSQRL Cloud](https://www.datasqrl.com) is a managed service that runs DataSQRL pipelines with no operational overhead and integrates directly with GitHub for simple deployments.\n--\u003e\n\nRead an [in-depth explanation of DataSQRL](https://docs.datasqrl.com/blog/data-platform-automation) or view the [full documentation](https://docs.datasqrl.com/) to learn more.\n\n\n## Contributing\n\n![Contribute to DataSQRL](documentation/static/img/undraw/code.svg)\n\nOur goal is to automate data platforms by building a world model that provides the necessary guardrails and feedback. We believe anyone who can read SQL should be empowered to build complex data systems that are robust and reliable.\nYour feedback is invaluable in achieving this goal. Let us know what works and what doesn't by filing GitHub issues or starting discussions.\n\nWe welcome code contributions. For more details, check out [`CONTRIBUTING.md`](CONTRIBUTING.md).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatasqrl%2Fsqrl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatasqrl%2Fsqrl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatasqrl%2Fsqrl/lists"}