{"id":13439614,"url":"https://github.com/sql-machine-learning/sqlflow","last_synced_at":"2025-05-13T22:04:22.733Z","repository":{"id":37432519,"uuid":"151525500","full_name":"sql-machine-learning/sqlflow","owner":"sql-machine-learning","description":"Brings SQL and AI together.","archived":false,"fork":false,"pushed_at":"2024-04-18T08:08:51.000Z","size":29014,"stargazers_count":5152,"open_issues_count":251,"forks_count":706,"subscribers_count":165,"default_branch":"develop","last_synced_at":"2025-05-06T23:39:11.146Z","etag":null,"topics":["ai","databases","deep-learning","machine-learning","sql-syntax","sqlflow","transpiler"],"latest_commit_sha":null,"homepage":"https://sqlflow.org","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sql-machine-learning.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-10-04T06:00:50.000Z","updated_at":"2025-05-04T04:43:35.000Z","dependencies_parsed_at":"2024-06-18T18:42:51.372Z","dependency_job_id":null,"html_url":"https://github.com/sql-machine-learning/sqlflow","commit_stats":{"total_commits":2098,"total_committers":47,"mean_commits":"44.638297872340424","dds":0.847950428979981,"last_synced_commit":"6c492098320875427b08ad82ce3f874c0b6aaa7a"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sql-machine-learning%2Fsqlflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sql-machine-learning%2Fsqlflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sql-machine-learning%2Fsqlflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sql-machine-learning%2Fsqlflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sql-machine-learning","download_url":"https://codeload.github.com/sql-machine-learning/sqlflow/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254036813,"owners_count":22003653,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","databases","deep-learning","machine-learning","sql-syntax","sqlflow","transpiler"],"created_at":"2024-07-31T03:01:15.607Z","updated_at":"2025-05-13T22:04:22.687Z","avatar_url":"https://github.com/sql-machine-learning.png","language":"Go","readme":"# SQLFlow\n\n[![CI](https://github.com/sql-machine-learning/sqlflow/workflows/CI/badge.svg)](https://github.com/sql-machine-learning/sqlflow/actions)\n[![codecov](https://codecov.io/gh/sql-machine-learning/sqlflow/branch/develop/graph/badge.svg)](https://codecov.io/gh/sql-machine-learning/sqlflow)\n[![GoDoc](https://godoc.org/github.com/sql-machine-learning/sqlflow?status.svg)](https://godoc.org/github.com/sql-machine-learning/sqlflow) \n[![License](https://img.shields.io/badge/license-Apache%202-blue.svg)](LICENSE) \n[![Go Report Card](https://goreportcard.com/badge/github.com/sql-machine-learning/sqlflow)](https://goreportcard.com/report/github.com/sql-machine-learning/sqlflow)\n\n## What is SQLFlow\n\nSQLFlow is a compiler that compiles a SQL program to a workflow that runs on Kubernetes. The input is a SQL program that written in our extended SQL grammar to support AI jobs including training, prediction, model evaluation, model explanation, custom jobs, and mathematical programming. The output is an [Argo](https://argoproj.github.io/) workflow that runs on a Kubernetes cluster distributed.\n\nSQLFlow supports various database systems like MySQL, MariaDB, [TiDB](https://pingcap.com/en/), Hive, [MaxCompute](https://www.aliyun.com/product/odps) and many  machine learning toolkits like [TensorFlow](https://github.com/tensorflow/tensorflow), [Keras](https://keras.io/), [XGBoost](https://github.com/dmlc/xgboost).\n\nTry SQLFlow **NOW** in our playground https://playground.sqlflow.tech/ and check out the handy tutorials in it.\n\n![](https://github.com/sql-machine-learning/sql-machine-learning.github.io/raw/master/assets/instruction.gif)\n\n## Motivation\n\nThe current experience of development ML based applications requires a team of data engineers, data scientists, business analysts as well as a proliferation of advanced languages and programming tools like Python, SQL, SAS, SASS, Julia, R. The fragmentation of tooling and development environment brings additional difficulties in engineering to model training/tuning. What if we marry the most widely used data management/processing language SQL with ML/system capabilities and let engineers with SQL skills develop advanced ML based applications?\n\nThere are already some work in progress in the industry. We can write simple machine learning prediction (or scoring) algorithms in SQL using operators like [`DOT_PRODUCT`](https://thenewstack.io/sql-fans-can-now-develop-ml-applications/). However, this requires copy-n-pasting model parameters from the training program to SQL statements. In the commercial world, we see some proprietary SQL engines providing extensions to support machine learning capabilities.\n\n- [Microsoft SQL Server](https://docs.microsoft.com/en-us/sql/advanced-analytics/?view=sql-server-2017): Microsoft SQL Server has the machine learning service that runs machine learning programs in R or Python as an external script.\n- [Teradata SQL for DL](https://www.linkedin.com/pulse/sql-deep-learning-sql-dl-omri-shiv): Teradata also provides a RESTful service, which is callable from the extended SQL SELECT syntax.\n- [Google BigQuery](https://cloud.google.com/bigquery/docs/bigqueryml-intro): Google BigQuery enables machine learning in SQL by introducing the `CREATE MODEL` statement.\n\nNone of the existing solution solves our pain point, instead we want it to be fully extensible.\n\n1. This solution should be compatible to many SQL engines, instead of a specific version or type.\n1. It should support sophisticated machine learning models, including TensorFlow for deep learning and [XGBoost](https://github.com/dmlc/xgboost) for trees.\n1. We also want the flexibility to configure and run cutting-edge ML algorithms including specifying [feature crosses](https://www.tensorflow.org/api_docs/python/tf/feature_column/crossed_column), at least, no Python or R code embedded in the SQL statements, and fully integrated with hyperparameter estimation.\n\n## Quick Overview\n\nHere are examples for training a TensorFlow [DNNClassifier](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNClassifier) model using sample data Iris.train, and running prediction using the trained model. You can see how cool it is to write some elegant ML code using SQL:\n\n```sql\nsqlflow\u003e SELECT *\nFROM iris.train\nTO TRAIN DNNClassifier\nWITH model.n_classes = 3, model.hidden_units = [10, 20]\nCOLUMN sepal_length, sepal_width, petal_length, petal_width\nLABEL class\nINTO sqlflow_models.my_dnn_model;\n\n...\nTraining set accuracy: 0.96721\nDone training\n```\n\n```sql\nsqlflow\u003e SELECT *\nFROM iris.test\nTO PREDICT iris.predict.class\nUSING sqlflow_models.my_dnn_model;\n\n...\nDone predicting. Predict table : iris.predict\n```\n\n## How to use SQLFlow\n\n- [Quick Start](/doc/quick_start.md)\n- [Language Guide](/doc/language_guide.md)\n- Interactive Examples\n    * [DNN Classification example on Iris dataset](https://dsw-dev.data.aliyun.com/?fileUrl=http://cdn.sqlflow.tech/sqlflow/tutorials/latest/iris-dnn.ipynb\u0026fileName=iris-dnn.ipynb#/)\n    * [DNN Classification example on fraud detection](https://dsw-dev.data.aliyun.com/?fileUrl=http://cdn.sqlflow.tech/sqlflow/tutorials/latest/fraud-dnn.ipynb\u0026fileName=fraud-dnn.ipynb#/)\n    * [Housing Price Prediction with XGBoost](https://dsw-dev.data.aliyun.com/?fileUrl=http://cdn.sqlflow.tech/sqlflow/tutorials/latest/housing-xgboost.ipynb\u0026fileName=housing-xgboost.ipynb#/)\n    * [Housing Price Prediction Explanation](https://dsw-dev.data.aliyun.com/?fileUrl=http://cdn.sqlflow.tech/sqlflow/tutorials/latest/housing-explain.ipynb\u0026fileName=housing-explain.ipynb#/)\n    * [Mathematical Optimization Guide](https://dsw-dev.data.aliyun.com/?fileUrl=http://cdn.sqlflow.tech/sqlflow/tutorials/latest/optimization_guide.ipynb\u0026fileName=optimization_guide.ipynb#/)\n\n## Contributing Guidelines\n\n- [Build and Test](/doc/build.md)\n- [Walkthrough the Project](/doc/walkthrough.md)\n\n## Roadmap\n\nSQLFlow will love to support as many mainstream ML frameworks and data sources as possible, but we feel like the expansion would be hard to be done merely on our own, so we would love to hear your options on what ML frameworks and data sources you are currently using and build upon. Please refer to our [roadmap](https://github.com/sql-machine-learning/sqlflow/issues/327) for specific timelines, also let us know your current scenarios and interests around SQLFlow project so we can prioritize based on the feedback from the community.\n\n## Feedback\n\nYour feedback is our motivation to move on. Please let us know your questions, concerns, and issues by [filing GitHub Issues](https://github.com/sql-machine-learning/sqlflow/issues).\n\n## License\n\n[Apache License 2.0](https://github.com/sql-machine-learning/sqlflow/blob/develop/LICENSE)\n\n## Published\n\n- An arXiv paper at https://arxiv.org/abs/2001.06846\n- Demo Videos\n  1. 01/19/2020: https://www.youtube.com/watch?v=qUjQn7ePbto\n  1. 10/04/2019: https://www.youtube.com/watch?v=zIkwOQ_davw\n  1. 04/01/2019: https://www.youtube.com/watch?v=zIkwOQ_davw\n","funding_links":[],"categories":["Go","开源类库","Ecosystem Projects","Open source library","其他_机器学习与深度学习","Data Processing \u0026 Analytics","🚀 MLOps","Machine Learning"],"sub_categories":["机器学习","Machine Learning","Tools","Compare"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsql-machine-learning%2Fsqlflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsql-machine-learning%2Fsqlflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsql-machine-learning%2Fsqlflow/lists"}