{"id":31939796,"url":"https://github.com/alibaba/pilotscope","last_synced_at":"2026-03-14T06:39:05.805Z","repository":{"id":213093256,"uuid":"704303020","full_name":"alibaba/pilotscope","owner":"alibaba","description":"PilotScope is a middleware to bridge the gaps of deploying AI4DB (Artificial Intelligence for Databases) algorithms into actual database systems.","archived":false,"fork":false,"pushed_at":"2024-07-12T06:20:01.000Z","size":127957,"stargazers_count":140,"open_issues_count":3,"forks_count":17,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-08-07T15:54:30.607Z","etag":null,"topics":["ai4db","cardinality-estimation","database","index-recommendation","knob-tuning","middleware","postgresql","query-optimizer","spark"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alibaba.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-13T01:22:15.000Z","updated_at":"2024-08-05T19:00:22.000Z","dependencies_parsed_at":"2024-07-09T14:09:21.216Z","dependency_job_id":null,"html_url":"https://github.com/alibaba/pilotscope","commit_stats":null,"previous_names":["alibaba/pilotscope"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/alibaba/pilotscope","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Fpilotscope","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Fpilotscope/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Fpilotscope/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Fpilotscope/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alibaba","download_url":"https://codeload.github.com/alibaba/pilotscope/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2Fpilotscope/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279018304,"owners_count":26086345,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-14T02:00:06.444Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai4db","cardinality-estimation","database","index-recommendation","knob-tuning","middleware","postgresql","query-optimizer","spark"],"created_at":"2025-10-14T08:47:20.944Z","updated_at":"2026-03-14T06:39:05.765Z","avatar_url":"https://github.com/alibaba.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# \u003ccenter\u003e\u003cfont color=green size=10\u003ePilotScope\u003c/font\u003e\u003c/center\u003e\n\n\u003cdiv style=\"text-align:center\"\u003e\n  \u003cimg src=\"fig/banner.png\", alt=\"PilotScope\" /\u003e\n\u003c/div\u003e\n\n![](https://img.shields.io/badge/language-Python-blue.svg)\n![](https://img.shields.io/badge/language-C-blue.svg)\n![](https://img.shields.io/badge/language-Scala-blue.svg)\n![](https://img.shields.io/badge/license-Apache_2.0-000000.svg) \n![](https://img.shields.io/badge/contributions-Welcome-brightgreen.svg)\n\n\n[![](https://img.shields.io/badge/docs-Usage_Guideline-purple.svg)](https://woodybryant.github.io/PilotScopeDoc.io/)\n[![](https://img.shields.io/badge/docs-Develop_Guideline-purple.svg)](https://woodybryant.github.io/PilotScopeDoc.io/references/core_modules.html)\n[![](https://img.shields.io/badge/docs-API_Reference-purple.svg)](https://woodybryant.github.io/PilotScopeDoc.io/references/api.html)\n\n[![](https://img.shields.io/badge/AI4DB_driver-Knob_Tuning-4E29FF.svg)](https://woodybryant.github.io/PilotScopeDoc.io/references/example.html#knob-tuning-task-example)\n[![](https://img.shields.io/badge/AI4DB_driver-Index_Recommendation-4E29FF.svg)](https://woodybryant.github.io/PilotScopeDoc.io/references/example.html#index-recommendation-task-example)\n[![](https://img.shields.io/badge/AI4DB_driver-Cardinality_Estimation-4E29FF.svg)](https://woodybryant.github.io/PilotScopeDoc.io/references/example.html#cardinality-estimation-task-example)\n[![](https://img.shields.io/badge/AI4DB_driver-E2E_Query_Optimizer-4E29FF.svg)](https://woodybryant.github.io/PilotScopeDoc.io/references/example.html#query-optimizer-task-example)\n\n[![](https://img.shields.io/badge/database-PostgreSQL_13.1-FFD21E.svg)](https://www.postgresql.org/)\n[![](https://img.shields.io/badge/database-Spark_3.3.2-FFD21E.svg)](https://spark.apache.org/)\n\n**PilotScope** is a middleware to bridge the gaps of deploying AI4DB (Artificial Intelligence for Databases) algorithms\ninto actual database systems. It aims at hindering the underlying details of different databases so that an AI4DB driver\ncould steer any database in a unified manner. By applying PilotScope, we obtain the following benefits:\n\n* The DB users could experience any AI4DB algorithm as a plug-in unit on their databases with little cost. The cloud\n  computing service providers could operate and maintain AI4DB algorithms on their database products as a service to\n  users. **(More Convenient for Usage! 👏👏👏)**\n\n* The ML researchers could easily benchmark and iterate their AI4DB algorithms in practical scenarios. **(Much Faster to\n  Iterate! ⬆️⬆️⬆️)**\n\n* The ML and DB developers are liberated from learning the details in other side. They could play their own strengths\n  to write the codes in their own sides. **(More Freedom to Develop! 🏄‍♀️🏄‍♀️🏄‍♀️)**\n\n\n* All contributors could extend PilotScope to support more AI4DB algorithms, more databases and more functions. **(We\n  highly encourage this! 😊😊😊)**\n\n| [Code Structure](#code-structure) | [Installation](#installation) | [Feature Overview](#feature-overview) |\n [Documentation](#documentation) | [License](#license) | [Reference](#reference)\n| [Contributing](#contributing) |\n\n---\n**News**\n\n* 🎉 [2023-12-15] Our **[paper](paper)** on PilotScope has been accepted by VLDB 2024!\n\n---\n\u003c!-- ## News --\u003e\n\n## Code Structure\n\n```\nPilotScope/\n├── algorithm_examples                         # Algorithm examples\n├── fig                                        # Saved some Figures\n├── paper                                 \n│   ├── PilotScope.pdf                         # Paper of PilotScope\n├── pilotscope\n│   ├── Anchor                                 # Base push and pull anchors for implementing push and pull opearators       \n│   │   ├── AnchorHandler.py\n│   │   ├── AnchorEnum.py\n│   │   ├── AnchorTransData.py\n│   │   ├── ...\n│   ├── Common                                 # Useful tools for PilotScope\n│   │   ├── Index.py\n│   │   ├── CardMetricCalc.py                   \n│   │   ├── ...\n│   ├── DBController                           # The implemenation of DB controllers for different databased\n│   │   ├── BaseDBController.py\n│   │   ├── PostgreSQLController.py\n│   │   ├── ...\n│   ├── DBInteractor                           # The funtionalities for interaction with database\n│   │   ├── HttpInteractorReceiver.py\n│   │   ├── PilotDataInteractor.py\n│   │   ├── ...\n│   ├── DataManager                            # The management of data\n│   │   ├── DataManager.py\n│   │   └── TableVisitedTracker.py\n│   ├── Dataset                                # An easy-to-use API for loading benchmarks\n│   │   ├── BaseDataset.py\n│   │   ├── Imdb\n│   │   ├── ...\n│   ├── Exception                              # Some exception which may occur in the lifecycle of pilotscope\n│   │   └── Exception.py\n│   ├── Factory                                # Factory patterns\n│   │   ├── AnchorHandlerFactory.py\n│   │   ├── DBControllerFectory.py\n│   │   ├── ...\n│   ├── PilotConfig.py                         # Configurations of PilotScope\n│   ├── PilotEnum.py                           # Some related enumeration types\n│   ├── PilotEvent.py                          # Some predefined events\n│   ├── PilotModel.py                          # Base models of pilotscope \n│   ├── PilotScheduler.py                      # Sheduling data traing、inference、collection push-and-pull and so on\n│   ├── PilotSysConfig.py                      # System configuration of PilotScope \n│   └── PilotTransData.py                      # A unified data object for data collection\n├── requirements.txt                           # Requirements for PilotScope\n├── setup.py                                   # Setup for PilotScope\n├── test_example_algorithms                    # Examples of some tasks, such as index recommendation, knob tuning, etc.\n└── test_pilotscope                            # Unittests of PilotScope\n```\n\n## Installation\n\nRequired Software Versions:\n\n- Python: 3.8\n- PostgreSQL: 13.1\n- Apache Spark: 3.3.2\n\nYou can install PilotScope Core and modified databases (e.g., PostgreSQL and Spark) following\nthe [documentation](https://woodybryant.github.io/PilotScopeDoc.io/).\n\n## Feature Overview\n\nThe components of PilotScope Core in ML side can be divided into two categories: Database Components and Deployment\nComponents. The Database Components are used to facilitate data exchange and control over the database, while the Deployment\nComponents are used to facilitate the automatic application of custom AI algorithms to each incoming SQL query.\n\nA high-level overview of the PilotScope Core components is shown in the following figure.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"fig/pilotscope_module_framework.png\" alt=\"PilotScope\" style=\"width: 80%;\" /\u003e\n\u003c/div\u003e\n\nThe Database Components are highlighted in Yellow, while the Deployment Components are highlighted in green. We will\ndiscuss each of these components in detail in the [documentation](https://woodybryant.github.io/PilotScopeDoc.io/).\n\n### An Example for Data Interaction with Database\nThe `PilotConfig` class is utilized to configure the PilotScope application, such as the database credentials for\nestablishing a connection.\nWe first create an instance of the PilotConfig where we can specify the database credentials and connected database\nname, i.e., stats_tiny.\n\n```python\n# Example of PilotConfig\nconfig: PilotConfig = PostgreSQLConfig(host=\"localhost\", port=\"5432\", user=\"postgres\", pwd=\"postgres\")\n# You can also instantiate a PilotConfig for other DBMSes. e.g. \n# config:PilotConfig = SparkConfig()\nconfig.db = \"stats_tiny\"\n# Configure PilotScope here, e.g. changing the name of database you want to connect to.\n```\n\nThe PilotDataInteractor class provides a flexible workflow for data exchange. It includes three main functions: `push`,\n`pull`, and `execute`.\nThese functions assist the user in collecting data (pull operators) after setting additional data (push operators) in a\nsingle query execution process.\n\nFor instance, if the user wants to collect the execution time, estimated cost, and cardinality of all sub-queries within\na query. Here is an example code:\n\n```python\nsql = \"select count(*) from votes as v, badges as b, users as u where u.id = v.userid and v.userid = b.userid and u.downvotes\u003e=0 and u.downvotes\u003c=0\"\ndata_interactor = PilotDataInteractor(config)\ndata_interactor.pull_estimated_cost()\ndata_interactor.pull_subquery_card()\ndata_interactor.pull_execution_time()\ndata = data_interactor.execute(sql)\nprint(data)\n```\n\nThe `execute` function returns a `PilotTransData` object named `data`, which serves as a placeholder for the collected\ndata.\nEach member of this object represents a specific data point, and the values corresponding to the previously\nregistered `pull` operators will be filled in, while the other values will remain as None.\n\n```\nexecution_time: 0.00173\nestimated_cost: 98.27\nsubquery_2_card: {'select count(*) from votes v': 3280.0, 'select count(*) from badges b': 798.0, 'select count(*) from users u where u.downvotes \u003e= 0 and u.downvotes \u003c= 0': 399.000006, 'select count(*) from votes v, badges b where v.userid = b.userid;': 368.609177, 'select count(*) from votes v, users u where v.userid = u.id and u.downvotes \u003e= 0 and u.downvotes \u003c= 0;': 333.655156, 'select count(*) from badges b, users u where b.userid = u.id and u.downvotes \u003e= 0 and u.downvotes \u003c= 0;': 425.102804, 'select count(*) from votes v, badges b, users u where v.userid = u.id and v.userid = b.userid and u.downvotes \u003e= 0 and u.downvotes \u003c= 0;': 37.536205}\nbuffercache: None\n...\n```\n\nIn certain scenarios, when the user wants to collect the execution time of a SQL query after applying a new\ncardinality (e.g., scaling the original cardinality by 100) for all sub-queries within the SQL,\nthe PilotDataInteractor provides push function to achieve this.\nHere is an example code:\n\n```python\n# Example of PilotDataInteractor (registering operators again and execution)\ndata_interactor.push_card({k: v * 100 for k, v in data.subquery_2_card.items()})\ndata_interactor.pull_estimated_cost()\ndata_interactor.pull_execution_time()\nnew_data = data_interactor.execute(sql)\nprint(new_data)\n```\n\nBy default, each call to the execute function will reset any previously registered operators.\nTherefore, we need to push these new cardinalities and re-register the pull operators to collect the estimated cost and\nexecution time.\nIn this scenario, the new cardinalities will replace the ones estimated by the database's cardinality estimator.\nAs a result, the partial result of the `new_data` object will be significantly different from the result of the `data`\nobject,\nmainly due to the variation in cardinality values.\n\n```\nexecution_time: 0.00208\nestimated_cost: 37709.05\n...\n```\nMore functionalities please refer to the [documentation](https://woodybryant.github.io/PilotScopeDoc.io/).\n\n\n## Documentation\n\nThe classes and methods of PilotScope have been well documented. You can find the documentation\nin [documentation](https://woodybryant.github.io/PilotScopeDoc.io/).\n\n## License\n\nPilotScope is released under Apache License 2.0.\n\n## Reference\n\nIf you find our work useful for your research or development, please kindly cite the following\n\n\t@article{zhu2023pilotscope,\n  \t\ttitle={PilotScope: Steering Databases with Machine Learning Drivers},\n  \t\tauthor={Rong Zhu and Lianggui Weng and Wenqing Wei and Di Wu and Jiazhen Peng and Yifan Wang and Bolin Ding and Defu Lian Bolong Zheng and Jingren Zhou},\n  \t\tjournal = {Proceedings of the VLDB Endowment},\n  \t\tyear={2024}}\n\n\n## Contributing\n\nAs an open-sourced project, we greatly appreciate any contribution to PilotScope! ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Fpilotscope","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falibaba%2Fpilotscope","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Fpilotscope/lists"}