{"id":13671424,"url":"https://github.com/wyfunique/DBSim","last_synced_at":"2025-04-27T18:31:15.823Z","repository":{"id":47440072,"uuid":"473765466","full_name":"wyfunique/DBSim","owner":"wyfunique","description":"The codebase for DBSim","archived":false,"fork":false,"pushed_at":"2023-03-08T03:50:24.000Z","size":175,"stargazers_count":15,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-11-11T09:43:41.550Z","etag":null,"topics":["data-science","database","in-database","in-database-analytics","query-optimizer","sql-parser","sql-query"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wyfunique.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-03-24T20:36:34.000Z","updated_at":"2023-10-02T14:33:31.000Z","dependencies_parsed_at":"2024-04-01T00:43:39.816Z","dependency_job_id":null,"html_url":"https://github.com/wyfunique/DBSim","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wyfunique%2FDBSim","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wyfunique%2FDBSim/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wyfunique%2FDBSim/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wyfunique%2FDBSim/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wyfunique","download_url":"https://codeload.github.com/wyfunique/DBSim/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251187101,"owners_count":21549582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","database","in-database","in-database-analytics","query-optimizer","sql-parser","sql-query"],"created_at":"2024-08-02T09:01:09.412Z","updated_at":"2025-04-27T18:31:15.264Z","avatar_url":"https://github.com/wyfunique.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# DBSim\n\nDBSim is an extensible database simulator in pure Python for fast prototyping in-database algorithms. It provides a simulated RDBMS environment, related tools and high extensibility \u0026 flexibility for data scientists to fast develop, verify and analyze the prototypes of in-database analytic algorithms. \n\nImplementing any new in-database algorithm directly in real database kernels without pre-verification may not be a good idea and may cause waste of time. If you agree, try DBSim! Spending a little time on prototyping your in-db algorithms in DBSim to convince yourself before formally making them into real-world DBMS. \n\n### Features\n\n* Including all the major components of a general RDBMS query engine: SQL parser, relational operators, logical and physical plan generator, query optimizer, etc. \n* High extensibility and flexibility: DBSim provides APIs to extend almost any of the DB components, including but not limited to adding new keywords to query syntax, implementing custom operators, writing custom optimization rules, extending physical plan executors, etc.\n* Debugging and analyzing tools: query plan visualizer, GUI, etc.\n* Low learning cost for users: writing in Python makes it much easier to learn comparing with the real-world DBMS developed in C/C++/Java. \n\n### TODO\n\n* Add more optimization rules that are commonly applied by general RDBMS\n* Implement Volcano-style optimizer (currently DBSim only supports heuristic (rule-based) optimizer)\n* Develop rule definition language to support simpler claims of custom optimization rules\n* Integrate more performance analyzing and visualization tools \n\n### Install \n\nDBSim requires Python 3.6+ . To install necessary dependencies, run the following command:\n\n```\npip install -r requirements.txt\n```\n\n### Run the demo\n\n```\npython demo.py\n```\n\n### Examples\n\nWe use some basic examples to briefly present DBsim here. Please see `dbsim/examples/` and `dbsim/tests/` for more detailed examples.\n\n1. **Execute a query end-to-end**\n\n   (1) without query optimizer\n\n   ```python\n   from dbsim import dataset as ds\n   from dbsim.tests.fixtures.demo_adapter import DemoAdapter\n   from dbsim.query_parser import parse_statement\n   from dbsim.query import Query\n   \n   dataset = ds.DataSet()\n   dataset.add_adapter(DemoAdapter())\n   sql = \"\"\"\n       SELECT musical.title, musical.year\n       FROM \n         (SELECT * \n           FROM \n             (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) \n           WHERE \n             animation.mid \u003c 3000\n         )\n       WHERE musical.year \u003e 1960\n   \"\"\"\n   for row in Query(dataset, parse_statement(sql)):\n     print(row)\n   ```\n\n   (2) with query optimizer\n\n   ```python\n   from dbsim import dataset as ds\n   from dbsim.tests.fixtures.demo_adapter import DemoAdapter\n   from dbsim.query_parser import parse_statement\n   from dbsim.query import Query\n   from dbsim.planners import rules\n   from dbsim.planners.heuristic.heuristic_planner import HeuristicPlanner\n   \n   dataset = ds.DataSet()\n   dataset.add_adapter(DemoAdapter())\n   planner = HeuristicPlanner(max_limit = float('Inf'))\n   planner.addRule(rules.FilterMergeRule())\n   planner.addRule(rules.FilterPushDownRule())\n   planner.addRule(rules.Selection_SimSelection_Swap_Rule())\n   \n   sql = \"\"\"\n       SELECT musical.title, musical.year\n       FROM \n         (SELECT * \n           FROM \n             (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) \n           WHERE \n             animation.mid \u003c 3000\n         )\n       WHERE musical.year \u003e 1960\n   \"\"\"\n   for row in Query(dataset, parse_statement(sql), optimizer=planner):\n     print(row)\n   ```\n\n   \n\n2. **Parse a query into abstract syntax tree(AST) and visualize the AST**\n\n   ```python\n   from dbsim.query_parser import parse_statement\n   from dbsim.utils.visualizer import LogicalPlanViz \n   \n   sql = \"\"\"\n       SELECT musical.title, musical.year\n       FROM musical\n       WHERE musical.year \u003e 1960\n   \"\"\"\n   ast = parse_statement(sql)\n   LogicalPlanViz.show(ast, view=True)\n   ```\n\n   \n\n3. **Manually optimize a logical plan and visualize the resulting plan** \n\n   ```python\n   from dbsim import dataset as ds\n   from dbsim.tests.fixtures.demo_adapter import DemoAdapter\n   from dbsim.query_parser import parse_statement\n   from dbsim.query import Query\n   from dbsim.planners import rules\n   from dbsim.planners.heuristic.heuristic_planner import HeuristicPlanner\n   from dbsim.utils.visualizer import LogicalPlanViz \n   \n   dataset = ds.DataSet()\n   dataset.add_adapter(DemoAdapter())\n   \n   planner = HeuristicPlanner(max_limit = float('Inf'))\n   planner.addRule(rules.FilterMergeRule())\n   planner.addRule(rules.FilterPushDownRule())\n   planner.addRule(rules.Selection_SimSelection_Swap_Rule())\n   \n   sql = \"\"\"\n       SELECT musical.title, musical.year\n       FROM \n         (SELECT * \n           FROM \n             (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) \n           WHERE \n             animation.mid \u003c 3000\n         )\n       WHERE musical.year \u003e 1960\n   \"\"\"\n   plan = Query(dataset, parse_statement(sql)).getPlan()\n   best_plan = planner.findBestPlan(plan)\n   LogicalPlanViz.show(best_plan, view=True)\n   ```\n\n   \n\n4. **Note: ** We already extended the standard SQL syntax in DBSim as examples to show its extensibility. Please see the documentation for further tutorials. But at this step you can quickly try such an extended-SQL-syntax query in any of the code snippets above:\n\n   ```python\n   sql = \"\"\"\n       SELECT musical.title, musical.year\n       FROM \n         (SELECT * \n           FROM \n             (SELECT * FROM animation, musical WHERE animation.mid = musical.mid) \n           WHERE \n             animation.embedding to [1,2,3,4] \u003c 10\n         )\n       WHERE musical.year \u003e 1960\n   \"\"\"\n   ```\n\n   \n\n5. **More examples:**\n\n   To try more examples, run the following command (DO NOT cd to `dbsim/` or `dbsim/examples`, just run the command here, i.e., in the root path of this repo):\n\n   ```bash\n   python dbsim/examples/\u003cexample_filename\u003e.py\n   ```\n\n   To run the tests in `dbsim/tests/`, you can use Pytest or manually call each testing function.\n\n### Run the GUI\n\nIn the repo root path, run following command to start the GUI:\n\n```bash\npython dbsim/gui/backend/server.py\n```\n\nThen open your browser and go to the link `localhost:8888`. Please read our paper [Extensible Database Simulator for Fast Prototyping In-Database Algorithms](https://arxiv.org/abs/2204.09819) for more details on how to use the GUI. The GUI is only tested on Microsoft Edge, and currently we do not guarantee its compatibility with other browsers.\n\n**Note:** \n\n(1) To run a query, you need to let the query input box lose focus, like clicking anywhere outside the box.\n\n(2) We provide example datasets and query in the GUI when it is started. You can try with them or use your own datasets and queries.\n\n### Documentation\n\nSee the [Wiki](https://github.com/wyfunique/DBSim/wiki) of this repo.\n\n### Acknowledgement \n\nDBSim is developed based on Splicer ([trivio/splicer: Splicer - adds relation querying (SQL) to any python project (github.com)](https://github.com/trivio/splicer)). We also borrowed some ideas from Apache Calcite.\n\n### Citation\n\nIf you use this codebase, or otherwise found our work valuable, please cite:\n\n```\n@misc{https://doi.org/10.48550/arxiv.2204.09819,\n  doi = {10.48550/ARXIV.2204.09819},\n  url = {https://arxiv.org/abs/2204.09819},\n  author = {Wang, Yifan and Wang, Daisy Zhe},\n  title = {Extensible Database Simulator for Fast Prototyping In-Database Algorithms},\n  publisher = {arXiv},\n  year = {2022},\n  copyright = {arXiv.org perpetual, non-exclusive license}\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwyfunique%2FDBSim","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwyfunique%2FDBSim","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwyfunique%2FDBSim/lists"}