{"id":24972443,"url":"https://github.com/tuannh982/query-planner-guide","last_synced_at":"2025-04-11T06:30:29.825Z","repository":{"id":196859126,"uuid":"682891716","full_name":"tuannh982/query-planner-guide","owner":"tuannh982","description":"build your own query planner","archived":false,"fork":false,"pushed_at":"2024-11-06T08:02:17.000Z","size":3557,"stargazers_count":26,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-25T04:26:32.321Z","etag":null,"topics":["awesome","cascades","database","database-design","database-management","from-scratch","good-first-issue","guide","help-wanted","looking-for-contributors","query-engine","query-optimization","query-optimizer","query-planner","query-planning","scala","scratch","sql","tutorial","volcano"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tuannh982.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-08-25T05:45:36.000Z","updated_at":"2025-03-14T21:29:55.000Z","dependencies_parsed_at":"2024-11-06T09:17:48.861Z","dependency_job_id":"f61e5b99-6812-467b-b2cf-725643f3cd31","html_url":"https://github.com/tuannh982/query-planner-guide","commit_stats":null,"previous_names":["tuannh982/query-planner-guide"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuannh982%2Fquery-planner-guide","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuannh982%2Fquery-planner-guide/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuannh982%2Fquery-planner-guide/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tuannh982%2Fquery-planner-guide/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tuannh982","download_url":"https://codeload.github.com/tuannh982/query-planner-guide/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248355280,"owners_count":21089978,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awesome","cascades","database","database-design","database-management","from-scratch","good-first-issue","guide","help-wanted","looking-for-contributors","query-engine","query-optimization","query-optimizer","query-planner","query-planning","scala","scratch","sql","tutorial","volcano"],"created_at":"2025-02-03T17:09:33.377Z","updated_at":"2025-04-11T06:30:29.763Z","avatar_url":"https://github.com/tuannh982.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"[placeholder]\n\n## Introduction\n\nA query planner is a component of a database management system (DBMS) that is responsible for generating a plan for\nexecuting a database query. The query plan specifies the steps that the DBMS will take to retrieve the data requested by\nthe query. The goal of the query planner is to generate a plan that is as efficient as possible, meaning that it will\nreturn the data to the user as quickly as possible.\n\nQuery planners are complex pieces of software, and they can be difficult to understand. This guide to implementing a\ncost-based query planner will provide you with a step-by-step overview of the process, how to implement your own\ncost-based query planner, while still cover the basic concepts of query planner.\n\n\u003e Written by AI, edited by human\n\n## Targeted audiences\n\nThis guide is written for:\n\n- who used to work with query engines\n- who curious, want to make their own stuffs\n- who wants to learn DB stuffs but hate math\n\nGoals:\n\n- Able to understand the basic of query planning\n- Able to write your own query planner\n\n## Basic architecture of a query engine\n\n```mermaid\ngraph TD\n    user((user))\n    parser[Query Parser]\n    planner[Query Planner]\n    executor[Query Processor]\n    user -- text query --\u003e parser\n    parser -- AST --\u003e planner\n    planner -- physical plan --\u003e executor\n```\n\nBasic architecture of a query engine is consisted of those components:\n\n- **Query parser:** used to parse user query input, usually in human-readable text format (such as SQL)\n- **Query planner:** used to generate the plan/strategy to execute the query. Normally the query planner will choose the\n  best plan among several plans generated from a single query\n- **Query processor:** used to execute the query plan, which is output by the query planner\n\n## Types of query planners\n\nNormally, query planners are divided into 2 types:\n\n- heuristic planner\n- cost-based planner\n\nHeuristic planner is the query planner which used pre-defined rules to generate query plan.\n\nCost-based planner is the query planner who based on the cost to generate query, it tries to find the optimal plan based\non cost of the input query.\n\nWhile heuristic planner usually find the best plan by apply transform rules if it knows that the transformed plan is\nbetter, the cost-based planner find the best plan by enumerate equivalent plans and try to find the best plan among\nthem.\n\n### Cost based query planner\n\nIn cost based query planner, it's usually composed of phases:\n\n- Plan Enumerations\n- Query Optimization\n\nIn the Plan Enumerations phase, the planner will enumerate the possible equivalent plans.\n\nAfter that, in Query Optimization phase, the planner will search for the best plan from the list of enumerated plans.\nThe best plan is the plan having the lowest cost, which the cost model (or cost function) is defined.\n\nBecause the natural of logical plan, is having tree-like structure, so you can think the optimization/search is actually\na tree-search problem. And there are lots of tree-search algorithms out here:\n\n- Exhaustive search, such as deterministic dynamic programming. The algorithm will perform searching for best plan until\n  search termination conditions\n- Randomized search, such as randomized tree search. The algorithm will perform searching for best plan until\n  search termination conditions\n\n**notes:** in theory it's possible to use any kind of tree-search algorithm. However, in practical it's not feasible\nsince the\nsearch time is increased when our search algorithm is complex\n\n**notes:** the search termination conditions usually are:\n\n- search exhaustion (when no more plans to visit)\n- cost threshold (when found a plan that cost is lower than a specified cost threshold)\n- time (when the search phase is running for too long)\n\n### Volcano query planner\n\nVolcano query planner (or Volcano optimizer generator) is a cost-based query planner\n\nVolcano planner uses dynamic programming approach to find the best query plan from the list of enumerated plans.\n\ndetails: https://ieeexplore.ieee.org/document/344061 (I'm too lazy to explain the paper here)\n\nHere is a great explanation: https://15721.courses.cs.cmu.edu/spring2017/slides/15-optimizer2.pdf#page=9\n\n## Drafting our cost-based query planner\n\nOur query planner, is a cost based query planner, following the basic idea of Volcano query planner\nOur planner will be consisted of 2 main phases:\n\n- exploration/search phase\n- implementation/optimization phase\n\n```mermaid\ngraph LR\n    ast((AST))\n    logical_plan[Plan]\n    explored_plans[\"`\n        Plan #1\n        ...\n        Plan #N\n    `\"]\n    implementation_plan[\"Plan #X (best plan)\"]\n    ast -- convert to logical plan --\u003e logical_plan\n    logical_plan -- exploration phase --\u003e explored_plans\n    explored_plans -- optimization phase --\u003e implementation_plan\n    linkStyle 1,2 color: orange, stroke: orange, stroke-width: 5px\n```\n\n#### Glossary\n\n##### Logical plan\n\nLogical plan is the datastructure holding the abstraction of transformation step required to execute the query.\n\nHere is an example of a logical plan:\n\n```mermaid\ngraph TD\n    1[\"PROJECT tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"];\n    2[\"JOIN\"];\n    3[\"SCAN tbl1\"];\n    4[\"JOIN\"];\n    5[\"SCAN tbl2\"];\n    6[\"SCAN tbl3\"];\n    1 --\u003e 2;\n    2 --\u003e 3;\n    2 --\u003e 4;\n    4 --\u003e 5;\n    4 --\u003e 6;\n```\n\n##### Physical plan\n\nWhile logical plan only holds the abstraction, physical plan is the datastructure holding the implementation details.\nEach logical plan will have multiple physical plans. For example, a logical JOIN might has many physical plans such as\nHASH JOIN, MERGE JOIN, BROADCAST JOIN, etc.\n\n##### Equivalent Group\n\nEquivalent group is a group of equivalent expressions (which for each expression, their logical plan is logically\nequivalent)\n\ne.g.\n\n```mermaid\ngraph TD\n    subgraph Group#8\n        Expr#8[\"SCAN tbl2 (field1, field2, id)\"]\n    end\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#11\n        Expr#11[\"JOIN\"]\n    end\n    Expr#11 --\u003e Group#7\n    Expr#11 --\u003e Group#10\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#7\n        Expr#7[\"SCAN tbl1 (id, field1)\"]\n    end\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#10\n        Expr#10[\"JOIN\"]\n    end\n    Expr#10 --\u003e Group#8\n    Expr#10 --\u003e Group#9\n    subgraph Group#9\n        Expr#9[\"SCAN tbl3 (id, field2)\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#6\n        Expr#12[\"PROJECT tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#12 --\u003e Group#11\n    Expr#6 --\u003e Group#5\n```\n\nHere we can see `Group#6` is having 2 equivalent expressions, which are both representing the same query (one is doing\nscan from table then project, one is pushing down the projection down to SCAN node).\n\n##### Transformation rule\n\nTransformation rule is the rule to transform from one logical plan to another logical equivalent logical plan\n\nFor example, the plan:\n\n```mermaid\ngraph TD\n    1[\"PROJECT tbl1.id, tbl1.field1, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"];\n    2[\"JOIN\"];\n    3[\"SCAN tbl1\"];\n    4[\"JOIN\"];\n    5[\"SCAN tbl2\"];\n    6[\"SCAN tbl3\"];\n    1 --\u003e 2;\n    2 --\u003e 3;\n    2 --\u003e 4;\n    4 --\u003e 5;\n    4 --\u003e 6;\n```\n\nwhen apply the projection pushdown transformation, is transformed to:\n\n```mermaid\ngraph TD\n    1[\"PROJECT *.*\"];\n    2[\"JOIN\"];\n    3[\"SCAN tbl1 (id, field1)\"];\n    4[\"JOIN\"];\n    5[\"SCAN tbl2 (field1, field2)\"];\n    6[\"SCAN tbl3 (id, field2, field2)\"];\n    1 --\u003e 2;\n    2 --\u003e 3;\n    2 --\u003e 4;\n    4 --\u003e 5;\n    4 --\u003e 6;\n```\n\nThe transformation rule can be affect by logical traits/properties such as table schema, data statistics, etc.\n\n##### Implementation rule\n\nImplementation rule is the rule to return the physical plans given logical plan.\n\nThe implementation rule can be affect by physical traits/properties such as data layout (sorted or not), etc.\n\n#### Exploration phase\n\nIn the exploration phase, the planner will apply transformation rules, generating equivalent logical plans\n\nFor example, the plan:\n\n```mermaid\ngraph TD\n    1326583549[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"];\n    -425111028[\"JOIN\"];\n    -349388609[\"SCAN tbl1\"];\n    1343755644[\"JOIN\"];\n    -1043437086[\"SCAN tbl2\"];\n    -1402686787[\"SCAN tbl3\"];\n    1326583549 --\u003e -425111028;\n    -425111028 --\u003e -349388609;\n    -425111028 --\u003e 1343755644;\n    1343755644 --\u003e -1043437086;\n    1343755644 --\u003e -1402686787;\n```\n\nAfter applying transformation rules, resulting in the following graph:\n\n```mermaid\ngraph TD\n    subgraph Group#8\n        Expr#8[\"SCAN tbl2 (id, field1, field2)\"]\n    end\n    subgraph Group#11\n        Expr#11[\"JOIN\"]\n        Expr#14[\"JOIN\"]\n    end\n    Expr#11 --\u003e Group#7\n    Expr#11 --\u003e Group#10\n    Expr#14 --\u003e Group#8\n    Expr#14 --\u003e Group#12\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n        Expr#16[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    Expr#16 --\u003e Group#2\n    Expr#16 --\u003e Group#13\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#13\n        Expr#15[\"JOIN\"]\n    end\n    Expr#15 --\u003e Group#1\n    Expr#15 --\u003e Group#3\n    subgraph Group#7\n        Expr#7[\"SCAN tbl1 (id, field1)\"]\n    end\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#10\n        Expr#10[\"JOIN\"]\n    end\n    Expr#10 --\u003e Group#8\n    Expr#10 --\u003e Group#9\n    subgraph Group#9\n        Expr#9[\"SCAN tbl3 (id, field2)\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#12\n        Expr#13[\"JOIN\"]\n    end\n    Expr#13 --\u003e Group#7\n    Expr#13 --\u003e Group#9\n    subgraph Group#6\n        Expr#12[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#12 --\u003e Group#11\n    Expr#6 --\u003e Group#5\n```\n\nHere we can see that projection pushdown rule and join reorder rule are applied.\n\n#### Optimization phase\n\nThe optimization phase, is to traverse the expanded tree in exploration phase, to find the best\nplan for our query.\n\nThis \"actually\" is tree search optimization, so you can use any tree search algorithm you can imagine (but you have to\nmake sure it's correct).\n\nHere is the example of generated physical plan after optimization phase:\n\n```mermaid\n\ngraph TD\n    Group#6[\"\n    Group #6\nSelected: PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\nOperator: ProjectOperator\nCost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)\n\"]\nGroup#6 --\u003e Group#11\nGroup#11[\"\nGroup #11\nSelected: JOIN\nOperator: HashJoinOperator\nCost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)\n\"]\nGroup#11 --\u003e Group#7\nGroup#11 --\u003e Group#10\nGroup#7[\"\nGroup #7\nSelected: SCAN tbl1 (id, field1)\nOperator: NormalScanOperator\nCost: Cost(cpu=400.00, mem=400000.00, time=1000.00)\n\"]\nGroup#10[\"\nGroup #10\nSelected: JOIN\nOperator: MergeJoinOperator\nTraits: SORTED\nCost: Cost(cpu=640000.00, mem=20000012.00, time=1100000.00)\n\"]\nGroup#10 --\u003e Group#8\nGroup#10 --\u003e Group#9\nGroup#8[\"\nGroup #8\nSelected: SCAN tbl2 (id, field1, field2)\nOperator: NormalScanOperator\nTraits: SORTED\nCost: Cost(cpu=600000.00, mem=12.00, time=1000000.00)\n\"]\nGroup#9[\"\nGroup #9\nSelected: SCAN tbl3 (id, field2)\nOperator: NormalScanOperator\nTraits: SORTED\nCost: Cost(cpu=40000.00, mem=20000000.00, time=100000.00)\n\"]\n```\n\nThe generated plan has shown the selected logical plan, the estimated cost, and the physical operator\n\n#### Optimize/search termination\n\nOur planner will perform exhaustion search to find the best plan\n\n## Diving into the codes\n\nSince the code of the planner is big, so I will not write step-by-step guide, but I will explain every piece of the code\ninstead\n\n### The query language\n\nHere we will define a query language which used thoroughly this tutorial\n\n```sql\nSELECT emp.id,\n       emp.code,\n       dept.dept_name,\n       emp_info.name,\n       emp_info.origin\nFROM emp\n         JOIN dept ON emp.id = dept.emp_id\n         JOIN emp_info ON dept.emp_id = emp_info.id\n```\n\nThe query language we will implement is a SQL-like language.\nHowever, for the sake of simplicity, we will restrict its functionality and syntax.\n\nThe language is appeared in form of\n\n```sql\nSELECT tbl.field, [...]\nFROM tbl JOIN [...]\n```\n\nIt will only support for `SELECT` and `JOIN`, also the field in Select statement must be fully qualified (in form\nof `table.field`), all other functionalities will not be supported\n\n#### The AST\n\nFirst, we have to define the AST for our language. AST (\nor [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree)) is a tree used to represent the syntactic\nstructure of a text.\n\nSince our language is so simple, we just can define the AST structure in several line of codes:\n\n```scala\nsealed trait Identifier\n\ncase class TableID(id: String) extends Identifier\n\ncase class FieldID(table: TableID, id: String) extends Identifier\n\nsealed trait Statement\n\ncase class Table(table: TableID) extends Statement\n\ncase class Join(left: Statement, right: Statement, on: Seq[(FieldID, FieldID)]) extends Statement\n\ncase class Select(fields: Seq[FieldID], from: Statement) extends Statement\n\n```\n\nFor example, a query\n\n```sql\nSELECT tbl1.id,\n       tbl1.field1,\n       tbl2.id,\n       tbl2.field1,\n       tbl2.field2,\n       tbl3.id,\n       tbl3.field2,\n       tbl3.field2\nFROM tbl1\n         JOIN tbl2 ON tbl1.id = tbl2.id\n         JOIN tbl3 ON tbl2.id = tbl3.id\n```\n\ncan be represented as\n\n```scala\nSelect(\n  Seq(\n    FieldID(TableID(\"tbl1\"), \"id\"),\n    FieldID(TableID(\"tbl1\"), \"field1\"),\n    FieldID(TableID(\"tbl2\"), \"id\"),\n    FieldID(TableID(\"tbl2\"), \"field1\"),\n    FieldID(TableID(\"tbl2\"), \"field2\"),\n    FieldID(TableID(\"tbl3\"), \"id\"),\n    FieldID(TableID(\"tbl3\"), \"field2\"),\n    FieldID(TableID(\"tbl3\"), \"field2\")\n  ),\n  Join(\n    Table(TableID(\"tbl1\")),\n    Join(\n      Table(TableID(\"tbl2\")),\n      Table(TableID(\"tbl3\")),\n      Seq(\n        FieldID(TableID(\"tbl2\"), \"id\") -\u003e FieldID(TableID(\"tbl3\"), \"id\")\n      )\n    ),\n    Seq(\n      FieldID(TableID(\"tbl1\"), \"id\") -\u003e FieldID(TableID(\"tbl2\"), \"id\")\n    )\n  )\n)\n```\n\n#### A simple query parser\n\nAfter defined the AST structure, we will have to write the query parser, which is used to convert the text query into\nAST form.\n\nSince this guide is using Scala for implementation, we will\nchoose [scala-parser-combinators](https://github.com/scala/scala-parser-combinators) to create our query parser.\n\nQuery parser class:\n\n```scala\nobject QueryParser extends ParserWithCtx[QueryExecutionContext, Statement] with RegexParsers {\n\n  override def parse(in: String)(implicit ctx: QueryExecutionContext): Either[Throwable, Statement] = {\n    Try(parseAll(statement, in) match {\n      case Success(result, _) =\u003e Right(result)\n      case NoSuccess(msg, _) =\u003e Left(new Exception(msg))\n    }) match {\n      case util.Failure(ex) =\u003e Left(ex)\n      case util.Success(value) =\u003e value\n    }\n  }\n\n  private def select: Parser[Select] = ??? // we will implement it in later section\n\n  private def statement: Parser[Statement] = select\n}\n\n```\n\nThen define some parse rules:\n\n```scala\n// common\nprivate def str: Parser[String] = \"\"\"[a-zA-Z0-9_]+\"\"\".r\nprivate def fqdnStr: Parser[String] = \"\"\"[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+\"\"\".r\n\n// identifier\nprivate def tableId: Parser[TableID] = str ^^ (s =\u003e TableID(s))\n\nprivate def fieldId: Parser[FieldID] = fqdnStr ^^ { s =\u003e\n  val identifiers = s.split('.')\n  if (identifiers.length != 2) {\n    throw new Exception(\"should never happen\")\n  } else {\n    val table = identifiers.head\n    val field = identifiers(1)\n    FieldID(TableID(table), field)\n  }\n}\n```\n\nHere are two rules, which are used to parse the identifiers: `TableID` and `FieldID`.\n\nTable ID (or table name) usually only contains characters, numbers and underscores (`_`), so we will use a simple\nregex `[a-zA-Z0-9_]+` to identify the table name.\n\nOn the other hand, Field ID (for field qualifier) in our language is fully-qualified-field-name. Normally it's in form\nof `table.field`, and field name also usually only contains characters, numbers and underscores, so we will use the\nregex `[a-zA-Z0-9_]+\\.[a-zA-Z0-9_]+` to parser the field name.\n\nAfter defining the rules for parsing the identifiers, we can now define rules to parse query statement:\n\n```scala\n// statement\nprivate def table: Parser[Table] = tableId ^^ (t =\u003e Table(t))\nprivate def subQuery: Parser[Statement] = \"(\" ~\u003e select \u003c~ \")\"\n```\n\nThe `table` rule is a simple rule, it just creates `Table` node by using the parsed `TableID` from `tableId` rule.\n\nThe `subQuery`, is the rule to parse the sub-query. In SQL, we can write a query which is looked like this:\n\n```sql\nSELECT a\nFROM (SELECT b FROM c) d\n```\n\nThe `SELECT b FROM c` is the sub-query in above statement. Here, in our simple query language, we will indicate a\nstatement is a sub-query if it is enclosed by a pair of parentheses (`()`). Since our language only have SELECT\nstatement, we can write the parse rule as following:\n\n```scala\ndef subQuery: Parser[Statement] = \"(\" ~\u003e select \u003c~ \")\"\n```\n\nNow we will define the parse rules for SELECT statement:\n\n```scala\nprivate def fromSource: Parser[Statement] = table ||| subQuery\n\nprivate def select: Parser[Select] =\n  \"SELECT\" ~ rep1sep(fieldId, \",\") ~ \"FROM\" ~ fromSource ~ rep(\n    \"JOIN\" ~ fromSource ~ \"ON\" ~ rep1(fieldId ~ \"=\" ~ fieldId)\n  ) ^^ {\n    case _ ~ fields ~ _ ~ src ~ joins =\u003e\n      val p = if (joins.nonEmpty) {\n        def chain(left: Statement, right: Seq[(Statement, Seq[(FieldID, FieldID)])]): Join = {\n          if (right.isEmpty) {\n            throw new Exception(\"should never happen\")\n          } else if (right.length == 1) {\n            val next = right.head\n            Join(left, next._1, next._2)\n          } else {\n            val next = right.head\n            Join(left, chain(next._1, right.tail), next._2)\n          }\n        }\n\n        val temp = joins.map { join =\u003e\n          val statement = join._1._1._2\n          val joinOn = join._2.map(on =\u003e on._1._1 -\u003e on._2)\n          statement -\u003e joinOn\n        }\n        chain(src, temp)\n      } else {\n        src\n      }\n      Select(fields, p)\n  }\n```\n\nIn SQL, we can use a sub-query as a JOIN source. For example:\n\n```sql\nSELECT *.*\nFROM tbl1\n    JOIN (SELECT *.* FROM tbl2)\n    JOIN tbl3\n```\n\nSo our parser will also implement rules to parse the sub-query in the JOIN part of the statement, that's why we have the\nparse rule:\n\n```scala\n\"SELECT\" ~ rep1sep(fieldId, \",\") ~ \"FROM\" ~ fromSource ~ rep(\"JOIN\" ~ fromSource ~ \"ON\" ~ rep1(fieldId ~ \"=\" ~ fieldId)\n```\n\nSee [QueryParser.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fql%2FQueryParser.scala) for full implementation\n\n#### Testing our query parser\n\nSee [QueryParserSpec.scala](core%2Fsrc%2Ftest%2Fscala%2Fcore%2Fql%2FQueryParserSpec.scala)\n\n### Logical plan\n\nAfter generate the AST from the text query, we can directly convert it to the logical plan\n\nFirst, lets define the interface for our logical plan:\n\n```scala\nsealed trait LogicalPlan {\n  def children(): Seq[LogicalPlan]\n}\n\n```\n\n`children` is the list of child logical plan. For example:\n\n```mermaid\ngraph TD\n    1326583549[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"];\n    -425111028[\"JOIN\"];\n    -349388609[\"SCAN tbl1\"];\n    1343755644[\"JOIN\"];\n    -1043437086[\"SCAN tbl2\"];\n    -1402686787[\"SCAN tbl3\"];\n    1326583549 --\u003e -425111028;\n    -425111028 --\u003e -349388609;\n    -425111028 --\u003e 1343755644;\n    1343755644 --\u003e -1043437086;\n    1343755644 --\u003e -1402686787;\n```\n\nThe child nodes of the `PROJECT` node is the first `JOIN` node. The first `JOIN` node has 2 children, which are the\nsecond `JOIN` node and `SCAN tbl1` node. So on, ...\n\nSince our query language is simple, we only need 3 types of logical node:\n\n- PROJECT: represent the projection operator in relation algebra\n- JOIN: represent the logical join\n- SCAN: represent the table scan\n\n```scala\ncase class Scan(table: ql.TableID, projection: Seq[String]) extends LogicalPlan {\n  override def children(): Seq[LogicalPlan] = Seq.empty\n}\n\ncase class Project(fields: Seq[ql.FieldID], child: LogicalPlan) extends LogicalPlan {\n  override def children(): Seq[LogicalPlan] = Seq(child)\n}\n\ncase class Join(left: LogicalPlan, right: LogicalPlan, on: Seq[(ql.FieldID, ql.FieldID)]) extends LogicalPlan {\n  override def children(): Seq[LogicalPlan] = Seq(left, right)\n}\n\n```\n\nThen we can write the function to convert the AST into logical plan:\n\n```scala\ndef toPlan(node: ql.Statement): LogicalPlan = {\n  node match {\n    case ql.Table(table) =\u003e Scan(table, Seq.empty)\n    case ql.Join(left, right, on) =\u003e Join(toPlan(left), toPlan(right), on)\n    case ql.Select(fields, from) =\u003e Project(fields, toPlan(from))\n  }\n}\n```\n\nSee [LogicalPlan.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Flogicalplan%2FLogicalPlan.scala) for full\nimplementation\n\n### The equivalent groups\n\n#### Group\n\nWe can define classes for Group as following:\n\n```scala\ncase class Group(\n                  id: Long,\n                  equivalents: mutable.HashSet[GroupExpression]\n                ) {\n  val explorationMark: ExplorationMark = new ExplorationMark\n  var implementation: Option[GroupImplementation] = None\n}\n\ncase class GroupExpression(\n                            id: Long,\n                            plan: LogicalPlan,\n                            children: mutable.MutableList[Group]\n                          ) {\n  val explorationMark: ExplorationMark = new ExplorationMark\n  val appliedTransformations: mutable.HashSet[TransformationRule] = mutable.HashSet()\n}\n\n```\n\n`Group` is the set of plans which are logically equivalent.\n\nEach `GroupExpression` represents a logical plan node. Since we've defined a logical plan node will have a list of child\nnodes (in the previous section), and the `GroupExpression` represents a logical plan node, and the `Group` represents a\nset of equivalent plans, so the children of `GroupExpression` is a list of `Group`\n\ne.g.\n\n```mermaid\ngraph TD\n    subgraph Group#8\n        Expr#8\n    end\n    subgraph Group#2\n        Expr#2\n    end\n    subgraph Group#11\n        Expr#11\n    end\n    Expr#11 --\u003e Group#7\n    Expr#11 --\u003e Group#10\n    subgraph Group#5\n        Expr#5\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#7\n        Expr#7\n    end\n    subgraph Group#1\n        Expr#1\n    end\n    subgraph Group#10\n        Expr#10\n    end\n    Expr#10 --\u003e Group#8\n    Expr#10 --\u003e Group#9\n    subgraph Group#9\n        Expr#9\n    end\n    subgraph Group#3\n        Expr#3\n    end\n    subgraph Group#6\n        Expr#12\n        Expr#6\n    end\n    Expr#12 --\u003e Group#11\n    Expr#6 --\u003e Group#5\n```\n\nAs we can see here, the `Group#6` has 2 equivalent expressions: `Expr#12` and `Expr#6`, and the children of `Expr#12`\nis `Group#11`\n\n**notes:** We will implement multiple round transformation in the exploration phase, so for each `Group`\nand `GroupExpression`, we have\na `ExplorationMark` indication the exploration status.\n\n```scala\nclass ExplorationMark {\n  private var bits: Long = 0\n\n  def get: Long = bits\n\n  def isExplored(round: Int): Boolean = BitUtils.getBit(bits, round)\n\n  def markExplored(round: Int): Unit = bits = BitUtils.setBit(bits, round, on = true)\n\n  def markUnexplored(round: Int): Unit = bits = BitUtils.setBit(bits, round, on = false)\n}\n\n```\n\n`ExplorationMark` is just a bitset wrapper class, it will mark i-th bit as 1 if i-th round is explored, mark as 0\notherwise.\n\n`ExplorationMark` can also be used to visualize the exact transformation,\nsee [visualization](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Futils%2Fvisualization) for more details\n\n#### Memo\n\nMemo is a bunch of helpers to help constructing the equivalent groups. Memo is consists of several hashmap to cache the\ngroup and group expression, also provide methods to register new group or group expression.\n\n```scala\nclass Memo(\n            groupIdGenerator: Generator[Long] = new LongGenerator,\n            groupExpressionIdGenerator: Generator[Long] = new LongGenerator\n          ) {\n  val groups: mutable.HashMap[Long, Group] = mutable.HashMap[Long, Group]()\n  val parents: mutable.HashMap[Long, Group] = mutable.HashMap[Long, Group]() // lookup group from group expression ID\n  val groupExpressions: mutable.HashMap[LogicalPlan, GroupExpression] = mutable.HashMap[LogicalPlan, GroupExpression]()\n\n  def getOrCreateGroupExpression(plan: LogicalPlan): GroupExpression = {\n    val children = plan.children()\n    val childGroups = children.map(child =\u003e getOrCreateGroup(child))\n    groupExpressions.get(plan) match {\n      case Some(found) =\u003e found\n      case None =\u003e\n        val id = groupExpressionIdGenerator.generate()\n        val children = mutable.MutableList() ++ childGroups\n        val expression = GroupExpression(\n          id = id,\n          plan = plan,\n          children = children\n        )\n        groupExpressions += plan -\u003e expression\n        expression\n    }\n  }\n\n  def getOrCreateGroup(plan: LogicalPlan): Group = {\n    val exprGroup = getOrCreateGroupExpression(plan)\n    val group = parents.get(exprGroup.id) match {\n      case Some(group) =\u003e\n        group.equivalents += exprGroup\n        group\n      case None =\u003e\n        val id = groupIdGenerator.generate()\n        val equivalents = mutable.HashSet() + exprGroup\n        val group = Group(\n          id = id,\n          equivalents = equivalents\n        )\n        groups.put(id, group)\n        group\n    }\n    parents += exprGroup.id -\u003e group\n    group\n  }\n}\n\n```\n\nSee [Memo.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Fmemo%2FMemo.scala) for full implementation\n\n### Initialization\n\nThe first step inside the planner, is initialization\n\n```mermaid\ngraph LR\n    query((query))\n    ast((ast))\n    root_plan((rootPlan))\n    root_group((rootGroup))\n    query -- \" QueryParser.parse(query) \" --\u003e ast\n    ast -- \" LogicalPlan.toPlan(ast) \" --\u003e root_plan\n    root_plan -- \" memo.getOrCreateGroup(rootPlan) \" --\u003e root_group\n```\n\nFirst, query will be parsed into AST. Then converted to logical plan, called `root plan`, then initialize the group\nfrom `root plan`, called `root group`.\n\n```scala\ndef initialize(query: Statement)(implicit ctx: VolcanoPlannerContext): Unit = {\n  ctx.query = query\n  ctx.rootPlan = LogicalPlan.toPlan(ctx.query)\n  ctx.rootGroup = ctx.memo.getOrCreateGroup(ctx.rootPlan)\n  // assuming this is first the exploration round,\n  // by marking the initialRound(0) as explored,\n  // it will be easier to visualize the different between rounds (added nodes, add connections)\n  ctx.memo.groups.values.foreach(_.explorationMark.markExplored(initialRound))\n  ctx.memo.groupExpressions.values.foreach(_.explorationMark.markExplored(initialRound))\n}\n```\n\nSee [VolcanoPlanner.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2FVolcanoPlanner.scala) for more details\n\nFor example, the query:\n\n```sql\nSELECT tbl1.id,\n       tbl1.field1,\n       tbl2.id,\n       tbl2.field1,\n       tbl2.field2,\n       tbl3.id,\n       tbl3.field2,\n       tbl3.field2\nFROM tbl1\n         JOIN tbl2 ON tbl1.id = tbl2.id\n         JOIN tbl3 ON tbl2.id = tbl3.id\n```\n\nafter initialization, the groups will be looked like this:\n\n```mermaid\ngraph TD\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#6\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#6 --\u003e Group#5\n```\n\nHere you can see that, every group has exactly one equivalent expression\n\n### Exploration phase\n\nAfter initialization, now is the exploration phase, which will explore all possible equivalent plans.\n\nThe exploration method is quite simple:\n\n- For each group, apply transformation rules to find all equivalent group expression and add to equivalent set until we\n  couldn't find any new equivalent plan\n- For each group expression, explore all child groups\n\n#### Transformation rule\n\nBefore diving into exploration code, lets talk about transformation rule first.\n\nTransformation rule is a rule used to transform a logical plan to another equivalent logical plan if it's matched the\nrule condition.\n\nHere is the interface of transformation rule:\n\n```scala\ntrait TransformationRule {\n  def `match`(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): Boolean\n\n  def transform(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): GroupExpression\n}\n\n```\n\nSince the logical plan is a tree-like datastructure, so the `match` implementation of transformation rules is pattern\nmatching on tree.\n\nFor example, here is the `match` that is used to match the PROJECT node while also check if it's descendants containing\nJOIN and SCAN only:\n\n```scala\noverride def `match`(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): Boolean = {\n  val plan = expression.plan\n  plan match {\n    case Project(_, child) =\u003e check(child)\n    case _ =\u003e false\n  }\n}\n\n// check if the tree only contains SCAN and JOIN nodes\nprivate def check(node: LogicalPlan): Boolean = {\n  node match {\n    case Scan(_, _) =\u003e true\n    case Join(left, right, _) =\u003e check(left) \u0026\u0026 check(right)\n    case _ =\u003e false\n  }\n}\n```\n\nThis plan is \"matched\":\n\n```mermaid\ngraph TD\n    subgraph Group#2\n        Expr#2[\"SCAN\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#1\n        Expr#1[\"SCAN\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN\"]\n    end\n    subgraph Group#6\n        Expr#6[\"PROJECT\"]\n    end\n    Expr#6 --\u003e Group#5\n```\n\nWhile this plan is not:\n\n```mermaid\ngraph TD\n    subgraph Group#2\n        Expr#2[\"SCAN\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#3\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"SCAN\"]\n    end\n    subgraph Group#7\n        Expr#7[\"PROJECT\"]\n    end\n    Expr#7 --\u003e Group#6\n    subgraph Group#1\n        Expr#1[\"SCAN\"]\n    end\n    subgraph Group#3\n        Expr#3[\"PROJECT\"]\n    end\n    Expr#3 --\u003e Group#2\n    subgraph Group#6\n        Expr#6[\"JOIN\"]\n    end\n    Expr#6 --\u003e Group#1\n    Expr#6 --\u003e Group#5\n```\n\n#### Plan enumerations\n\nAs we've said before, the exploration method is:\n\n- For each group, apply transformation rules to find all equivalent group expression and add to equivalent set until we\n  couldn't find any new equivalent plan\n- For each group expression, explore all child groups\n\nAnd here is exploration code (quite simple, huh):\n\n```scala\nprivate def exploreGroup(\n                          group: Group,\n                          rules: Seq[TransformationRule],\n                          round: Int\n                        )(implicit ctx: VolcanoPlannerContext): Unit = {\n  while (!group.explorationMark.isExplored(round)) {\n    group.explorationMark.markExplored(round)\n    // explore all child groups\n    group.equivalents.foreach { equivalent =\u003e\n      if (!equivalent.explorationMark.isExplored(round)) {\n        equivalent.explorationMark.markExplored(round)\n        equivalent.children.foreach { child =\u003e\n          exploreGroup(child, rules, round)\n          if (equivalent.explorationMark.isExplored(round) \u0026\u0026 child.explorationMark.isExplored(round)) {\n            equivalent.explorationMark.markExplored(round)\n          } else {\n            equivalent.explorationMark.markUnexplored(round)\n          }\n        }\n      }\n      // fire transformation rules to explore all the possible transformations\n      rules.foreach { rule =\u003e\n        if (!equivalent.appliedTransformations.contains(rule) \u0026\u0026 rule.`match`(equivalent)) {\n          val transformed = rule.transform(equivalent)\n          if (!group.equivalents.contains(transformed)) {\n            group.equivalents += transformed\n            transformed.explorationMark.markUnexplored(round)\n            group.explorationMark.markUnexplored(round)\n          }\n        }\n      }\n      if (group.explorationMark.isExplored(round) \u0026\u0026 equivalent.explorationMark.isExplored(round)) {\n        group.explorationMark.markExplored(round)\n      } else {\n        group.explorationMark.markUnexplored(round)\n      }\n    }\n  }\n}\n```\n\nSee [VolcanoPlanner.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2FVolcanoPlanner.scala) for more details\n\n#### Implement some transformation rules\n\nNow it's time to implement some transformation rules\n\n##### Projection pushdown\n\nProjection pushdown is a simple transformation rule, used to push the projection down to storage layer.\n\nFor example, the query\n\n```sql\nSELECT field1, field2\nfrom tbl\n```\n\nhas the plan\n\n```mermaid\ngraph LR\n    project[PROJECT field1, field2]\n    scan[SCAN tbl]\n    project --\u003e scan\n```\n\nWith this plan, when executing, rows from storage layer (under SCAN) will be fully fetched, and then unnecessary fields\nwill be dropped (PROJECT). The unnecessary data is still have to move from SCAN node to PROJECT node, so there are some\nwasted efforts here.\n\nWe can make it better by just simply tell the storage layer only fetch the necessary fields. Now the plan will be\ntransformed to:\n\n```mermaid\ngraph LR\n    project[PROJECT field1, field2]\n    scan[\"SCAN tbl(field1, field2)\"]\n    project --\u003e scan\n```\n\nLet's go into the code:\n\n```scala\noverride def `match`(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): Boolean = {\n  val plan = expression.plan\n  plan match {\n    case Project(_, child) =\u003e check(child)\n    case _ =\u003e false\n  }\n}\n\n// check if the tree only contains SCAN and JOIN nodes\nprivate def check(node: LogicalPlan): Boolean = {\n  node match {\n    case Scan(_, _) =\u003e true\n    case Join(left, right, _) =\u003e check(left) \u0026\u0026 check(right)\n    case _ =\u003e false\n  }\n}\n```\n\nOur projection pushdown rule here, will match the plan when it's the PROJECT node, and all of its descendants are SCAN\nand JOIN node only.\n\n**notes:** Actually the real projection pushdown match is more complex, but for the sake of simplicity, the match rule\nhere is just PROJECT node with SCAN and JOIN descendants\n\nAnd here is the transform code:\n\n```scala\noverride def transform(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): GroupExpression = {\n  val plan = expression.plan.asInstanceOf[Project]\n  val pushDownProjection = mutable.ListBuffer[FieldID]()\n  extractProjections(plan, pushDownProjection)\n  val newPlan = Project(plan.fields, pushDown(pushDownProjection.distinct, plan.child))\n  ctx.memo.getOrCreateGroupExpression(newPlan)\n}\n\nprivate def extractProjections(node: LogicalPlan, buffer: mutable.ListBuffer[FieldID]): Unit = {\n  node match {\n    case Scan(_, _) =\u003e (): Unit\n    case Project(fields, parent) =\u003e\n      buffer ++= fields\n      extractProjections(parent, buffer)\n    case Join(left, right, on) =\u003e\n      buffer ++= on.map(_._1) ++ on.map(_._2)\n      extractProjections(left, buffer)\n      extractProjections(right, buffer)\n  }\n}\n\nprivate def pushDown(pushDownProjection: Seq[FieldID], node: LogicalPlan): LogicalPlan = {\n  node match {\n    case Scan(table, tableProjection) =\u003e\n      val filteredPushDownProjection = pushDownProjection.filter(_.table == table).map(_.id)\n      val updatedProjection =\n        if (filteredPushDownProjection.contains(\"*\") || filteredPushDownProjection.contains(\"*.*\")) {\n          Seq.empty\n        } else {\n          (tableProjection ++ filteredPushDownProjection).distinct\n        }\n      Scan(table, updatedProjection)\n    case Join(left, right, on) =\u003e Join(pushDown(pushDownProjection, left), pushDown(pushDownProjection, right), on)\n    case _ =\u003e throw new Exception(\"should never happen\")\n  }\n}\n```\n\nThe transform code will first find all projections from the root PROJECT node, and then push them down to all SCAN nodes\nunder it.\n\nVisualizing our rule, for example, the plan\n\n```mermaid\ngraph TD\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#6\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#6 --\u003e Group#5\n```\n\nafter applying projection pushdown transformation, will result in a new equivalent plan with the projections are pushed\ndown to the SCAN operations (the new plan is the tree with orange border nodes).\n\n```mermaid\ngraph TD\n    subgraph Group#8\n        Expr#8[\"SCAN tbl2 (id, field1, field2)\"]\n    end\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#11\n        Expr#11[\"JOIN\"]\n    end\n    Expr#11 --\u003e Group#7\n    Expr#11 --\u003e Group#10\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#7\n        Expr#7[\"SCAN tbl1 (id, field1)\"]\n    end\n    subgraph Group#10\n        Expr#10[\"JOIN\"]\n    end\n    Expr#10 --\u003e Group#8\n    Expr#10 --\u003e Group#9\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#9\n        Expr#9[\"SCAN tbl3 (id, field2)\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#6\n        Expr#12[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#12 --\u003e Group#11\n    Expr#6 --\u003e Group#5\n    style Expr#12 stroke-width: 4px, stroke: orange\n    style Expr#8 stroke-width: 4px, stroke: orange\n    style Expr#10 stroke-width: 4px, stroke: orange\n    style Expr#9 stroke-width: 4px, stroke: orange\n    style Expr#11 stroke-width: 4px, stroke: orange\n    style Expr#7 stroke-width: 4px, stroke: orange\n    linkStyle 0 stroke-width: 4px, stroke: orange\n    linkStyle 1 stroke-width: 4px, stroke: orange\n    linkStyle 6 stroke-width: 4px, stroke: orange\n    linkStyle 7 stroke-width: 4px, stroke: orange\n    linkStyle 8 stroke-width: 4px, stroke: orange\n```\n\nSee [ProjectionPushDown.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Frules%2Ftransform%2FProjectionPushDown.scala)\nfor full implementation\n\n##### Join reorder\n\nJoin reorder is also one of the most recognized transformation in the world of query planner. Our planner, will also\nimplement a reorder transformation rule.\n\nSince Join reorder in real world is not an easy piece to implement. So we will implement a simple, rip-off version of\njoin reorder rule here.\n\nFirst, the rule `match`:\n\n```scala\n// check if the tree only contains SCAN and JOIN nodes, and also extract all SCAN nodes and JOIN conditions\nprivate def checkAndExtract(\n                             node: LogicalPlan,\n                             buffer: mutable.ListBuffer[Scan],\n                             joinCondBuffer: mutable.ListBuffer[(ql.FieldID, ql.FieldID)]\n                           ): Boolean = {\n  node match {\n    case node@Scan(_, _) =\u003e\n      buffer += node\n      true\n    case Join(left, right, on) =\u003e\n      joinCondBuffer ++= on\n      checkAndExtract(left, buffer, joinCondBuffer) \u0026\u0026 checkAndExtract(right, buffer, joinCondBuffer)\n    case _ =\u003e false\n  }\n}\n\nprivate def buildInterchangeableJoinCond(conditions: Seq[(ql.FieldID, ql.FieldID)]): Seq[Seq[ql.FieldID]] = {\n  val buffer = mutable.ListBuffer[mutable.Set[ql.FieldID]]()\n  conditions.foreach { cond =\u003e\n    val set = buffer.find { set =\u003e\n      set.contains(cond._1) || set.contains(cond._2)\n    } match {\n      case Some(set) =\u003e set\n      case None =\u003e\n        val set = mutable.Set[ql.FieldID]()\n        buffer += set\n        set\n    }\n    set += cond._1\n    set += cond._2\n  }\n  buffer.map(_.toSeq)\n}\n\noverride def `match`(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): Boolean = {\n  val plan = expression.plan\n  plan match {\n    case node@Join(_, _, _) =\u003e\n      val buffer = mutable.ListBuffer[Scan]()\n      val joinCondBuffer = mutable.ListBuffer[(ql.FieldID, ql.FieldID)]()\n      if (checkAndExtract(node, buffer, joinCondBuffer)) {\n        // only match if the join is 3 tables join\n        if (buffer.size == 3) {\n          var check = true\n          val interChangeableCond = buildInterchangeableJoinCond(joinCondBuffer)\n          interChangeableCond.foreach { c =\u003e\n            check \u0026= c.size == 3\n          }\n          check\n        } else {\n          false\n        }\n      } else {\n        false\n      }\n    case _ =\u003e false\n  }\n}\n```\n\nOur rule will only be matched, if we match the 3-way JOIN (the number of involved table must be 3, and the join\ncondition must be 3-way, such as `tbl1.field1 = tbl2.field2 = tbl3.field3`)\n\nFor example,\n\n```sql\ntbl1\n    JOIN tbl2 ON tbl1.field1 = tbl2.field2\n    JOIN tbl3 ON tbl1.field1 = tbl3.field3\n```\n\nThe join statement here will be \"matched\" since it's 3-way JOIN (it's the join between `tbl1`, `tbl2`, `tbl3`, and the\ncondition is `tbl1.field1 = tbl2.field2 = tbl3.field3`)\n\nNext, is the transform code:\n\n```scala\noverride def transform(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): GroupExpression = {\n  val plan = expression.plan.asInstanceOf[Join]\n  val buffer = mutable.ListBuffer[Scan]()\n  val joinCondBuffer = mutable.ListBuffer[(ql.FieldID, ql.FieldID)]()\n  checkAndExtract(plan, buffer, joinCondBuffer)\n  val interChangeableCond = buildInterchangeableJoinCond(joinCondBuffer)\n  //\n  val scans = buffer.toList\n  implicit val ord: Ordering[Scan] = new Ordering[Scan] {\n    override def compare(x: Scan, y: Scan): Int = {\n      val xStats = ctx.statsProvider.tableStats(x.table.id)\n      val yStats = ctx.statsProvider.tableStats(y.table.id)\n      xStats.estimatedTableSize.compareTo(yStats.estimatedTableSize)\n    }\n  }\n\n  def getJoinCond(left: Scan, right: Scan): Seq[(ql.FieldID, ql.FieldID)] = {\n    val leftFields = interChangeableCond.flatMap { c =\u003e\n      c.filter(p =\u003e p.table == left.table)\n    }\n    val rightFields = interChangeableCond.flatMap { c =\u003e\n      c.filter(p =\u003e p.table == right.table)\n    }\n    if (leftFields.length != rightFields.length) {\n      throw new Exception(s\"leftFields.length(${leftFields.length}) != rightFields.length(${rightFields.length})\")\n    } else {\n      leftFields zip rightFields\n    }\n  }\n\n  val sorted = scans.sorted\n  val newPlan = Join(\n    sorted(0),\n    Join(\n      sorted(1),\n      sorted(2),\n      getJoinCond(sorted(1), sorted(2))\n    ),\n    getJoinCond(sorted(0), sorted(1))\n  )\n  ctx.memo.getOrCreateGroupExpression(newPlan)\n}\n```\n\nThe transform code here, will reorder the tables by its estimated size.\n\nFor example, if we have 3 tables A, B, C with estimated size of 300b, 100b, 200b and a JOIN statement `A JOIN B JOIN C`,\nthen it will be transformed into `B JOIN C JOIN A`\n\n**notes:** You might notice in this code, we've made use of table statistics, to provide a hint to transform the plan.\nIn practical, the planner can use all sorts of statistics to aid its transformation such as table size, row size, null\ncount, histogram, etc.\n\nVisualizing our rule, for example, the plan\n\n```mermaid\ngraph TD\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#6\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#6 --\u003e Group#5\n```\n\nafter join reorder transformation, resulting in\n\n```mermaid\ngraph TD\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n        Expr#8[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    Expr#8 --\u003e Group#2\n    Expr#8 --\u003e Group#7\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#7\n        Expr#7[\"JOIN\"]\n    end\n    Expr#7 --\u003e Group#1\n    Expr#7 --\u003e Group#3\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#6\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#6 --\u003e Group#5\n    style Expr#8 stroke-width: 4px, stroke: orange\n    style Expr#7 stroke-width: 4px, stroke: orange\n    linkStyle 2 stroke-width: 4px, stroke: orange\n    linkStyle 6 stroke-width: 4px, stroke: orange\n    linkStyle 3 stroke-width: 4px, stroke: orange\n    linkStyle 7 stroke-width: 4px, stroke: orange\n```\n\nwe can see that `tbl2 JOIN tbl1 JOIN tbl3` is created from `tbl1 JOIN tbl2 JOIN tbl3` is generated by the\ntransformation (the newly added nodes and edges are indicated by orange lines)\n\nSee [X3TableJoinReorderBySize.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Frules%2Ftransform%2FX3TableJoinReorderBySize.scala)\nfor full implementation\n\n##### Putting all transformations together\n\nNow we can put our transformation rules in one place\n\n```scala\nprivate val transformationRules: Seq[Seq[TransformationRule]] = Seq(\n  Seq(new ProjectionPushDown),\n  Seq(new X3TableJoinReorderBySize)\n)\n```\n\nAnd run them to explore the equivalent groups\n\n```scala\nfor (r \u003c- transformationRules.indices) {\n  exploreGroup(ctx.rootGroup, transformationRules(r), r + 1)\n}\n```\n\nFor example, the plan\n\n```mermaid\ngraph TD\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#6\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#6 --\u003e Group#5\n```\n\nafter being explored, will result in this graph\n\n```mermaid\ngraph TD\n    subgraph Group#8\n        Expr#8[\"SCAN tbl2 (id, field1, field2)\"]\n    end\n    subgraph Group#11\n        Expr#11[\"JOIN\"]\n        Expr#14[\"JOIN\"]\n    end\n    Expr#11 --\u003e Group#7\n    Expr#11 --\u003e Group#10\n    Expr#14 --\u003e Group#8\n    Expr#14 --\u003e Group#12\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n        Expr#16[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    Expr#16 --\u003e Group#2\n    Expr#16 --\u003e Group#13\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#13\n        Expr#15[\"JOIN\"]\n    end\n    Expr#15 --\u003e Group#1\n    Expr#15 --\u003e Group#3\n    subgraph Group#7\n        Expr#7[\"SCAN tbl1 (id, field1)\"]\n    end\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#10\n        Expr#10[\"JOIN\"]\n    end\n    Expr#10 --\u003e Group#8\n    Expr#10 --\u003e Group#9\n    subgraph Group#9\n        Expr#9[\"SCAN tbl3 (id, field2)\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#12\n        Expr#13[\"JOIN\"]\n    end\n    Expr#13 --\u003e Group#7\n    Expr#13 --\u003e Group#9\n    subgraph Group#6\n        Expr#12[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#12 --\u003e Group#11\n    Expr#6 --\u003e Group#5\n    style Expr#12 stroke-width: 4px, stroke: orange\n    style Expr#8 stroke-width: 4px, stroke: orange\n    style Expr#10 stroke-width: 4px, stroke: orange\n    style Expr#13 stroke-width: 4px, stroke: orange\n    style Expr#14 stroke-width: 4px, stroke: orange\n    style Expr#11 stroke-width: 4px, stroke: orange\n    style Expr#9 stroke-width: 4px, stroke: orange\n    style Expr#15 stroke-width: 4px, stroke: orange\n    style Expr#7 stroke-width: 4px, stroke: orange\n    style Expr#16 stroke-width: 4px, stroke: orange\n    linkStyle 0 stroke-width: 4px, stroke: orange\n    linkStyle 15 stroke-width: 4px, stroke: orange\n    linkStyle 12 stroke-width: 4px, stroke: orange\n    linkStyle 1 stroke-width: 4px, stroke: orange\n    linkStyle 16 stroke-width: 4px, stroke: orange\n    linkStyle 13 stroke-width: 4px, stroke: orange\n    linkStyle 2 stroke-width: 4px, stroke: orange\n    linkStyle 6 stroke-width: 4px, stroke: orange\n    linkStyle 3 stroke-width: 4px, stroke: orange\n    linkStyle 10 stroke-width: 4px, stroke: orange\n    linkStyle 7 stroke-width: 4px, stroke: orange\n    linkStyle 14 stroke-width: 4px, stroke: orange\n    linkStyle 11 stroke-width: 4px, stroke: orange\n```\n\nSee [VolcanoPlanner.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2FVolcanoPlanner.scala) for more details\n\n### Optimization phase\n\nAfter exploration phase, we now have a fully expanded tree containing all possible plans, now is the optimization phase.\n\nIn this phase, we will find the best plan for our root group. The optimization process is described as following:\n\n- For each group, we will find the best implementation by choosing the group expressing with the lowest cost\n- For each group expression, first we will enumerate the physical implementations from the logical plan. Then for each\n  physical implementation, we will calculate its cost using its child group costs.\n\nHere is an example\n\n```mermaid\ngraph TD\n    subgraph Group#2[\"Group#2(cost=1)\"]\n        Expr#2[\"Expr#2(cost=1)\"]\n    end\n    subgraph Group#5[\"Group#5(cost=3)\"]\n        Expr#5[\"Expr#5(cost=max(3,2)=3\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    subgraph Group#4[\"Group#4(cost=2)\"]\n        Expr#4[\"Expr#4(cost=max(1,2)=2)\"]\n        Expr#7[\"Expr#7(cost=1+2=3)\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#1[\"Group#1(cost=3)\"]\n        Expr#1[\"Expr#1(cost=3)\"]\n    end\n    subgraph Group#3[\"Group#3(cost=2)\"]\n        Expr#3[\"Expr#3(cost=2)\"]\n    end\n    subgraph Group#6[\"Group#6(cost=4.5)\"]\n        Expr#6[\"Expr#6(cost=3*1.5=4.5)\"]\n    end\n    Expr#6 --\u003e Group#5\n    subgraph Group#8[\"Group#8(cost=1)\"]\n        Expr#8[\"Expr#8(cost=1)\"]\n    end\n    subgraph Group#9[\"Group#9(cost=2)\"]\n        Expr#9[\"Expr#9(cost=2)\"]\n    end\n    Expr#7 --\u003e Group#8\n    Expr#7 --\u003e Group#9\n```\n\nfor example, the `Expr#4` cost is calculated by its child group costs (`Group#2` and `Group#3`) using `max` function.\nAnother example, is the `Group#4`, its cost is calculated by calculating the min value between the costs of its\nequivalent expressions.\n\n#### Physical plan\n\nSince the goal of optimization phase is to produce the best physical plan given the explored group expressions. We can\ndefine the physical plan as following:\n\n```scala\nsealed trait PhysicalPlan {\n  def operator(): Operator\n\n  def children(): Seq[PhysicalPlan]\n\n  def cost(): Cost\n\n  def estimations(): Estimations\n\n  def traits(): Set[String]\n}\n\n```\n\nThe `operator` is the physical operator, which used to execute the plan, we will cover it in later section.\nThen `children` is the list of child plan nodes, they're used to participating in the process of cost calculation. The\nthird attribute is `cost`, `cost` is an object holding cost information (such as CPU cost, Memory cost, IO cost,\netc.). `estimations` is the property holding estimated statistics about the plan (such as row count, row size, etc.),\nit's also participating in cost calculation. Finally, `traits` is a set of physical traits, which affect the\nimplementation rule to affect the physical plan generation process.\n\nNext, we can implement the physical node classes:\n\n```scala\ncase class Scan(\n                 operator: Operator,\n                 cost: Cost,\n                 estimations: Estimations,\n                 traits: Set[String] = Set.empty\n               ) extends PhysicalPlan {\n  override def children(): Seq[PhysicalPlan] = Seq.empty // scan do not receive any child\n}\n\ncase class Project(\n                    operator: Operator,\n                    child: PhysicalPlan,\n                    cost: Cost,\n                    estimations: Estimations,\n                    traits: Set[String] = Set.empty\n                  ) extends PhysicalPlan {\n  override def children(): Seq[PhysicalPlan] = Seq(child)\n}\n\ncase class Join(\n                 operator: Operator,\n                 leftChild: PhysicalPlan,\n                 rightChild: PhysicalPlan,\n                 cost: Cost,\n                 estimations: Estimations,\n                 traits: Set[String] = Set.empty\n               ) extends PhysicalPlan {\n  override def children(): Seq[PhysicalPlan] = Seq(leftChild, rightChild)\n}\n\n```\n\nSee [PhysicalPlan.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Fphysicalplan%2FPhysicalPlan.scala) for\nfull implementation\n\n#### Implementation rule\n\nThe first thing in optimization phase, that is, we have to implement the implementation rules. Implementation rule is\nthe rule to convert from logical plan to physical plans without executing them.\n\nSince we're not directly executing the physical plan in the planner, so we will return the physical plan builder\ninstead, also it's easier to customize the cost function for each node.\n\nHere is the interface of implementation rule:\n\n```scala\ntrait PhysicalPlanBuilder {\n  def build(children: Seq[PhysicalPlan]): Option[PhysicalPlan]\n}\n\ntrait ImplementationRule {\n  def physicalPlanBuilders(expression: GroupExpression)(implicit ctx: VolcanoPlannerContext): Seq[PhysicalPlanBuilder]\n}\n\n```\n\nHere the `PhysicalPlanBuilder` is the interface used to build the physical plan, given the child physical plans.\n\nFor example, the logical JOIN has 2 physical implementations are HASH JOIN and MERGE JOIN\n\n```mermaid\ngraph TD\n    child#1[\"child#1\"]\n    child#2[\"child#2\"]\n    child#3[\"child#3\"]\n    child#4[\"child#4\"]\n    hash_join[\"`HASH JOIN \n    cost=f(cost(child#1),cost(child#2))\n    `\"]\n    merge_join[\"`MERGE JOIN\n    cost=g(cost(child#3),cost(child#4))\n    `\"]\n    hash_join --\u003e child#1\n    hash_join --\u003e child#2\n    merge_join --\u003e child#3\n    merge_join --\u003e child#4\n```\n\nthe HASH JOIN cost is using function `f()` to calculate cost, and MERGE JOIN is using function `g()` to calculate cost,\nboth are using its children as function parameters. So it's easier to code if we're returning just the phyiscal plan\nbuilder from the implementation rule instead of the physical plan.\n\n#### Finding the best plan\n\nAs we've said before, the optimization process is described as following:\n\n- For each group, we will find the best implementation by choosing the group expressing with the lowest cost\n- For each group expression, first we will enumerate the physical implementations from the logical plan. Then for each\n  physical implementation, we will calculate its cost using its child group costs.\n\nAnd here is the code:\n\n```scala\nprivate def implementGroup(group: Group, combinedRule: ImplementationRule)(\n  implicit ctx: VolcanoPlannerContext\n): GroupImplementation = {\n  group.implementation match {\n    case Some(implementation) =\u003e implementation\n    case None =\u003e\n      var bestImplementation = Option.empty[GroupImplementation]\n      group.equivalents.foreach { equivalent =\u003e\n        val physicalPlanBuilders = combinedRule.physicalPlanBuilders(equivalent)\n        val childPhysicalPlans = equivalent.children.map { child =\u003e\n          val childImplementation = implementGroup(child, combinedRule)\n          child.implementation = Option(childImplementation)\n          childImplementation.physicalPlan\n        }\n        // calculate the implementation, and update the best cost for group\n        physicalPlanBuilders.flatMap(_.build(childPhysicalPlans)).foreach { physicalPlan =\u003e\n          val cost = physicalPlan.cost()\n          bestImplementation match {\n            case Some(currentBest) =\u003e\n              if (ctx.costModel.isBetter(currentBest.cost, cost)) {\n                bestImplementation = Option(\n                  GroupImplementation(\n                    physicalPlan = physicalPlan,\n                    cost = cost,\n                    selectedEquivalentExpression = equivalent\n                  )\n                )\n              }\n            case None =\u003e\n              bestImplementation = Option(\n                GroupImplementation(\n                  physicalPlan = physicalPlan,\n                  cost = cost,\n                  selectedEquivalentExpression = equivalent\n                )\n              )\n          }\n        }\n      }\n      bestImplementation.get\n  }\n}\n```\n\nThis code is an exhaustive search code, which is using recursive function to traverse all nodes. At each node (group),\nthe function is called once to get its best plan while also calculate the optimal cost of that group.\n\nFinally, the best plan for our query is the best plan of the root group\n\n```scala\nval implementationRules = new ImplementationRule {\n\n  override def physicalPlanBuilders(\n                                     expression: GroupExpression\n                                   )(implicit ctx: VolcanoPlannerContext): Seq[PhysicalPlanBuilder] = {\n    expression.plan match {\n      case node@Scan(_, _) =\u003e implement.Scan(node)\n      case node@Project(_, _) =\u003e implement.Project(node)\n      case node@Join(_, _, _) =\u003e implement.Join(node)\n    }\n  }\n}\n\nctx.rootGroup.implementation = Option(implementGroup(ctx.rootGroup, implementationRules))\n```\n\nSee [VolcanoPlanner.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2FVolcanoPlanner.scala) for full\nimplementation\n\nHere is an example of the plan after optimization, it's shown the selected logical node, the selected physical operator,\nand the estimated cost\n\n```mermaid\n\ngraph TD\n    Group#6[\"\n    Group #6\nSelected: PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\nOperator: ProjectOperator\nCost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)\n\"]\nGroup#6 --\u003e Group#11\nGroup#11[\"\nGroup #11\nSelected: JOIN\nOperator: HashJoinOperator\nCost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)\n\"]\nGroup#11 --\u003e Group#7\nGroup#11 --\u003e Group#10\nGroup#7[\"\nGroup #7\nSelected: SCAN tbl1 (id, field1)\nOperator: NormalScanOperator\nCost: Cost(cpu=400.00, mem=400000.00, time=1000.00)\n\"]\nGroup#10[\"\nGroup #10\nSelected: JOIN\nOperator: MergeJoinOperator\nTraits: SORTED\nCost: Cost(cpu=640000.00, mem=20000012.00, time=1100000.00)\n\"]\nGroup#10 --\u003e Group#8\nGroup#10 --\u003e Group#9\nGroup#8[\"\nGroup #8\nSelected: SCAN tbl2 (id, field1, field2)\nOperator: NormalScanOperator\nTraits: SORTED\nCost: Cost(cpu=600000.00, mem=12.00, time=1000000.00)\n\"]\nGroup#9[\"\nGroup #9\nSelected: SCAN tbl3 (id, field2)\nOperator: NormalScanOperator\nTraits: SORTED\nCost: Cost(cpu=40000.00, mem=20000000.00, time=100000.00)\n\"]\n```\n\n#### Implement the implementation rules\n\nNext, we will implement some implementation rules.\n\n##### PROJECT\n\nThe first, easiest one is the implementation rule of logical PROJECT\n\n```scala\nobject Project {\n\n  def apply(node: logicalplan.Project)(implicit ctx: VolcanoPlannerContext): Seq[PhysicalPlanBuilder] = {\n    Seq(\n      new ProjectionImpl(node.fields)\n    )\n  }\n}\n\nclass ProjectionImpl(projection: Seq[ql.FieldID]) extends PhysicalPlanBuilder {\n\n  override def build(children: Seq[PhysicalPlan]): Option[PhysicalPlan] = {\n    val child = children.head\n    val selfCost = Cost(\n      estimatedCpuCost = 0,\n      estimatedMemoryCost = 0,\n      estimatedTimeCost = 0\n    ) // assuming the cost of projection is 0\n    val cost = Cost(\n      estimatedCpuCost = selfCost.estimatedCpuCost + child.cost().estimatedCpuCost,\n      estimatedMemoryCost = selfCost.estimatedMemoryCost + child.cost().estimatedMemoryCost,\n      estimatedTimeCost = selfCost.estimatedTimeCost + child.cost().estimatedTimeCost\n    )\n    val estimations = Estimations(\n      estimatedLoopIterations = child.estimations().estimatedLoopIterations,\n      estimatedRowSize = child.estimations().estimatedRowSize // just guessing the value\n    )\n    Some(\n      Project(\n        operator = ProjectOperator(projection, child.operator()),\n        child = child,\n        cost = cost,\n        estimations = estimations,\n        traits = child.traits()\n      )\n    )\n  }\n}\n\n```\n\nThe implementation rule for logical PROJECT, is returning one physical plan builder `ProjectionImpl`. `ProjectionImpl`\ncost calculation is simple, it just inherits the cost from the child node (because the projection is actually not doing\nany intensive operation). Beside that, it also updates the estimation (in this code, estimation is also inherit from the\nchild node)\n\nSee [Project.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Frules%2Fimplement%2FProject.scala) for full\nimplementation\n\n##### JOIN\n\nWriting implementation rule for logical JOIN is way harder than PROJECTION.\n\nOne first reason is, a logical JOIN has many\nphysical implementation, such as HASH JOIN, MERGE JOIN, BROADCAST JOIN, etc.\n\nThe second reason is, estimating cost for\nphysical JOIN is also hard, because it depends on lots of factors such as row count, row size, data histogram, indexes,\ndata layout, etc.\n\nSo, to keep everything simple in this guide, I will only implement 2 physical JOIN: HASH JOIN and MERGE JOIN. The cost\nestimation functions are fictional (just to show how it works, I'm not trying to correct it). And in the MERGE JOIN, all\ndata is assuming to be sorted by join key.\n\nHere is the code:\n\n```scala\nobject Join {\n\n  def apply(node: logicalplan.Join)(implicit ctx: VolcanoPlannerContext): Seq[PhysicalPlanBuilder] = {\n    val leftFields = node.on.map(_._1).map(f =\u003e s\"${f.table.id}.${f.id}\")\n    val rightFields = node.on.map(_._2).map(f =\u003e s\"${f.table.id}.${f.id}\")\n    Seq(\n      new HashJoinImpl(leftFields, rightFields),\n      new MergeJoinImpl(leftFields, rightFields)\n    )\n  }\n}\n\n```\n\nThe HASH JOIN:\n\n```scala\nclass HashJoinImpl(leftFields: Seq[String], rightFields: Seq[String]) extends PhysicalPlanBuilder {\n\n  private def viewSize(plan: PhysicalPlan): Long = {\n    plan.estimations().estimatedLoopIterations * plan.estimations().estimatedRowSize\n  }\n\n  //noinspection ZeroIndexToHead,DuplicatedCode\n  override def build(children: Seq[PhysicalPlan]): Option[PhysicalPlan] = {\n    // reorder the child nodes, the left child is the child with smaller view size (smaller than the right child if we're store all of them in memory)\n    val (leftChild, rightChild) = if (viewSize(children(0)) \u003c viewSize(children(1))) {\n      (children(0), children(1))\n    } else {\n      (children(1), children(0))\n    }\n    val estimatedLoopIterations = Math.max(\n      leftChild.estimations().estimatedLoopIterations,\n      rightChild.estimations().estimatedLoopIterations\n    ) // just guessing the value\n    val estimatedOutRowSize = leftChild.estimations().estimatedRowSize + rightChild.estimations().estimatedRowSize\n    val selfCost = Cost(\n      estimatedCpuCost = leftChild.estimations().estimatedLoopIterations, // cost to hash all record from the smaller view\n      estimatedMemoryCost = viewSize(leftChild), // hash the smaller view, we need to hold the hash table in memory\n      estimatedTimeCost = rightChild.estimations().estimatedLoopIterations\n    )\n    val childCosts = Cost(\n      estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,\n      estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,\n      estimatedTimeCost = 0\n    )\n    val estimations = Estimations(\n      estimatedLoopIterations = estimatedLoopIterations,\n      estimatedRowSize = estimatedOutRowSize\n    )\n    val cost = Cost(\n      estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,\n      estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,\n      estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost\n    )\n    Some(\n      Join(\n        operator = HashJoinOperator(\n          leftChild.operator(),\n          rightChild.operator(),\n          leftFields,\n          rightFields\n        ),\n        leftChild = leftChild,\n        rightChild = rightChild,\n        cost = cost,\n        estimations = estimations,\n        traits = Set.empty // don't inherit trait from children since we're hash join\n      )\n    )\n  }\n}\n\n```\n\nWe can see that the cost function of HASH JOIN is composed of its children costs and estimations\n\n```scala\nval selfCost = Cost(\n  estimatedCpuCost = leftChild.estimations().estimatedLoopIterations, // cost to hash all record from the smaller view\n  estimatedMemoryCost = viewSize(leftChild), // hash the smaller view, we need to hold the hash table in memory\n  estimatedTimeCost = rightChild.estimations().estimatedLoopIterations\n)\n\nval childCosts = Cost(\n  estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,\n  estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,\n  estimatedTimeCost = 0\n)\n\nval estimations = Estimations(\n  estimatedLoopIterations = estimatedLoopIterations,\n  estimatedRowSize = estimatedOutRowSize\n)\n\nval cost = Cost(\n  estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,\n  estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,\n  estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost\n)\n```\n\nNext, the MERGE JOIN:\n\n```scala\nclass MergeJoinImpl(leftFields: Seq[String], rightFields: Seq[String]) extends PhysicalPlanBuilder {\n\n  //noinspection ZeroIndexToHead,DuplicatedCode\n  override def build(children: Seq[PhysicalPlan]): Option[PhysicalPlan] = {\n    val (leftChild, rightChild) = (children(0), children(1))\n    if (leftChild.traits().contains(\"SORTED\") \u0026\u0026 rightChild.traits().contains(\"SORTED\")) {\n      val estimatedTotalRowCount =\n        leftChild.estimations().estimatedLoopIterations +\n          rightChild.estimations().estimatedLoopIterations\n      val estimatedLoopIterations = Math.max(\n        leftChild.estimations().estimatedLoopIterations,\n        rightChild.estimations().estimatedLoopIterations\n      ) // just guessing the value\n      val estimatedOutRowSize = leftChild.estimations().estimatedRowSize + rightChild.estimations().estimatedRowSize\n      val selfCost = Cost(\n        estimatedCpuCost = 0, // no additional cpu cost, just scan from child iterator\n        estimatedMemoryCost = 0, // no additional memory cost\n        estimatedTimeCost = estimatedTotalRowCount\n      )\n      val childCosts = Cost(\n        estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,\n        estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,\n        estimatedTimeCost = 0\n      )\n      val estimations = Estimations(\n        estimatedLoopIterations = estimatedLoopIterations,\n        estimatedRowSize = estimatedOutRowSize\n      )\n      val cost = Cost(\n        estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,\n        estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,\n        estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost\n      )\n      Some(\n        Join(\n          operator = MergeJoinOperator(\n            leftChild.operator(),\n            rightChild.operator(),\n            leftFields,\n            rightFields\n          ),\n          leftChild = leftChild,\n          rightChild = rightChild,\n          cost = cost,\n          estimations = estimations,\n          traits = leftChild.traits() ++ rightChild.traits()\n        )\n      )\n    } else {\n      None\n    }\n  }\n}\n\n```\n\nSame with HASH JOIN, MERGE JOIN also uses its children costs and estimations to calculate its cost, but with different\nformulla:\n\n```scala\nval selfCost = Cost(\n  estimatedCpuCost = 0, // no additional cpu cost, just scan from child iterator\n  estimatedMemoryCost = 0, // no additional memory cost\n  estimatedTimeCost = estimatedTotalRowCount\n)\n\nval childCosts = Cost(\n  estimatedCpuCost = leftChild.cost().estimatedCpuCost + rightChild.cost().estimatedCpuCost,\n  estimatedMemoryCost = leftChild.cost().estimatedMemoryCost + rightChild.cost().estimatedMemoryCost,\n  estimatedTimeCost = 0\n)\n\nval estimations = Estimations(\n  estimatedLoopIterations = estimatedLoopIterations,\n  estimatedRowSize = estimatedOutRowSize\n)\n\nval cost = Cost(\n  estimatedCpuCost = selfCost.estimatedCpuCost + childCosts.estimatedCpuCost,\n  estimatedMemoryCost = selfCost.estimatedMemoryCost + childCosts.estimatedMemoryCost,\n  estimatedTimeCost = selfCost.estimatedTimeCost + childCosts.estimatedTimeCost\n)\n```\n\nSee [HashJoinImpl.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Fphysicalplan%2Fbuilder%2FHashJoinImpl.scala)\nand [MergeJoinImpl.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Fphysicalplan%2Fbuilder%2FMergeJoinImpl.scala)\nfor full implementation\n\n##### Others\n\nYou can see other rules and physical plan builders here:\n\n- [implement](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Frules%2Fimplement)\n- [builder](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fplanner%2Fvolcano%2Fphysicalplan%2Fbuilder)\n\n#### Putting all pieces together\n\nNow, after done implementing the implementation rules, now we can find our best plan. Let's start over from the user\nquery\n\n```sql\nSELECT tbl1.id,\n       tbl1.field1,\n       tbl2.id,\n       tbl2.field1,\n       tbl2.field2,\n       tbl3.id,\n       tbl3.field2,\n       tbl3.field2\nFROM tbl1\n         JOIN tbl2 ON tbl1.id = tbl2.id\n         JOIN tbl3 ON tbl2.id = tbl3.id\n```\n\nwill be converted to the logical plan\n\n```mermaid\ngraph TD\n    1326583549[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"];\n    -425111028[\"JOIN\"];\n    -349388609[\"SCAN tbl1\"];\n    1343755644[\"JOIN\"];\n    -1043437086[\"SCAN tbl2\"];\n    -1402686787[\"SCAN tbl3\"];\n    1326583549 --\u003e -425111028;\n    -425111028 --\u003e -349388609;\n    -425111028 --\u003e 1343755644;\n    1343755644 --\u003e -1043437086;\n    1343755644 --\u003e -1402686787;\n```\n\nAfter exploration phase, it will generate lots of equivalent plans\n\n```mermaid\ngraph TD\n    subgraph Group#8\n        Expr#8[\"SCAN tbl2 (id, field1, field2)\"]\n    end\n    subgraph Group#11\n        Expr#11[\"JOIN\"]\n        Expr#14[\"JOIN\"]\n    end\n    Expr#11 --\u003e Group#7\n    Expr#11 --\u003e Group#10\n    Expr#14 --\u003e Group#8\n    Expr#14 --\u003e Group#12\n    subgraph Group#2\n        Expr#2[\"SCAN tbl2\"]\n    end\n    subgraph Group#5\n        Expr#5[\"JOIN\"]\n        Expr#16[\"JOIN\"]\n    end\n    Expr#5 --\u003e Group#1\n    Expr#5 --\u003e Group#4\n    Expr#16 --\u003e Group#2\n    Expr#16 --\u003e Group#13\n    subgraph Group#4\n        Expr#4[\"JOIN\"]\n    end\n    Expr#4 --\u003e Group#2\n    Expr#4 --\u003e Group#3\n    subgraph Group#13\n        Expr#15[\"JOIN\"]\n    end\n    Expr#15 --\u003e Group#1\n    Expr#15 --\u003e Group#3\n    subgraph Group#7\n        Expr#7[\"SCAN tbl1 (id, field1)\"]\n    end\n    subgraph Group#1\n        Expr#1[\"SCAN tbl1\"]\n    end\n    subgraph Group#10\n        Expr#10[\"JOIN\"]\n    end\n    Expr#10 --\u003e Group#8\n    Expr#10 --\u003e Group#9\n    subgraph Group#9\n        Expr#9[\"SCAN tbl3 (id, field2)\"]\n    end\n    subgraph Group#3\n        Expr#3[\"SCAN tbl3\"]\n    end\n    subgraph Group#12\n        Expr#13[\"JOIN\"]\n    end\n    Expr#13 --\u003e Group#7\n    Expr#13 --\u003e Group#9\n    subgraph Group#6\n        Expr#12[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n        Expr#6[\"PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\"]\n    end\n    Expr#12 --\u003e Group#11\n    Expr#6 --\u003e Group#5\n    style Expr#12 stroke-width: 4px, stroke: orange\n    style Expr#8 stroke-width: 4px, stroke: orange\n    style Expr#10 stroke-width: 4px, stroke: orange\n    style Expr#13 stroke-width: 4px, stroke: orange\n    style Expr#14 stroke-width: 4px, stroke: orange\n    style Expr#11 stroke-width: 4px, stroke: orange\n    style Expr#9 stroke-width: 4px, stroke: orange\n    style Expr#15 stroke-width: 4px, stroke: orange\n    style Expr#7 stroke-width: 4px, stroke: orange\n    style Expr#16 stroke-width: 4px, stroke: orange\n    linkStyle 0 stroke-width: 4px, stroke: orange\n    linkStyle 15 stroke-width: 4px, stroke: orange\n    linkStyle 12 stroke-width: 4px, stroke: orange\n    linkStyle 1 stroke-width: 4px, stroke: orange\n    linkStyle 16 stroke-width: 4px, stroke: orange\n    linkStyle 13 stroke-width: 4px, stroke: orange\n    linkStyle 2 stroke-width: 4px, stroke: orange\n    linkStyle 6 stroke-width: 4px, stroke: orange\n    linkStyle 3 stroke-width: 4px, stroke: orange\n    linkStyle 10 stroke-width: 4px, stroke: orange\n    linkStyle 7 stroke-width: 4px, stroke: orange\n    linkStyle 14 stroke-width: 4px, stroke: orange\n    linkStyle 11 stroke-width: 4px, stroke: orange\n```\n\nAnd the at optimization phase, a final best plan is chose\n\n```mermaid\ngraph TD\n    Group#6[\"\n    Group #6\nSelected: PROJECT tbl1.id, tbl1.field1, tbl2.id, tbl2.field1, tbl2.field2, tbl3.id, tbl3.field2, tbl3.field2\nOperator: ProjectOperator\nCost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)\n\"]\nGroup#6 --\u003e Group#11\nGroup#11[\"\nGroup #11\nSelected: JOIN\nOperator: HashJoinOperator\nCost: Cost(cpu=641400.00, mem=1020400012.00, time=1000000.00)\n\"]\nGroup#11 --\u003e Group#7\nGroup#11 --\u003e Group#10\nGroup#7[\"\nGroup #7\nSelected: SCAN tbl1 (id, field1)\nOperator: NormalScanOperator\nCost: Cost(cpu=400.00, mem=400000.00, time=1000.00)\n\"]\nGroup#10[\"\nGroup #10\nSelected: JOIN\nOperator: MergeJoinOperator\nTraits: SORTED\nCost: Cost(cpu=640000.00, mem=20000012.00, time=1100000.00)\n\"]\nGroup#10 --\u003e Group#8\nGroup#10 --\u003e Group#9\nGroup#8[\"\nGroup #8\nSelected: SCAN tbl2 (id, field1, field2)\nOperator: NormalScanOperator\nTraits: SORTED\nCost: Cost(cpu=600000.00, mem=12.00, time=1000000.00)\n\"]\nGroup#9[\"\nGroup #9\nSelected: SCAN tbl3 (id, field2)\nOperator: NormalScanOperator\nTraits: SORTED\nCost: Cost(cpu=40000.00, mem=20000000.00, time=100000.00)\n\"]\n```\n\n### Bonus: query execution\n\nNow we've done building a functional query planner which can optimize the query from user, but our query plan could not\nrun by itself. So it's the reason why now we will implement the query processor to test out our query plan.\n\nBasically the query process receive input from the query planner, and execute them\n\n```mermaid\ngraph LR\n    plan((\"Physical Plan\"))\n    storage[(\"Storage Layer\")]\n    processor[\"Query Processor\"]\n    plan -- execute --\u003e processor\n    storage -- fetch --\u003e processor\n```\n\n#### Volcano/Iterator model\n\nVolcano/iterator model is the query processing model that is widely used in many DBMS. It is a pipeline architecture,\nwhich means that the data is processed in stages, with each stage passing the output of the previous stage to the next\nstage.\n\nEach stage in the pipeline is represented by an operator. Operators are functions that perform a specific operation on\nthe data, such as selecting rows, filtering rows, or aggregating rows.\n\nUsually, operator can be formed directly from the query plan. For example, the query\n\n```sql\nSELECT field_1\nFROM tbl\nWHERE field = 1\n```\n\nwill have the plan\n\n```mermaid\ngraph TD\n  project[\"PROJECT: field_1\"]\n  scan[\"SCAN: tbl\"]\n  filter[\"FILTER: field = 1\"]\n  project --\u003e scan\n  filter --\u003e project\n```\n\nwill create a chain of operators like this:\n\n```text\nscan = {\n    next() // fetch next row from table \"tbl\"\n}\n\nproject = {\n    next() = {\n        next_row = scan.next() // fetch next row from scan operator\n        projected = next_row[\"field_1\"]\n        return projected\n    }\n}\n\nfilter = {\n    next() = {\n        next_row = {}\n        do {\n            next_row = project.next() // fetch next row from project operator\n        } while (next_row[\"field\"] != 1)\n        return next_row\n    }\n}\n\nresults = []\nwhile (row = filter.next()) {\n    results.append(row)\n}\n```\n\n**notes**: this pseudo code did not handle for end of result stream\n\n#### The operators\n\nThe basic interface of an operator is described as following:\n\n```scala\ntrait Operator {\n  def next(): Option[Seq[Any]]\n}\n\n```\n\nSee [Operator.scala](core%2Fsrc%2Fmain%2Fscala%2Fcore%2Fexecution%2FOperator.scala) for full implementation of all\noperators\n\n#### Testing a simple query\n\nLet's define a query\n\n```sql\nSELECT emp.id,\n       emp.code,\n       dept.dept_name,\n       emp_info.name,\n       emp_info.origin\nFROM emp\n         JOIN dept ON emp.id = dept.emp_id\n         JOIN emp_info ON dept.emp_id = emp_info.id\n```\n\nwith some data and stats\n\n```scala\nval table1: Datasource = Datasource(\n  table = \"emp\",\n  catalog = TableCatalog(\n    Seq(\n      \"id\" -\u003e classOf[String],\n      \"code\" -\u003e classOf[String]\n    ),\n    metadata = Map(\"sorted\" -\u003e \"true\") // assumes rows are already sorted by id\n  ),\n  rows = Seq(\n    Seq(\"1\", \"Emp A\"),\n    Seq(\"2\", \"Emp B\"),\n    Seq(\"3\", \"Emp C\")\n  ),\n  stats = TableStats(\n    estimatedRowCount = 3,\n    avgColumnSize = Map(\"id\" -\u003e 10, \"code\" -\u003e 32)\n  )\n)\n\nval table2: Datasource = Datasource(\n  table = \"dept\",\n  catalog = TableCatalog(\n    Seq(\n      \"emp_id\" -\u003e classOf[String],\n      \"dept_name\" -\u003e classOf[String]\n    ),\n    metadata = Map(\"sorted\" -\u003e \"true\") // assumes rows are already sorted by emp_id (this is just a fake trait to demonstrate how trait works)\n  ),\n  rows = Seq(\n    Seq(\"1\", \"Dept 1\"),\n    Seq(\"1\", \"Dept 2\"),\n    Seq(\"2\", \"Dept 3\"),\n    Seq(\"3\", \"Dept 3\")\n  ),\n  stats = TableStats(\n    estimatedRowCount = 4,\n    avgColumnSize = Map(\"emp_id\" -\u003e 10, \"dept_name\" -\u003e 255)\n  )\n)\n\nval table3: Datasource = Datasource(\n  table = \"emp_info\",\n  catalog = TableCatalog(\n    Seq(\n      \"id\" -\u003e classOf[String],\n      \"name\" -\u003e classOf[String],\n      \"origin\" -\u003e classOf[String]\n    ),\n    metadata = Map(\"sorted\" -\u003e \"true\") // assumes rows are already sorted by id (this is just a fake trait to demonstrate how trait works)\n  ),\n  rows = Seq(\n    Seq(\"1\", \"AAAAA\", \"Country A\"),\n    Seq(\"2\", \"BBBBB\", \"Country A\"),\n    Seq(\"3\", \"CCCCC\", \"Country B\")\n  ),\n  stats = TableStats(\n    estimatedRowCount = 3,\n    avgColumnSize = Map(\"id\" -\u003e 10, \"name\" -\u003e 255, \"origin\" -\u003e 255)\n  )\n)\n```\n\nThe cost model is optimized for CPU\n\n```scala\nval costModel: CostModel = (currentCost: Cost, newCost: Cost) =\u003e {\n  currentCost.estimatedCpuCost \u003e newCost.estimatedCpuCost\n}\n```\n\nNow, executing the query by running this code:\n\n```scala\nval planner = new VolcanoPlanner\nQueryParser.parse(query) match {\n  case Left(err) =\u003e throw err\n  case Right(parsed) =\u003e\n    val operator = planner.getPlan(parsed)\n    val result = Utils.execute(operator)\n    // print result\n    println(result._1.mkString(\",\"))\n    result._2.foreach(row =\u003e println(row.mkString(\",\")))\n}\n```\n\nit will print:\n\n```text\nemp.id,emp.code,dept.dept_name,emp_info.name,emp_info.origin\n1,Emp A,Dept 1,AAAAA,Country A\n1,Emp A,Dept 2,AAAAA,Country A\n2,Emp B,Dept 3,BBBBB,Country A\n3,Emp C,Dept 3,CCCCC,Country B\n```\n\nVoila, We've done building a fully functional query planner and query engine :). You can start writing one for your own,\ngood luck\n\nSee [Demo.scala](demo%2Fsrc%2Fmain%2Fscala%2Fdemo%2FDemo.scala) for full demo code\n\n# Thanks\n\nThanks for reading this, this guide is quite long, and not fully correct, but I've tried my best to write it as\nunderstandably as possible :beers:\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuannh982%2Fquery-planner-guide","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftuannh982%2Fquery-planner-guide","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftuannh982%2Fquery-planner-guide/lists"}