{"id":37425535,"url":"https://github.com/liancheng/spear","last_synced_at":"2026-01-16T06:13:13.405Z","repository":{"id":37664152,"uuid":"47673843","full_name":"liancheng/spear","owner":"liancheng","description":"A playground for experimenting ideas that may apply to Spark SQL/Catalyst","archived":false,"fork":false,"pushed_at":"2018-07-05T05:02:50.000Z","size":1926,"stargazers_count":137,"open_issues_count":10,"forks_count":54,"subscribers_count":13,"default_branch":"master","last_synced_at":"2023-10-20T21:14:23.895Z","etag":null,"topics":["query-optimizer","sql"],"latest_commit_sha":null,"homepage":null,"language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/liancheng.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-12-09T06:44:57.000Z","updated_at":"2023-10-08T08:54:27.000Z","dependencies_parsed_at":"2022-09-09T04:01:06.555Z","dependency_job_id":null,"html_url":"https://github.com/liancheng/spear","commit_stats":null,"previous_names":[],"tags_count":0,"template":null,"template_full_name":null,"purl":"pkg:github/liancheng/spear","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liancheng%2Fspear","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liancheng%2Fspear/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liancheng%2Fspear/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liancheng%2Fspear/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/liancheng","download_url":"https://codeload.github.com/liancheng/spear/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/liancheng%2Fspear/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28477633,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-16T03:13:13.607Z","status":"ssl_error","status_checked_at":"2026-01-16T03:11:47.863Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["query-optimizer","sql"],"created_at":"2026-01-16T06:13:12.667Z","updated_at":"2026-01-16T06:13:13.386Z","avatar_url":"https://github.com/liancheng.png","language":"Scala","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Overview\n\n[![Build Status][travis-ci-badge]][travis-ci] [![codecov.io][codecov-badge]][codecov]\n\n![Codecov.io][codecov-history]\n\n[travis-ci-badge]: https://travis-ci.org/liancheng/spear.svg?branch=master\n[travis-ci]: https://travis-ci.org/liancheng/spear\n[codecov-badge]: https://codecov.io/github/liancheng/spear/coverage.svg?branch=master\n[codecov]: https://codecov.io/github/liancheng/spear?branch=master\n[codecov-history]: https://codecov.io/github/liancheng/spear/branch.svg?branch=master\n\nThis project is a sandbox and playground of mine for experimenting ideas and potential improvements to Spark SQL. It consists of:\n\n- A parser that parses a small SQL dialect into unresolved logical plans\n- A semantic analyzer that resolves unresolved logical plans into resolved ones\n- A query optimizer that optimizes resolved query plans into equivalent but more performant ones\n- A query planner that turns (optimized) logical plans into executable physical plans\n\nCurrently Spear only works with local Scala collections.\n\n# Build\n\nBuilding Spear is as easy as:\n\n```\n$ ./build/sbt package\n```\n\n# Run the REPL\n\nSpear has an Ammonite-based REPL for interactive experiments. To start it:\n\n```\n$ ./build/sbt spear-repl/run\n```\n\nLet's create a simple DataFrame of numbers:\n\n```scala\n@ context range 10 show ()\n```\n\n```\n╒══╕\n│id│\n├──┤\n│ 0│\n│ 1│\n│ 2│\n│ 3│\n│ 4│\n│ 5│\n│ 6│\n│ 7│\n│ 8│\n│ 9│\n╘══╛\n```\n\nA sample query using the DataFrame API:\n\n```scala\n@ context.\n    range(10).\n    select('id as 'key, (rand(42) * 100) cast IntType as 'value).\n    where('value % 2 === 0).\n    orderBy('value.desc).\n    show()\n```\n\n```\n╒═══╤═════╕\n│key│value│\n├───┼─────┤\n│  5│   90│\n│  9│   78│\n│  0│   72│\n│  1│   68│\n│  4│   66│\n│  8│   46│\n│  6│   36│\n│  2│   30│\n╘═══╧═════╛\n```\n\nEquivalent sample query using SQL:\n\n```scala\n@ context range 10 asTable 't // Registers a temporary table first\n\n@ context.sql(\n    \"\"\"SELECT * FROM (\n      |  SELECT id AS key, CAST(RAND(42) * 100 AS INT) AS value FROM t\n      |) s\n      |WHERE value % 2 = 0\n      |ORDER BY value DESC\n      |\"\"\".stripMargin\n  ).show()\n```\n\n```\n╒═══╤═════╕\n│key│value│\n├───┼─────┤\n│  5│   90│\n│  9│   78│\n│  0│   72│\n│  1│   68│\n│  4│   66│\n│  8│   46│\n│  6│   36│\n│  2│   30│\n╘═══╧═════╛\n```\n\nWe can also check the query plan using `explain()`:\n\n```scala\n@ context.\n    range(10).\n    select('id as 'key, (rand(42) * 100) cast IntType as 'value).\n    where('value % 2 === 0).\n    orderBy('value.desc).\n    explain(true)\n```\n\n```\n# Logical plan\nSort: order=[$0] ⇒ [?output?]\n│ ╰╴$0: `value` DESC NULLS FIRST\n╰╴Filter: condition=$0 ⇒ [?output?]\n  │ ╰╴$0: ((`value` % 2:INT) = 0:INT)\n  ╰╴Project: projectList=[$0, $1] ⇒ [?output?]\n    │ ├╴$0: (`id` AS `key`#11)\n    │ ╰╴$1: (CAST((RAND(42:INT) * 100:INT) AS INT) AS `value`#12)\n    ╰╴LocalRelation: data=\u003clocal-data\u003e ⇒ [`id`#10:BIGINT!]\n\n# Analyzed plan\nSort: order=[$0] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n│ ╰╴$0: `value`#12:INT! DESC NULLS FIRST\n╰╴Filter: condition=$0 ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n  │ ╰╴$0: ((`value`#12:INT! % 2:INT) = 0:INT)\n  ╰╴Project: projectList=[$0, $1] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n    │ ├╴$0: (`id`#10:BIGINT! AS `key`#11)\n    │ ╰╴$1: (CAST((RAND(CAST(42:INT AS BIGINT)) * CAST(100:INT AS DOUBLE)) AS INT) AS `value`#12)\n    ╰╴LocalRelation: data=\u003clocal-data\u003e ⇒ [`id`#10:BIGINT!]\n\n# Optimized plan\nSort: order=[$0] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n│ ╰╴$0: `value`#12:INT! DESC NULLS FIRST\n╰╴Filter: condition=$0 ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n  │ ╰╴$0: ((`value`#12:INT! % 2:INT) = 0:INT)\n  ╰╴Project: projectList=[$0, $1] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n    │ ├╴$0: (`id`#10:BIGINT! AS `key`#11)\n    │ ╰╴$1: (CAST((RAND(42:BIGINT) * 100.0:DOUBLE) AS INT) AS `value`#12)\n    ╰╴LocalRelation: data=\u003clocal-data\u003e ⇒ [`id`#10:BIGINT!]\n\n# Physical plan\nSort: order=[$0] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n│ ╰╴$0: `value`#12:INT! DESC NULLS FIRST\n╰╴Filter: condition=$0 ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n  │ ╰╴$0: ((`value`#12:INT! % 2:INT) = 0:INT)\n  ╰╴Project: projectList=[$0, $1] ⇒ [`key`#11:BIGINT!, `value`#12:INT!]\n    │ ├╴$0: (`id`#10:BIGINT! AS `key`#11)\n    │ ╰╴$1: (CAST((RAND(42:BIGINT) * 100.0:DOUBLE) AS INT) AS `value`#12)\n    ╰╴LocalRelation: data=\u003clocal-data\u003e ⇒ [`id`#10:BIGINT!]\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliancheng%2Fspear","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fliancheng%2Fspear","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fliancheng%2Fspear/lists"}