{"id":13506079,"url":"https://github.com/RCHowell/Sift","last_synced_at":"2025-03-30T03:30:30.238Z","repository":{"id":83301868,"uuid":"383863555","full_name":"RCHowell/Sift","owner":"RCHowell","description":"Sift is a basic, Relational Algebra based query engine built on top of Apache Arrow. It draws inspiration from Andy Grove's KQuery.","archived":false,"fork":false,"pushed_at":"2022-05-01T21:37:01.000Z","size":444,"stargazers_count":21,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-11-01T04:35:00.437Z","etag":null,"topics":["query","relation-algebra","sql"],"latest_commit_sha":null,"homepage":"","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/RCHowell.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-07-07T16:38:14.000Z","updated_at":"2024-04-26T10:36:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"4460e930-aa9f-466a-b5aa-d4b622276666","html_url":"https://github.com/RCHowell/Sift","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RCHowell%2FSift","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RCHowell%2FSift/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RCHowell%2FSift/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/RCHowell%2FSift/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/RCHowell","download_url":"https://codeload.github.com/RCHowell/Sift/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246273533,"owners_count":20750904,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["query","relation-algebra","sql"],"created_at":"2024-08-01T01:00:34.210Z","updated_at":"2025-03-30T03:30:29.541Z","avatar_url":"https://github.com/RCHowell.png","language":"Kotlin","readme":"## Preface\n\nI built this as an exercise while studying [Database Systems: The Complete Book](http://infolab.stanford.edu/~ullman/dscb.html) (DSCB) by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom. I also wanted to experiment with Apache Arrow, and I found Andy Grove's [KQuery](https://github.com/andygrove/how-query-engines-work); much of this work is modelled after his engine, and I have left notes where I use some of his constructs. This exercise was more about studying the execution of queries, so little effort was put into the parser and planner. There are currently no plan optimizations, and the language is simply syntactic sugar over the operators of Relation Algebra discussed in DSCB. \n\n## Operations\n- Scan\n- Selection\n- Projection\n- Limit\n- Grouping/Aggregation\n- Distinct\n- Sort (TODO)\n- Join / Union / Difference / Intersection (TODO)\n\n## Language\n\n\u003e Full details in *sift.lang/README.md*\n\nThe purpose of the Sift language is to have a query language that maps near 1:1 to operators of the extended relational algebra discussed in section 5.2 of Garcia-Molina et. al. It is literally an inversion of the query expression tree using the F# (and Elixir) pipe operator to simplify writing nested transformations.\n\nLimitations in the language come from my inability to dedicate time to the parser. Right now, I'm more interested in learning about parser generators. The purpose of the hand-written lexer and parser was to learn some basics.\n\nA query is formed with a relation production followed by transformations. All type data is provided by the **Schema** of a data **Source** which is registered to the query execution environment. The full BNF is at the bottom.\n\n### Shell Example\n\n![](https://i.imgur.com/1RGvkLm.png)\n\n![](https://i.imgur.com/s2yIvwl.png)\n\n### Relation Productions\n\nLet *R(A, B, C)* and *S(B, C, D)* be two relations. Here are some example relation productions, including subqueries.\n```\n# simple scan\n'R'\n\n# joins\n'R' JOIN 'S'\n'R' OUTER JOIN 'S'\n'R' JOIN 'S' ON A = D\n\n# equivalent to the previous join\n'R' X 'S' |\u003e SELECT A = D\n\n# project tuples to same domain prior to union\n('R' |\u003e PROJECT B, C) UNION ('S' |\u003e PROJECT B, C)\n\n# Let T(X, Y) and V(X, Y) be two relations\n'T' X 'V' # cross\n'T' U 'V' # union\n'T' \\ 'V' # difference\n'T' \u0026 'V' # intersection\n```\n\n### Examples\n\n```\nQ: Select all titles produced by Paramount between 1979 and 1982\n\n'Movies'\n  |\u003e SELECT (1979 \u003c= Year \u0026\u0026 Year \u003c= 1982) \u0026\u0026 Studio = 'Paramount'\n  |\u003e PROJECT Title\n```\n\n```\nQ: Get the average, min, and max heights of all players by age and position\n\n'Players'\n  |\u003e PROJECT Height, Age\n  |\u003e GROUP AVG(Height) -\u003e Avg, MIN(Height) -\u003e Shortest, MAX(Height) -\u003e Tallest BY Age, Position\n```\n\n## Execution\n\n\u003e Do 'gradle run --console plain' to run the interactive query shell\n\n### Sample Data\n\nThe sample data is a collection of some fuzzy friends.\n\n```\n┌─────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐\n│Name         │Age         │Gender      │Weight      │Type        │Breed       │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Ramona       │2.00        │F           │8.00        │Cat         │Mini Coon   │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Mochi        │2.00        │F           │45.00       │Dog         │Samoyed     │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Cali         │7.00        │F           │30.00       │Dog         │Vizsla      │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Gretchen     │13.00       │F           │50.00       │Dog         │English     │\n│             │            │            │            │            │Bulldog     │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Cooper       │6.00        │M           │30.00       │Dog         │Beagle      │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Eleanor      │5.00        │F           │24.00       │Dog         │Cocker      │\n│             │            │            │            │            │Spaniel     │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Huckleberry  │7.00        │M           │20.00       │Cat         │Medium Coon │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Madman Mochi │3.00        │M           │14.00       │Cat         │Unknown     │\n└─────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘\n```\n\n### Selection\n\n\u003e You can see I have a bug in the precedence of parsing, but I don't care much about the parser\n\n```\n'Pets' |\u003e SELECT (Type = 'Dog') \u0026\u0026 (Gender = 'F')\n\n┌─────────────┬────────────┬────────────┬────────────┬────────────┬────────────┐\n│Name         │Age         │Gender      │Weight      │Type        │Breed       │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Mochi        │2.00        │F           │45.00       │Dog         │Samoyed     │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Cali         │7.00        │F           │30.00       │Dog         │Vizsla      │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Gretchen     │13.00       │F           │50.00       │Dog         │English     │\n│             │            │            │            │            │Bulldog     │\n├─────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤\n│Eleanor      │5.00        │F           │24.00       │Dog         │Cocker      │\n│             │            │            │            │            │Spaniel     │\n└─────────────┴────────────┴────────────┴────────────┴────────────┴────────────┘\n```\n\n### Projection\n\n\n```\n'Pets'\n  |\u003e SELECT Type = 'Cat'\n  |\u003e PROJECT Name + ' is a ' + Breed + ' kitty cat' -\u003e Greeting\n\n┌──────────────────────────────────────────────────────────────────────────────┐\n│Greeting                                                                      │\n├──────────────────────────────────────────────────────────────────────────────┤\n│Ramona is a Mini Coon kitty cat                                               │\n├──────────────────────────────────────────────────────────────────────────────┤\n│Huckleberry is a Medium Coon kitty cat                                        │\n├──────────────────────────────────────────────────────────────────────────────┤\n│Madman Mochi is a Unknown kitty cat                                           │\n└──────────────────────────────────────────────────────────────────────────────┘\n```\n\n\n### Aggregations\n\n```\n'Pets' |\u003e GROUP MAX(Weight) -\u003e Thiccest BY Type\n\n┌───────────────────────────────────────┬──────────────────────────────────────┐\n│Type                                   │Thiccest                              │\n├───────────────────────────────────────┼──────────────────────────────────────┤\n│Cat                                    │20.00                                 │\n├───────────────────────────────────────┼──────────────────────────────────────┤\n│Dog                                    │50.00                                 │\n└───────────────────────────────────────┴──────────────────────────────────────┘\n```\n\n---\n\n## SiftQL BNF\n\n```\n# Tokens\n\u003cID\u003e      ::= [A-Za-z\\-_]+  # operators, relation and field identifiers\n\u003cSTRING\u003e  ::= '[A-Za-z0-9\\s]+'\n\u003cNUM\u003e     ::= [0-9]+(.[0-9]+)?\n\u003cBOOL\u003e    ::= (TRUE|FALSE|UNKOWN)\n\u003cNULL\u003e    ::= NULL\n\n\u003cQUERY\u003e ::= \u003cRELATION-PRODUCTION\u003e \u003cTRANSFORMS\u003e\n\n\u003cRELATION-PRODUCTION\u003e ::= \u003cRELATION\u003e\n                       |  \u003cJOIN\u003e\n                       |  \u003cCROSS\u003e\n                       |  \u003cUNION\u003e\n                       |  \u003cDIFF\u003e\n                       |  \u003cINTERSECT\u003e\n\n\u003cRELATION\u003e  ::= '\u003cID\u003e'        # quoted identifier\n             |  ( \u003cQUERY\u003e )     # sub-query\n\n\u003cJOIN\u003e      ::= \u003cRELATION\u003e (AS \u003cID\u003e)? (OUTER|LEFT|RIGHT)? JOIN \u003cRELATION\u003e (AS \u003cID\u003e)? (ON \u003cEXPR\u003e)?\n\u003cCROSS\u003e     ::= \u003cRELATION\u003e (X|CROSS) \u003cRELATION\u003e\n\u003cUNION\u003e     ::= \u003cRELATION\u003e (U|UNION) \u003cRELATION\u003e\n\u003cDIFF\u003e      ::= \u003cRELATION\u003e (\\|DIFF) \u003cRELATION\u003e\n\u003cINTERSECT\u003e ::= \u003cRELATION\u003e (\u0026|INTERSECT) \u003cRELATION\u003e\n\n\u003cTRANSFORMS\u003e ::= (|\u003e \u003cTRANSFORM\u003e)*\n\u003cTRANSFORM\u003e  ::= \u003cSELECT\u003e\n              |  \u003cPROJECT\u003e\n              |  \u003cGROUP\u003e\n              |  \u003cSORT\u003e (BY \u003cIDS\u003e)? (ASC|DESC)\n              |  LIMIT \u003cNUM\u003e\n              |  DISTINC\n              \n\u003cSELECT\u003e ::= SELECT \u003cEXPR\u003e\n\n\u003cPROJECT\u003e ::= PROJECT \u003cFUNCS\u003e\n\u003cFUNC\u003e    ::= \u003cID\u003e\n           |  \u003cEXPR\u003e -\u003e \u003cID\u003e\n           \n\u003cGROUP\u003e    ::= GROUP \u003cAGGS\u003e (BY \u003cIDS\u003e)?\n\u003cAGG\u003e      ::= \u003cAGG_FUNC\u003e -\u003e \u003cID\u003e\n\u003cAGG_FUNC\u003e ::= \\#\u003cID\u003e(\u003cID\u003e)\n\n\u003cEXPR\u003e    ::= \u003cFACTOR\u003e\n           |  \u003cFACTOR\u003e \u003cOP\u003e \u003cEXPR\u003e\n           |  ( \u003cEXPR\u003e )\n\u003cFACTOR\u003e  ::= \u003cID\u003e            # field reference\n           |  \\#\u003cID\u003e(\u003cEXPR\u003e)  # functions\n           |  \u003cLITERAL\u003e\n\u003cLITERAL\u003e ::= (\u003cSTRING\u003e|\u003cNUM\u003e|\u003cBOOL\u003e|\u003cNULL\u003e)\n```\n\n## Shell\n\nTry Graal\n","funding_links":[],"categories":["Relational Algebra"],"sub_categories":["Custom databases"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRCHowell%2FSift","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FRCHowell%2FSift","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FRCHowell%2FSift/lists"}