{"id":20813966,"url":"https://github.com/alexiszamanidis/sql_query_executor","last_synced_at":"2026-04-09T22:00:07.891Z","repository":{"id":103980644,"uuid":"240541742","full_name":"alexiszamanidis/sql_query_executor","owner":"alexiszamanidis","description":"A Parallel SQL Query Executor that parses and executes SQL queries using a Thread pool. It also rearranges the predicates by frequency to reduce execution time.","archived":false,"fork":false,"pushed_at":"2020-08-24T00:37:35.000Z","size":6322,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-17T05:40:29.551Z","etag":null,"topics":["bash","c","cpp","parallelization","sql-query-executor","thread-pool"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alexiszamanidis.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-02-14T15:41:03.000Z","updated_at":"2023-07-02T12:08:32.000Z","dependencies_parsed_at":"2023-04-28T01:46:10.624Z","dependency_job_id":null,"html_url":"https://github.com/alexiszamanidis/sql_query_executor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/alexiszamanidis/sql_query_executor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexiszamanidis%2Fsql_query_executor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexiszamanidis%2Fsql_query_executor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexiszamanidis%2Fsql_query_executor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexiszamanidis%2Fsql_query_executor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alexiszamanidis","download_url":"https://codeload.github.com/alexiszamanidis/sql_query_executor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alexiszamanidis%2Fsql_query_executor/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267617643,"owners_count":24116208,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-29T02:00:12.549Z","response_time":2574,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bash","c","cpp","parallelization","sql-query-executor","thread-pool"],"created_at":"2024-11-17T21:09:02.985Z","updated_at":"2026-04-09T22:00:02.816Z","avatar_url":"https://github.com/alexiszamanidis.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"## SQL Query Executor\r\n\r\nIn this Project we implemented a Parallel SQL Query Executor that parses and executes SQL queries. \r\nIt also rearranges the predicates by frequency to reduce execution time.\r\n\r\n### Methology\r\n\r\nAt the beginning you get a number of file paths, each one from which it has the data of each relation. \r\nThe data of each relation should read and stored in memory. For this reason you will need to keep one array with as many \r\ncells as the relations that initially come to you. Each cell in this array will hold a metadata such as the row number of this \r\narray and the number of columns. She will also keep a board that will have pointers in her memory location of each of its columns. \r\nThe first cell of the array should hold the information for the first relation the second for the second etc. When we refer in \r\nrelation 0 we will refer to the relation that is stored in cell 0 etc. The representation of this structure is shown in the figure below:\r\n\r\n#### File Array\r\n\r\n![file_array](https://user-images.githubusercontent.com/48658768/89515187-75aa5f00-d7df-11ea-966a-e738b5ab4e82.png)\r\n\r\n#### Query Parsing\r\nAfter loading all relations in memory, we parse and execute queries per batches.\r\n\r\n**Query input format example**\r\n\r\nRelations | Predicates | Projections\r\n```c\r\n0 2 4|0.1=1.2\u00261.0=2.1\u00260.1\u003e3000|0.0 1.1\r\n```\r\n\r\nTranslate to SQL:\r\n```sql\r\nSELECT SUM(\"0\".c0), SUM(\"1\".c1)\r\nFROM r0 \"0\", r2 \"1\", r4 \"2\"\r\nWHERE 0.c1=1.c2 and 1.c0=2.c1 and 0.c1\u003e3000\r\n```\r\n\r\n#### Sort Merge Join\r\n\r\nSort Merge Join's idea is to sort the relations as to the key of their coupling and then the coupling to be done with simple \r\nmerge as we will have a passage of sorted tables.\r\n\r\nAt the Sort stage, the relations will be sorted by the following radix-sort method. The data from the relations will be \r\nanalyzed **in buckets the same size as the L1-cache of the processor (note the maximum size 64 KB)**. The new table will result from \r\nthe following procedure: We commit a new one table the size of the original. In this table we will store the data in series of \r\neach bucket. To do this we need to know where in the new table start the data of each bucket. For this reason we create a table (histogram) \r\n256 positions where in each position we hold the number of elements that are in this bucket. Then we will create its cumulative \r\nhistogram which will show the position where each bucket will start. Having this histogram we can write the data in its correct \r\nplace with a passage of the original table new table.\r\n\r\n**This process will be repeated** for each bucket retrospectively until the bucket size isis less than desired (64KB). When this happens, \r\neach bucket can be sortedusing an efficient sort method (quicksort).\r\n\r\n**In the end** we will have two fully sorted relations in terms of the key to the link (join key). To get the final results, \r\nwe have to scan them in parallel data of the two relations and extract the final results.\r\n\r\n#### Results\r\n\r\n**Because we do not know in advance the volume of results**, the results of the link will be written to a new structure which is **in the form of a queue**. \r\n**Each bucket on the queue will hold a 1MB table and a pointer to the next bucket**. The logic is that we write data to a table. Once the table is full \r\nthen we will introduce a new one bucket in the queue and so on. This structure will have the form shown in the figure below: \r\n\r\n![results](https://user-images.githubusercontent.com/48658768/89527981-ee66e680-d7f2-11ea-9c14-12cfbed4de94.png)\r\n\r\n### Optimizations\r\n\r\n**Reordeing queries**: We execute the filters first, then execute relations that were used in filters or that appear more often.\r\n\r\n**Parallel Query Execution**: As we said before we parse and execute queries per batches, so main thread pushes all batch \r\nqueries to the Queue.\r\n\r\n**Parallel Histogram Calculation**: We break the Histogram into two pieces and push them to the Queue. We tried different numbers, \r\nfor example to break the histogram each time with the number of threads in the thread pool, but it was not optimal for our application, \r\nbecause calculating the histogram is not a big deal.\r\n\r\n**Parallel Sorting between 2 relations that are joined**: The two relations are not interdependent. Thus, we can sort \r\neach one separately. If any of these were executed in exactly the previous predicate, then we do not push it to the Queue \r\nbecause it has already been sorted and saved in intermediate results.\r\n\r\n**Parallel Join between elements with the same prefix**: We take the prefixes of each relation and push a Join Job \r\nif the prefixes match.\r\n\r\n### Execution Instructions\r\n\r\nΑn example of how you can run the program\r\n\r\n```c\r\nmake\r\n./join 1 \u003c ../inputfiles/inputfile_small\r\n```\r\n\r\nOf course you can change the parameter of the threads as well as the input.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexiszamanidis%2Fsql_query_executor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falexiszamanidis%2Fsql_query_executor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falexiszamanidis%2Fsql_query_executor/lists"}