{"id":28274141,"url":"https://github.com/clflushopt/sigmod-2025","last_synced_at":"2025-07-17T23:32:45.993Z","repository":{"id":292408795,"uuid":"970222400","full_name":"clflushopt/sigmod-2025","owner":"clflushopt","description":"My solution to the SIGMOD 2025 contest (non-registered).","archived":false,"fork":false,"pushed_at":"2025-05-09T19:49:43.000Z","size":409,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-16T05:44:41.595Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/clflushopt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-21T17:13:47.000Z","updated_at":"2025-05-09T19:49:48.000Z","dependencies_parsed_at":"2025-05-09T20:35:13.914Z","dependency_job_id":null,"html_url":"https://github.com/clflushopt/sigmod-2025","commit_stats":null,"previous_names":["clflushopt/sigmod-2025"],"tags_count":0,"template":false,"template_full_name":"transactionalblog/sigmod-contest-2025","purl":"pkg:github/clflushopt/sigmod-2025","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clflushopt%2Fsigmod-2025","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clflushopt%2Fsigmod-2025/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clflushopt%2Fsigmod-2025/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clflushopt%2Fsigmod-2025/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/clflushopt","download_url":"https://codeload.github.com/clflushopt/sigmod-2025/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/clflushopt%2Fsigmod-2025/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265678530,"owners_count":23810114,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-21T01:16:23.868Z","updated_at":"2025-07-17T23:32:45.969Z","avatar_url":"https://github.com/clflushopt.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SIGMOD Contest 2025 (Solved)\n\nThis repository contains my solutions for the SIGMOD Contest 2025.\n\nAll timings were recorded on an AMD Zen 5 9950x (16 cores) see `hardware__talos.h`\nfor a full description of the hardware capabilities.\n\n```\nAll queries succeeded: true\nTotal Runtime in ms: 18249 ms\n```\n\n## Approach\n\nThe approach was to keep all processing in columnar format unlike what is done in\nthe baseline solution with on-demand de-serialization. For the core hash-join I again\nkept it simple, using a parallel fixed-size partitioned implementation of both the build\nand probe phases even with `std::unordered_map` this scored a 27x performance gain getting\nus down to a total 20 second of runtime, I followed this by replacing it with a dense hash-map\nimplementation (note that I didn't use the parallel_hashmap implementation).\n\nInitially I implemented the hashmap described in [Simple, Efficient and Robust Hash Tables for Join Processing](https://db.in.tum.de/~birler/papers/hashtable.pdf)\nbut because the paper isn't quite clear on some implementation details (I didn't implement the pre-computed directory lookups for e.g)\nthe performance was almost the same, so I resorted to using an off the shelf implementation\ninstead (note I only spent a couple weekends worth of time on this).\n\nOverall I am quite happy with the current implementation and since the leaderboard was down\nfrom May 7th I didn't feel like spending more time on this (but I am super eager to see the\nresults announced on June's SIGMOD 2025 conference).\n\nHere's a list of avenues I didn't explore that can be worth-while to investigate :\n\n- Work-stealing thread pool: Currently we spawn one thread per partition (the number of partitions is the number of cores)\n  and we join the threads to collect the individual partitions before merging them, instead\n  we could spawn the same number of threads but use smaller partitions that can fit in cache\n  and implement work stealing a la HyPer (this approach is described in the [Morsel-Driven Parallelism](https://db.in.tum.de/~leis/papers/morsels.pdf) paper).\n  Note that I didn't benchmark this on a single socket machine and the paper's system was designed\n  for multi-socket machines but it is worth approaching nonetheless.\n\n- Bloom filters: When building the partitions one can also build a bloom filter to speed-up\n  the probe phase, I didn't look at what the data distribution for the actual joins look like\n  so I am not sure what the gains would be here.\n\n- Radix Joins: The reason I didn't consider Radix Joins is because they are more complicated\n  to implement correctly I also don't know if they are currently used in some popular systems\n  that I could use to benchmark on first.\n\n- Adaptive Joins: In theory we could decide on a different join approach (Sort Merge, Loop Join...) at\n  runtime once we know what the data looks like some could be faster than using a Hash Join.\n\n## Changes\n\nThis is a list of changes (divergences) from the main challenge repository.\n\n- Add hardware definition file for the machine I used.\n- Running `clang-format` on all files.\n- Fixed issue due to a missing include for `hardware.h` (replaced by `hardware__talos.h`).\n\n## Baseline\n\n```\n$ time ./build/run plans.json\nQuery 1a \u003e\u003e              Runtime: 1837 ms - Result correct: true\nQuery 1b \u003e\u003e              Runtime: 963 ms - Result correct: true\nQuery 1c \u003e\u003e              Runtime: 354 ms - Result correct: true\nQuery 1d \u003e\u003e              Runtime: 1374 ms - Result correct: true\nQuery 2a \u003e\u003e              Runtime: 2593 ms - Result correct: true\nQuery 2b \u003e\u003e              Runtime: 2383 ms - Result correct: true\nQuery 2c \u003e\u003e              Runtime: 2315 ms - Result correct: true\nQuery 2d \u003e\u003e              Runtime: 2842 ms - Result correct: true\nQuery 3a \u003e\u003e              Runtime: 1255 ms - Result correct: true\nQuery 3b \u003e\u003e              Runtime: 816 ms - Result correct: true\nQuery 3c \u003e\u003e              Runtime: 1658 ms - Result correct: true\nQuery 4a \u003e\u003e              Runtime: 1434 ms - Result correct: true\nQuery 4b \u003e\u003e              Runtime: 858 ms - Result correct: true\nQuery 4c \u003e\u003e              Runtime: 1767 ms - Result correct: true\nQuery 5a \u003e\u003e              Runtime: 543 ms - Result correct: true\nQuery 5b \u003e\u003e              Runtime: 341 ms - Result correct: true\nQuery 5c \u003e\u003e              Runtime: 1234 ms - Result correct: true\nQuery 6a \u003e\u003e              Runtime: 14609 ms - Result correct: true\nQuery 6b \u003e\u003e              Runtime: 12257 ms - Result correct: true\nQuery 6c \u003e\u003e              Runtime: 12120 ms - Result correct: true\nQuery 6d \u003e\u003e              Runtime: 13744 ms - Result correct: true\nQuery 6e \u003e\u003e              Runtime: 13765 ms - Result correct: true\nQuery 6f \u003e\u003e              Runtime: 15426 ms - Result correct: true\nQuery 7a \u003e\u003e              Runtime: 13434 ms - Result correct: true\nQuery 7b \u003e\u003e              Runtime: 11958 ms - Result correct: true\nQuery 7c \u003e\u003e              Runtime: 13523 ms - Result correct: true\nQuery 8a \u003e\u003e              Runtime: 2342 ms - Result correct: true\nQuery 8b \u003e\u003e              Runtime: 651 ms - Result correct: true\nQuery 8c \u003e\u003e              Runtime: 16555 ms - Result correct: true\nQuery 8d \u003e\u003e              Runtime: 15376 ms - Result correct: true\nQuery 9a \u003e\u003e              Runtime: 2771 ms - Result correct: true\nQuery 9b \u003e\u003e              Runtime: 2303 ms - Result correct: true\nQuery 9c \u003e\u003e              Runtime: 4154 ms - Result correct: true\nQuery 9d \u003e\u003e              Runtime: 4902 ms - Result correct: true\nQuery 10a \u003e\u003e             Runtime: 2480 ms - Result correct: true\nQuery 10b \u003e\u003e             Runtime: 2673 ms - Result correct: true\nQuery 10c \u003e\u003e             Runtime: 3619 ms - Result correct: true\nQuery 11a \u003e\u003e             Runtime: 1574 ms - Result correct: true\nQuery 11b \u003e\u003e             Runtime: 888 ms - Result correct: true\nQuery 11c \u003e\u003e             Runtime: 2394 ms - Result correct: true\nQuery 11d \u003e\u003e             Runtime: 2757 ms - Result correct: true\nQuery 12a \u003e\u003e             Runtime: 999 ms - Result correct: true\nQuery 12b \u003e\u003e             Runtime: 5668 ms - Result correct: true\nQuery 12c \u003e\u003e             Runtime: 1567 ms - Result correct: true\nQuery 13a \u003e\u003e             Runtime: 7351 ms - Result correct: true\nQuery 13b \u003e\u003e             Runtime: 5019 ms - Result correct: true\nQuery 13c \u003e\u003e             Runtime: 5113 ms - Result correct: true\nQuery 13d \u003e\u003e             Runtime: 8276 ms - Result correct: true\nQuery 14a \u003e\u003e             Runtime: 1252 ms - Result correct: true\nQuery 14b \u003e\u003e             Runtime: 935 ms - Result correct: true\nQuery 14c \u003e\u003e             Runtime: 1819 ms - Result correct: true\nQuery 15a \u003e\u003e             Runtime: 1990 ms - Result correct: true\nQuery 15b \u003e\u003e             Runtime: 1445 ms - Result correct: true\nQuery 15c \u003e\u003e             Runtime: 2683 ms - Result correct: true\nQuery 15d \u003e\u003e             Runtime: 2726 ms - Result correct: true\nQuery 16a \u003e\u003e             Runtime: 14995 ms - Result correct: true\nQuery 16b \u003e\u003e             Runtime: 17242 ms - Result correct: true\nQuery 16c \u003e\u003e             Runtime: 14447 ms - Result correct: true\nQuery 16d \u003e\u003e             Runtime: 14539 ms - Result correct: true\nQuery 17a \u003e\u003e             Runtime: 14593 ms - Result correct: true\nQuery 17b \u003e\u003e             Runtime: 14245 ms - Result correct: true\nQuery 17c \u003e\u003e             Runtime: 13983 ms - Result correct: true\nQuery 17d \u003e\u003e             Runtime: 13948 ms - Result correct: true\nQuery 17e \u003e\u003e             Runtime: 15813 ms - Result correct: true\nQuery 17f \u003e\u003e             Runtime: 14637 ms - Result correct: true\nQuery 18a \u003e\u003e             Runtime: 7003 ms - Result correct: true\nQuery 18b \u003e\u003e             Runtime: 1503 ms - Result correct: true\nQuery 18c \u003e\u003e             Runtime: 2868 ms - Result correct: true\nQuery 19a \u003e\u003e             Runtime: 2643 ms - Result correct: true\nQuery 19b \u003e\u003e             Runtime: 1890 ms - Result correct: true\nQuery 19c \u003e\u003e             Runtime: 3405 ms - Result correct: true\nQuery 19d \u003e\u003e             Runtime: 8991 ms - Result correct: true\nQuery 20a \u003e\u003e             Runtime: 14085 ms - Result correct: true\nQuery 20b \u003e\u003e             Runtime: 12573 ms - Result correct: true\nQuery 20c \u003e\u003e             Runtime: 14521 ms - Result correct: true\nQuery 21a \u003e\u003e             Runtime: 2195 ms - Result correct: true\nQuery 21b \u003e\u003e             Runtime: 1721 ms - Result correct: true\nQuery 21c \u003e\u003e             Runtime: 2156 ms - Result correct: true\nQuery 22a \u003e\u003e             Runtime: 1800 ms - Result correct: true\nQuery 22b \u003e\u003e             Runtime: 1665 ms - Result correct: true\nQuery 22c \u003e\u003e             Runtime: 1994 ms - Result correct: true\nQuery 22d \u003e\u003e             Runtime: 2303 ms - Result correct: true\nQuery 23a \u003e\u003e             Runtime: 2162 ms - Result correct: true\nQuery 23b \u003e\u003e             Runtime: 2215 ms - Result correct: true\nQuery 23c \u003e\u003e             Runtime: 2387 ms - Result correct: true\nQuery 24a \u003e\u003e             Runtime: 3494 ms - Result correct: true\nQuery 24b \u003e\u003e             Runtime: 3210 ms - Result correct: true\nQuery 25a \u003e\u003e             Runtime: 3093 ms - Result correct: true\nQuery 25b \u003e\u003e             Runtime: 2062 ms - Result correct: true\nQuery 25c \u003e\u003e             Runtime: 3334 ms - Result correct: true\nQuery 26a \u003e\u003e             Runtime: 14454 ms - Result correct: true\nQuery 26b \u003e\u003e             Runtime: 14347 ms - Result correct: true\nQuery 26c \u003e\u003e             Runtime: 15066 ms - Result correct: true\nQuery 27a \u003e\u003e             Runtime: 2135 ms - Result correct: true\nQuery 27b \u003e\u003e             Runtime: 857 ms - Result correct: true\nQuery 27c \u003e\u003e             Runtime: 2141 ms - Result correct: true\nQuery 28a \u003e\u003e             Runtime: 2214 ms - Result correct: true\nQuery 28b \u003e\u003e             Runtime: 1547 ms - Result correct: true\nQuery 28c \u003e\u003e             Runtime: 1898 ms - Result correct: true\nQuery 29a \u003e\u003e             Runtime: 3340 ms - Result correct: true\nQuery 29b \u003e\u003e             Runtime: 3371 ms - Result correct: true\nQuery 29c \u003e\u003e             Runtime: 4896 ms - Result correct: true\nQuery 30a \u003e\u003e             Runtime: 2837 ms - Result correct: true\nQuery 30b \u003e\u003e             Runtime: 1989 ms - Result correct: true\nQuery 30c \u003e\u003e             Runtime: 3487 ms - Result correct: true\nQuery 31a \u003e\u003e             Runtime: 3630 ms - Result correct: true\nQuery 31b \u003e\u003e             Runtime: 1943 ms - Result correct: true\nQuery 31c \u003e\u003e             Runtime: 4906 ms - Result correct: true\nQuery 32a \u003e\u003e             Runtime: 3207 ms - Result correct: true\nQuery 32b \u003e\u003e             Runtime: 3278 ms - Result correct: true\nQuery 33a \u003e\u003e             Runtime: 3654 ms - Result correct: true\nQuery 33b \u003e\u003e             Runtime: 3016 ms - Result correct: true\nQuery 33c \u003e\u003e             Runtime: 4185 ms - Result correct: true\n\nreal    11m7.193s\nuser    12m19.108s\nsys     2m47.299s\n```\n\nMost of the low hanging fruits performance wise are to exploit columnar format and use\na parallel hash join in this case `std::unordered_map` scores a 20 second total runtime\nreplacing it with an off the shelf `flat_hash_map` implementation Abseil-like gets us down\nto 18 seconds on the same machine.\n\n```\nclflushopt@workstation:~/Projects/Experiments/sigmod-2025$ ./build/run plans.json\nQuery 1a \u003e\u003e              Runtime: 97 ms - Result correct: true\nQuery 1b \u003e\u003e              Runtime: 72 ms - Result correct: true\nQuery 1c \u003e\u003e              Runtime: 13 ms - Result correct: true\nQuery 1d \u003e\u003e              Runtime: 92 ms - Result correct: true\nQuery 2a \u003e\u003e              Runtime: 98 ms - Result correct: true\nQuery 2b \u003e\u003e              Runtime: 107 ms - Result correct: true\nQuery 2c \u003e\u003e              Runtime: 88 ms - Result correct: true\nQuery 2d \u003e\u003e              Runtime: 103 ms - Result correct: true\nQuery 3a \u003e\u003e              Runtime: 41 ms - Result correct: true\nQuery 3b \u003e\u003e              Runtime: 17 ms - Result correct: true\nQuery 3c \u003e\u003e              Runtime: 56 ms - Result correct: true\nQuery 4a \u003e\u003e              Runtime: 50 ms - Result correct: true\nQuery 4b \u003e\u003e              Runtime: 17 ms - Result correct: true\nQuery 4c \u003e\u003e              Runtime: 71 ms - Result correct: true\nQuery 5a \u003e\u003e              Runtime: 23 ms - Result correct: true\nQuery 5b \u003e\u003e              Runtime: 10 ms - Result correct: true\nQuery 5c \u003e\u003e              Runtime: 46 ms - Result correct: true\nQuery 6a \u003e\u003e              Runtime: 130 ms - Result correct: true\nQuery 6b \u003e\u003e              Runtime: 121 ms - Result correct: true\nQuery 6c \u003e\u003e              Runtime: 116 ms - Result correct: true\nQuery 6d \u003e\u003e              Runtime: 156 ms - Result correct: true\nQuery 6e \u003e\u003e              Runtime: 152 ms - Result correct: true\nQuery 6f \u003e\u003e              Runtime: 352 ms - Result correct: true\nQuery 7a \u003e\u003e              Runtime: 482 ms - Result correct: true\nQuery 7b \u003e\u003e              Runtime: 192 ms - Result correct: true\nQuery 7c \u003e\u003e              Runtime: 922 ms - Result correct: true\nQuery 8a \u003e\u003e              Runtime: 87 ms - Result correct: true\nQuery 8b \u003e\u003e              Runtime: 25 ms - Result correct: true\nQuery 8c \u003e\u003e              Runtime: 631 ms - Result correct: true\nQuery 8d \u003e\u003e              Runtime: 431 ms - Result correct: true\nQuery 9a \u003e\u003e              Runtime: 141 ms - Result correct: true\nQuery 9b \u003e\u003e              Runtime: 112 ms - Result correct: true\nQuery 9c \u003e\u003e              Runtime: 336 ms - Result correct: true\nQuery 9d \u003e\u003e              Runtime: 479 ms - Result correct: true\nQuery 10a \u003e\u003e             Runtime: 100 ms - Result correct: true\nQuery 10b \u003e\u003e             Runtime: 228 ms - Result correct: true\nQuery 10c \u003e\u003e             Runtime: 354 ms - Result correct: true\nQuery 11a \u003e\u003e             Runtime: 56 ms - Result correct: true\nQuery 11b \u003e\u003e             Runtime: 16 ms - Result correct: true\nQuery 11c \u003e\u003e             Runtime: 127 ms - Result correct: true\nQuery 11d \u003e\u003e             Runtime: 121 ms - Result correct: true\nQuery 12a \u003e\u003e             Runtime: 26 ms - Result correct: true\nQuery 12b \u003e\u003e             Runtime: 608 ms - Result correct: true\nQuery 12c \u003e\u003e             Runtime: 52 ms - Result correct: true\nQuery 13a \u003e\u003e             Runtime: 810 ms - Result correct: true\nQuery 13b \u003e\u003e             Runtime: 66 ms - Result correct: true\nQuery 13c \u003e\u003e             Runtime: 67 ms - Result correct: true\nQuery 13d \u003e\u003e             Runtime: 705 ms - Result correct: true\nQuery 14a \u003e\u003e             Runtime: 55 ms - Result correct: true\nQuery 14b \u003e\u003e             Runtime: 17 ms - Result correct: true\nQuery 14c \u003e\u003e             Runtime: 61 ms - Result correct: true\nQuery 15a \u003e\u003e             Runtime: 45 ms - Result correct: true\nQuery 15b \u003e\u003e             Runtime: 26 ms - Result correct: true\nQuery 15c \u003e\u003e             Runtime: 62 ms - Result correct: true\nQuery 15d \u003e\u003e             Runtime: 75 ms - Result correct: true\nQuery 16a \u003e\u003e             Runtime: 238 ms - Result correct: true\nQuery 16b \u003e\u003e             Runtime: 631 ms - Result correct: true\nQuery 16c \u003e\u003e             Runtime: 158 ms - Result correct: true\nQuery 16d \u003e\u003e             Runtime: 192 ms - Result correct: true\nQuery 17a \u003e\u003e             Runtime: 320 ms - Result correct: true\nQuery 17b \u003e\u003e             Runtime: 131 ms - Result correct: true\nQuery 17c \u003e\u003e             Runtime: 76 ms - Result correct: true\nQuery 17d \u003e\u003e             Runtime: 85 ms - Result correct: true\nQuery 17e \u003e\u003e             Runtime: 276 ms - Result correct: true\nQuery 17f \u003e\u003e             Runtime: 252 ms - Result correct: true\nQuery 18a \u003e\u003e             Runtime: 654 ms - Result correct: true\nQuery 18b \u003e\u003e             Runtime: 32 ms - Result correct: true\nQuery 18c \u003e\u003e             Runtime: 131 ms - Result correct: true\nQuery 19a \u003e\u003e             Runtime: 26 ms - Result correct: true\nQuery 19b \u003e\u003e             Runtime: 10 ms - Result correct: true\nQuery 19c \u003e\u003e             Runtime: 53 ms - Result correct: true\nQuery 19d \u003e\u003e             Runtime: 575 ms - Result correct: true\nQuery 20a \u003e\u003e             Runtime: 215 ms - Result correct: true\nQuery 20b \u003e\u003e             Runtime: 154 ms - Result correct: true\nQuery 20c \u003e\u003e             Runtime: 267 ms - Result correct: true\nQuery 21a \u003e\u003e             Runtime: 49 ms - Result correct: true\nQuery 21b \u003e\u003e             Runtime: 51 ms - Result correct: true\nQuery 21c \u003e\u003e             Runtime: 74 ms - Result correct: true\nQuery 22a \u003e\u003e             Runtime: 61 ms - Result correct: true\nQuery 22b \u003e\u003e             Runtime: 56 ms - Result correct: true\nQuery 22c \u003e\u003e             Runtime: 86 ms - Result correct: true\nQuery 22d \u003e\u003e             Runtime: 84 ms - Result correct: true\nQuery 23a \u003e\u003e             Runtime: 52 ms - Result correct: true\nQuery 23b \u003e\u003e             Runtime: 54 ms - Result correct: true\nQuery 23c \u003e\u003e             Runtime: 64 ms - Result correct: true\nQuery 24a \u003e\u003e             Runtime: 102 ms - Result correct: true\nQuery 24b \u003e\u003e             Runtime: 94 ms - Result correct: true\nQuery 25a \u003e\u003e             Runtime: 133 ms - Result correct: true\nQuery 25b \u003e\u003e             Runtime: 65 ms - Result correct: true\nQuery 25c \u003e\u003e             Runtime: 147 ms - Result correct: true\nQuery 26a \u003e\u003e             Runtime: 330 ms - Result correct: true\nQuery 26b \u003e\u003e             Runtime: 254 ms - Result correct: true\nQuery 26c \u003e\u003e             Runtime: 397 ms - Result correct: true\nQuery 27a \u003e\u003e             Runtime: 47 ms - Result correct: true\nQuery 27b \u003e\u003e             Runtime: 21 ms - Result correct: true\nQuery 27c \u003e\u003e             Runtime: 74 ms - Result correct: true\nQuery 28a \u003e\u003e             Runtime: 97 ms - Result correct: true\nQuery 28b \u003e\u003e             Runtime: 49 ms - Result correct: true\nQuery 28c \u003e\u003e             Runtime: 72 ms - Result correct: true\nQuery 29a \u003e\u003e             Runtime: 47 ms - Result correct: true\nQuery 29b \u003e\u003e             Runtime: 44 ms - Result correct: true\nQuery 29c \u003e\u003e             Runtime: 153 ms - Result correct: true\nQuery 30a \u003e\u003e             Runtime: 102 ms - Result correct: true\nQuery 30b \u003e\u003e             Runtime: 67 ms - Result correct: true\nQuery 30c \u003e\u003e             Runtime: 144 ms - Result correct: true\nQuery 31a \u003e\u003e             Runtime: 135 ms - Result correct: true\nQuery 31b \u003e\u003e             Runtime: 64 ms - Result correct: true\nQuery 31c \u003e\u003e             Runtime: 211 ms - Result correct: true\nQuery 32a \u003e\u003e             Runtime: 125 ms - Result correct: true\nQuery 32b \u003e\u003e             Runtime: 126 ms - Result correct: true\nQuery 33a \u003e\u003e             Runtime: 134 ms - Result correct: true\nQuery 33b \u003e\u003e             Runtime: 113 ms - Result correct: true\nQuery 33c \u003e\u003e             Runtime: 195 ms - Result correct: true\nAll queries succeeded: true\nTotal Runtime in ms: 18249 ms    \n```\n\n## Task\n\nGiven the joining pipeline and the pre-filtered input data, your task is to implement an efficient joining algorithm to accelerate the execution time of the joining pipeline. Specifically, you need to implement the following function in `src/execute.cpp`:\n\n```C++\nColumnarTable execute(const Plan\u0026 plan, void* context);\n```\n\nOptionally, you can implement these two functions as well to prepare any global context (e.g., thread pool) to accelerate the execution.\n\n```C++\nvoid* build_context();\nvoid destroy_context(void*);\n```\n\n### Input format\n\nThe input plan in the above function is defined as the following struct.\n\n```C++\nstruct ScanNode {\n    size_t base_table_id;\n};\n\nstruct JoinNode {\n    bool   build_left;\n    size_t left;\n    size_t right;\n    size_t left_attr;\n    size_t right_attr;\n};\n\nstruct PlanNode {\n    std::variant\u003cScanNode, JoinNode\u003e          data;\n    std::vector\u003cstd::tuple\u003csize_t, DataType\u003e\u003e output_attrs;\n};\n\nstruct Plan {\n    std::vector\u003cPlanNode\u003e      nodes;\n    std::vector\u003cColumnarTable\u003e inputs;\n    size_t root;\n}\n```\n\n**Scan**:\n- The `base_table_id` member refers to which input table in the `inputs` member of a plan is used by the Scan node.\n- Each item in the `output_attrs` indicates which column in the base table should be output and what type it is.\n\n**Join**:\n- The `build_left` member refers to which side the hash table should be built on, where `true` indicates building the hash table on the left child, and `false` indicates the opposite.\n- The `left` and `right` members are the indexes of the left and right child of the Join node in the `nodes` member of a plan, respectively.\n- The `left_attr` and `right_attr` members are the join condition of Join node. Supposing that there are two records, `left_record` and `right_record`, from the intermediate results of the left and right child, respectively. The members indicate that the two records should be joined when `left_record[left_attr] == right_record[right_attr]`.\n- Each item in the `output_attrs` indicates which column in the result of children should be output and what type it is. Supposing that the left child has $n_l$ columns and the right child has $n_r$ columns, the value of the index $i \\in \\{0, \\dots, n_l + n_r - 1\\}$, where the ranges $\\{0, \\dots, n_l - 1\\}$ and $\\{n_l, \\dots, n_l + n_r - 1\\}$ indicate the output column is from left and right child respectively.\n\n**Root**: The `root` member of a plan indicates which node is the root node of the execution plan tree.\n\n### Data format\n\nThe input and output data both follow a simple columnar data format.\n\n```C++\nenum class DataType {\n    INT32,       // 4-byte integer\n    INT64,       // 8-byte integer\n    FP64,        // 8-byte floating point\n    VARCHAR,     // string of arbitary length\n};\n\nconstexpr size_t PAGE_SIZE = 8192;\n\nstruct alignas(8) Page {\n    std::byte data[PAGE_SIZE];\n};\n\nstruct Column {\n    DataType           type;\n    std::vector\u003cPage*\u003e pages;\n};\n\nstruct ColumnarTable {\n    size_t              num_rows;\n    std::vector\u003cColumn\u003e columns;\n};\n```\n\nA `ColumnarTable` first stores how many rows the table has in the `num_rows` member, then stores each column seperately as a `Column`. Each `Column` has a type and stores the items of the column into several pages. Each page is of 8192 bytes. In each page:\n\n- The first 2 bytes are a `uint16_t` which is the number of rows $n_r$ in the page.\n- The following 2 bytes are a `uint16_t` which is the number of non-`NULL` values $n_v$ in the page.\n- The first $n_r$ bits in the last $\\left\\lfloor\\frac{(n_r + 7)}{8}\\right\\rfloor$ bytes is a bitmap indicating whether the corresponding row has value or is `NULL`.\n\n**Fixed-length attribute**: There are $n_v$ contiguous values begins at the first aligned position. For example, in a `Page` of `INT32`, the first value is at `data + 4`. While in a `Page` of `INT64` and `FP64`, the first value is at `data + 8`.\n\n**Variable-length attribute**: There are $n_v$ contigous offsets (`uint16_t`) begins at `data + 4` in a `Page`, followed by the content of the varchars which begins at `char_begin = data + 4 + n_r * 2`. Each offset indicates the ending offset of the corresponding `VARCHAR` with respect to the `char_begin`.\n\n**Long string**: When the length of a string is longer than `PAGE_SIZE - 7`, it can not fit in a normal page. Special pages will be used to store such string. If $n_r$ `== 0xffff` or $n_r$ `== 0xfffe`, the `Page` is a special page for long string. `0xffff` means the page is the first page of a long string and `0xfffe` means the page is the following page of a long string. The following 2 bytes is a `uint16_t` indicating the number of chars in the page, beginning at `data + 4`.\n\n## Requirement\n\n- You can only modify the file `src/execute.cpp` in the project.\n- You must not use any third-party libraries. If you are using libraries for development (e.g., for logging), ensure to remove them before the final submission.\n- The joining pipeline (including order and build side) is optimized by PostgreSQL for `Hash Join` only. However, in the `execute` function, you are free to use other algorithms and change the pipeline, as long as the result is equivalent.\n- For any struct listed above, all of there members are public. You can manipulate them in free functions as desired as long as the original files are not changed and the manipulated objects can be destructed properly.\n- Your program will be evaluated on an unpublished benchmark sampled from the original JOB benchmark. You will not be able to access the test benchmark.\n\n## Quick start\n\n\u003e [!TIP]\n\u003e Run all the following commands in the root directory of this project.\n\nFirst, download the imdb dataset.\n\n```bash\n./download_imdb.sh\n```\n\nSecond, build the project.\n\n```bash\ncmake -S . -B build -DCMAKE_BUILD_TYPE=Release -Wno-dev\ncmake --build build -- -j $(nproc)\n```\n\nThird, prepare the DuckDB database for correctness checking.\n\n```bash\n./build/build_database imdb.db\n```\n\nNow, you can run the tests:\n```bash\n./build/run plans.json\n```\n\u003e [!TIP]\n\u003e If you want to use `Ninja Multi-Config` as the generator. The commands will look like:\n\u003e\n\u003e```bash\n\u003e cmake -S . -B build -Wno-dev -G \"Ninja Multi-Config\"\n\u003e cmake --build build --config Release -- -j $(nproc)\n\u003e ./build/Release/build_database imdb.db\n\u003e ./build/Release/run plans.json\n\u003e ```\n\n# Hardware\n\nThe evaluation is automatically executed on four different servers. On multi-socket machines, the benchmarks are bound to a single socket (using `numactl -m 0 -N 0`).\n\n * **Intel #1**\n    * CPU: 4x Intel Xeon E7-4880 v2 (SMT 2, 15 cores, 30 threads)\n    * Main memory: 512 GB\n * **AMD #1**\n    * CPU: 2x AMD EPYC 7F72 (SMT 2, 24 cores, 48 threads)\n    * Main memory: 256 GB\n * **IBM #1**\n    * CPU: 8x IBM Power8 (SMT 8, 12 cores, 96 threads)\n    * Main memory: 1024 GB\n * **ARM #1**\n    * CPU: 1x Ampere Altra Max (SMT 1, 128 cores, 128 threads)\n    * Main memory: 512 GB\n\n\nFor the final evaluation after the submission deadline, four additional servers will be included. These additional servers cover the same platforms but might differ in the supported feature set as they can be significantly older or newer than the initial servers.\nAll servers run Ubuntu Linux with versions ranging from 20.04 to 24.04. Code is compiled with Clang 18.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclflushopt%2Fsigmod-2025","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclflushopt%2Fsigmod-2025","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclflushopt%2Fsigmod-2025/lists"}