{"id":20294187,"url":"https://github.com/grab/grab-query-traces","last_synced_at":"2026-03-08T13:37:21.506Z","repository":{"id":97825902,"uuid":"351312717","full_name":"grab/grab-query-traces","owner":"grab","description":null,"archived":false,"fork":false,"pushed_at":"2021-03-30T04:41:34.000Z","size":10177,"stargazers_count":11,"open_issues_count":0,"forks_count":2,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-01-14T09:36:36.998Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/grab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-25T04:52:32.000Z","updated_at":"2024-03-12T12:36:39.000Z","dependencies_parsed_at":null,"dependency_job_id":"635ac430-9daa-49cc-b11d-d597fca81519","html_url":"https://github.com/grab/grab-query-traces","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grab%2Fgrab-query-traces","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grab%2Fgrab-query-traces/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grab%2Fgrab-query-traces/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/grab%2Fgrab-query-traces/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/grab","download_url":"https://codeload.github.com/grab/grab-query-traces/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241789345,"owners_count":20020459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-14T15:28:10.328Z","updated_at":"2026-03-08T13:37:16.440Z","avatar_url":"https://github.com/grab.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Grab-Traces \u0026 TPC-DS Presto query plans\n\n## TL;DR\nGrab-Traces is (to the best of our knowledge) the largest, publicly available industry-based dataset of query plans for research. \n\nIn contrast to open sourced TPC benchmarks, Grab-Traces feature more plan diversity, based on the range of large \u0026 small query plans that are realistic to the query patterns seen in large scaled companies. All Grab-Traces query plans are based on real Presto queries executed \u0026 profiled over Grab's datalake. \n\nIn order to emphasise the difference between the query plans under the TPC benchmarks and Grab-Traces, we plotted a sample of 245,849 logical plans, obtained over 2 consecutive months in Grab, on their node count and maximum tree depth. We contrasted these plans with TPC-DS \u0026 TPC-H templates. The maximum plan (size, depth) observed was (477, 38) for TPC-H, (883, 73) for TPC-DS and (4969, 321) for Grab. \n\n![grab-traces-query-plans](Pics/grab_query_traces.png)\n\nFrom the picture, two things become clear:\n- Grab's query plans are diverse: We observed a range of very large and small plans issued to our Presto clusters\n\n- Query volumes are large: We observed many distinct queries issued to our Presto clusters. At scale, many of the existing query featurization techniques may be highly inefficient.\n\n## Dataset\nWe are releasing both our Grab-Traces \u0026 TPC-DS dataset, as part of our conference submission to Sigmod 2021.\n\nThere are 2 query plan dataset in this repository\n\nPlease see [grab-traces](Grab-Traces/)\n\nPlease see [tpc-ds](TPC-DS/)\n\n## Licensing \nAll data is subjected to the MIT open source licensing scheme. \nFor more details, please see [licensing](LICENSE)\n\n## Related Publications\n- Johan Kok Zhi Kang, Gaurav, Sien Yi Tan, Feng Cheng, Shixuan Sun, Bingsheng He. Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query Workload. ACM Sigmod 2021. \n\n## Citations\nTODO: Fill in this page once paper is published\n\n## Acknowledgement\n- Grab-NUS AI Lab","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrab%2Fgrab-query-traces","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgrab%2Fgrab-query-traces","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgrab%2Fgrab-query-traces/lists"}