{"id":28231803,"url":"https://github.com/mohammadzainabbas/database-system-architecture-project","last_synced_at":"2025-08-31T22:35:38.386Z","repository":{"id":45483131,"uuid":"428430869","full_name":"mohammadzainabbas/database-system-architecture-project","owner":"mohammadzainabbas","description":"Database System Architecture's Project - Selectivity \u0026 Join estimations for Range Types in PostgreSQL ✨","archived":false,"fork":false,"pushed_at":"2023-12-15T05:28:46.000Z","size":1385,"stargazers_count":2,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-21T04:46:05.825Z","etag":null,"topics":["c-language","database","postgresql","sql","vscode"],"latest_commit_sha":null,"homepage":"","language":"PLpgSQL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mohammadzainabbas.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-11-15T21:45:48.000Z","updated_at":"2022-04-29T02:36:22.000Z","dependencies_parsed_at":"2025-06-21T04:36:26.305Z","dependency_job_id":"3cb0c350-ad20-4180-9c43-868429c00d81","html_url":"https://github.com/mohammadzainabbas/database-system-architecture-project","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mohammadzainabbas/database-system-architecture-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammadzainabbas%2Fdatabase-system-architecture-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammadzainabbas%2Fdatabase-system-architecture-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammadzainabbas%2Fdatabase-system-architecture-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammadzainabbas%2Fdatabase-system-architecture-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mohammadzainabbas","download_url":"https://codeload.github.com/mohammadzainabbas/database-system-architecture-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mohammadzainabbas%2Fdatabase-system-architecture-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273051832,"owners_count":25037073,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-language","database","postgresql","sql","vscode"],"created_at":"2025-05-18T19:10:52.589Z","updated_at":"2025-08-31T22:35:38.380Z","avatar_url":"https://github.com/mohammadzainabbas.png","language":"PLpgSQL","readme":"### Database System Architecture Project 👨🏻‍💻\n\n\u003c/br\u003e\n\n\u003cdiv\u003e\n  \u003ca href=\"https://open.vscode.dev/mohammadzainabbas/database-system-architecture-project\" target=\"_blank\" style=\"cursor: pointer;\"\u003e \n    \u003cimg src=\"https://open.vscode.dev/badges/open-in-vscode.svg\" style=\"cursor: pointer;\"/\u003e\n  \u003c/a\u003e\n\u003c/div\u003e\n\n### Table of contents\n\n- [Introduction](#introduction)\n- [Project Overview](#project-overview)\n  * [Statistics](#statistics)\n  * [Selectivity Estimations](#selectivity-estimations)\n  * [Join Estimations](#join-estimations)\n- [Debugging Guide](#debugging-guide)\n- [Benchmark Guide](#benchmark-guide)\n\n---\n\n\u003ca id=\"introduction\" /\u003e\n\n#### 1. Introduction\n\nRange types are data types representing a range of values of some element type (called the `range's subtype`). For example, _ranges of timestamp_ might be used to represent the ranges of _time_ that a meeting room is reserved. In this case the data type is `tsrange` (short for “_timestamp range_”), and timestamp is the subtype. The subtype must have a total order so that it is well-defined whether element values are within, before, or after a range of values.\n\n---\n\n\u003ca id=\"project-overview\" /\u003e\n\n#### 2. Project Overview\n\nThe aim of this project is to improve the overall scheme of statistics collection and cardinality estimation for range types in PostgreSQL.\n\nSo, project can be divided up into three parts:\n\n- [x] Statistics\n- [x] Selectivity Estimations\n- [x] Join Estimations\n\n\u003e Note: We will be working with PostgreSQL 13 stable version for the development purpose.\n\n---\n\n\u003ca id=\"statistics\" /\u003e\n\n##### 2.1. Statistics\n\nIn the current implementation, you have `range_typeanalyze` function which is called whenever you do `vacuum analyze` on some relation/table having a range type. You can find the function definition for `range_typeanalyze` [here.](https://github.com/postgres/postgres/blob/f76fd05bae047103cb36ef5fb82137c8995142c1/src/backend/utils/adt/rangetypes_typanalyze.c?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L43)\n\n\nThis function sets few configurations/settings and sets a handle for computing range type stats\n\n```cpp\nstats-\u003ecompute_stats = compute_range_stats;\n```\n\nHere, we are setting `compute_range_stats` function as a handle to be called later for computing stats.\n\nWhen you will look into `compute_range_stats` function (see [here](https://github.com/postgres/postgres/blob/f76fd05bae047103cb36ef5fb82137c8995142c1/src/backend/utils/adt/rangetypes_typanalyze.c?_pjax=%23js-repo-pjax-container%2C%20div%5Bitemtype%3D%22http%3A%2F%2Fschema.org%2FSoftwareSourceCode%22%5D%20main%2C%20%5Bdata-pjax-container%5D#L97)), you will notice that three different kinds of histograms are being calculated.\n\n- _Lower Bound Histogram_\n- _Upper Bound Histogram_\n- _Length based Histogram_\n\nAnd we store all these into the `pg_statistics` for later usage.\n\n\u003e Note: These stats are usually calculated once per million or so inserts. Or when you do `vacuum analyze` statement.\n\nThe first goal of our project is to determine which statistics should we calculate during the analysis phase. Do we need something extra for doing proper/better estimations for selectivity and joins for range type. \n\n---\n\n\u003ca id=\"selectivity-estimations\" /\u003e\n\n##### 2.2. Selectivity Estimations\n\nThe second goal of our project is to implement selectivity estimation functions for the _overlaps_ `\u0026\u0026` and the _strictly left_ of `\u003c\u003c` predicates/operators.\n\n---\n\n\u003ca id=\"join-estimations\" /\u003e\n\n##### 2.3. Join Estimations\n\nThe final and main goal of our project is to implement join cardinality estimation for the overlaps `\u0026\u0026` predicate/operator for the range type.\n\n---\n\n\u003ca id=\"debugging-guide\" /\u003e\n\n#### 3. Debugging Guide\n\nPlease refer to [debugging guide](https://github.com/mohammadzainabbas/database-system-architecture-project/blob/main/docs/DEBUG.md) for more details.\n\n---\n\n\u003ca id=\"benchmark-guide\" /\u003e\n\n#### 4. Benchmark Guide\n\nTo run the benchmarks on different `range_type`, follow the below mentioned steps:\n\n1. Clone this repo\n\n```bash\ngit clone https://github.com/mohammadzainabbas/database-system-architecture-project.git\ncd database-system-architecture-project\n```\n\n2. Run the benchmark script\n```bash\nsh scripts/run_benchmark.sh -d test\n```\n\n\u003e Note: Replace `test` with the name of your database\n\n\u003e Note: If you see an error `Binary 'psql' not found'`, run the following command and re-try:\n\n```bash\necho 'export PATH=\"/usr/local/pgsql/bin:$PATH\"' \u003e\u003e ~/.bashrc \u0026\u0026 source ~/.bashrc\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammadzainabbas%2Fdatabase-system-architecture-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmohammadzainabbas%2Fdatabase-system-architecture-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmohammadzainabbas%2Fdatabase-system-architecture-project/lists"}