{"id":19458456,"url":"https://github.com/postgrespro/aqo","last_synced_at":"2025-05-15T17:08:52.575Z","repository":{"id":38523745,"uuid":"72944971","full_name":"postgrespro/aqo","owner":"postgrespro","description":"Adaptive query optimization for PostgreSQL","archived":false,"fork":false,"pushed_at":"2025-03-18T11:45:25.000Z","size":7418,"stargazers_count":455,"open_issues_count":9,"forks_count":55,"subscribers_count":22,"default_branch":"master","last_synced_at":"2025-05-11T18:11:25.499Z","etag":null,"topics":["adaptive","optimization","postgresql","query-optimizer"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"agpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/postgrespro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2016-11-05T18:18:53.000Z","updated_at":"2025-05-01T10:40:37.000Z","dependencies_parsed_at":"2022-07-14T04:30:37.199Z","dependency_job_id":"a518a37a-76fd-442c-b86a-e4ff855724b8","html_url":"https://github.com/postgrespro/aqo","commit_stats":{"total_commits":253,"total_committers":19,"mean_commits":13.31578947368421,"dds":"0.23715415019762842","last_synced_commit":"2a99c0256f62f4cb3df5a605ecf2780a54371fa6"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Faqo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Faqo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Faqo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Faqo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/postgrespro","download_url":"https://codeload.github.com/postgrespro/aqo/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254384989,"owners_count":22062422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["adaptive","optimization","postgresql","query-optimizer"],"created_at":"2024-11-10T17:27:08.834Z","updated_at":"2025-05-15T17:08:47.564Z","avatar_url":"https://github.com/postgrespro.png","language":"C","funding_links":[],"categories":["Optimization","C"],"sub_categories":[],"readme":"# Adaptive query optimization\n\nAdaptive query optimization is the extension of standard PostgreSQL cost-based\nquery optimizer. Its basic principle is to use query execution statistics\nfor improving cardinality estimation. Experimental evaluation shows that this\nimprovement sometimes provides an enormously large speed-up for rather\ncomplicated queries.\n\n## Installation\n\nThe module works with PostgreSQL 9.6 and above.\nTo avoid compatibility issues, the following branches in the git-repository are allocated:\n* `stable9_6`.\n* `stable11` - for PG v10 and v11.\n* `stable12` - for PG v12.\n* `stable13` - for PG v13.\n* `stable14` - for PG v14.\n* `stable15` - for PG v15.\n* the `master` branch of the AQO repository correctly works with PGv15 and the PostgreSQL `master` branch.\n\nThe module contains a patch and an extension. Patch has to be applied to the\nsources of PostgresSQL. Patch affects header files, that is why PostgreSQL\nmust be rebuilt completely after applying the patch (`make clean` and\n`make install`).\nExtension has to be unpacked into contrib directory and then to be compiled and\ninstalled with `make install`.\n\n```\ncd postgresql-9.6                                                # enter postgresql source directory\ngit clone https://github.com/postgrespro/aqo.git contrib/aqo        # clone aqo into contrib\npatch -p1 --no-backup-if-mismatch \u003c contrib/aqo/aqo_pg\u003cversion\u003e.patch  # patch postgresql\nmake clean \u0026\u0026 make \u0026\u0026 make install                               # recompile postgresql\ncd contrib/aqo                                                   # enter aqo directory\nmake \u0026\u0026 make install                                             # install aqo\nmake check                                              # check whether it works correctly (optional)\n```\n\nTag `version` at the patch name corresponds to suitable PostgreSQL release.\nFor PostgreSQL 9.6 use the 'aqo_pg9_6.patch' file; PostgreSQL 10 use aqo_pg10.patch; for PostgreSQL 11 use aqo_pg11.patch and so on.\nAlso, you can see git tags at the master branch for more accurate definition of\nsuitable PostgreSQL version.\n\nIn your database:\n\n`CREATE EXTENSION aqo;`\n\nModify your postgresql.conf:\n\n`shared_preload_libraries = 'aqo'`\n\nand restart PostgreSQL.\n\nIt is essential that library is preloaded during server startup, because\nadaptive query optimization must be enabled on per-cluster basis instead\nof per-database.\n\n## Usage\n\nThe typical case is follows: you have complicated query, which executes too\nlong. `EXPLAIN ANALYZE` shows, that the possible reason is bad cardinality\nestimation.\n\nExample:\n```\n                                                                             QUERY PLAN\n--------------------------------------------------------------------------------------------------------------------------------------------------------------------\n Aggregate  (cost=15028.15..15028.16 rows=1 width=96) (actual time=8168.188..8168.189 rows=1 loops=1)\n   -\u003e  Nested Loop  (cost=8.21..15028.14 rows=1 width=48) (actual time=199.500..8167.708 rows=88 loops=1)\n         -\u003e  Nested Loop  (cost=7.78..12650.75 rows=5082 width=37) (actual time=0.682..3015.721 rows=785477 loops=1)\n               Join Filter: (t.id = ci.movie_id)\n               -\u003e  Nested Loop  (cost=7.21..12370.11 rows=148 width=41) (actual time=0.666..404.791 rows=14165 loops=1)\n                     -\u003e  Nested Loop  (cost=6.78..12235.17 rows=270 width=20) (actual time=0.645..146.855 rows=35548 loops=1)\n                           -\u003e  Seq Scan on keyword k  (cost=0.00..3632.40 rows=8 width=20) (actual time=0.126..29.117 rows=8 loops=1)\n                                 Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[]))\n                                 Rows Removed by Filter: 134162\n                           -\u003e  Bitmap Heap Scan on movie_keyword mk  (cost=6.78..1072.32 rows=303 width=8) (actual time=0.919..13.800 rows=4444 loops=8)\n                                 Recheck Cond: (keyword_id = k.id)\n                                 Heap Blocks: exact=23488\n                                 -\u003e  Bitmap Index Scan on keyword_id_movie_keyword  (cost=0.00..6.71 rows=303 width=0) (actual time=0.535..0.535 rows=4444 loops=8)\n                                       Index Cond: (keyword_id = k.id)\n                     -\u003e  Index Scan using title_pkey on title t  (cost=0.43..0.49 rows=1 width=21) (actual time=0.007..0.007 rows=0 loops=35548)\n                           Index Cond: (id = mk.movie_id)\n                           Filter: (production_year \u003e 2000)\n                           Rows Removed by Filter: 1\n               -\u003e  Index Scan using movie_id_cast_info on cast_info ci  (cost=0.56..1.47 rows=34 width=8) (actual time=0.009..0.168 rows=55 loops=14165)\n                     Index Cond: (movie_id = mk.movie_id)\n         -\u003e  Index Scan using name_pkey on name n  (cost=0.43..0.46 rows=1 width=19) (actual time=0.006..0.006 rows=0 loops=785477)\n               Index Cond: (id = ci.person_id)\n               Filter: (name ~~ '%Downey%Robert%'::text)\n               Rows Removed by Filter: 1\n Planning time: 40.047 ms\n Execution time: 8168.373 ms\n(26 rows)\n```\n\nThen you can use the following pattern:\n```\nBEGIN;\nSET aqo.mode = 'learn';\nEXPLAIN ANALYZE \u003cquery\u003e;\nRESET aqo.mode;\n-- ... do EXPLAIN ANALYZE \u003cquery\u003e while cardinality estimations in the plan are bad\n--                                      and the plan is bad\nCOMMIT;\n```\n**_Warning:_** execute query until plan stops changing!\n\nWhen the plan stops changing, you can often observe performance improvement:\n```\n                                                                              QUERY PLAN\n-----------------------------------------------------------------------------------------------------------------------------------------------------------------------\n Aggregate  (cost=112883.89..112883.90 rows=1 width=96) (actual time=738.731..738.731 rows=1 loops=1)\n   -\u003e  Nested Loop  (cost=1.85..112883.23 rows=88 width=48) (actual time=73.826..738.618 rows=88 loops=1)\n         -\u003e  Nested Loop  (cost=1.43..110496.69 rows=5202 width=36) (actual time=72.917..723.994 rows=5202 loops=1)\n               Join Filter: (t.id = mk.movie_id)\n               -\u003e  Nested Loop  (cost=0.99..110046.39 rows=306 width=40) (actual time=72.902..720.310 rows=306 loops=1)\n                     -\u003e  Nested Loop  (cost=0.56..109820.42 rows=486 width=19) (actual time=72.856..717.429 rows=486 loops=1)\n                           -\u003e  Seq Scan on name n  (cost=0.00..107705.93 rows=2 width=19) (actual time=72.819..717.148 rows=2 loops=1)\n                                 Filter: (name ~~ '%Downey%Robert%'::text)\n                                 Rows Removed by Filter: 4167489\n                           -\u003e  Index Scan using person_id_cast_info on cast_info ci  (cost=0.56..1054.82 rows=243 width=8) (actual time=0.024..0.091 rows=243 loops=2)\n                                 Index Cond: (person_id = n.id)\n                     -\u003e  Index Scan using title_pkey on title t  (cost=0.43..0.45 rows=1 width=21) (actual time=0.005..0.006 rows=1 loops=486)\n                           Index Cond: (id = ci.movie_id)\n                           Filter: (production_year \u003e 2000)\n                           Rows Removed by Filter: 0\n               -\u003e  Index Scan using movie_id_movie_keyword on movie_keyword mk  (cost=0.43..1.26 rows=17 width=8) (actual time=0.004..0.008 rows=17 loops=306)\n                     Index Cond: (movie_id = ci.movie_id)\n         -\u003e  Index Scan using keyword_pkey on keyword k  (cost=0.42..0.45 rows=1 width=20) (actual time=0.003..0.003 rows=0 loops=5202)\n               Index Cond: (id = mk.keyword_id)\n               Filter: (keyword = ANY ('{superhero,sequel,second-part,marvel-comics,based-on-comic,tv-special,fight,violence}'::text[]))\n               Rows Removed by Filter: 1\n Planning time: 51.333 ms\n Execution time: 738.904 ms\n(23 rows)\n```\n\nThe settings system in AQO works with normalised queries, i. e. queries with\nremoved constants. For example, the normalised version of\n`SELECT * FROM tbl WHERE a \u003c 25 AND b = 'str';`\nis\n`SELECT * FROM tbl WHERE a \u003c CONST and b = CONST;`\n\nSo the queries have equal normalisation if and only if they differ only\nin their constants.\n\nEach normalised query has its own hash. The correspondence between normalised\nquery hash and query text is stored in aqo_query_texts table:\n```\nSELECT * FROM aqo_query_texts;\n```\n```\n query_hash  |                                query_text\n-------------+----------------------------------------------------------------------------\n           0 | COMMON feature space (do not delete!)\n -1104999304 | SELECT                                                                    +\n             |     MIN(k.keyword) AS movie_keyword,                                      +\n             |     MIN(n.name) AS actor_name,                                            +\n             |     MIN(t.title) AS hero_movie                                            +\n             | FROM                                                                      +\n             |     cast_info AS ci,                                                      +\n             |     keyword AS k,                                                         +\n             |     movie_keyword AS mk,                                                  +\n             |     name AS n, title AS t                                                 +\n             | WHERE                                                                     +\n             |     k.keyword in ('superhero', 'sequel', 'second-part', 'marvel-comics',  +\n             |                   'based-on-comic', 'tv-special', 'fight', 'violence') AND+\n             |     n.name LIKE '%Downey%Robert%' AND                                     +\n             |     t.production_year \u003e 2000 AND                                          +\n             |     k.id = mk.keyword_id AND                                              +\n             |     t.id = mk.movie_id AND                                                +\n             |     t.id = ci.movie_id AND                                                +\n             |     ci.movie_id = mk.movie_id AND                                         +\n             |     n.id = ci.person_id;\n(2 rows)\n```\n\nThe most useful settings are `learn_aqo` and `use_aqo`. In the example pattern\nabove, if you want to freeze the plan and prevent aqo from further learning\nfrom queries execution statistics (which is not recommended, especially\nif the data tends to change significantly), you can do\n`UPDATE SET aqo_learn=false WHERE query_hash = \u003cquery_hash\u003e;`\nbefore commit.\n\nThe extension includes two GUC's to display the executed cardinality predictions for a query.\nThe `aqo.show_details = 'on'` (default - off) allows to see the aqo cardinality prediction results for each node of a query plan and an AQO summary.\nThe `aqo.show_hash = 'on'` (default - off) will print hash signature for each plan node and overall query. It is system-specific information and should be used for situational analysis.\n\nThe more detailed reference of AQO settings mechanism is available further.\n\n## Advanced tuning\n\nAQO has two kind of settings: per-query-type settings are stored in\n`aqo_queries` table in the database and also there is GUC variable `aqo.mode`.\n\nIf `aqo.mode = 'disabled'`, AQO is disabled for all queries, so PostgreSQL use\nits own cardinality estimations during query optimization.\nIt is useful if you want to disable aqo for all queries temporarily in the\ncurrent session or for the whole cluster\nbut not to remove or to change collected statistics and settings.\n\nOtherwise, if the normalized query hash is stored in `aqo_queries`, AQO uses\nsettings from there to process the query.\n\nThose settings are:\n\n`Learn_aqo` setting shows whether AQO collects statistics for next execution of\nthe same query type. Enabled value may have computational overheads,\nbut it is essential when AQO model does not fit the data. It happens at the\nstart of AQO for the new query type or when the data distribution in database\nis changed.\n\n`Use_aqo` setting shows whether AQO cardinalities prediction be used for next\nexecution of such query type. Disabling of AQO usage is reasonable for that\ncases in which query execution time increases after applying AQO. It happens\nsometimes because of cost models incompleteness.\n\n`fs` setting is for extra advanced AQO tuning. It may be changed manually\nto optimize a number of queries using the same model. It may decrease the\namount of memory for models and even the query execution time, but also it\nmay cause the bad AQO's behavior, so please use it only if you know exactly\nwhat you do.\n\n`Auto_tuning` setting identifies whether AQO tries to tune learn_aqo and use_aqo\nsettings for the query on its own.\n\nIf the normalized query hash is not stored in aqo_queries, AQO behaviour depends\non the `aqo.mode`.\n\nIf `aqo.mode` is `'controlled'`, the unknown query is just ignored, i. e. the\nstandard PostgreSQL optimizer is used and the query execution statistics is\nignored.\n\nIf `aqo.mode` is `'learn'`, then the normalized query hash appends to aqo_queries\nwith the default settings `learn_aqo=true`, `use_aqo=true`, `auto_tuning=false`, and\n`fs = queryid` which means that AQO uses separate machine learning\nmodel for this query type optimization. After that the query is processed as if\nit already was in aqo_queries.\n\n`Aqo.mode = 'intelligent'` behaves similarly. The only difference is that default\n`auto_tunung` variable in this case is `true`.\n\nif `aqo.mode` is `'forced'`, the query is not appended to `aqo_queries` table, but uses\nspecial `COMMON` feature space with identificator `fspace=0` for the query\noptimization and update `COMMON` machine learning model with the execution\nstatistics of this query.\n\n## Comments on AQO modes\n\n`'controlled'` mode is the default mode to use in production, because it uses\nstandard PostgreSQL optimizer for all unknown query types and uses\npredefined settings for the known ones.\n\n`'learn'` mode is a base mode necessary to memorize new normalized query. The usage\npattern is follows\n```\nSET aqo.mode='learn'\n\u003cquery\u003e\nSET aqo.mode='controlled';\n\u003cquery\u003e\n\u003cquery\u003e\n...\n-- unitl convergence\n```\n\n`'learn'` mode is not recommended to be used permanently for the whole cluster,\nbecause it enables AQO for every query type, even for those ones that don't need\nit, and that may lead to unnecessary computational overheads and performance\ndegradation.\n\n`'intelligent'` mode is the attempt to do machine learning optimizations completelly\nautomatically in a self-tuning manner, i.e. determine for which queries it is\nreasonable to use machine learing models and for which it is not. If you want to\nrely completely on it, you may use it on per-cluster basis: just add line\n`aqo.mode = 'intelligent'` into your postgresql.conf.\nNevertheless, it may still work not very good, so we do not recommend to use it\nfor production.\n\nFor handling workloads with dynamically generated query structures the forced\nmode `aqo.mode = 'forced'` is provided.\nWe cannot guarantee overall performance improvement with this mode, but you\nmay try it nevertheless.\nOn one hand it lacks of intelligent tuning, so the performance for some queries\nmay even decrease, on the other hand it may work for dynamic workload and consumes\nless memory than the `'intelligent'` mode.\n\n## Recipes\n\nIf you want to freeze optimizer's behavior (i. e. disable learning under\nworkload), use\n\n`UPDATE aqo_queries SET learn_aqo=false, auto_tuning=false;`.\n\nIf you want to disable AQO for all queries, you may use\n\n`UPDATE aqo_queries SET use_aqo=false, learn_aqo=false, auto_tuning=false;`.\n\nIf you want to disable aqo for all queries temporarily in the current session\nor for the whole cluster\nbut not to remove or to change collected statistics and settings,\nyou may use disabled mode:\n\n`SET aqo.mode = 'disabled';`\n\nor\n\n`ALTER SYSTEM SET aqo.mode = 'disabled'`.\n\n## Limitations\n\nNote that the extension doesn't work with any kind of temporary objects, because\nin query normalization AQO uses the inner OIDs of objects, which are different\nfor dynamically generated objects, even if their names are equal. That is why\n`'intelligent'`, `'learn'` and `'forced'` aqo modes cannot be used as the system setting\nwith such objects in the workload. In this case you can use `aqo.mode='controlled'`\nand use another `aqo.mode` inside the transaction to store settings for the queries\nwithout temporary objects.\n\nThe extension doesn't collect statistics on replicas, because replicas are\nread-only. It may use query execution statistics from master if the replica is\nbinary, nevertheless. The version which overcomes the replica usage limitations\nis comming soon.\n\n`'learn'` and `'intelligent'` modes are not supposed to work on per-cluster basis\nwith queries with dynamically generated structure, because they memorize all\nnormalized query hashes, which are different for all queries in such workload.\nDynamically generated constants are okay.\n\n## License\n\n© [Postgres Professional](https://postgrespro.com/), 2016-2022. Licensed under\n[The PostgreSQL License](LICENSE).\n\n## Reference\n\nThe paper on the proposed method is also under development, but the draft version\nwith experiments is available [here](https://arxiv.org/abs/1711.08330).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Faqo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpostgrespro%2Faqo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Faqo/lists"}