{"id":19853986,"url":"https://github.com/eugene-khyst/postgresql-performance-essentials","last_synced_at":"2025-05-02T01:30:26.107Z","repository":{"id":75018062,"uuid":"331945782","full_name":"eugene-khyst/postgresql-performance-essentials","owner":"eugene-khyst","description":"PostgreSQL performance essentials in 1 hour","archived":false,"fork":false,"pushed_at":"2022-07-10T08:34:37.000Z","size":73,"stargazers_count":61,"open_issues_count":0,"forks_count":14,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-06T20:25:35.723Z","etag":null,"topics":["bitmap-scans","btree-indexes","database-perfomance","gin-indexes","hash-indexes","hash-join","index-scan","merge-join","multicolumn-indexes","nested-loops","partial-indexes","postgres","postgresql","query-optimization","query-plan","sequential-scan","slow-queries","sql","sql-join","table-partitioning"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eugene-khyst.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2021-01-22T12:54:03.000Z","updated_at":"2025-03-11T18:22:30.000Z","dependencies_parsed_at":"2023-02-28T22:45:52.171Z","dependency_job_id":null,"html_url":"https://github.com/eugene-khyst/postgresql-performance-essentials","commit_stats":null,"previous_names":["eugene-khyst/postgresql-performance-essentials"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugene-khyst%2Fpostgresql-performance-essentials","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugene-khyst%2Fpostgresql-performance-essentials/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugene-khyst%2Fpostgresql-performance-essentials/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eugene-khyst%2Fpostgresql-performance-essentials/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eugene-khyst","download_url":"https://codeload.github.com/eugene-khyst/postgresql-performance-essentials/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251969232,"owners_count":21673180,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bitmap-scans","btree-indexes","database-perfomance","gin-indexes","hash-indexes","hash-join","index-scan","merge-join","multicolumn-indexes","nested-loops","partial-indexes","postgres","postgresql","query-optimization","query-plan","sequential-scan","slow-queries","sql","sql-join","table-partitioning"],"created_at":"2024-11-12T14:08:15.479Z","updated_at":"2025-05-02T01:30:25.864Z","avatar_url":"https://github.com/eugene-khyst.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# PostgreSQL Performance Essentials in 1 Hour\n\n* [Run PostgreSQL and pgAdmin](#31bb545671c6f28bf4d0c32e9ff42e10)\n* [Create sample schema with data](#46c030b410d17f0142ec5c684020c12b)\n* [Query plan](#d3f45a1dbf52313215de2d72335b08c9)\n* [Seq Scan](#5077602cb415acf28d7b838fd00c4c86)\n* [Index selectivity](#8ab5ea27dde1f77a2a3059f626b52ca5)\n* [B-Tree indexes](#a4a9f6a9d3e60a151754f45f91f5978c)\n  * [Index Scan](#9e0a689058c255f7dbc8a3303cec18a8)\n  * [Bitmap Scans](#0aca2dd64d94c0de22b10b1249f3d461)\n* [Multicolumn indexes](#063cd14ca3da4c69df873907b8cd09f2)\n  * [Index Only Scan](#0bc6a1f1a4c2655e1ad0f09702ca813d)\n* [Unique indexes](#e0b12e6f9f94a86f5c585a73a7824d4a)\n* [Partial indexes](#a9961238a26d752aadf86e48184eef90)\n* [Expression indexes](#41e31e7b92b335c5f475794040a589a4)\n* [GIN indexes](#1d7f7e5f39be8319be61e0509a9e0090)\n* [Hash indexes](#c39d79c0c6f99d7ff585ac9408522a22)\n* [Create indexes on foreign keys](#9604d17d77116450f47e7f7d7cb37c58)\n* [Use more joins](#7fc107be8e1d492f8fc88bec58ff7740)\n* [Don't over-index](#76ccb0862b9d9d30188b32c852396b28)\n* [Keep statistics updated](#2b1d46eeefdef5de7e4b4d5ce62849c4)\n* [Detect slow queries](#8f264ba3b501b66608e02c8c5efc2c58)\n* [Cluster a table according to an index](#ae39d16394b02eaba62f114a8d4231b2)\n* [Use table partitioning](#88bb68b618474d18c73e4ada7256cfd9)\n\nThis example covers the most important topics related to PostgreSQL performance.\n\n## \u003ca name=\"31bb545671c6f28bf4d0c32e9ff42e10\"\u003e\u003c/a\u003eRun PostgreSQL and pgAdmin\n\n1. Make sure Docker and Docker Compose are installed and up to date\n\n2. Run `docker-compose up -d`, wait for it to initialize completely\n\n3. Login to pgAdmin 4 at `http://localhost:8080` as `admin@example.com/s3cr3t`\n\n4. Connect to `Local PostgreqSQL` as `admin/s3cr3t`\n\n## \u003ca name=\"46c030b410d17f0142ec5c684020c12b\"\u003e\u003c/a\u003eCreate sample schema with data\n\n![ER-diagram](img/tables.png)\n\n1. Create schema by executing DDL statements from [`tables.sql`](tables.sql)\n\n2. Create test data by executing DML statements from [`data.sql`](data.sql)\n\n3. Generate more test data\n    ```sql\n    INSERT INTO book (isbn, title, publication_date, rating)\n    SELECT SUBSTR(MD5(RANDOM()::TEXT), 0, 14), \n           MD5(RANDOM()::TEXT), \n           DATE '2010-01-01' + CAST(RANDOM() * (DATE '2020-01-01' - DATE '2010-01-01') AS INT),\n           ROUND((1 + RANDOM() * 4)::numeric, 3)\n      FROM generate_series(1, 100000);\n    ```\n\n---\n\n* https://www.postgresql.org/docs/13/functions-srf.html\n\n## \u003ca name=\"d3f45a1dbf52313215de2d72335b08c9\"\u003e\u003c/a\u003eQuery plan\n\nPostgreSQL has *planner/optimizer* that creates an optimal execution plan.\n\nTo understand how to optimize an SQL query we need to know its execution plan.\n\n`EXPLAIN ANALYZE` followed by `SELECT ...`, `UPDATE ...`, or `DELETE ...`, \nexecutes the statement and provides a query plan with details about the execution.\n\n```sql\nEXPLAIN ANALYZE\nSELECT * FROM book;\n```\n```\nQUERY PLAN\n----------\n\"Seq Scan on book  (cost=0.00..3667.07 rows=100007 width=89) (actual time=0.003..6.143 rows=100007 loops=1)\"\n\"Planning Time: 0.100 ms\"\n\"Execution Time: 8.622 ms\"\n```\n\nA query plan shows what type of scanning was used for the query:\n* Seq Scan\n* Bitmap Index Scan and Bitmap Heap Scan\n* Index Scan\n* Index Only Scan\n\nTo display a query plan as a diagram with an additional information in pgAdmin 4 use `EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON)`.\n\n---\n\n* https://www.postgresql.org/docs/13/planner-optimizer.html\n* https://www.postgresql.org/docs/13/sql-explain.html\n\n## \u003ca name=\"5077602cb415acf28d7b838fd00c4c86\"\u003e\u003c/a\u003eSeq Scan\n\nSeq Scan is a full table scan and always reads everything in the table. \nIt scans through every page of data sequentially.\n\n1. Seq Scan is efficient when a large proportion of the table is retrieved from the table\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Seq Scan on book b  (cost=0.00..3667.07 rows=100007 width=57) (actual time=0.008..13.369 rows=100007 loops=1)\"\n    \"Planning Time: 0.035 ms\"\n    \"Execution Time: 15.806 ms\"\n    ```\n\n2. Seq Scan can filter rows while reading them\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date = '1994-11-10';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Seq Scan on book b  (cost=0.00..3917.09 rows=27 width=57) (actual time=0.008..8.575 rows=1 loops=1)\"\n    \"  Filter: (publication_date = '1994-11-10'::date)\"\n    \"  Rows Removed by Filter: 100006\"\n    \"Planning Time: 0.056 ms\"\n    \"Execution Time: 8.592 ms\"\n    ```\n\n3. When a small number of rows is returned by the query a scan that uses index is more efficient.\n    If there in no suitable index for the query, Seq Scan is the only available option.\n    \n    Create indexes if often less than 15% of the table's rows are going to be retrieved.\n\n4. Reset statistics counters to zeros\n    ```sql\n    SELECT pg_stat_reset();\n    ```\n\n5. Let the system work for some time after resetting statistics.\n\n6. Get the suggestion what tables need an index by looking at `seq_scan` and `seq_tup_read`. These columns show what tables were used in sequential scans\n    ```sql\n    SELECT schemaname,\n           relname as table_name,\n           seq_scan, -- Number of sequential scans initiated on this table\n           seq_tup_read, -- Number of live rows fetched by sequential scans\n           idx_scan, -- Number of index scans initiated on this table\n           idx_tup_fetch -- Number of live rows fetched by index scans\n      FROM pg_stat_user_tables\n     WHERE seq_scan \u003e 0\n     ORDER BY seq_tup_read DESC;\n    ```\n\n## \u003ca name=\"8ab5ea27dde1f77a2a3059f626b52ca5\"\u003e\u003c/a\u003eIndex selectivity\n\nThe number of distinct values in the indexed column divided by the number of records in the table is called a selectivity of an index.\n\nSelectivity is one of factors influencing type of scanning the planner/optimezer will use for a query (Index Scan or Bitmap Index Scan).\n\n1. Calculate the selectivity of the single column indexes you want to create\n    ```sql\n    SELECT ROUND(COUNT(DISTINCT rating)::NUMERIC / COUNT(*), 2) AS selectivity\n      FROM BOOK;\n    ```\n    ```\n    selectivity\n    -----------\n    0.04\n    ```\n    ```sql\n    SELECT ROUND(COUNT(DISTINCT publication_date)::NUMERIC / count(*), 2) AS selectivity\n      FROM BOOK;\n    ```\n    ```\n    selectivity\n    -----------\n    0.04\n    ```\n2. The best selectivity is 1. Only unique indexes on NOT NULL columns *guaranteed* to have such selectivity.\n    ```sql\n    SELECT ROUND(COUNT(DISTINCT isbn)::NUMERIC / COUNT(*), 2) AS idx_selectivity\n      FROM BOOK;\n    ```\n    ```\n    selectivity\n    -----------\n    1\n    ```\n\nPrefer indexing columns with selectivity greater than \u003e 0.85.\n\n## \u003ca name=\"a4a9f6a9d3e60a151754f45f91f5978c\"\u003e\u003c/a\u003eB-Tree indexes\n\n### \u003ca name=\"9e0a689058c255f7dbc8a3303cec18a8\"\u003e\u003c/a\u003eIndex Scan\n\nIndex Scan uses an index to find rows matching a predicate.\nIt finds each row in the index and then reads the actual data from the table.\n\n1. Create a single column B-Tree index\n    ```sql\n    CREATE INDEX idx_book_title ON book (title);\n    ```\n\n2. Execute the query\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.title = 'Patterns of Enterprise Application Architecture';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Index Scan using idx_book_title on book b  (cost=0.42..8.44 rows=1 width=57) (actual time=0.195..0.197 rows=1 loops=1)\"\n    \"  Index Cond: ((title)::text = 'Patterns of Enterprise Application Architecture'::text)\"\n    \"Planning Time: 0.340 ms\"\n    \"Execution Time: 0.209 ms\"\n    ```\n\n3. Index Scan was used because index `idx_book_title` has good selectivity (1).\n\n4. If there is an additional predicate in the query (unindexed columns), the Index Scan can filter rows while reading them, just like a sequential scan\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.title = 'Patterns of Enterprise Application Architecture'\n       AND b.rating \u003e 4.5; --unindexed column\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Index Scan using idx_book_title on book b  (cost=0.42..8.44 rows=1 width=57) (actual time=0.013..0.013 rows=0 loops=1)\"\n    \"  Index Cond: ((title)::text = 'Patterns of Enterprise Application Architecture'::text)\"\n    \"  Filter: (rating \u003e 4.5)\"\n    \"  Rows Removed by Filter: 1\"\n    \"Planning Time: 0.303 ms\"\n    \"Execution Time: 0.026 ms\"\n    ```\n\n5. Postgres will switch to a Seq Scan instead Index Scan when a large proportion of the table (approximately more than 15% of the table's rows) is retrieved from the table (not selective predicate).\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2012-01-01';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Seq Scan on book b  (cost=0.00..3917.09 rows=80034 width=57) (actual time=0.012..16.944 rows=79900 loops=1)\"\n    \"  Filter: (publication_date \u003e '2012-01-01'::date)\"\n    \"  Rows Removed by Filter: 20107\"\n    \"Planning Time: 0.204 ms\"\n    \"Execution Time: 19.364 ms\"\n    ```\n\n### \u003ca name=\"0aca2dd64d94c0de22b10b1249f3d461\"\u003e\u003c/a\u003eBitmap Scans\n\nIf index selectivity is bad (approximately less than \u003c 0.85), \nthe planner/optimizer will use Bitmap Scan instead of Index Scan.\n\nBitmap Scan is in the middle between a Seq Scan and an Index Scan.\nIt is useful when you need a lot of rows from a table and these rows are located in different pages (blocks).\n\nBitmap Scans always consists of minimum 2 nodes.\nThere is Bitmap Index Scan at the bottom and then Bitmap Heap Scan.\n\n1. Create index on  acolumns with bad selectivity\n    ```sql\n    CREATE INDEX idx_book_pub_date ON book (publication_date);\n    ```\n\n2. Execute the query\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2019-01-01';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=116.61..2906.71 rows=9848 width=57) (actual time=1.485..5.062 rows=9869 loops=1)\"\n    \"  Recheck Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"  Heap Blocks: exact=1334\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_pub_date  (cost=0.00..114.15 rows=9848 width=0) (actual time=1.324..1.326 rows=9869 loops=1)\"\n    \"        Index Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"Planning Time: 0.409 ms\"\n    \"Execution Time: 5.430 ms\"\n    ```\n\n3. Bitmap Scan was used because index `idx_book_pub_date` has bad selectivity (0.04) and thus the query returns a lot of rows.\n\n4. Bitmap scans are capable of combining multiple indexes using bitmap AND (\u0026) and OR (|).\n    ```sql\n    CREATE INDEX idx_book_rating ON book (rating);\n    ```\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2019-01-01'\n       AND b.rating \u003e 4.9;\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=166.48..927.55 rows=256 width=57) (actual time=0.537..0.858 rows=253 loops=1)\"\n    \"  Recheck Cond: ((rating \u003e 4.9) AND (publication_date \u003e '2019-01-01'::date))\"\n    \"  Heap Blocks: exact=232\"\n    \"  -\u003e  BitmapAnd  (cost=166.48..166.48 rows=256 width=0) (actual time=0.515..0.515 rows=0 loops=1)\"\n    \"        -\u003e  Bitmap Index Scan on idx_book_rating  (cost=0.00..51.95 rows=2604 width=0) (actual time=0.194..0.194 rows=2547 loops=1)\"\n    \"              Index Cond: (rating \u003e 4.9)\"\n    \"        -\u003e  Bitmap Index Scan on idx_book_pub_date  (cost=0.00..114.15 rows=9848 width=0) (actual time=0.281..0.281 rows=9869 loops=1)\"\n    \"              Index Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"Planning Time: 0.074 ms\"\n    \"Execution Time: 0.881 ms\"\n    ```\n\n## \u003ca name=\"063cd14ca3da4c69df873907b8cd09f2\"\u003e\u003c/a\u003eMulticolumn indexes\n\n1. Sometimes two or more columns with poor selectivity can be combined to form a multicolumn index with good selectivity\n    ```sql\n    SELECT ROUND((\n      SELECT COUNT(*) AS count_distinct FROM (\n        SELECT DISTINCT publication_date, rating FROM book\n      ) AS t)::NUMERIC / COUNT(*), 2) AS selectivity\n    FROM book;\n    ```\n    ```\n    selectivity\n    -----------\n    1.00\n    ```\n\n2. Drop the single column indexes \n    ```sql\n    DROP INDEX idx_book_pub_date, \n               idx_book_rating;\n    ```\n\n3. Create a multicolumn index\n    ```sql\n    CREATE INDEX idx_book_pub_date_rating ON book (publication_date, rating);\n    ```\n\n4. Execute the query\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2019-01-01'\n       AND b.rating \u003e 4.9;\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=254.96..1016.03 rows=256 width=57) (actual time=0.658..0.896 rows=253 loops=1)\"\n    \"  Recheck Cond: ((publication_date \u003e '2019-01-01'::date) AND (rating \u003e 4.9))\"\n    \"  Heap Blocks: exact=232\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_pub_date_rating  (cost=0.00..254.90 rows=256 width=0) (actual time=0.592..0.592 rows=253 loops=1)\"\n    \"        Index Cond: ((publication_date \u003e '2019-01-01'::date) AND (rating \u003e 4.9))\"\n    \"Planning Time: 0.074 ms\"\n    \"Execution Time: 0.922 ms\"\n    ```\n\n5. The order of predicates in a query is not important. The planner/optimizer will use the index `idx_book_pub_date_rating` for both `WHERE b.publication_date \u003e $1 AND b.rating \u003e $2` and `WHERE b.rating \u003e $2 AND b.publication_date \u003e $1`\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.rating \u003e 4.9\n       AND b.publication_date \u003e '2019-01-01';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=254.96..1016.03 rows=256 width=57) (actual time=0.732..1.010 rows=253 loops=1)\"\n    \"  Recheck Cond: ((publication_date \u003e '2019-01-01'::date) AND (rating \u003e 4.9))\"\n    \"  Heap Blocks: exact=232\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_pub_date_rating  (cost=0.00..254.90 rows=256 width=0) (actual time=0.694..0.694 rows=253 loops=1)\"\n    \"        Index Cond: ((publication_date \u003e '2019-01-01'::date) AND (rating \u003e 4.9))\"\n    \"Planning Time: 0.102 ms\"\n    \"Execution Time: 1.044 ms\"\n    ```\n\n6. The multicolumn index will be also used in queries referencing only the left part of the indexed columns in the `WHERE` clause, e.g. `WHERE b.publication_date \u003e $1`\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2019-01-01';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=232.74..3022.84 rows=9848 width=57) (actual time=0.592..4.204 rows=9869 loops=1)\"\n    \"  Recheck Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"  Heap Blocks: exact=1334\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_pub_date_rating  (cost=0.00..230.28 rows=9848 width=0) (actual time=0.461..0.462 rows=9869 loops=1)\"\n    \"        Index Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"Planning Time: 0.054 ms\"\n    \"Execution Time: 4.449 ms\"\n    ```\n\n7. However, the multicolumn index will *not* be used in queries referencing only the right part of the indexed columns in the `WHERE` clause, e.g. `WHERE b.rating \u003e 4`\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.rating \u003e 4;\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Seq Scan on book b  (cost=0.00..3917.09 rows=24873 width=57) (actual time=0.011..21.136 rows=24998 loops=1)\"\n    \"  Filter: (rating \u003e '4'::numeric)\"\n    \"  Rows Removed by Filter: 75009\"\n    \"Planning Time: 0.118 ms\"\n    \"Execution Time: 21.976 ms\"\n    ```\n\n8. The multicolumn index will *not* be used in queries where predicates are combined with `OR`\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.rating \u003e 4.9\n        OR b.publication_date \u003e '2019-01-01';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Seq Scan on book b  (cost=0.00..4167.10 rows=12196 width=57) (actual time=0.752..15.237 rows=12163 loops=1)\"\n    \"  Filter: ((rating \u003e 4.9) OR (publication_date \u003e '2019-01-01'::date))\"\n    \"  Rows Removed by Filter: 87844\"\n    \"Planning Time: 0.067 ms\"\n    \"Execution Time: 15.564 ms\"\n    ```\n\n### \u003ca name=\"0bc6a1f1a4c2655e1ad0f09702ca813d\"\u003e\u003c/a\u003eIndex Only Scan\n\n[Index Only Scan](https://www.postgresql.org/docs/current/indexes-index-only-scans.html) fetches data directly from the index without reading the table data entirely.\n\nIndex Only Scan is the most efficient type of scanning.\n\nTo use Index Only Scan the query must select only columns included in the index\n```sql\nEXPLAIN ANALYZE\nSELECT b.publication_date,\n       b.rating\n  FROM book b\n WHERE b.publication_date = '1994-11-10';\n```\n```\nQUERY PLAN\n----------\n\"Index Only Scan using idx_book_pub_date_rating on book b  (cost=0.42..4.89 rows=27 width=10) (actual time=0.383..0.386 rows=1 loops=1)\"\n\"  Index Cond: (publication_date = '1994-11-10'::date)\"\n\"  Heap Fetches: 0\"\n\"Planning Time: 0.186 ms\"\n\"Execution Time: 0.472 ms\"\n```\n\n## \u003ca name=\"e0b12e6f9f94a86f5c585a73a7824d4a\"\u003e\u003c/a\u003eUnique indexes\n\nA unique index guarantees that the table column values won't have duplicates.\n\nManually unique index can be defined as\n```sql\nCREATE UNIQUE INDEX book_isbn_key ON book (isbn);\n```\n\nPostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table.\n\nThe column `isbn` of the `book` tables has unique constrained (`isbn VARCHAR(14) UNIQUE`), so the planner/optimezer will use automatically created unique index `book_isbn_key` for the query\n```sql\nEXPLAIN ANALYZE\nSELECT b.isbn, \n       b.title,\n       b.publication_date,\n       b.rating\n  FROM book b\n WHERE b.isbn = '978-1449373320';\n```\n```\nQUERY PLAN\n----------\n\"Index Scan using book_isbn_key on book b  (cost=0.42..8.44 rows=1 width=57) (actual time=0.111..0.111 rows=1 loops=1)\"\n\"  Index Cond: ((isbn)::text = '978-1449373320'::text)\"\n\"Planning Time: 0.063 ms\"\n\"Execution Time: 0.124 ms\"\n```\n\n---\n\n* https://www.postgresql.org/docs/13/indexes-unique.html\n\n## \u003ca name=\"a9961238a26d752aadf86e48184eef90\"\u003e\u003c/a\u003ePartial indexes\n\nA partial index is an index with a `WHERE` clause.\n\nOnly rows matching the supplied predicated are indexed.\nUse partial indexes to exclude rows from an index that are not likely to be queried.\n\n1. Create the partial index\n    ```sql\n    CREATE INDEX idx_book_pub_date_rating_part on book (publication_date) WHERE rating \u003e 4;\n    ```\n\n2. The partial index will be used by queries with predicate `b.rating \u003e 4` like\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2019-01-01'\n       AND b.rating \u003e 4;\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=35.27..2791.45 rows=2449 width=57) (actual time=0.268..1.500 rows=2436 loops=1)\"\n    \"  Recheck Cond: ((publication_date \u003e '2019-01-01'::date) AND (rating \u003e '4'::numeric))\"\n    \"  Heap Blocks: exact=1130\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_pub_date_rating_part  (cost=0.00..34.66 rows=2449 width=0) (actual time=0.166..0.166 rows=2436 loops=1)\"\n    \"        Index Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"Planning Time: 0.128 ms\"\n    \"Execution Time: 1.574 ms\"\n    ```\n\n3. The partial index will *not* be used by queries with predicate `b.rating \u003e $1` and `$1 != 4` like\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2019-01-01'\n       AND b.rating \u003e 3;\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=115.39..2930.11 rows=4953 width=57) (actual time=0.557..4.687 rows=4979 loops=1)\"\n    \"  Recheck Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"  Filter: (rating \u003e '3'::numeric)\"\n    \"  Rows Removed by Filter: 4890\"\n    \"  Heap Blocks: exact=1334\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_pub_date  (cost=0.00..114.15 rows=9848 width=0) (actual time=0.410..0.411 rows=9869 loops=1)\"\n    \"        Index Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"Planning Time: 0.106 ms\"\n    \"Execution Time: 4.857 ms\"\n    ```\n\n---\n\n* https://www.postgresql.org/docs/13/indexes-partial.html\n\n## \u003ca name=\"41e31e7b92b335c5f475794040a589a4\"\u003e\u003c/a\u003eExpression indexes\n\nExpression indexes are useful for queries using function in the `WHERE` clause.\n\n1. Sequential scan is used foth the query that match on `LOWER` function (lowercase)\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE LOWER(b.title) = 'patterns of enterprise application architecture';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Seq Scan on book b  (cost=0.00..4167.10 rows=500 width=57) (actual time=0.175..82.824 rows=1 loops=1)\"\n    \"  Filter: (lower((title)::text) = 'patterns of enterprise application architecture'::text)\"\n    \"  Rows Removed by Filter: 100006\"\n    \"Planning Time: 0.128 ms\"\n    \"Execution Time: 82.846 ms\"\n    ```\n\n2. Create an expression index\n    ```sql\n    CREATE INDEX idx_book_lower_title on book (LOWER(title));\n    ```\n\n3. Repeat the `SELECT` query\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=20.29..1290.40 rows=500 width=57) (actual time=0.189..0.190 rows=1 loops=1)\"\n    \"  Recheck Cond: (lower((title)::text) = 'patterns of enterprise application architecture'::text)\"\n    \"  Heap Blocks: exact=1\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_lower_title  (cost=0.00..20.17 rows=500 width=0) (actual time=0.183..0.183 rows=1 loops=1)\"\n    \"        Index Cond: (lower((title)::text) = 'patterns of enterprise application architecture'::text)\"\n    \"Planning Time: 0.526 ms\"\n    \"Execution Time: 0.212 ms\"\n    ```\n\n---\n\n* https://www.postgresql.org/docs/13/indexes-expressional.html\n\n## \u003ca name=\"1d7f7e5f39be8319be61e0509a9e0090\"\u003e\u003c/a\u003eGIN indexes\n\nGIN indexes are \"inverted indexes\".\nAn inverted index contains a separate entry for each component value, and can efficiently handle queries that test for the presence of specific component values.\n\n1. B-Tree index is not used in queries with `LIKE`\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.title LIKE 'Patterns%';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Seq Scan on book b  (cost=0.00..3917.09 rows=10 width=57) (actual time=0.017..18.060 rows=1 loops=1)\"\n    \"  Filter: ((title)::text ~~ 'Patterns%'::text)\"\n    \"  Rows Removed by Filter: 100006\"\n    \"Planning Time: 0.733 ms\"\n    \"Execution Time: 18.096 ms\"\n    ```\n\n2. To use a trigram GIN indexes, create an extension\n    ```sql\n    CREATE EXTENSION IF NOT EXISTS pg_trgm;\n    ```\n\n3. Create the trigtram GIN index\n    ```sql\n    CREATE INDEX idx_book_title_trgm ON book USING gin (title gin_trgm_ops);\n    ```\n\n4. Repeat the `SELECT` query\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=100.08..138.37 rows=10 width=57) (actual time=0.123..0.125 rows=1 loops=1)\"\n    \"  Recheck Cond: ((title)::text ~~ 'Patterns%'::text)\"\n    \"  Rows Removed by Index Recheck: 3\"\n    \"  Heap Blocks: exact=1\"\n    \"  -\u003e  Bitmap Index Scan on idx_book_title_trgm  (cost=0.00..100.08 rows=10 width=0) (actual time=0.112..0.113 rows=4 loops=1)\"\n    \"        Index Cond: ((title)::text ~~ 'Patterns%'::text)\"\n    \"Planning Time: 0.509 ms\"\n    \"Execution Time: 0.167 ms\"\n    ```\n\n5. GIN index can be combined with other indexes using bitmap operations\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.publication_date \u003e '2019-01-01'\n       AND b.title LIKE 'a%';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Bitmap Heap Scan on book b  (cost=184.16..1618.23 rows=597 width=57) (actual time=1.916..3.272 rows=621 loops=1)\"\n    \"  Recheck Cond: (((title)::text ~~ 'a%'::text) AND (publication_date \u003e '2019-01-01'::date))\"\n    \"  Heap Blocks: exact=505\"\n    \"  -\u003e  BitmapAnd  (cost=184.16..184.16 rows=597 width=0) (actual time=1.772..1.773 rows=0 loops=1)\"\n    \"        -\u003e  Bitmap Index Scan on idx_book_title_trgm  (cost=0.00..69.46 rows=6061 width=0) (actual time=1.077..1.077 rows=6209 loops=1)\"\n    \"              Index Cond: ((title)::text ~~ 'a%'::text)\"\n    \"        -\u003e  Bitmap Index Scan on idx_book_pub_date  (cost=0.00..114.15 rows=9848 width=0) (actual time=0.599..0.599 rows=9869 loops=1)\"\n    \"              Index Cond: (publication_date \u003e '2019-01-01'::date)\"\n    \"Planning Time: 0.182 ms\"\n    \"Execution Time: 3.334 ms\"\n    ```\n\n---\n\n* https://www.postgresql.org/docs/13/indexes-types.html\n\n## \u003ca name=\"c39d79c0c6f99d7ff585ac9408522a22\"\u003e\u003c/a\u003eHash indexes\n\nHash index is a flat structure unlike B-Tree.\n\nHash indexes can only handle simple equality comparisons (using the `=` operator)\n\nThe main advantage of Hash indexes over B-Tree is space.\nOn a very large data sets Hash indexes takes less space compared to B-Tree and allow pretty fast lookups.\n\nOnly B-Tree indexes can be unique, hash indexes doesn't support this feature.\n\nTo use Hash indexes PostgreSQL 10+ is required.\n\n1. Drop B-Tree index `idx_book_title`\n    ```sql\n    DROP INDEX idx_book_title\n    ```\n\n2. Create Hash index\n    ```sql\n    CREATE INDEX idx_book_title_hash ON book USING HASH (title);\n    ```\n\n3. Execute the query\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n     WHERE b.title = 'Patterns of Enterprise Application Architecture'\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Index Scan using idx_book_title_hash on book b  (cost=0.00..8.02 rows=1 width=57) (actual time=0.029..0.029 rows=1 loops=1)\"\n    \"  Index Cond: ((title)::text = 'Patterns of Enterprise Application Architecture'::text)\"\n    \"Planning Time: 0.122 ms\"\n    \"Execution Time: 0.043 ms\"\n    ```\n\n## \u003ca name=\"9604d17d77116450f47e7f7d7cb37c58\"\u003e\u003c/a\u003eCreate indexes on foreign keys\n\nMake sure every foreign key has a matching index.\n\nThere are few exceptions when index on foreign key is unnecessary:\n* if the table with the foreign key is samll because sequential scan will be probably cheaper,\n* if you will never join tables on this key,\n* if you will never delete a row or update a key column in the referenced table.\n\n1. Execute the query without indexing the foreign keys\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT b.isbn, \n           b.title,\n           b.publication_date,\n           b.rating\n      FROM book b\n      JOIN publisher p\n        ON p.publisher_id = b.publisher_id\n     WHERE b.publication_date = '1994-11-10';\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Hash Join  (cost=5.57..105.83 rows=27 width=57) (actual time=0.753..0.760 rows=1 loops=1)\"\n    \"  Hash Cond: (b.publisher_id = p.publisher_id)\"\n    \"  -\u003e  Bitmap Heap Scan on book b  (cost=4.50..104.69 rows=27 width=73) (actual time=0.456..0.459 rows=1 loops=1)\"\n    \"        Recheck Cond: (publication_date = '1994-11-10'::date)\"\n    \"        Heap Blocks: exact=1\"\n    \"        -\u003e  Bitmap Index Scan on idx_book_pub_date  (cost=0.00..4.50 rows=27 width=0) (actual time=0.436..0.437 rows=1 loops=1)\"\n    \"              Index Cond: (publication_date = '1994-11-10'::date)\"\n    \"  -\u003e  Hash  (cost=1.03..1.03 rows=3 width=16) (actual time=0.124..0.126 rows=3 loops=1)\"\n    \"        Buckets: 1024  Batches: 1  Memory Usage: 9kB\"\n    \"        -\u003e  Seq Scan on publisher p  (cost=0.00..1.03 rows=3 width=16) (actual time=0.114..0.116 rows=3 loops=1)\"\n    \"Planning Time: 1.978 ms\"\n    \"Execution Time: 0.926 ms\"\n    ```\n\n2. Create the index matching the foreign key\n    ```sql\n    CREATE INDEX idx_fk_book_publisher ON book (publisher_id);\n    ```\n\n3. Repeat the `SELECT` query and compare query plans\n    ```\n    QUERY PLAN\n    ----------\n    \"Nested Loop  (cost=0.29..26.36 rows=27 width=57) (actual time=0.177..0.193 rows=1 loops=1)\"\n    \"  -\u003e  Seq Scan on publisher p  (cost=0.00..1.03 rows=3 width=16) (actual time=0.008..0.010 rows=3 loops=1)\"\n    \"  -\u003e  Index Scan using idx_fk_book_publisher on book b  (cost=0.29..8.43 rows=1 width=73) (actual time=0.057..0.058 rows=0 loops=3)\"\n    \"        Index Cond: (publisher_id = p.publisher_id)\"\n    \"        Filter: (publication_date = '1994-11-10'::date)\"\n    \"        Rows Removed by Filter: 2\"\n    \"Planning Time: 0.720 ms\"\n    \"Execution Time: 0.226 ms\"\n    ```\n\n4. Drop the index `idx_fk_book_publisher`\n    ```sql\n    DROP INDEX idx_fk_book_publisher;\n    ```\n\n5. Run the following query to find all unindexed foreign key constraints\n    ```sql\n    SELECT c.conname AS fk_constraint_name, \n           c.conrelid::REGCLASS AS table, \n           c.confrelid::REGCLASS AS referenced_table,\n           ARRAY_AGG(a.attname) AS columns,\n           FORMAT('CREATE INDEX ON %s (%s);',\n           c.conrelid::regclass,\n           STRING_AGG(a.attname, ',')) as sql\n      FROM pg_constraint c\n      JOIN pg_attribute a \n        ON a.attrelid = c.conrelid \n       AND a.attnum = any(c.conkey)\n     WHERE c.contype = 'f'\n       AND NOT EXISTS (\n           SELECT 1\n             FROM pg_index i \n            WHERE i.indrelid = c.conrelid\n             AND conkey \u003c@ STRING_TO_ARRAY(indkey::text, ' ')::SMALLINT[])\n     GROUP BY c.conname, c.conrelid, c.confrelid;\n    ```\n    ```\n    fk_constraint_name        table   referenced_table  columns           sql\n    -------------------------------------------------------------------------\n    \"book_publisher_id_fkey\"  \"book\"  \"publisher\"       \"{publisher_id}\"  \"CREATE INDEX ON book (publisher_id);\"\n    ```\n    \n  6. Take the values of the `sql` column form the result set and execute to create the missing indexes.\n\n## \u003ca name=\"7fc107be8e1d492f8fc88bec58ff7740\"\u003e\u003c/a\u003eUse more joins\n\nPostgreSQL is good at joining multiple tables.\nQueries with joins are usually better than subqueries.\nA query with 5 joins are completely acceptable.\n\n```sql\nEXPLAIN ANALYZE\nSELECT b.isbn, \n       b.title,\n\t   b.publication_date,\n\t   a.full_name,\n\t   c.name\n  FROM book b\n  JOIN book_author ba \n    ON b.book_id = ba.book_id\n  JOIN author a \n    ON a.author_id = ba.author_id\n  JOIN book_category bc \n    ON b.book_id = bc.book_id\n  JOIN category c \n    ON c.category_id = bc.category_id\n WHERE b.title LIKE '%Patterns%';\n```\n\n## \u003ca name=\"76ccb0862b9d9d30188b32c852396b28\"\u003e\u003c/a\u003eDon't over-index\n\nMaintaining a lot of indexes has its price.\nIndexes consume disk space.\nThe more indexes you have, the slower `INSERT` statements will become.\n\nMake sure you don't have an unused indexes.\n\n1. Reset statistics counters to zeros\n    ```sql\n    SELECT pg_stat_reset();\n    ```\n\n2. Let the system work for some time after resetting statistics.\n\n3. Find indexes that have never been used\n    ```sql\n    SELECT s.schemaname,\n           s.relname AS table_name,\n           s.indexrelname AS index_name,\n           s.idx_scan AS times_used,\n           pg_size_pretty(pg_relation_size(t.relid)) AS table_size,\n           pg_size_pretty(pg_relation_size(s.indexrelid)) AS index_size,\n           idx.indexdef AS index_ddl\n      FROM pg_stat_user_indexes s\n      JOIN pg_stat_user_tables t \n        ON s.relname = t.relname\n      JOIN pg_index i\n        ON s.indexrelid = i.indexrelid\n      JOIN pg_indexes AS idx \n        ON s.indexrelname = idx.indexname\n       AND s.schemaname = idx.schemaname\n     WHERE s.idx_scan = 0 -- no scans\n       AND 0 \u003c\u003e ALL(i.indkey) -- 0 in the array means this is an expression index\n       AND NOT i.indisunique -- no unique index\n     ORDER BY pg_relation_size(s.indexrelid) DESC;\n    ```\n\n## \u003ca name=\"2b1d46eeefdef5de7e4b4d5ce62849c4\"\u003e\u003c/a\u003eKeep statistics updated\n\nThe planner/optimizer relies on statistics about the contents of tables in order to generate good plans for queries.\n\nMake sure to run `VACUUM ANALYZE` to keep statistics up to date and recover or reuse disk space occupied by updated or deleted rows.\n\nPostgreSQL has *autovacuum* daemon automating the execution of `VACUUM` and `ANALYZE` commands.\nIt is highly recommended to enable autovacuum feature.\n\nCheck when the tables was last vacuumed and analyzed\n```sql\nSELECT schemaname, \n       relname, \n       last_vacuum, \n       vacuum_count, \n       last_analyze, \n       analyze_count \n  FROM pg_stat_user_tables;\n```\n\n---\n\n* https://www.postgresql.org/docs/13/routine-vacuuming.html\n\n## \u003ca name=\"8f264ba3b501b66608e02c8c5efc2c58\"\u003e\u003c/a\u003eDetect slow queries\n\nTo detect slow queries `pg_stat_statements` extension is required.\nIt has to be preloaded using `shared_preload_libraries=pg_stat_statements`.\n\n1. Check that `pg_stat_statements` was preloaded\n    ```sql\n    SHOW shared_preload_libraries;\n    ```\n    ```\n    shared_preload_libraries\n    ------------------------\n    \"pg_stat_statements\"\n    ```\n\n2. Create extension for tracking planning and execution statistics of all executed SQL statements\n    ```sql\n    CREATE EXTENSION IF NOT EXISTS pg_stat_statements;\n    ```\n\n3. Get 20 slowest SQL queries\n    ```sql\n    SELECT SUBSTRING(query, 1, 40) AS short_query,\n           ROUND(( 100 * total_exec_time / SUM(total_exec_time) OVER ())::NUMERIC, 2) AS percent,\n           ROUND(total_exec_time::numeric, 2) AS total_exec_time,\n           calls,\n           ROUND(mean_exec_time::numeric, 2) AS mean,\n           query\n      FROM pg_stat_statements\n     ORDER BY total_exec_time DESC\n     LIMIT 20;\n    ```\n\n---\n\n* https://www.postgresql.org/docs/13/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-PRELOAD\n\n## \u003ca name=\"ae39d16394b02eaba62f114a8d4231b2\"\u003e\u003c/a\u003eCluster a table according to an index\n\nWhen a table is clustered, it is physically reordered based on the index information.\n\nPostgreSQL Documentation:\n\n\u003e In cases where you are accessing single rows randomly within a table, the actual order of the data in the table is unimportant. However, if you tend to access some data more than others, and there is an index that groups them together, you will benefit from using CLUSTER. If you are requesting a range of indexed values from a table, or a single indexed value that has multiple rows that match, CLUSTER will help because once the index identifies the table page for the first row that matches, all other rows that match are probably already on the same table page, and so you save disk accesses and speed up the query.\n\nA table can be clustered according to an index using\n```sql\nCLUSTER book USING book_isbn_key;\n```\n\nClustering is a one-time operation: when the table is subsequently updated, the changes are not clustered.\n\n---\n\n* https://www.postgresql.org/docs/13/sql-cluster.html\n\n## \u003ca name=\"88bb68b618474d18c73e4ada7256cfd9\"\u003e\u003c/a\u003eUse table partitioning\n\nPartitioning refers to splitting what is logically one large table into smaller physical pieces.\n\nPartitioning benefits:\n\n* Partitioning helps to scale PostgreSQL by splitting large logical tables into smaller physical tables that can be stored on different storage media based on uses;\n\n* Dividing a large table into smaller tables reduces table scans and memory swap problems and thus increases performance;\n\n* Partitioning reduces index size and makes it more likely that the heavily-used parts of the indexes fit in memory.\n\nYou will get benefits from partitioning only when a table is very large.\nVery large means that the table is at least larger than the physical memory of the database server.\n\nTo use declarative partition PostgreSQL 10+ is required.\n\n1. Make sure *partition pruning* is enabled. Partition pruning is a query optimization technique that improves performance for declaratively partitioned tables.\n    ```sql\n    SHOW enable_partition_pruning;\n    ```\n    ```\n    enable_partition_pruning\n    ------------------------\n    on\n    ```\n\n2. Drop all tables by executing DDL statements from [`drop-all.sql`](drop-all.sql)\n\n3. Create a table `book` that will be partitioned by `publication_date` column\n    ```sql\n    CREATE EXTENSION IF NOT EXISTS \"uuid-ossp\";\n\n    CREATE TABLE book (\n        book_id UUID DEFAULT uuid_generate_v4(),\n        isbn VARCHAR(14),\n        title VARCHAR(255) NOT NULL,\n        publication_date DATE NOT NULL,\n        rating NUMERIC(4, 3),\n        PRIMARY KEY (book_id, publication_date),\n        UNIQUE (isbn, publication_date)\n    ) PARTITION BY RANGE (publication_date);\n    ```\n\n4. Notice: primary key and unique constraint on partitioned table must include all partitioning columns.\n\n5. Create a tablespace for the partition with the most recent data. \n    Tablespace allows to define locations in the file system where the files representing database objects can be stored. \n    This way partitions can be stored on different storage media.\n    ```sql\n    CREATE TABLESPACE fasttablespace LOCATION '/ssd1/postgresql/data/fasttablespace';\n    ```\n\n6. Create partitions for the data from the past in the default tablespace\n    ```sql\n    CREATE TABLE book_y1990 PARTITION OF book\n      FOR VALUES FROM ('1990-01-01') TO ('2000-01-01');\n\n    CREATE TABLE book_y2000 PARTITION OF book\n      FOR VALUES FROM ('2000-01-01') TO ('2010-01-01');\n\n    CREATE TABLE book_y2010 PARTITION OF book\n      FOR VALUES FROM ('2010-01-01') TO ('2020-01-01');\n    ```\n\n7. Create partition for the most recent data in the `fasttablespace` tablespace located on different storage media\n    ```sql\n    CREATE TABLE book_y2020 PARTITION OF book\n      FOR VALUES FROM ('2020-01-01') TO ('2030-01-01')\n      TABLESPACE fasttablespace;\n    ```\n\n8. Create an index on the key column. This automatically creates one index on each partition.\n    ```sql\n    CREATE INDEX idx_book_part_key ON book (publication_date);\n    ```\n\n9. Generate some test data\n    ```sql\n    INSERT INTO book (isbn, title, publication_date, rating)\n    SELECT SUBSTR(MD5(RANDOM()::TEXT), 0, 14), \n           MD5(RANDOM()::TEXT), \n           DATE '2010-01-01' + CAST(RANDOM() * (DATE '2021-01-01' - DATE '2010-01-01') AS INT),\n           ROUND((1 + RANDOM() * 4)::numeric, 3)\n      FROM generate_series(1, 100000);\n    ```\n\n10. Depending on a predicate, only a single partition (smaller index) can be queried, improving query performance\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT EXTRACT(YEAR FROM b.publication_date) AS pub_year,\n           COUNT(*)\n      FROM book b\n     WHERE b.publication_date \u003e '2020-01-01' -- Only book_y2020 will be queried\n     GROUP BY pub_year;\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"HashAggregate  (cost=294.23..297.23 rows=200 width=16) (actual time=4.898..4.900 rows=2 loops=1)\"\n    \"  Group Key: date_part('year'::text, (b.publication_date)::timestamp without time zone)\"\n    \"  Batches: 1  Memory Usage: 40kB\"\n    \"  -\u003e  Index Only Scan using book_y2020_publication_date_idx on book_y2020 b  (cost=0.29..248.79 rows=9089 width=8) (actual time=0.021..2.606 rows=9092 loops=1)\"\n    \"        Index Cond: (publication_date \u003e '2020-01-01'::date)\"\n    \"        Heap Fetches: 0\"\n    \"Planning Time: 0.346 ms\"\n    \"Execution Time: 5.112 ms\"\n    ```\n\n11. Depending on a predicate a, multiple partitions can be queried in parallel and combined into a single result set\n    ```sql\n    EXPLAIN ANALYZE\n    SELECT EXTRACT(YEAR FROM b.publication_date) AS pub_year,\n           COUNT(*)\n      FROM book b\n     WHERE b.publication_date \u003e '2010-01-01' -- Both book_y2010 and book_y2020 will be queried\n     GROUP BY pub_year;\n    ```\n    ```\n    QUERY PLAN\n    ----------\n    \"Finalize GroupAggregate  (cost=3455.77..3507.44 rows=200 width=16) (actual time=20.181..21.293 rows=12 loops=1)\"\n    \"  Group Key: (date_part('year'::text, (b.publication_date)::timestamp without time zone))\"\n    \"  -\u003e  Gather Merge  (cost=3455.77..3502.44 rows=400 width=16) (actual time=20.174..21.282 rows=33 loops=1)\"\n    \"        Workers Planned: 2\"\n    \"        Workers Launched: 2\"\n    \"        -\u003e  Sort  (cost=2455.75..2456.25 rows=200 width=16) (actual time=17.239..17.241 rows=11 loops=3)\"\n    \"              Sort Key: (date_part('year'::text, (b.publication_date)::timestamp without time zone))\"\n    \"              Sort Method: quicksort  Memory: 25kB\"\n    \"              Worker 0:  Sort Method: quicksort  Memory: 25kB\"\n    \"              Worker 1:  Sort Method: quicksort  Memory: 25kB\"\n    \"              -\u003e  Partial HashAggregate  (cost=2445.11..2448.11 rows=200 width=16) (actual time=17.210..17.213 rows=11 loops=3)\"\n    \"                    Group Key: (date_part('year'::text, (b.publication_date)::timestamp without time zone))\"\n    \"                    Batches: 1  Memory Usage: 40kB\"\n    \"                    Worker 0:  Batches: 1  Memory Usage: 40kB\"\n    \"                    Worker 1:  Batches: 1  Memory Usage: 40kB\"\n    \"                    -\u003e  Parallel Append  (cost=0.29..2236.82 rows=41657 width=8) (actual time=0.031..10.669 rows=33327 loops=3)\"\n    \"                          -\u003e  Parallel Index Only Scan using book_y2010_publication_date_idx on book_y2010 b_1  (cost=0.29..1835.51 rows=53449 width=8) (actual time=0.024..7.661 rows=30290 loops=3)\"\n    \"                                Index Cond: (publication_date \u003e '2010-01-01'::date)\"\n    \"                                Heap Fetches: 0\"\n    \"                          -\u003e  Parallel Index Only Scan using book_y2020_publication_date_idx on book_y2020 b_2  (cost=0.29..193.03 rows=5360 width=8) (actual time=0.027..1.176 rows=4556 loops=2)\"\n    \"                                Index Cond: (publication_date \u003e '2010-01-01'::date)\"\n    \"                                Heap Fetches: 0\"\n    \"Planning Time: 0.173 ms\"\n    \"Execution Time: 21.418 ms\"\n    ```\n\n---\n\n* https://www.postgresql.org/docs/13/ddl-partitioning.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feugene-khyst%2Fpostgresql-performance-essentials","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feugene-khyst%2Fpostgresql-performance-essentials","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feugene-khyst%2Fpostgresql-performance-essentials/lists"}