{"id":15066765,"url":"https://github.com/pawurb/python-pg-extras","last_synced_at":"2025-12-13T18:14:32.068Z","repository":{"id":39280675,"uuid":"301553570","full_name":"pawurb/python-pg-extras","owner":"pawurb","description":"Python PostgreSQL database performance insights. Locks, index usage, buffer cache hit ratios, vacuum stats and more. ","archived":true,"fork":false,"pushed_at":"2022-06-01T10:04:53.000Z","size":35,"stargazers_count":37,"open_issues_count":1,"forks_count":5,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-01-08T17:07:23.139Z","etag":null,"topics":["databases","performance","postgresql","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pawurb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-10-05T22:20:46.000Z","updated_at":"2025-01-07T05:25:32.000Z","dependencies_parsed_at":"2022-09-01T08:02:05.722Z","dependency_job_id":null,"html_url":"https://github.com/pawurb/python-pg-extras","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pawurb%2Fpython-pg-extras","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pawurb%2Fpython-pg-extras/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pawurb%2Fpython-pg-extras/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pawurb%2Fpython-pg-extras/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pawurb","download_url":"https://codeload.github.com/pawurb/python-pg-extras/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235360895,"owners_count":18977594,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["databases","performance","postgresql","python"],"created_at":"2024-09-25T01:11:53.650Z","updated_at":"2025-10-05T03:31:22.897Z","avatar_url":"https://github.com/pawurb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Python PG Extras [![PyPI version](https://badge.fury.io/py/pg-extras.svg)](https://badge.fury.io/py/pg-extras)\n\nPython port of [Heroku PG Extras](https://github.com/heroku/heroku-pg-extras) with several additions and improvements. The goal of this project is to provide powerful insights into the PostgreSQL database for Python apps that are not using the Heroku PostgreSQL plugin.\n\nQueries can be used to obtain information about a Postgres instance, that may be useful when analyzing performance issues. This includes information about locks, index usage, buffer cache hit ratios and vacuum statistics. Python API enables developers to easily integrate the tool into e.g. automatic monitoring tasks.\n\nYou can check out this blog post for detailed step by step tutorial on how to [optimize PostgreSQL using PG Extras library](https://pawelurbanek.com/postgresql-fix-performance).\n\nAlternative versions:\n\n- [Ruby](https://github.com/pawurb/ruby-pg-extras)\n\n- [Ruby on Rails](https://github.com/pawurb/rails-pg-extras)\n\n- [NodeJS](https://github.com/pawurb/node-postgres-extras)\n\n- [Elixir](https://github.com/pawurb/ecto_psql_extras)\n\n- [Haskell](https://github.com/pawurb/haskell-pg-extras)\n\n## Installation\n\n```bash\npip install pg-extras\n```\n\nSome of the queries (e.g., `calls` and `outliers`) require [pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html) extension enabled.\n\nYou can check if it is enabled in your database by running:\n\n```python\nPGExtras.query('extensions')\n```\nYou should see the similar line in the output:\n\n```bash\n| pg_stat_statements  | 1.7  | 1.7 | track execution statistics of all SQL statements executed |\n```\n\n## Usage\n\nGem expects the `os.environ['DATABASE_URL']` value in the following format:\n\n```python\n\"postgresql://postgres:secret@localhost:5432/database_name\"\n```\n\nAlternatively you can pass it directly to the method:\n\n```python\nPGExtras.query('cache_hit', database_url=\"postgresql://postgres:secret@localhost:5432/database_name\")\n```\n\nYou can run queries using a simple python API:\n\n```python\nfrom pg_extras import PGExtras\n\nPGExtras.query('cache_hit')\n```\n```bash\n+----------------+------------------------+\n|        Index and table hit rate         |\n+----------------+------------------------+\n| name           | ratio                  |\n+----------------+------------------------+\n| index hit rate | 0.97796610169491525424 |\n| table hit rate | 0.96724294813466787989 |\n+----------------+------------------------+\n```\n\nBy default the ASCII table is displayed. Alternatively you can return the raw query object:\n\n```python\nresult = PGExtras.query('cache_hit', output='raw')\n\ntype(result) # =\u003e \u003cclass 'sqlalchemy.engine.result.ResultProxy'\u003e\nresult.keys() # =\u003e ['name', 'ratio']\nresult.fetchall() # =\u003e [('index hit rate', Decimal('0.939...')), ('table hit rate', Decimal('0.986...'))]\n\n```\n\n## Available methods\n\n### `cache_hit`\n\n```python\n\nPGExtras.query('cache_hit')\n\n      name      |         ratio\n----------------+------------------------\n index hit rate | 0.99957765013541945832\n table hit rate |                   1.00\n(2 rows)\n```\n\nThis command provides information on the efficiency of the buffer cache, for both index reads (`index hit rate`) as well as table reads (`table hit rate`). A low buffer cache hit ratio can be a sign that the Postgres instance is too small for the workload.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#cache-hit)\n\n### `index_cache_hit`\n\n```python\n\nPGExtras.query('index_cache_hit')\n\n| name                  | buffer_hits | block_reads | total_read | ratio             |\n+-----------------------+-------------+-------------+------------+-------------------+\n| teams                 | 187665      | 109         | 187774     | 0.999419514948821 |\n| subscriptions         | 5160        | 6           | 5166       | 0.99883855981417  |\n| plans                 | 5718        | 9           | 5727       | 0.998428496595076 |\n(truncated results for brevity)\n```\n\nThe same as `cache_hit` with each table's indexes cache hit info displayed separately.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#cache-hit)\n\n### `table_cache_hit`\n\n```python\n\nPGExtras.query('table_cache_hit')\n\n| name                  | buffer_hits | block_reads | total_read | ratio             |\n+-----------------------+-------------+-------------+------------+-------------------+\n| plans                 | 32123       | 2           | 32125      | 0.999937743190662 |\n| subscriptions         | 95021       | 8           | 95029      | 0.999915815172211 |\n| teams                 | 171637      | 200         | 171837     | 0.99883610631005  |\n(truncated results for brevity)\n```\n\nThe same as `cache_hit` with each table's cache hit info displayed seperately.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#cache-hit)\n\n### `index_usage`\n\n```python\n\nPGExtras.query('index_usage')\n\n       relname       | percent_of_times_index_used | rows_in_table\n---------------------+-----------------------------+---------------\n events              |                          65 |       1217347\n app_infos           |                          74 |        314057\n app_infos_user_info |                           0 |        198848\n user_info           |                           5 |         94545\n delayed_jobs        |                          27 |             0\n(5 rows)\n```\n\nThis command provides information on the efficiency of indexes, represented as what percentage of total scans were index scans. A low percentage can indicate under indexing, or wrong data being indexed.\n\n### `locks`\n\n```python\n\nPGExtras.query('locks')\n\n procpid | relname | transactionid | granted |     query_snippet     | mode             |       age\n---------+---------+---------------+---------+-----------------------+-------------------------------------\n   31776 |         |               | t       | \u003cIDLE\u003e in transaction | ExclusiveLock    |  00:19:29.837898\n   31776 |         |          1294 | t       | \u003cIDLE\u003e in transaction | RowExclusiveLock |  00:19:29.837898\n   31912 |         |               | t       | select * from hello;  | ExclusiveLock    |  00:19:17.94259\n    3443 |         |               | t       |                      +| ExclusiveLock    |  00:00:00\n         |         |               |         |    select            +|                  |\n         |         |               |         |      pg_stat_activi   |                  |\n(4 rows)\n```\n\nThis command displays queries that have taken out an exclusive lock on a relation. Exclusive locks typically prevent other operations on that relation from taking place, and can be a cause of \"hung\" queries that are waiting for a lock to be granted.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#deadlocks)\n\n### `all_locks`\n\n```python\n\nPGExtras.query('all_locks')\n\n```\n\nThis command displays all the current locks, regardless of their type.\n\n### `outliers`\n\n```python\n\nPGExtras.query('outliers')\n\n                   qry                   |    exec_time     | prop_exec_time |   ncalls    | sync_io_time\n-----------------------------------------+------------------+----------------+-------------+--------------\n SELECT * FROM archivable_usage_events.. | 154:39:26.431466 | 72.2%          | 34,211,877  | 00:00:00\n COPY public.archivable_usage_events (.. | 50:38:33.198418  | 23.6%          | 13          | 13:34:21.00108\n COPY public.usage_events (id, reporte.. | 02:32:16.335233  | 1.2%           | 13          | 00:34:19.784318\n INSERT INTO usage_events (id, retaine.. | 01:42:59.436532  | 0.8%           | 12,328,187  | 00:00:00\n SELECT * FROM usage_events WHERE (alp.. | 01:18:10.754354  | 0.6%           | 102,114,301 | 00:00:00\n UPDATE usage_events SET reporter_id =.. | 00:52:35.683254  | 0.4%           | 23,786,348  | 00:00:00\n INSERT INTO usage_events (id, retaine.. | 00:49:24.952561  | 0.4%           | 21,988,201  | 00:00:00\n(truncated results for brevity)\n```\n\nThis command displays statements, obtained from `pg_stat_statements`, ordered by the amount of time to execute in aggregate. This includes the statement itself, the total execution time for that statement, the proportion of total execution time for all statements that statement has taken up, the number of times that statement has been called, and the amount of time that statement spent on synchronous I/O (reading/writing from the file system).\n\nTypically, an efficient query will have an appropriate ratio of calls to total execution time, with as little time spent on I/O as possible. Queries that have a high total execution time but low call count should be investigated to improve their performance. Queries that have a high proportion of execution time being spent on synchronous I/O should also be investigated.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#missing-indexes)\n\n### `calls`\n\n```python\n\nPGExtras.query('calls')\n\n                   qry                   |    exec_time     | prop_exec_time |   ncalls    | sync_io_time\n-----------------------------------------+------------------+----------------+-------------+--------------\n SELECT * FROM usage_events WHERE (alp.. | 01:18:11.073333  | 0.6%           | 102,120,780 | 00:00:00\n BEGIN                                   | 00:00:51.285988  | 0.0%           | 47,288,662  | 00:00:00\n COMMIT                                  | 00:00:52.31724   | 0.0%           | 47,288,615  | 00:00:00\n SELECT * FROM  archivable_usage_event.. | 154:39:26.431466 | 72.2%          | 34,211,877  | 00:00:00\n UPDATE usage_events SET reporter_id =.. | 00:52:35.986167  | 0.4%           | 23,788,388  | 00:00:00\n INSERT INTO usage_events (id, retaine.. | 00:49:25.260245  | 0.4%           | 21,990,326  | 00:00:00\n INSERT INTO usage_events (id, retaine.. | 01:42:59.436532  | 0.8%           | 12,328,187  | 00:00:00\n(truncated results for brevity)\n```\n\nThis command is much like `pg:outliers`, but ordered by the number of times a statement has been called.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#missing-indexes)\n\n### `blocking`\n\n```python\n\nPGExtras.query('blocking')\n\n blocked_pid |    blocking_statement    | blocking_duration | blocking_pid |                                        blocked_statement                           | blocked_duration\n-------------+--------------------------+-------------------+--------------+------------------------------------------------------------------------------------+------------------\n         461 | select count(*) from app | 00:00:03.838314   |        15682 | UPDATE \"app\" SET \"updated_at\" = '2013-03-04 15:07:04.746688' WHERE \"id\" = 12823149 | 00:00:03.821826\n(1 row)\n```\n\nThis command displays statements that are currently holding locks that other statements are waiting to be released. This can be used in conjunction with `pg:locks` to determine which statements need to be terminated in order to resolve lock contention.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#deadlocks)\n\n### `total_index_size`\n\n```python\n\nPGExtras.query('total_index_size')\n\n  size\n-------\n 28194 MB\n(1 row)\n```\n\nThis command displays the total size of all indexes on the database, in MB. It is calculated by taking the number of pages (reported in `relpages`) and multiplying it by the page size (8192 bytes).\n\n### `index_size`\n\n```python\n\nPGExtras.query('index_size')\n\n                             name                              |  size\n---------------------------------------------------------------+---------\n idx_activity_attemptable_and_type_lesson_enrollment           | 5196 MB\n index_enrollment_attemptables_by_attempt_and_last_in_group    | 4045 MB\n index_attempts_on_student_id                                  | 2611 MB\n enrollment_activity_attemptables_pkey                         | 2513 MB\n index_attempts_on_student_id_final_attemptable_type           | 2466 MB\n attempts_pkey                                                 | 2466 MB\n index_attempts_on_response_id                                 | 2404 MB\n index_attempts_on_enrollment_id                               | 1957 MB\n index_enrollment_attemptables_by_enrollment_activity_id       | 1789 MB\n enrollment_activities_pkey                                    |  458 MB\n(truncated results for brevity)\n```\n\nThis command displays the size of each each index in the database, in MB. It is calculated by taking the number of pages (reported in `relpages`) and multiplying it by the page size (8192 bytes).\n\n### `table_size`\n\n```python\n\nPGExtras.query('table_size')\n\n                             name                              |  size\n---------------------------------------------------------------+---------\n learning_coaches                                              |  196 MB\n states                                                        |  145 MB\n grade_levels                                                  |  111 MB\n charities_customers                                           |   73 MB\n charities                                                     |   66 MB\n(truncated results for brevity)\n```\n\nThis command displays the size of each table and materialized view in the database, in MB. It is calculated by using the system administration function `pg_table_size()`, which includes the size of the main data fork, free space map, visibility map and TOAST data.\n\n### `table_indexes_size`\n\n```python\n\nPGExtras.query('table_indexes_size')\n\n                             table                             | indexes_size\n---------------------------------------------------------------+--------------\n learning_coaches                                              |    153 MB\n states                                                        |    125 MB\n charities_customers                                           |     93 MB\n charities                                                     |     16 MB\n grade_levels                                                  |     11 MB\n(truncated results for brevity)\n```\n\nThis command displays the total size of indexes for each table and materialized view, in MB. It is calculated by using the system administration function `pg_indexes_size()`.\n\n### `total_table_size`\n\n```python\n\nPGExtras.query('total_table_size')\n\n                             name                              |  size\n---------------------------------------------------------------+---------\n learning_coaches                                              |  349 MB\n states                                                        |  270 MB\n charities_customers                                           |  166 MB\n grade_levels                                                  |  122 MB\n charities                                                     |   82 MB\n(truncated results for brevity)\n```\n\nThis command displays the total size of each table and materialized view in the database, in MB. It is calculated by using the system administration function `pg_total_relation_size()`, which includes table size, total index size and TOAST data.\n\n### `unused_indexes`\n\n```python\n\nPGExtras.query('unused_indexes')\n\n          table      |                       index                | index_size | index_scans\n---------------------+--------------------------------------------+------------+-------------\n public.grade_levels | index_placement_attempts_on_grade_level_id | 97 MB      |           0\n public.observations | observations_attrs_grade_resources         | 33 MB      |           0\n public.messages     | user_resource_id_idx                       | 12 MB      |           0\n(3 rows)\n```\n\nThis command displays indexes that have \u003c 50 scans recorded against them, and are greater than 5 pages in size, ordered by size relative to the number of index scans. This command is generally useful for eliminating indexes that are unused, which can impact write performance, as well as read performance should they occupy space in memory.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#unused-indexes)\n\n### `null_indexes`\n\n```python\n\nPGExtras.query('null_indexes')\n\n   oid   |         index      | index_size | unique | indexed_column | null_frac | expected_saving\n---------+--------------------+------------+--------+----------------+-----------+-----------------\n  183764 | users_reset_token  | 1445 MB    | t      | reset_token    |   97.00%  | 1401 MB\n   88732 | plan_cancelled_at  | 539 MB     | f      | cancelled_at   |    8.30%  | 44 MB\n 9827345 | users_email        | 18 MB      | t      | email          |   28.67%  | 5160 kB\n\n```\n\nThis command displays indexes that contain `NULL` values. A high ratio of `NULL` values means that using a partial index excluding them will be beneficial in case they are not used for searching.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#null-indexes)\n\n### `seq_scans`\n\n```python\n\nPGExtras.query('seq_scans')\n\n\n               name                |  count\n-----------------------------------+----------\n learning_coaches                  | 44820063\n states                            | 36794975\n grade_levels                      | 13972293\n charities_customers               |  8615277\n charities                         |  4316276\n messages                          |  3922247\n contests_customers                |  2915972\n classroom_goals                   |  2142014\n(truncated results for brevity)\n```\n\nThis command displays the number of sequential scans recorded against all tables, descending by count of sequential scans. Tables that have very high numbers of sequential scans may be under-indexed, and it may be worth investigating queries that read from these tables.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#missing-indexes)\n\n### `long_running_queries`\n\n```python\n\nPGExtras.query('long_running_queries')\n\n\n  pid  |    duration     |                                      query\n-------+-----------------+---------------------------------------------------------------------------------------\n 19578 | 02:29:11.200129 | EXPLAIN SELECT  \"students\".* FROM \"students\"  WHERE \"students\".\"id\" = 1450645 LIMIT 1\n 19465 | 02:26:05.542653 | EXPLAIN SELECT  \"students\".* FROM \"students\"  WHERE \"students\".\"id\" = 1889881 LIMIT 1\n 19632 | 02:24:46.962818 | EXPLAIN SELECT  \"students\".* FROM \"students\"  WHERE \"students\".\"id\" = 1581884 LIMIT 1\n(truncated results for brevity)\n```\n\nThis command displays currently running queries, that have been running for longer than 5 minutes, descending by duration. Very long running queries can be a source of multiple issues, such as preventing DDL statements completing or vacuum being unable to update `relfrozenxid`.\n\n### `records_rank`\n\n```python\n\nPGExtras.query('records_rank')\n\n               name                | estimated_count\n-----------------------------------+-----------------\n tastypie_apiaccess                |          568891\n notifications_event               |          381227\n core_todo                         |          178614\n core_comment                      |          123969\n notifications_notification        |          102101\n django_session                    |           68078\n (truncated results for brevity)\n```\n\nThis command displays an estimated count of rows per table, descending by estimated count. The estimated count is derived from `n_live_tup`, which is updated by vacuum operations. Due to the way `n_live_tup` is populated, sparse vs. dense pages can result in estimations that are significantly out from the real count of rows.\n\n### `bloat`\n\n```python\n\nPGExtras.query('bloat')\n\n\n type  | schemaname |           object_name         | bloat |   waste\n-------+------------+-------------------------------+-------+----------\n table | public     | bloated_table                 |   1.1 | 98 MB\n table | public     | other_bloated_table           |   1.1 | 58 MB\n index | public     | bloated_table::bloated_index  |   3.7 | 34 MB\n table | public     | clean_table                   |   0.2 | 3808 kB\n table | public     | other_clean_table             |   0.3 | 1576 kB\n (truncated results for brevity)\n```\n\nThis command displays an estimation of table \"bloat\" – space allocated to a relation that is full of dead tuples, that has yet to be reclaimed. Tables that have a high bloat ratio, typically 10 or greater, should be investigated to see if vacuuming is aggressive enough, and can be a sign of high table churn.\n\n[More info](https://pawelurbanek.com/postgresql-fix-performance#bloat)\n\n### `vacuum_stats`\n\n```python\n\nPGExtras.query('vacuum_stats')\n\n schema |         table         | last_vacuum | last_autovacuum  |    rowcount    | dead_rowcount  | autovacuum_threshold | expect_autovacuum\n--------+-----------------------+-------------+------------------+----------------+----------------+----------------------+-------------------\n public | log_table             |             | 2013-04-26 17:37 |         18,030 |              0 |          3,656       |\n public | data_table            |             | 2013-04-26 13:09 |             79 |             28 |             66       |\n public | other_table           |             | 2013-04-26 11:41 |             41 |             47 |             58       |\n public | queue_table           |             | 2013-04-26 17:39 |             12 |          8,228 |             52       | yes\n public | picnic_table          |             |                  |             13 |              0 |             53       |\n (truncated results for brevity)\n```\n\nThis command displays statistics related to vacuum operations for each table, including an estimation of dead rows, last autovacuum and the current autovacuum threshold. This command can be useful when determining if current vacuum thresholds require adjustments, and to determine when the table was last vacuumed.\n\n### `kill_all`\n\n```python\n\nPGExtras.query('kill_all')\n\n```\n\nThis commands kills all the currently active connections to the database. It can be useful as a last resort when your database is stuck in a deadlock.\n\n### `extensions`\n\n```python\n\nPGExtras.query('extensions')\n\n```\n\nThis command lists all the currently installed and available PostgreSQL extensions.\n\n### `mandelbrot`\n\n```python\n\nPGExtras.query('mandelbrot')\n\n```\n\nThis command outputs the Mandelbrot set, calculated through SQL.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpawurb%2Fpython-pg-extras","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpawurb%2Fpython-pg-extras","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpawurb%2Fpython-pg-extras/lists"}