{"id":13703804,"url":"https://github.com/Senzing/postgresql-performance","last_synced_at":"2025-05-05T07:31:57.671Z","repository":{"id":39588636,"uuid":"449466831","full_name":"Senzing/postgresql-performance","owner":"Senzing","description":"Tweaks to PostgreSQL and the Senzing DDL","archived":false,"fork":false,"pushed_at":"2024-11-11T22:48:26.000Z","size":301,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":6,"default_branch":"main","last_synced_at":"2024-11-11T23:31:24.713Z","etag":null,"topics":["documentation","senzing-gdev"],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Senzing.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-01-18T22:24:09.000Z","updated_at":"2024-11-11T22:48:30.000Z","dependencies_parsed_at":"2023-12-18T16:02:47.116Z","dependency_job_id":"d4df6dbf-545c-40f5-bcc8-66c4bff72921","html_url":"https://github.com/Senzing/postgresql-performance","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fpostgresql-performance","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fpostgresql-performance/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fpostgresql-performance/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Senzing%2Fpostgresql-performance/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Senzing","download_url":"https://codeload.github.com/Senzing/postgresql-performance/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":224431295,"owners_count":17310086,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["documentation","senzing-gdev"],"created_at":"2024-08-02T21:01:00.263Z","updated_at":"2025-05-05T07:31:57.664Z","avatar_url":"https://github.com/Senzing.png","language":null,"funding_links":[],"categories":["Documentation"],"sub_categories":[],"readme":"# postgresql-performance\n\nThis repository is to document specific tweaks to PostgreSQL and the Senzing DDL that may be useful for larger installations.  Definitely add your own comments/experiences as GitHub issues in this repository.\n\nIf you haven't already taken a look at the general performance document, please do: https://github.com/Senzing/performance-general/blob/main/README.md\n\n\n## Fundamentals\nA DBA needs to tune PostgreSQL for the available hardware... shared_buffers, worker memory, IO, etc.  The one \"unusual\" thing about Senzing is that it largely runs in auto-commit mode, which means that commit performance has a lot to do with overall performance, 10x+ so.  You can check single connection insert performance with `G2Command` and the `checkDBPerf -s 3` command.  Ideally, you should get \u003c.5ms per insert or 6000 inserts in 3 seconds.  Many systems, even cloud systems, will achieve .1-.3ms.\n\nYou can also use psql to the PostgreSQL database from the same environment you are running the Senzing API to check network performance.  Use `\\timing` to enable timings and `select 1;` to check the roundtrip performance.  Since this does not leverage the Senzing schema or the database IO subsystem, if this is slow it is pure networking/connection overhead.\n\nThe primary configuration parameter to improve commit performance is to turn off disk flushes on commit with `synchronous_commit=off`.\n\nThere is more to pay attention to on your system though.  For instance, if replication is done synchronously then you end up with the same problem.  On AWS Aurora, replication to a different AZ forces synchronous commit back on.  As you are looking at the design of your database infrastructure, you will need to take these things into consideration.  To simplify things, customers will often do the initial historical loads without replication and set it up afterward when the DB performance needs tend to be much lower.\n\nSee the [hardware setup](test_hardware.md) used for this testing.\n\n\n## PostgreSQL 17\nI am liking PostgreSQL 17 a lot.  It looks to be the best release since 14.  I suspect the improvements are a result of the WAL efficiency improvements and streaming IO.\n\nThis was the first time I've been able to run a non-partitioned schema (used the index mods) and it finished faster than prioir (\u003c17) partitioned runs.  In the past, I've never let a non-partitioned run finish as the downtime from XID paused almost completely stalled processing.  With v17 I was still processing ~100M records a day on the test system even with the pauses.  This change was helpful:\n```\nautovacuum_work_mem = 1GB\n```\n\nA reminder, you do NOT partition for speed but only for operational needs.  With the partitioned schema, I see 20% more IO and the box is CPU saturated... resulting in 20% lower peak performance rates and even a select count(*) on a table gives a notable performance hit near the end.  I will likely be reducing the number of partitions in the default mods as v17 is performance better.\n\n\n## PostgreSQL 16\nFirst, I wouldn't move here yet.  I have been eager to try it as it allows you to do an explain on generic query plans which is precisely what the PostgreSQL optimizer typically uses for Senzing's prepared statements.  That particular feature was immediately valuable.\n\nIf you do move, make sure to set this to let autovacuum freely use the shared buffers.\n```\nvacuum_buffer_usage_limit=0\n```\n\n\n## PostgreSQL 14\nThis version has specific improvements to the handling of transactions and idle connections that can give substantial benefits for running OLTP applications like Senzing.  You can see if you are being impacted by the on previous versions of PostgreSQL by running `perf top` and looking for GetSnapshotData using significant CPU while the system is under load.  In my tests, this function was the largest consumer of CPU on the entire system.  This optimization is automatically enabled when you install 14.\n\nlz4 TOAST compression may be a small win as it has significantly higher compression speeds.  In theory, this should reduce latency.  It can be enabled with `default_toast_compression='lz4'` on a new system.\n\n\n## Autovacuum experiments\n\nTake a look [here](autovacuum.md) for some new work on autovacuum tuning with Senzing.\n\n\n## Recommended Senzing Configuration changes\nNormally, I don't change the Senzing default configuration unless I want to add data sources, features, keys (e.g., NAMEADDR_KEY), etc.  Starting with Senzing 3.8.0, I do recommend that people with large datasets make one change to NOT have NAME_KEYs create a redo.  Prior to 3.8, new configurations have this disabled by default.  The reason was simple: Senzing doesn't make decisions solely on name, and the majority of NAME_KEYs end up generic, so you generic 25-50x the amount of redo during processing.\n\nSo why did this change?  NAME_KEYs are actually based on only NAME or NAME in combination with other things (DOB, POSTAL_CODE, ADDR_CITY, LAST4_ID, etc).  Except in the NAME+DOB situation, we didn't make resolution or relationship decisions based on any of those combinations so it was nearly impossible anything might be sitting around based on a now generic value.  In fact, in the entire history of Senzing, I'm only aware of one instance where a customer noticed such a decision on NAME+DOB.\n\nWhat changed is that in 3.5 we started building relationships on close unique names to find more connections on smaller datasets where unique names are common.  We also have some people experimenting with resolving on some very loose/questionable criteria.  This caused [now] generic name-based decisions to be more common and in 3.8 a new configuration defaults to having redo enabled.\n\nWhy turn it off on large systems?  Simply put the negative far outweighs the questionable benefit.  With large data, 1) you probably aren't configuring loose decision making and 2) the velocity/volume of data easily quickly takes care of things like relationships on now generic names... so would you likely never see it, anything you did see would be something like a name only possible match (weak) that is now generic.\n\nThe change is simple.  Run G2ConfigTool.py and execute:\n```\nsetGenericThreshold {\"plan\": \"INGEST\", \"feature\": \"all\", \"behavior\": \"NAME\", \"candidateCap\": 10, \"scoringCap\":-1, \"sendToRedo\": \"No\"}\nsave\n```\n\n## Well JIT!\n![image](https://github.com/Senzing/postgresql-performance/assets/24964308/a1f8a41d-5863-4d8f-a4a0-29bc38689964)\n\nSo I got an error \"53200FATAL: fatal llvm error: Unable to allocate section memory!\" from PostgreSQL 14 at about 710M records into a test.  No problem, the consumers began to restart and the load hardly skipped a beat.  In that same minute, I googled the error and found that is the JIT compiler that is enabled by default using that memory.  My understanding is the JIT compiles reused SQL statements into binary code to more effectively execute.  It can be set off in the postgresql.conf with `jit=off` and `select pg_reload_conf()` can be used in psql to live reload that setting.  Immediately, performance went from a steady-state of about 870/s to 1500/s.\n\nNot believing this could be true, I turned it back on and immediately it dropped back down.  Then turned it off again and the performance went back up.\n\nFrom a database system behavior, the CPU for the select processes dropped dramatically while performance nearly doubled.  I'm still not sure why, but the test results are clear.\n\n## pg_stat_statements\nThis is really nice if you want to monitor what SQL statements the system is really spending time on and why.  Google how to enable it on your system.  I like to watch this SQL statement which consolidates types of Senzing SQL statements into one per row:\n\n```\nwatch -n 10 psql -p 5432 -U postgres -w -h 127.0.0.1 g2 -c \"\\\"select count(*), sum(calls) as calls, cast(sum(total_exec_time) as bigint) as total_exec_time, sum(rows) as rows, cast(sum(blk_read_time) as bigint) as blk_read_time, cast(sum(blk_write_time) as bigint) blk_write_time, sum(shared_blks_dirtied) as shared_blks_dirtied, sum(local_blks_dirtied) as local_blks_dirtied, sum(wal_records) as wal_records, cast(sum(io_time) as bigint) as io_time, trimmed_query from ( select  (case when strpos(query,'2') \u003e0 then left(query,strpos(query,'2')) else query end ) as trimmed_query, blk_read_time+blk_write_time as io_time,* from pg_stat_statements) group by trimmed_query order by io_time desc;\"\\\"\n```\n\n## BufferMapping and waits\nThere are nice wait queries that I like to run during loads to see what the database is waiting on.\n\n```\nwatch psql -U postgres -w -h 127.0.0.1 g2 -c \"\\\"select extract('epoch' from now()-xact_start) as duration, wait_event_type, wait_event, state, query from pg_stat_activity where state != 'idle' and (wait_event_type != '' or query like '%vacuum%') order by duration desc\\\"\"\nwatch psql -U postgres -w -h 127.0.0.1 g2 -c \"\\\"select count(*) as cnt, wait_event_type, wait_event from pg_stat_activity where state != 'idle'  and wait_event_type != '' group by wait_event_type, wait_event having count(*) \u003e 1 order by cnt desc\\\"\"\n```\n\nOne thing you will find is that once you are seeing lots of LWLock:BufferMapping waits, then adding more connections to the database is unlikely to scale.  PostgreSQL has 128 \"buffer partitions,\" and when a select is looking to allocate memory to return results, it locks one of those partitions.  This means that once your workload shows heavy LWLock:BufferMapping waits, you need to look at a few options to continue scaling:\n* Move to DB clustering: Using either https://senzing.zendesk.com/hc/en-us/articles/360010599254-Scaling-Out-Your-Database-With-Clustering or some tables may work well with more traditional database clustering\n* Reduce the size of selects: If you increased generics thresholds for ingest and/or have keys causing entities to be highly related, you may want to revisit it\n\nHere is an example of a load where the number of loading threads was changed from 768 to 384 to 192.  The dips are when the loaders were restarted with new settings.  Only at 192 did BufferMapping essentially disappear from being a wait event with performance dropping \u003c5%, which could likely be recovered by increasing the threads slightly.\n![image](https://github.com/Senzing/postgresql-performance/assets/24964308/de79e8f2-96f3-41b7-87cb-7ca00071ef43)\n\nThe other side effect of monitoring BufferMapping is with autovacuum.  Autovacuum (before v16) leverages those same buffers, and contention on them severely impacts the ability of the autovacuum to keep up.  In the load above, the autovacuum was taking several times longer when there was contention.\n\n\n## Partitioning\nPartitioning can be very effective for Senzing.  Autovacuum, backup, and restore are all single-threaded operations per table in PostgreSQL.  By partitioning Senzing, you can achieve substantially better parallelization of these operations.  Some obvious tables to partition are:\n\n```\nRES_FEAT_EKEY\nRES_FEAT_STAT\nRES_ENT\nDSRC_RECORD\n```\n\nIn this repository, you will find a `partitioning_mods.sql` file for the latest.\n\n\n## Governor\nRecommend setting the thresholds to 1.2B/1.5B to allow for more time to vacuum.  Also, the smaller difference in the values can help prevent the cost of an expensive \"double vacuum\" where a \"pause\" is needed immediately after the initial vacuum as the XID is not dropped far enough.  I saw a reduction from 2-4 hours to \u003c1hr in wait time on average by doing this.\n\nThe best setting for you may be different depending on the system you have.  I run pretty aggressively like:\n```\nsynchronous_commit=off\n\nlock_timeout = 500000\nidle_in_transaction_session_timeout=600000\n\ncheckpoint_timeout = 2min\ncheckpoint_completion_target = 0.9\nmax_wal_size = 80GB\n\nfull_page_writes = off\nwal_init_zero = off\nwal_level = minimal\nwal_writer_delay = 10000ms\nwal_recycle = off\nmax_wal_senders = 0\n\neffective_io_concurrency = 1000\nmaintenance_io_concurrency = 1000\nmax_parallel_maintenance_workers = 16\nmax_parallel_workers_per_gather = 16\nmax_worker_processes = 16\nmax_parallel_workers = 16\n\nautovacuum_max_workers=16\nautovacuum_vacuum_cost_limit = 10000\nvacuum_cost_page_hit = 0\t\t# 0-10000 credits\nvacuum_cost_page_miss = 1\t\t# 0-10000 credits\nvacuum_cost_page_dirty = 1\t\t# 0-10000 credits\nvacuum_freeze_table_age=1000000000\nvacuum_freeze_min_age=200000000\nautovacuum_freeze_max_age = 1200000000\nautovacuum_multixact_freeze_max_age = 1500000000\nautovacuum_vacuum_scale_factor = 0.01\nautovacuum_vacuum_insert_scale_factor = 0.01\nautovacuum_vacuum_cost_delay = 0\nautovacuum_naptime = 1min\n\n\ndefault_toast_compression = 'lz4'       # 'pglz' or 'lz4'\nenable_seqscan = off\nrandom_page_cost = 1.1\n```\n\n\n## Auto-vacuuming\nKeep the system in regular vacuum as much as possible.  The aggressive vacuum causes massive IO, making the cost 100x more expensive.  Partitioning of the hottest tables can help regular autovacuum keep up longer.\n\n\n## Fillfactor\nWhen PostgreSQL updates a record it creates a new version (a copy) of the record with the update.  If this can be done 1) without modifying an index and 2) with putting the copy in the same page as the old version, then this change does not contribute to additional vacuum workload.  In fact, the old copies can even be cleaned up during select operations.\n\nThe problem is that PostgreSQL by default fills 100% of a page in a table before splitting.  This means that there likely won't be room for this operation and some Senzing tables are updated frequently.  The negative of reducing the fillfactor is that it may increase disk space.  You may want to experiment with this yourself but for performance runs, I set the following:\n\n```\nALTER TABLE RES_RELATE SET ( fillfactor = 100 );\nALTER TABLE LIB_FEAT SET ( fillfactor = 100 );\nALTER TABLE RES_FEAT_STAT SET ( fillfactor = 90 );\nALTER TABLE RES_FEAT_EKEY SET ( fillfactor = 90 );\nALTER TABLE RES_ENT SET ( fillfactor = 90 );\nALTER TABLE OBS_ENT SET ( fillfactor = 75 );\nALTER TABLE RES_ENT_OKEY SET ( fillfactor = 75 );\nALTER TABLE DSRC_RECORD SET ( fillfactor = 90 );\n```\n\nNOTE: If you have partitioned tables, this must be done on each partition.\n\n\n## Memory issues\nWhen trying 450M records in the heavily partitioned schema, I found Postgresql triggering the Linux OOM killer around 300M records.  This would happen repeatedly but made no sense as this dedicated DB server has 1.5TB RAM and 100GB of shared_buffers.  In doing some reviews, it appears that the kernel overcommit settings/algorithms just aren't good for this.  Oddly, the issue did not occur with lesser partitioning with the exact same data.\n\nSetting these kernel parameters resolved the issue:\n\n```\nvm.overcommit_memory=2\nvm.overcommit_ratio=90\n```\n\n## LUKS disk encryption\nI do my performance runs with full disk encryption using Linux LUKS on LVM, mdraid0, etc.  This tries to characterize real world and not ideal workloads.  There are some parameters to the crypt devices that can be helpful and more coming in newer kernels.\n\n** Note that I tried this and it works really well for a short period of time.  The problem is that the newer settings make the encryption happen in process.  This works great EXCEPT for kernel page flushing and checkpointing which are single process/thread operations.  The database can actually run out of space because WAL logs fill up the disk.  Depending on your write performance it may be beneficial to set no_read_workqueue but leave the write queue alone.\n\n```\nUbuntu 20.04 w/ 5.4 kernel\ncryptsetup --allow-discards --persistent refresh \u003cdevice\u003e\n\nNewer kernels (to be tested):\ncryptsetup --allow-discards --perf-no_read_workqueue --perf-no_write_workqueue --persistent refresh \u003cdevice\u003e\n```\n\n## IO concurrency\n```\neffective_io_concurrency = 1000\nmaintenance_io_concurrency = 1000\n```\nI tend to use the above settings.  It is also important to set the block device read-ahead too.  I will set the flash devices to 16 and DM devices to 256.  Since our access pattern is very random, I generally don't like read-ahead at all BUT PostgreSQL vacuum performs 3-4x better with readahead since it is a heavy sequential scan operation.  Hopefully PostgreSQL one day will support Direct IO and AsyncIO. Something like this:\n\n```\nblockdev --report\nblockdev --setra 256 /dev/dm-* ## can also probably leave this as what the OS defaults to\nblockdev --setra 16 /dev/nvme*n1\nblockdev --report\n```\n\nDrop extra tables and indexes\n```\nDROP TABLE DSRC_RECORD_HKEY, LIB_FEAT_HKEY, OBS_ENT_SKEY, RES_ENT_RKEY, RES_FEAT_LKEY;\nDROP TABLE OBS_FEAT_EKEY; -- ONLY IF THE TABLE IS COMPLETELY EMPTY\nDROP INDEX DSRC_RECORD_HK; -- 3.6 and newer\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSenzing%2Fpostgresql-performance","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FSenzing%2Fpostgresql-performance","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FSenzing%2Fpostgresql-performance/lists"}