{"id":20217818,"url":"https://github.com/matrix-org/rust-synapse-compress-state","last_synced_at":"2025-05-16T09:06:33.596Z","repository":{"id":34611961,"uuid":"148348518","full_name":"matrix-org/rust-synapse-compress-state","owner":"matrix-org","description":"A tool to compress some state in a Synapse instance's database","archived":false,"fork":false,"pushed_at":"2025-04-29T09:08:55.000Z","size":380,"stargazers_count":156,"open_issues_count":39,"forks_count":33,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-04-29T10:24:28.279Z","etag":null,"topics":["matrix-org","synapse"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/synapse-auto-compressor/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/matrix-org.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-09-11T16:41:27.000Z","updated_at":"2025-04-29T09:08:52.000Z","dependencies_parsed_at":"2024-07-23T11:26:01.781Z","dependency_job_id":"a43c2026-ae4a-4d89-9eef-cda0c6cce29c","html_url":"https://github.com/matrix-org/rust-synapse-compress-state","commit_stats":{"total_commits":93,"total_committers":19,"mean_commits":4.894736842105263,"dds":0.6666666666666667,"last_synced_commit":"13882d7654b8045dfe28ba7473b1021eb405e60f"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matrix-org%2Frust-synapse-compress-state","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matrix-org%2Frust-synapse-compress-state/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matrix-org%2Frust-synapse-compress-state/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/matrix-org%2Frust-synapse-compress-state/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/matrix-org","download_url":"https://codeload.github.com/matrix-org/rust-synapse-compress-state/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254501558,"owners_count":22081528,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["matrix-org","synapse"],"created_at":"2024-11-14T06:35:47.223Z","updated_at":"2025-05-16T09:06:28.581Z","avatar_url":"https://github.com/matrix-org.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# Compress Synapse State Tables\n\nThis workspace contains experimental tools that attempt to reduce the number of\nrows in the `state_groups_state` table inside of a Synapse Postgresql database.\n\n# Automated tool: synapse_auto_compressor\n\n## Introduction:\n\nThis tool is significantly more simple to use than the manual tool (described below).\nIt scans through all of the rows in the `state_groups` database table from the start. When\nit finds a group that hasn't been compressed, it runs the compressor for a while on that\ngroup's room, saving where it got up to. After compressing a number of these chunks it stops,\nsaving where it got up to for the next run of the `synapse_auto_compressor`.\n\nIt creates three extra tables in the database: `state_compressor_state` which stores the\ninformation needed to stop and start the compressor for each room, `state_compressor_progress`\nwhich stores the most recently compressed state group for each room and `state_compressor_total_progress`\nwhich stores how far through the `state_groups` table the compressor has scanned.\n\nThe tool can be run manually when you are running out of space, or be scheduled to run\nperiodically.\n\n## Building\n\nThis tool requires `cargo` to be installed. See https://www.rust-lang.org/tools/install\nfor instructions on how to do this.\n\nThis project follows the deprecation policy of [Synapse](https://matrix-org.github.io/synapse/latest/deprecation_policy.html)\non Rust and will assume a recent stable version of Rust and the ability to fetch a more recent one if necessary.\n\nTo build `synapse_auto_compressor`, clone this repository and navigate to the\n`synapse_auto_compressor/` subdirectory. Then execute `cargo build`.\n\nThis will create an executable and store it in\n`synapse_auto_compressor/target/debug/synapse_auto_compressor`.\n\n## Example usage\n\nCompress 100 chunks of size 500 in a remote PostgreSQL database:\n```\n$ synapse_auto_compressor -p postgresql://user:pass@localhost/synapse -c 500 -n 100\n```\n\nCompress 100 chunks of size 500 using local PostgreSQL socket:\n```\n$ sudo -u postgres synapse_auto_compressor -p \"user=postgres dbname=matrix-synapse host=/var/run/postgresql\" -c 500 -n 100\n```\n\n## Running Options\n\n- -p [POSTGRES_LOCATION] **Required**\nThe configuration for connecting to the Postgres database. This should be of the form\n`\"postgresql://username:password@mydomain.com/database\"` or a key-value pair\nstring: `\"user=username password=password dbname=database host=mydomain.com\"`\nSee https://docs.rs/tokio-postgres/0.7.2/tokio_postgres/config/struct.Config.html\nfor the full details.\n\n- -c [CHUNK_SIZE] **Required**\nThe number of state groups to work on at once. All of the entries from state_groups_state are\nrequested from the database for state groups that are worked on. Therefore small chunk\nsizes may be needed on machines with low memory. Note: if the compressor fails to find\nspace savings on the chunk as a whole (which may well happen in rooms with lots of backfill\nin) then the entire chunk is skipped.\n\n- -n [CHUNKS_TO_COMPRESS] **Required**\n*CHUNKS_TO_COMPRESS* chunks of size *CHUNK_SIZE* will be compressed. The higher this\nnumber is set to, the longer the compressor will run for.\n\n- -l [LEVELS]\nSizes of each new level in the compression algorithm, as a comma-separated list.\nThe first entry in the list is for the lowest, most granular level, with each\nsubsequent entry being for the next highest level. The number of entries in the\nlist determines the number of levels that will be used. The sum of the sizes of\nthe levels affects the performance of fetching the state from the database, as the\nsum of the sizes is the upper bound on the number of iterations needed to fetch a\ngiven set of state. [defaults to \"100,50,25\"]\n\n## Scheduling the compressor\nThe automatic tool may put some strain on the database, so it might be best to schedule\nit to run at a quiet time for the server. This could be done by creating an executable\nscript and scheduling it with something like\n[cron](https://www.man7.org/linux/man-pages/man1/crontab.1.html).\n\n# Manual tool: synapse_compress_state\n\n## Introduction\n\nA manual tool that reads in the rows from `state_groups_state` and `state_group_edges`\ntables for a specified room and calculates the changes that could be made that\n(hopefully) will significantly reduce the number of rows.\n\nThis tool currently *does not* write to the database by default, so should be\nsafe to run. If the `-o` option is specified then SQL will be written to the\ngiven file that would change the tables to match the calculated state. (Note\nthat if `-t` is given then each change to a particular state group is wrapped\nin a transaction). If you do wish to send the changes to the database automatically\nthen the `-c` flag can be set.\n\nThe SQL generated is safe to apply against the database with Synapse running.\nThis is because the `state_groups` and `state_groups_state` tables are append-only:\nonce written to the database, they are never modified. There is therefore no danger\nof a modification racing against a running Synapse. Further, this script makes its\nchanges within atomic transactions, and each transaction should not affect the results\nfrom any of the queries that Synapse performs.\n\nThe tool will also ensure that the generated state deltas do give the same state\nas the existing state deltas before generating any SQL.\n\n## Building\n\nThis tool requires `cargo` to be installed. See https://www.rust-lang.org/tools/install\nfor instructions on how to do this.\n\nTo build `synapse_compress_state`, clone this repository and then execute `cargo build`.\n\nThis will create an executable and store it in `target/debug/synapse_compress_state`.\n\n## Example usage\n\n```\n$ synapse_compress_state -p \"postgresql://localhost/synapse\" -r '!some_room:example.com' -o out.sql -t\nFetching state from DB for room '!some_room:example.com'...\nGot initial state from database. Checking for any missing state groups...\nNumber of state groups: 73904\nNumber of rows in current table: 2240043\nNumber of rows after compression: 165754 (7.40%)\nCompression Statistics:\n  Number of forced resets due to lacking prev: 34\n  Number of compressed rows caused by the above: 17092\n  Number of state groups changed: 2748\nNew state map matches old one\n\n# It's finished, so we can now go and rewrite the DB\n$ psql synapse \u003c out.data\n```\n\n## Running Options\n\n- -p [POSTGRES_LOCATION] **Required**\nThe configuration for connecting to the Postgres database. This should be of the form\n`\"postgresql://username:password@mydomain.com/database\"` or a key-value pair\nstring: `\"user=username password=password dbname=database host=mydomain.com\"`\nSee https://docs.rs/tokio-postgres/0.7.2/tokio_postgres/config/struct.Config.html\nfor the full details.\n\n- -r [ROOM_ID] **Required**\nThe room to process (this is the value found in the `rooms` table of the database\nnot the common name for the room - it should look like: \"!wOlkWNmgkAZFxbTaqj:matrix.org\".\n\n- -b [MIN_STATE_GROUP]\nThe state group to start processing from (non-inclusive).\n\n- -n [GROUPS_TO_COMPRESS]\nHow many groups to load into memory to compress (starting\nfrom the 1st group in the room or the group specified by -b).\n\n- -l [LEVELS]\nSizes of each new level in the compression algorithm, as a comma-separated list.\nThe first entry in the list is for the lowest, most granular level, with each\nsubsequent entry being for the next highest level. The number of entries in the\nlist determines the number of levels that will be used. The sum of the sizes of\nthe levels affects the performance of fetching the state from the database, as the\nsum of the sizes is the upper bound on the number of iterations needed to fetch a\ngiven set of state. [defaults to \"100,50,25\"]\n\n- -m [COUNT]\nIf the compressor cannot save this many rows from the database then it will stop early.\n\n- -s [MAX_STATE_GROUP]\nIf a max_state_group is specified then only state groups with id's lower than this\nnumber can be compressed.\n\n- -o [FILE]\nFile to output the SQL transactions to (for later running on the database).\n\n- -t\nIf this flag is set then each change to a particular state group is wrapped in a\ntransaction. This should be done if you wish to apply the changes while synapse is\nstill running.\n\n- -c\nIf this flag is set then the changes the compressor makes will be committed to the\ndatabase. This should be safe to use while synapse is running as it wraps the changes\nto every state group in it's own transaction (as if the transaction flag was set).\n\n- -g\nIf this flag is set then output the node and edge information for the state_group\ndirected graph built up from the predecessor state_group links. These can be looked\nat in something like Gephi (https://gephi.org).\n\n\n# Running tests\n\nThere are integration tests for these tools stored in `compressor_integration_tests/`.\n\nTo run the integration tests, you first need to start up a Postgres database\nfor the library to talk to. There is a docker-compose file that sets one up\nwith all of the correct tables. The tests can therefore be run as follows:\n\n```\n$ cd compressor_integration_tests/\n$ docker-compose up -d\n$ cargo test --workspace\n$ docker-compose down\n```\n\n# Using the synapse_compress_state library\n\nIf you want to use the compressor in another project, it is recomended that you\nuse jemalloc `https://github.com/tikv/jemallocator`.\n\nTo prevent the progress bars from being shown, use the `no-progress-bars` feature.\n(See `synapse_auto_compressor/Cargo.toml` for an example)\n\n# Troubleshooting\n\n## Connecting to database\n\n### From local machine\n\nIf you setup Synapse using the instructions on https://matrix-org.github.io/synapse/latest/postgres.html\nyou should have a username and password to use to login to the postgres database. To run the compressor\nfrom the machine where Postgres is running, the url will be the following:\n\n`postgresql://synapse_user:synapse_password@localhost/synapse`\n\n### From remote machine\n\nIf you wish to connect from a different machine, you'll need to edit your Postgres settings to allow\nremote connections. This requires updating the\n[`pg_hba.conf`](https://www.postgresql.org/docs/current/auth-pg-hba-conf.html) and the `listen_addresses`\nsetting in [`postgresql.conf`](https://www.postgresql.org/docs/current/runtime-config-connection.html)\n\n## Printing debugging logs\n\nThe amount of output the tools produce can be altered by setting the RUST_LOG\nenvironment variable to something.\n\nTo get more logs when running the synapse_auto_compressor tool try the following:\n\n```\n$ RUST_LOG=debug synapse_auto_compressor -p postgresql://user:pass@localhost/synapse -c 50 -n 100\n```\n\nIf you want to suppress all the debugging info you are getting from the\nPostgres client then try:\n\n```\nRUST_LOG=synapse_auto_compressor=debug,synapse_compress_state=debug synapse_auto_compressor [etc.]\n```\n\nThis will only print the debugging information from those two packages. For more info see\nhttps://docs.rs/env_logger/0.9.0/env_logger/.\n\n## Building difficulties\n\nBuilding the `openssl-sys` dependency crate requires OpenSSL development tools to be installed,\nand building on Linux will also require `pkg-config`\n\nThis can be done on Ubuntu  with: `$ apt-get install libssl-dev pkg-config`\n\nNote that building requires quite a lot of memory and out-of-memory errors might not be\nobvious. It's recomended you only build these tools on machines with at least 2GB of RAM.\n\n## Auto Compressor skips chunks when running on already compressed room\n\nIf you have used the compressor before, with certain config options, the automatic tool will\nproduce lots of warnings of the form: `The compressor tried to increase the number of rows in ...`\n\nTo fix this, ensure that the chunk_size is set to at least the L1 level size (so if the level\nsizes are \"100,50,25\" then the chunk_size should be at least 100).\n\nNote: if the level sizes being used when rerunning are different to when run previously\nthis might lead to less efficient compression and thus chunks being skipped, but this shouldn't\nbe a large problem.\n\n## Compressor is trying to increase the number of rows\n\nBackfilling can lead to issues with compression. The synapse_auto_compressor will\nskip chunks it can't reduce the size of and so this should help jump over the backfilled\nstate_groups. Lots of state resolution might also impact the ability to use the compressor.\n\nTo examine the state_group hierarchy run the manual tool on a room with the `-g` option\nand look at the graphs.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatrix-org%2Frust-synapse-compress-state","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmatrix-org%2Frust-synapse-compress-state","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmatrix-org%2Frust-synapse-compress-state/lists"}