{"id":13513483,"url":"https://github.com/postgrespro/zson","last_synced_at":"2025-04-05T13:09:36.241Z","repository":{"id":50717501,"uuid":"69675750","full_name":"postgrespro/zson","owner":"postgrespro","description":"ZSON is a PostgreSQL extension for transparent JSONB compression","archived":false,"fork":false,"pushed_at":"2023-04-14T20:26:36.000Z","size":126,"stargazers_count":549,"open_issues_count":0,"forks_count":21,"subscribers_count":37,"default_branch":"master","last_synced_at":"2025-03-28T10:02:07.499Z","etag":null,"topics":["compression","extensions","json","jsonb","postgresql"],"latest_commit_sha":null,"homepage":"http://eax.me/postgresql-extensions/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/postgrespro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-09-30T14:53:56.000Z","updated_at":"2025-03-06T10:31:17.000Z","dependencies_parsed_at":"2024-01-13T19:23:29.632Z","dependency_job_id":"25fac425-4bcc-4eba-887f-c674f0319c86","html_url":"https://github.com/postgrespro/zson","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fzson","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fzson/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fzson/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postgrespro%2Fzson/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/postgrespro","download_url":"https://codeload.github.com/postgrespro/zson/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247339158,"owners_count":20923014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","extensions","json","jsonb","postgresql"],"created_at":"2024-08-01T05:00:28.570Z","updated_at":"2025-04-05T13:09:36.216Z","avatar_url":"https://github.com/postgrespro.png","language":"C","funding_links":[],"categories":["C","Utilities"],"sub_categories":[],"readme":"# ZSON\n\n![ZSON Logo](img/zson-logo.png)\n\n## About\n\nZSON is a PostgreSQL extension for transparent JSONB compression. Compression is\nbased on a shared dictionary of strings most frequently used in specific JSONB\ndocuments (not only keys, but also values, array elements, etc).\n\nIn some cases ZSON can save half of your disk space and give you about 10% more\nTPS. Memory is saved as well. See [docs/benchmark.md](docs/benchmark.md).\nEverything depends on your data and workload, though. Don't believe any\nbenchmarks, re-check everything on your data, configuration, hardware, workload\nand PostgreSQL version.\n\nZSON was originally created in 2016 by [Postgres Professional][pgpro] team:\nresearched and coded by [Aleksander Alekseev][me]; ideas, code review, testing,\netc by [Alexander Korotkov][ak] and [Teodor Sigaev][ts].\n\n[me]: https://eax.me/\n[ak]: https://akorotkov.github.io/\n[ts]: http://www.sigaev.ru/\n[pgpro]: https://postgrespro.com/\n\nSee also discussions on [pgsql-general@][gen], [Hacker News][hn], [Reddit][rd]\nand [HabraHabr][habr].\n\n[gen]: https://www.postgresql.org/message-id/flat/20160930185801.38654a1c%40e754\n[hn]: https://news.ycombinator.com/item?id=12633486\n[rd]: https://www.reddit.com/r/PostgreSQL/comments/55mr4r/zson_postgresql_extension_for_transparent_jsonb/\n[habr]: https://habr.com/ru/company/postgrespro/blog/312006/\n\n## Install\n\nBuild and install ZSON:\n\n```\ncd /path/to/zson/source/code\nmake\nsudo make install\n```\n\nRun tests:\n\n```\nmake installcheck\n```\n\nConnect to PostgreSQL:\n\n```\npsql my_database\n```\n\nEnable extension:\n\n```\ncreate extension zson;\n```\n\n## Uninstall\n\nDisable extension:\n\n```\ndrop extension zson;\n```\n\nUninstall ZSON:\n\n```\ncd /path/to/zson/source/code\nsudo make uninstall\n```\n\n## Usage\n\nFirst ZSON should be *trained* on common data using zson\\_learn procedure:\n\n```\nzson_learn(\n    tables_and_columns text[][],\n    max_examples int default 10000,\n    min_length int default 2,\n    max_length int default 128,\n    min_count int default 2\n)\n```\n\nExample:\n\n```\nselect zson_learn('{{\"table1\", \"col1\"}, {\"table2\", \"col2\"}}');\n```\n\nYou can create a temporary table and write some common JSONB documents into it\nmanually or use the existing tables. The idea is to provide a subset of real\ndata.  Let's say some document *type* is twice as frequent as another document\ntype.  ZSON expects that there will be twice as many documents of the first type\nas those of the second one in a learning set.\n\nResulting dictionary could be examined using this query:\n\n```\nselect * from zson_dict;\n```\n\nNow ZSON type could be used as a complete and transparent replacement of JSONB\ntype:\n\n```\nzson_test=# create table zson_example(x zson);\nCREATE TABLE\n\nzson_test=# insert into zson_example values ('{\"aaa\": 123}');\nINSERT 0 1\n\nzson_test=# select x -\u003e 'aaa' from zson_example;\n-[ RECORD 1 ]-\n?column? | 123\n```\n\n## Migrating to a new dictionary\n\nWhen a schema of JSONB documents evolves ZSON could be *re-learned*:\n\n```\nselect zson_learn('{{\"table1\", \"col1\"}, {\"table2\", \"col2\"}}');\n```\n\nThis time *second* dictionary will be created. Dictionaries are cached in memory\nso it will take about a minute before ZSON realizes that there is a new\ndictionary. After that old documents will be decompressed using the old\ndictionary and new documents will be compressed and decompressed using the new\ndictionary.\n\nTo find out which dictionary is used for a given ZSON document use zson\\_info\nprocedure:\n\n```\nzson_test=# select zson_info(x) from test_compress where id = 1;\n-[ RECORD 1 ]---------------------------------------------------\nzson_info | zson version = 0, dict version = 1, ...\n\nzson_test=# select zson_info(x) from test_compress where id = 2;\n-[ RECORD 1 ]---------------------------------------------------\nzson_info | zson version = 0, dict version = 0, ...\n```\n\nIf **all** ZSON documents are migrated to the new dictionary the old one could\nbe safely removed:\n\n```\ndelete from zson_dict where dict_id = 0;\n```\n\nIn general, it's safer to keep old dictionaries just in case. Gaining a few KB\nof disk space is not worth the risk of losing data.\n\n## When it's a time to re-learn?\n\nUnfortunately, it's hard to recommend a general approach.\n\nA good heuristic could be:\n\n```\nselect pg_table_size('tt') / (select count(*) from tt)\n```\n\n... i.e. average document size. When it suddenly starts to grow it's time to\nre-learn.\n\nHowever, developers usually know when they change a schema significantly. It's\nalso easy to re-check whether the current schema differs a lot from the original\none using zson\\_dict table.\n\n## Known limitations \n\nIntalling ZSON in a schema other than `public` is not supported (i.e. `CREATE EXTENSION zson WITH SCHEMA ...`).\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Fzson","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpostgrespro%2Fzson","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostgrespro%2Fzson/lists"}