{"id":21034431,"url":"https://github.com/zpoint/model-compressor","last_synced_at":"2025-12-27T12:08:07.350Z","repository":{"id":113778137,"uuid":"269117298","full_name":"zpoint/model-compressor","owner":"zpoint","description":"compressor json like orm models to binary format before cache to backend such as redis","archived":false,"fork":false,"pushed_at":"2020-08-04T10:09:34.000Z","size":126,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"develop","last_synced_at":"2025-01-20T15:53:58.867Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zpoint.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-03T14:57:15.000Z","updated_at":"2020-08-04T10:09:37.000Z","dependencies_parsed_at":"2023-03-13T23:01:38.028Z","dependency_job_id":null,"html_url":"https://github.com/zpoint/model-compressor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zpoint%2Fmodel-compressor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zpoint%2Fmodel-compressor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zpoint%2Fmodel-compressor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zpoint%2Fmodel-compressor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zpoint","download_url":"https://codeload.github.com/zpoint/model-compressor/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243475370,"owners_count":20296713,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T13:04:27.420Z","updated_at":"2025-12-27T12:08:07.296Z","avatar_url":"https://github.com/zpoint.png","language":"Python","readme":"# model-compressor\n\n\n## Problem\n\nFor relation database, we may build a big table and need to cache as much data as possible\n\nWe can directly stringify  our model and store to cache backend such as redis\n\n```json\n{ \"firstName\":\"Bill\" , \"lastName\":\"Gates\", \"house\": \"111\", \"married\": true, \"has_child\": false, \"id\": \"063dc500-cbb4-4512-acdd-240596567e65\"}\n```\n\nOr we can use the help of language built in serializer such as `pickle` in python\n\n```python3\npickle.dumps(a)\n'(dp0\\nVfirstName\\np1\\nVBill\\np2\\nsVhas_child\\np3\\nI00\\nsVlastName\\np4\\nVGates\\np5\\nsVmarried\\np6\\nI01\\nsVhouse\\np7\\nV111\\np8\\nsVid\\np9\\nV063dc500-cbb4-4512-acdd-240596567e65\\np10\\ns.'\n```\n\nBut what if we have a few hundreds of different fields ? And we have millions of hot records need to be cached ?\n\nWe need a few hundred GB of memory to store these hot data for each copy, Under the modern HA system, more than one copy need to be stored in different server\n\nIf we back up our data in different region, our bandwidth may suffer\n\n## Solution\n\nWhat if we compress our data to binary format before storing to cache backend ?\n\nSince our record is stored in database, the column is fixed, and half of the column is of type `boolean`, `datetime`, and `uuid`\n\n1. leave all the key fields, stores only the values\n2. `boolean` can be represented as 0 and 1, and 8 `boolean` fields can be grouped together as 1 byte \n3. `datetime` can be represented as a timestamp, a 4 bytes integer instead of `2020-06-01 12:11:00` 19 bytes characters\n4. `uuid` can be represented as two 8 bytes long integer, instead of `\"063dc500-cbb4-4512-acdd-240596567e65\"` 36 bytes string\n5. for string/unicode field\n   * For non english text, scan all words in database, get the high frequency words to build a word-binary map, and design a binary-unicode mixed format to represent these values(we can't lose any data)\n     * we can try deflate/LZW or other popular compress algorithm, but they may not work well in this single record-field situation\n   * For english text, huffman tree will be enough\n\nFor a real record,  I come out with the result of  25 times smaller than origin stringify record after doing manual compression\n\n\n\n## RoadMap\n\n* [ ] framework to support multiply ORM models and multiply cache backend in python\n\n* [ ] design pattern file, it can be generated from ORM models automatically or written manually\n* [ ] core compress model according to pattern file and source data, written in C/C++ to gain performance\n\n![design](./compress.png)\n\n\n\n## Usage\n\nTo be continued...","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzpoint%2Fmodel-compressor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzpoint%2Fmodel-compressor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzpoint%2Fmodel-compressor/lists"}