{"id":26205198,"url":"https://github.com/davidepianca98/kmtfcompressor","last_synced_at":"2026-04-26T07:35:17.655Z","repository":{"id":70101299,"uuid":"461573031","full_name":"davidepianca98/kMTFCompressor","owner":"davidepianca98","description":"Context aware Move To Front Transform based compressor","archived":false,"fork":false,"pushed_at":"2023-02-27T16:52:58.000Z","size":234,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-12-27T17:22:49.429Z","etag":null,"topics":["compression-algorithm","entropy","kmer","move-to-front","stream-compression"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davidepianca98.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-02-20T18:07:59.000Z","updated_at":"2024-10-18T09:50:36.000Z","dependencies_parsed_at":"2023-02-21T19:45:35.826Z","dependency_job_id":null,"html_url":"https://github.com/davidepianca98/kMTFCompressor","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/davidepianca98/kMTFCompressor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidepianca98%2FkMTFCompressor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidepianca98%2FkMTFCompressor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidepianca98%2FkMTFCompressor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidepianca98%2FkMTFCompressor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davidepianca98","download_url":"https://codeload.github.com/davidepianca98/kMTFCompressor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davidepianca98%2FkMTFCompressor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32289926,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-26T06:26:00.361Z","status":"ssl_error","status_checked_at":"2026-04-26T06:25:58.791Z","response_time":129,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression-algorithm","entropy","kmer","move-to-front","stream-compression"],"created_at":"2025-03-12T04:37:28.914Z","updated_at":"2026-04-26T07:35:17.651Z","avatar_url":"https://github.com/davidepianca98.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# kMTFCompressor\nContext aware Move To Front Transform based compressor. Bachelor thesis project.\n\nC++ implementation of a compression algorithm based on the Move To Front (MTF) algorithm.\n\nThe thesis presents the theoretical basis and concepts, the implementation details and the experimental analysis comparing our algorithm with state-of-the-art compression software.\nThe result is a tunable and very fast stream algorithm with some theoretical guarantees. On the other hand the algorithm requires a good amount of memory to work well and cannot match the compression ratio of state-of-the-art compressors.\n\n## High level description (translated excerpt of the thesis)\n\nThe generalization of the Move To Front transform uses a predetermined number of symbol that precede the current symbol, to make a better prediction.\nThe classic MTF transform uses only the current symbol, which uses less memory and time for the execution but with worse results unless the input presents local correlations.\nIt is usually used after other transforms, for example in Bzip2 it is used after the Burrows-Wheeler Transform which increments its performance.\nThe aim is avoiding this type of block sorting transforms which are costly and require block elaboration of the input.\n\nThe algorithm maintains a list of the contexts seen up to this moment in the data stream, implemented as an hash table. For each of the contexts a dedicated MTF list is maintained.\nNew contexts might overwrite others, depending on the hash function, load factor and the collision resolution technique applied.\nThe transform is executed for each symbol in its specific context list, returning its index and moving it to the correct position so in front of all the other symbols.\nIf the symbol isn't in the list, a specific code is returned and it is inserted. The code is the maximum size of the list summed to the symbol (interpreted as an integer number).\nThis is possible because the maximum length of the list is limited to reduce the memory usage. This works well because the longer the contexts, the less the amount of distinct symbols usually come after them (at least in structured data).\nThe decoder is able to distinguish between list index and new symbol with a simple check with the value and the maximum size of the list.\n\n![alt text](images/MTF.png?raw=true)\n\nThe picture shows the distribution of the symbols in input vs the symbols in output on a sample text file.\nMost of the output values are indices of the MTF lists, values between 0 and 7 as the lists have length 8 in this example.\nThe other values are the first sends for every symbol of every context.\nThe resulting data stream is easier to compress because few symbols appear most of the time.\n\n## Additional compression steps\nOther steps have been implemented after the Context aware Move To Front Transform:\n  * Run-Length Encoding\n  * Adaptive Huffman\n  * Adaptive Arithmetic\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidepianca98%2Fkmtfcompressor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavidepianca98%2Fkmtfcompressor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavidepianca98%2Fkmtfcompressor/lists"}