{"id":31052938,"url":"https://github.com/timanema/msc-thesis-public","last_synced_at":"2026-04-19T03:03:04.405Z","repository":{"id":308898559,"uuid":"1004806168","full_name":"timanema/msc-thesis-public","owner":"timanema","description":"Repository containing a GPU-accelerated compressor based on FSST","archived":false,"fork":false,"pushed_at":"2025-08-30T07:08:01.000Z","size":21777,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-15T01:37:11.960Z","etag":null,"topics":["compression","cpp","cuda","gpu","thesis"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timanema.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-19T07:56:50.000Z","updated_at":"2025-08-11T14:47:45.000Z","dependencies_parsed_at":"2025-08-08T15:44:36.331Z","dependency_job_id":null,"html_url":"https://github.com/timanema/msc-thesis-public","commit_stats":null,"previous_names":["timanema/msc-thesis-public"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/timanema/msc-thesis-public","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timanema%2Fmsc-thesis-public","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timanema%2Fmsc-thesis-public/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timanema%2Fmsc-thesis-public/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timanema%2Fmsc-thesis-public/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timanema","download_url":"https://codeload.github.com/timanema/msc-thesis-public/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timanema%2Fmsc-thesis-public/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31992823,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T20:23:30.271Z","status":"online","status_checked_at":"2026-04-19T02:00:07.110Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","cpp","cuda","gpu","thesis"],"created_at":"2025-09-15T01:36:57.550Z","updated_at":"2026-04-19T03:03:04.382Z","avatar_url":"https://github.com/timanema.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Thesis project Tim Anema\nThis GitHub repository contains my compression pipeline implementations and my evaluation data.\nIt can be used to validate the compression pipelines, check my results, and also to continue my work.\n\nNote that this repository does not include the version integrated with GSST, as this was done with a closed-source copy\nof the GSST decompressor. This version has a closed source license from Voltron Data.\n\n## Thesis abstract\nThis thesis presents a GPU-accelerated string compression algorithm based on FSST (_Fast Static Symbol Table_).\nThe proposed compressor leverages several advanced CUDA techniques to optimize performance, including a voting mechanism that maximizes memory bandwidth and an efficient gathering pipeline utilizing stream compaction.\nAdditionally, the algorithm uses GPU compute capacity to support a memory-efficient encoding table through a space-time tradeoff.\n\nThe compression task is parallelized by tiling input data and adapting the data layout.\nWe introduce multiple compression pipelines, each with distinct tradeoffs.\nTo maximize encoding kernel throughput, the design introduces sliding windows and output packing to optimize register use and maximize effective memory bandwidth.\nPipeline-level throughput is further enhanced by introducing pipelined transposition stages and stream compaction to remove intermediate padding efficiently.\n\nWe evaluate these pipelines across several benchmark datasets and compare the best-performing version against state-of-the-art GPU compression algorithms, including nvCOMP, GPULZ, and compressors generated using the LC framework.\nThe proposed compressor achieves a throughput of 74GB/s on an RTX4090 while maintaining compression ratios comparable to FSST.\nIn terms of compression ratio, it consistently outperforms ANS, Bitcomp, Cascaded, and GPULZ across all datasets.\nIts overall throughput exceeds that of GPULZ and all nvCOMP compressors except ANS, Bitcomp, Cascaded, and those produced by the LC framework.\nOur compressor lies on the Pareto frontier for all evaluated datasets, advancing the state-of-the-art toward ideal compression.\nIt achieves near-identical compression ratios to FSST (except for machine-readable datasets), while achieving a speedup of 42.06x.\nCompared to multithreaded CPU compression, it achieves a 6.45x speedup.\n\nTo assess end-to-end performance, we integrate the compressor with the GSST decompressor. The resulting (de)compression pipeline achieves a combined throughput of 55GB/s, outperforming uncompressed data transfer on links with a bandwidth up to 37.5 GB/s.\nIt also outperforms all state-of-the-art (de)compressors when the link bandwidth ranges between 3GB/s and 20GB/s.\n\nWhile further research is needed to enhance robustness and integrate the compressor into analytical engines, this work demonstrates a viable and Pareto-optimal alternative to existing string compression methods.\n\n## Instructions\nThe repository is organized as follows:\n```bash\n.\n├── data                    # Experimental results\n├── include                 # Header files\n│   ├── bench             # Benchmark files\n│   ├── compressors       # Actual compressors\n│   ├── fsst              # Modified version of FSST\n│   └── gtsst             # Encoding tables, symbols, shared code\n└── src                     # Source files\n    ├── bench\n    ├── compressors\n    └── fsst\n\n```\nEvery (interesting) compression pipeline will have tree header files: `*-compressor.cuh`, `*-defines.cuh`, and `*-encode.cuh`.\nThese contain the public methods, parameter definitions, and private definitions, respectively.\nAll compressors implements this template:\n```c++\nstruct CompressionManager\n{\n    virtual ~CompressionManager() = default;\n    virtual CompressionConfiguration configure_compression(size_t buf_size) = 0;\n    virtual GTSSTStatus compress(const uint8_t* src, uint8_t* dst, const uint8_t* sample_src, uint8_t* tmp,\n                                 CompressionConfiguration\u0026 config, size_t* out_size,\n                                 CompressionStatistics\u0026 stats) = 0;\n\n    virtual DecompressionConfiguration configure_decompression(size_t buf_size) = 0;\n\n    virtual DecompressionConfiguration configure_decompression_from_compress(\n        const size_t buf_size, CompressionConfiguration\u0026 config)\n    {\n        return DecompressionConfiguration{\n            .input_buffer_size = buf_size,\n            .decompression_buffer_size = config.input_buffer_size,\n        };\n    }\n\n    virtual GTSSTStatus decompress(const uint8_t* src, uint8_t* dst, DecompressionConfiguration\u0026 config,\n                                   size_t* out_size) = 0;\n\nprivate:\n    virtual GTSSTStatus validate_compression_buffers(const uint8_t* src, uint8_t* dst, uint8_t* tmp,\n                                                     CompressionConfiguration\u0026 config) = 0;\n};\n```\n\n### Building the project\nTo build the project, you need to at least have the [CUDA development library](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/) and CMake 3.25.2 installed, but a more complete C++/CUDA environment is recommended.\nYou can then build the project with the following commands:\n```bash\ncmake . -B build -DCMAKE_BUILD_TYPE=Release\ncd build/\nmake\n```\n\n### Running the project\nOnce you have build the project, the executable can simply be run with `./gtsst`.\n\nHowever, this will likely result in the following error:\n`Error: filesystem error: directory iterator cannot open directory: No such file or directory [../../thesis-testing/lineitem-1gb/]`\nThis is because, by default, the project uses this directory to load data.\nThe directories to use can be given as a program argument:\n`./gtsst ../../thesis-testing/lineitem-1gb/ ../../thesis-testing/lineitem-0.5gb/`\n\nBy default, the V5T pipeline is used to perform 100 compression iterations on all files in the given directories and \na single validation decompression. This can be changed by modifying the main.cu file:\n```c++\nint main(int argc, char* argv[]) {\n    const bool use_override = argc \u003e= 2;\n\n    // Set directories to use\n    std::vector\u003cstd::string\u003e directories = {\n         \"../../thesis-testing/lineitem-1gb/\",\n    };\n\n    if (use_override) {\n        directories.clear();\n\n        for (int i = 1; i \u003c argc; i++) {\n            directories.emplace_back(argv[i]);\n        }\n    }\n\n    // Uncomment the compressor you want to test (only one)\n    // Uncomment the compressor you want to test (only one)\n    // gtsst::compressors::CompactionV1Compressor compressor;\n    // gtsst::compressors::CompactionV2Compressor compressor;\n    // gtsst::compressors::CompactionV3Compressor compressor;\n    // gtsst::compressors::CompactionV4Compressor compressor;\n    // gtsst::compressors::CompactionV3TCompressor compressor;\n    // gtsst::compressors::CompactionV4TCompressor compressor;\n    gtsst::compressors::CompactionV5TCompressor compressor;\n    // gtsst::compressors::CompactionV5TGSSTCompressor compressor; \u003c---- NOT PUBLICLY DISTRIBUTED DUE TO CLOSED LICENSE FROM VOLTRON DATA\n\n    // Set bench settings\n    constexpr int compression_iterations = 100;\n    constexpr int decompression_iterations = 1;\n    constexpr bool strict_checking = false; // Exit program when a single decompression mismatch occurs, otherwise only report it\n\n    // Run benchmark (use_dir=true if all files in the directory must be used, otherwise uses first file only)\n    const bool match = gtsst::bench::full_cycle_directory(directories, false, compression_iterations,\n                                                          decompression_iterations, compressor, false, strict_checking);\n    if (!match) {\n        std::cerr \u003c\u003c \"Cycle data mismatch.\" \u003c\u003c std::endl;\n        return 1;\n    }\n\n    return 0;\n}\n```\nThe default directories can be modified, the compressor can be chosen, and the number of iterations can be selected.\n\nNote that in the current version we expect that the data does not contain the character `0xfe` (254).\nIf it does, it will break. This is expected behaviour and can be fixed relatively easily, but I didn't since it's not really an issue for text data (see thesis 'Future work'' section).\nYou can check if this is happening by running the compressor in debug mode: `cmake . -B build -DCMAKE_BUILD_TYPE=Debug`\n\n### Output data\nThe output will be in the following format:\n```\nStrict output checking is disabled, decompressed data might not match original data!\nencoding: 9.715\nencoding: 13.033\nencoding: 17.569\nencoding: 17.356\nencoding: 17.523\ndecomp: 32.744\ndecomp: 34.543\ndecomp: 35.853\ndecomp: 34.919\ndecomp: 34.763\nerror: 675819520 -\u003e 32 != 0\nerror: 675819521 -\u003e 97 != 0\nerror: 675819522 -\u003e 114 != 0\nCycles (5, 5) completed. Stats:\n\tParameters:\n\t\tBlock size: 2621440\n\t\tInput size: 988282880\n\t\tEffective table size: 62914560\n\t\tFile name: ../../thesis-testing/lineitem-1gb/lineitem-comments-1gb-1.txt\n\tCompression:\n\t\tDuration (us): 85116 \n\t\tThroughput (GB/s): 12.192\n\t\tCompressed size: 351846452\n\tDecompression:\n\t\tDuration (us): 28617\n\t\tThroughput (GB/s): 34.535\n\t\tRatio: 2.8088\n\tCompression phases:\n\t\tTable generation (GB/s, us): 182.880 (5404)\n\t\tPrecomputation (GB/s, us): 31880.093 (31)\n\t\tEncoding (GB/s, us): 14.235 (69428)\n\t\tPostprocessing (GB/s, us): 112.063 (8819)\n```\nThe first line indicates that the `strict_checking` was set to `false` (required for GSST).\nThen the individual encoding throughputs and decompression throughput will be reported for every iteration.\nIf there are any differences in the decompressed data compared to the original data, their location and values will be reported.\nFinally, a summary will be printed. This contains the (average) throughput for compression\n(and the individual stages) and decompression, as well as the compression ratio.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimanema%2Fmsc-thesis-public","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimanema%2Fmsc-thesis-public","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimanema%2Fmsc-thesis-public/lists"}