{"id":32139513,"url":"https://github.com/celerity/ndzip","last_synced_at":"2025-12-11T22:46:37.796Z","repository":{"id":37859429,"uuid":"292009161","full_name":"celerity/ndzip","owner":"celerity","description":"A High-Throughput Parallel Lossless Compressor for Scientific Data","archived":false,"fork":false,"pushed_at":"2023-01-22T20:08:32.000Z","size":643,"stargazers_count":73,"open_issues_count":7,"forks_count":16,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-12-07T07:01:17.171Z","etag":null,"topics":["compression","cuda","floating-point","gpu","simd","sycl"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/celerity.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-09-01T13:43:53.000Z","updated_at":"2025-10-24T02:30:04.000Z","dependencies_parsed_at":"2023-02-12T18:00:45.690Z","dependency_job_id":null,"html_url":"https://github.com/celerity/ndzip","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/celerity/ndzip","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celerity%2Fndzip","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celerity%2Fndzip/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celerity%2Fndzip/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celerity%2Fndzip/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/celerity","download_url":"https://codeload.github.com/celerity/ndzip/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/celerity%2Fndzip/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27672013,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-11T02:00:11.302Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compression","cuda","floating-point","gpu","simd","sycl"],"created_at":"2025-10-21T05:53:26.645Z","updated_at":"2025-12-11T22:46:37.767Z","avatar_url":"https://github.com/celerity.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ndzip: A High-Throughput Parallel Lossless Compressor for Scientific Data\n\nndzip compresses and decompresses multidimensional univariate grids of single- and double-precision IEEE 754\nfloating-point data. We implement\n\n- a single-threaded CPU compressor\n- an OpenMP-backed multi-threaded compressor\n- a SYCL-based GPU compressor (currently hipSYCL + NVIDIA only)\n- a CUDA-based GPU compressor\n\nAll variants generate and decode bit-identical compressed stream.\n\nndzip is currently a research project with the primary use case of speeding up distributed HPC applications by\nincreasing effective interconnect bandwidth.\n\n## Prerequisites\n\n- CMake \u003e= 3.15\n- Clang \u003e= 10.0.0\n- Linux (tested on x86_64 and POWER9)\n- Boost \u003e= 1.66\n- [Catch2](https://github.com/catchorg/Catch2) \u003e= 2.13.3 (optional, for unit tests and microbenchmarks)\n\n### Additionaly, for GPU support\n\n- CUDA \u003e= 11.0 (not officially compatible with Clang 10/11, but a lower version will optimize insufficiently!)\n- An Nvidia GPU with Compute Capability \u003e= 6.1\n- For the SYCL version: [hipSYCL](https://github.com/illuhad/hipSYCL) \u003e= `8756087f`\n\n## Building\n\nMake sure to set the right build type and enable the full instruction set of the target CPU architecture:\n\n```sh\n-DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=\"-march=native\"\n```\n\nIf unit tests and microbenchmarks should also be built, add\n\n```sh\n-DNDZIP_BUILD_TEST=YES\n```\n\nDepending on your system, you might have to configure the correct C/C++ compilers to use (Clang \u003e= 10.0 and\nGCC \u003e= 8.2 have been known to work in the past):\n\n```sh\n-DCMAKE_C_COMPILER=/path/to/cc -DCMAKE_CXX_COMPILER=/path/to/c++\n```\n\n### For GPU support with SYCL\n\n1. Build and install hipSYCL\n\n```\ngit clone https://github.com/illuhad/hipSYCL\ncd hipSYCL\ncmake -B build -DCMAKE_INSTALL_PREFIX=../hipSYCL-install -DWITH_CUDA_BACKEND=YES -DCMAKE_BUILD_TYPE=Release\ncmake --build build --target install -j\n```\n\n2. Build ndzip with SYCL\n\n```\ncmake -B build -DCMAKE_PREFIX_PATH='../hipSYCL-install/lib/cmake' -DHIPSYCL_PLATFORM=cuda -DCMAKE_CUDA_ARCHITECTURES=75 -DHIPSYCL_GPU_ARCH=sm_75 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=\"-U__FLOAT128__ -U__SIZEOF_FLOAT128__ -march=native\"\ncmake --build build -j\n```\n\nReplace `sm_75` and `75` with the string matching your GPU's Compute Capability. The `-U__FLOAT128__` define is required\ndue to an [open bug in Clang](https://bugs.llvm.org/show_bug.cgi?id=47559).\n\n### For GPU support with CUDA (experimental)\n\na) Either build ndzip with CUDA + NVCC ...\n\n```\ncmake -B build -DCMAKE_CUDA_ARCHITECTURES=75 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=\"-march=native\"\ncmake --build build -j\n```\n\nReplace `sm_75` and `75` with the string matching your GPU's Compute Capability.\n\nIf `CMAKE_CXX_COMPILER` was redefined above, you also need to specify the CUDA host compiler:\n\n```sh\n-DCMAKE_CUDA_HOST_COMPILER=/path/to/c++\n```\n\nb) ... or with CUDA + Clang\n\n```\ncmake -B build -DCMAKE_CUDA_COMPILER=\"$(which clang++)\" -DCMAKE_CUDA_ARCHITECTURES=75 -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS=\"-U__FLOAT128__ -U__SIZEOF_FLOAT128__ -march=native\"\ncmake --build build -j\n```\n\nThe `-U__FLOAT128__` define is required due to an [open bug in Clang](https://bugs.llvm.org/show_bug.cgi?id=47559).\n\n## Compressing and decompressing files\n\n```sh\nbuild/compress -n \u003csize\u003e -i \u003cuncompressed-file\u003e -o \u003ccompressed-file\u003e [-t float|double]\nbuild/compress -d -n \u003csize\u003e -i \u003ccompressed-file\u003e -o \u003cdecompressed-file\u003e [-t float|double]\n```\n\n`\u003csize\u003e` are one to three arguments depending on the dimensionality of the input grid. In the multi-dimensional case,\nthe first number specifies the width of the slowest-iterating dimension.\n\nBy default, `compress` uses the single-threaded CPU compressor. Passing `-e cpu-mt` or `-e sycl` / `-e cuda` selects the\nmulti-threaded CPU compressor or the GPU compressor if available, respectively.\n\n## Running unit tests\n\nOnly available if tests have been enabled during build.\n\n```sh\nbuild/encoder_test\nbuild/sycl_bits_test  # only if built with SYCL support\nbuild/sycl_ubench     # GPU microbenchmarks, only if built with SYCL support\nbuild/cuda_bits_test  # only if built with CUDA support\n```\n\n## See also\n\n- [Benchmarking ndzip](docs/benchmarking.md)\n\n## References\n\nIf you are using ndzip as part of your research, we kindly ask you to cite\n\n- Fabian Knorr, Peter Thoman, and Thomas Fahringer. \"ndzip: A High-Throughput Parallel Lossless Compressor for\n  Scientific Data\". In _2021 Data Compression Conference (DCC)_, IEEE,\n  2021. [[DOI]](https://doi.org/10.1109/DCC50243.2021.00018) [Preprint PDF](https://dps.uibk.ac.at/~fabian/publications/2021-ndzip-a-high-throughput-parallel-lossless-compressor-for-scientific-data.pdf)\n\n- Knorr, Fabian, Peter Thoman, and Thomas Fahringer. \"ndzip-gpu: efficient lossless compression of scientific floating-point data on GPUs\". In _SC'21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis_, ACM, 2021. [[DOI]](https://doi.org/10.1145/3458817.3476224) [[Preprint PDF]](https://dps.uibk.ac.at/~fabian/publications/2021-ndzip-gpu-efficient-lossless-compression-of-scientific-floating-point-data-on-gpus.pdf)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcelerity%2Fndzip","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcelerity%2Fndzip","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcelerity%2Fndzip/lists"}