{"id":27011661,"url":"https://github.com/eth-cscs/spfft","last_synced_at":"2025-06-17T21:39:36.054Z","repository":{"id":36193326,"uuid":"204918071","full_name":"eth-cscs/SpFFT","owner":"eth-cscs","description":"Sparse 3D FFT library with MPI, OpenMP, CUDA and ROCm support","archived":false,"fork":false,"pushed_at":"2024-03-25T07:50:11.000Z","size":661,"stargazers_count":47,"open_issues_count":2,"forks_count":11,"subscribers_count":12,"default_branch":"master","last_synced_at":"2024-03-25T08:56:58.300Z","etag":null,"topics":["cuda","fft","fft-library","gpu-acceleration","hpc","mpi","rocm"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/eth-cscs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-28T11:41:55.000Z","updated_at":"2024-07-29T20:25:16.672Z","dependencies_parsed_at":"2024-03-25T08:48:32.959Z","dependency_job_id":"0fdabd96-32ca-417f-8e04-79a6c43d21e6","html_url":"https://github.com/eth-cscs/SpFFT","commit_stats":{"total_commits":112,"total_committers":4,"mean_commits":28.0,"dds":"0.044642857142857095","last_synced_commit":"aa6653f044dc8f6dbf5dc7befe45db7ce353938e"},"previous_names":[],"tags_count":21,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-cscs%2FSpFFT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-cscs%2FSpFFT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-cscs%2FSpFFT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/eth-cscs%2FSpFFT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/eth-cscs","download_url":"https://codeload.github.com/eth-cscs/SpFFT/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247173256,"owners_count":20896053,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","fft","fft-library","gpu-acceleration","hpc","mpi","rocm"],"created_at":"2025-04-04T11:36:25.214Z","updated_at":"2025-04-04T11:36:25.807Z","avatar_url":"https://github.com/eth-cscs.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![CI](https://github.com/eth-cscs/SpFFT/workflows/CI/badge.svg)](https://github.com/eth-cscs/SpFFT/actions?query=workflow%3ACI)\n[![conda-forge](https://img.shields.io/conda/vn/conda-forge/spfft.svg?style=flat)](https://anaconda.org/conda-forge/spfft)\n[![Documentation](https://readthedocs.org/projects/spfft/badge/?version=latest)](https://spfft.readthedocs.io/en/latest/?badge=latest)\n[![License](https://img.shields.io/badge/license-BSD-blue.svg)](https://raw.githubusercontent.com/eth-cscs/SpFFT/master/LICENSE)\n\n# SpFFT\nSpFFT - A 3D FFT library for sparse frequency domain data written in C++ with support for MPI, OpenMP, CUDA and ROCm.\n\nInspired by the need of some computational material science applications with spherical cutoff data in frequency domain, SpFFT provides Fast Fourier Transformations of sparse frequency domain data. For distributed computations with MPI, slab decomposition in space domain and pencil decomposition in frequency domain (sparse data within a pencil / column must be on one rank) is used.\n\n\u003cimg src=\"docs/images/sparse_to_dense.png\" alt=\"\" width=70% /\u003e\n\n***Fig. 1:*** Illustration of a transform, where data on each MPI rank is identified by color.\n\n### Design Goals\n- Sparse frequency domain input\n- Reuse of pre-allocated memory\n- Support for shifted indexing with centered zero-frequency\n- Optional parallelization and GPU acceleration\n- Unified interface for calculations on CPUs and GPUs\n- Support of Complex-To-Real and Real-To-Complex transforms, where the full hermitian symmetry property is utilized\n- C++, C and Fortran interfaces\n\n### Interface Design\nTo allow for pre-allocation and reuse of memory, the design is based on two classes:\n\n- **Grid**: Provides memory for transforms up to a given size.\n- **Transform**: Created with information on sparse input data and is associated with a *Grid*. Maximum size is limited by *Grid* dimensions. Internal reference counting to *Grid* objects guarantee a valid state until *Transform* object destruction.\n\nA transform can be computed in-place and out-of-place. Addtionally, an internally allocated work buffer can optionally be used for input / output of space domain data.\n\n### New Features in v1.0\n- Support for externally allocated memory for space domain data including in-place and out-of-place transforms\n- Optional asynchronous computation when using GPUs\n- Simplified / direct transform handle creation if no resource reuse through grid handles is required\n\n## Documentation\nDocumentation can be found [here](https://spfft.readthedocs.io/en/latest/).\n\n## Requirements\n- C++ Compiler with C++17 support. Supported compilers are:\n  - GCC 7 and later\n  - Clang 5 and later\n  - ICC 19.0 and later\n- CMake 3.18 and later (3.21 for ROCm)\n- Library providing a FFTW 3.x interface (FFTW3 or Intel MKL)\n- For multi-threading: OpenMP support by the compiler\n- For compilation with GPU support:\n  - CUDA 11.0 and later for Nvidia hardware\n  - ROCm 5.0 and later for AMD hardware\n\n## Installation\nThe build system follows the standard CMake workflow. Example:\n```console\nmkdir build\ncd build\ncmake .. -DSPFFT_OMP=ON -DSPFFT_MPI=ON -DSPFFT_GPU_BACKEND=CUDA -DSPFFT_SINGLE_PRECISION=OFF -DCMAKE_INSTALL_PREFIX=/usr/local\nmake -j8 install\n```\n\n### CMake options\n| Option                 | Default | Description                                                  |\n|------------------------|---------|--------------------------------------------------------------|\n| SPFFT_MPI              | ON      | Enable MPI support                                           |\n| SPFFT_OMP              | ON      | Enable multi-threading with OpenMP                           |\n| SPFFT_GPU_BACKEND      | OFF     | Select GPU backend. Can be OFF, CUDA or ROCM                 |\n| SPFFT_GPU_DIRECT       | OFF     | Use GPU aware MPI with GPUDirect                             |\n| SPFFT_SINGLE_PRECISION | OFF     | Enable single precision support                              |\n| SPFFT_STATIC           | OFF     | Build as static library                                      |\n| SPFFT_FFTW_LIB         | AUTO    | Library providing a FFTW interface. Can be AUTO, MKL or FFTW |\n| SPFFT_BUILD_TESTS      | OFF     | Build test executables for developement purposes             |\n| SPFFT_INSTALL          | ON      | Add library to install target                                |\n| SPFFT_FORTRAN          | OFF     | Build Fortran interface module                               |\n| SPFFT_BUNDLED_LIBS     | ON      | Download required libraries for building tests               |\n\n**_NOTE:_**  When compiling with CUDA or ROCM (HIP), the standard `CMAKE_CUDA_ARCHITECTURES` and `CMAKE_HIP_ARCHITECTURES` options should be defined as well. `HIP_HCC_FLAGS` is no longer in use.\n\n## Examples\nFurther exmples for C++, C and Fortran can be found in the \"examples\" folder.\n```cpp\n#include \u003ccomplex\u003e\n#include \u003ciostream\u003e\n#include \u003cvector\u003e\n\n#include \"spfft/spfft.hpp\"\n\nint main(int argc, char** argv) {\n  const int dimX = 2;\n  const int dimY = 2;\n  const int dimZ = 2;\n\n  std::cout \u003c\u003c \"Dimensions: x = \" \u003c\u003c dimX \u003c\u003c \", y = \" \u003c\u003c dimY \u003c\u003c \", z = \" \u003c\u003c dimZ \u003c\u003c std::endl\n            \u003c\u003c std::endl;\n\n  // Use default OpenMP value\n  const int numThreads = -1;\n\n  // Use all elements in this example.\n  const int numFrequencyElements = dimX * dimY * dimZ;\n\n  // Slice length in space domain. Equivalent to dimZ for non-distributed case.\n  const int localZLength = dimZ;\n\n  // Interleaved complex numbers\n  std::vector\u003cdouble\u003e frequencyElements;\n  frequencyElements.reserve(2 * numFrequencyElements);\n\n  // Indices of frequency elements\n  std::vector\u003cint\u003e indices;\n  indices.reserve(dimX * dimY * dimZ * 3);\n\n  // Initialize frequency domain values and indices\n  double initValue = 0.0;\n  for (int xIndex = 0; xIndex \u003c dimX; ++xIndex) {\n    for (int yIndex = 0; yIndex \u003c dimY; ++yIndex) {\n      for (int zIndex = 0; zIndex \u003c dimZ; ++zIndex) {\n        // init with interleaved complex numbers\n        frequencyElements.emplace_back(initValue);\n        frequencyElements.emplace_back(-initValue);\n\n        // add index triplet for value\n        indices.emplace_back(xIndex);\n        indices.emplace_back(yIndex);\n        indices.emplace_back(zIndex);\n\n        initValue += 1.0;\n      }\n    }\n  }\n\n  std::cout \u003c\u003c \"Input:\" \u003c\u003c std::endl;\n  for (int i = 0; i \u003c numFrequencyElements; ++i) {\n    std::cout \u003c\u003c frequencyElements[2 * i] \u003c\u003c \", \" \u003c\u003c frequencyElements[2 * i + 1] \u003c\u003c std::endl;\n  }\n\n  // Create local Grid. For distributed computations, a MPI Communicator has to be provided\n  spfft::Grid grid(dimX, dimY, dimZ, dimX * dimY, SPFFT_PU_HOST, numThreads);\n\n  // Create transform.\n  // Note: A transform handle can be created without a grid if no resource sharing is desired.\n  spfft::Transform transform =\n      grid.create_transform(SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX, dimY, dimZ, localZLength,\n                            numFrequencyElements, SPFFT_INDEX_TRIPLETS, indices.data());\n\n\n  ///////////////////////////////////////////////////\n  // Option A: Reuse internal buffer for space domain\n  ///////////////////////////////////////////////////\n\n  // Transform backward\n  transform.backward(frequencyElements.data(), SPFFT_PU_HOST);\n\n  // Get pointer to buffer with space domain data. Is guaranteed to be castable to a valid\n  // std::complex pointer. Using the internal working buffer as input / output can help reduce\n  // memory usage.\n  double* spaceDomainPtr = transform.space_domain_data(SPFFT_PU_HOST);\n\n  std::cout \u003c\u003c std::endl \u003c\u003c \"After backward transform:\" \u003c\u003c std::endl;\n  for (int i = 0; i \u003c transform.local_slice_size(); ++i) {\n    std::cout \u003c\u003c spaceDomainPtr[2 * i] \u003c\u003c \", \" \u003c\u003c spaceDomainPtr[2 * i + 1] \u003c\u003c std::endl;\n  }\n\n  /////////////////////////////////////////////////\n  // Option B: Use external buffer for space domain\n  /////////////////////////////////////////////////\n\n  std::vector\u003cdouble\u003e spaceDomainVec(2 * transform.local_slice_size());\n\n  // Transform backward\n  transform.backward(frequencyElements.data(), spaceDomainVec.data());\n\n  // Transform forward\n  transform.forward(spaceDomainVec.data(), frequencyElements.data(), SPFFT_NO_SCALING);\n\n  // Note: In-place transforms are also supported by passing the same pointer for input and output.\n\n  std::cout \u003c\u003c std::endl \u003c\u003c \"After forward transform (without normalization):\" \u003c\u003c std::endl;\n  for (int i = 0; i \u003c numFrequencyElements; ++i) {\n    std::cout \u003c\u003c frequencyElements[2 * i] \u003c\u003c \", \" \u003c\u003c frequencyElements[2 * i + 1] \u003c\u003c std::endl;\n  }\n\n  return 0;\n}\n```\n\n## Acknowledgements\nThis work was supported by:\n\n\n|![ethz](docs/images/logo_ethz.png) | [**Swiss Federal Institute of Technology in Zurich**](https://www.ethz.ch/) |\n|:----:|:----:|\n|![cscs](docs/images/logo_cscs.png) | [**Swiss National Supercomputing Centre**](https://www.cscs.ch/)            |\n|![max](docs/images/logo_max.png)  | [**MAterials design at the eXascale**](http://www.max-centre.eu) \u003cbr\u003e (Horizon2020, grant agreement MaX CoE, No. 824143) |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-cscs%2Fspfft","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Feth-cscs%2Fspfft","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Feth-cscs%2Fspfft/lists"}