{"id":13474591,"url":"https://github.com/lattice/quda","last_synced_at":"2025-05-15T18:06:47.525Z","repository":{"id":1353192,"uuid":"1300564","full_name":"lattice/quda","owner":"lattice","description":"QUDA is a library for performing calculations in lattice QCD on GPUs.","archived":false,"fork":false,"pushed_at":"2025-03-31T14:18:39.000Z","size":106034,"stargazers_count":309,"open_issues_count":214,"forks_count":108,"subscribers_count":58,"default_branch":"develop","last_synced_at":"2025-03-31T23:33:40.043Z","etag":null,"topics":["c","c-plus-plus","cuda","gpu","mpi","multi-gpu","qcd"],"latest_commit_sha":null,"homepage":"https://lattice.github.io/quda","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lattice.png","metadata":{"files":{"readme":"README.md","changelog":"NEWS","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2011-01-27T21:11:16.000Z","updated_at":"2025-03-28T03:30:09.000Z","dependencies_parsed_at":"2024-03-27T23:22:19.151Z","dependency_job_id":"9ee5be62-14d4-4d50-a81f-26b3ad311011","html_url":"https://github.com/lattice/quda","commit_stats":{"total_commits":12120,"total_committers":99,"mean_commits":"122.42424242424242","dds":0.5308580858085808,"last_synced_commit":"a54595d45db1ef890fde497977ccb27125c758bc"},"previous_names":[],"tags_count":28,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lattice%2Fquda","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lattice%2Fquda/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lattice%2Fquda/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lattice%2Fquda/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lattice","download_url":"https://codeload.github.com/lattice/quda/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247755557,"owners_count":20990620,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","c-plus-plus","cuda","gpu","mpi","multi-gpu","qcd"],"created_at":"2024-07-31T16:01:13.421Z","updated_at":"2025-05-15T18:06:47.518Z","avatar_url":"https://github.com/lattice.png","language":"C++","readme":"# QUDA 1.1.0\n\n## Overview\n\nQUDA is a library for performing calculations in lattice QCD on graphics\nprocessing units (GPUs), leveraging NVIDIA's CUDA platform. The current\nrelease includes optimized Dirac operators and solvers for the following\nfermion actions:\n\n* Wilson \n* Clover-improved Wilson\n* Twisted mass (including non-degenerate pairs)\n* Twisted mass with a clover term \n* Staggered fermions\n* Improved staggered (asqtad or HISQ) \n* Domain wall (4-d or 5-d preconditioned)\n* Möbius fermion\n\nImplementations of CG, multi-shift CG, BiCGStab, BiCGStab(l), and\nDD-preconditioned GCR are provided, including robust mixed-precision\nvariants supporting combinations of double, single, half and quarter\nprecisions (where the latter two are 16-bit and 8-bit \"block floating\npoint\", respectively).  The library also includes auxiliary routines\nnecessary for Hybrid Monte Carlo, such as HISQ link fattening, force\nterms and clover- field construction.  Use of many GPUs in parallel is\nsupported throughout, with communication handled by QMP or MPI.\n\nQUDA includes an implementations of adaptive multigrid for the Wilson,\nclover-improved, twisted-mass and twisted-clover fermion actions.  We\nnote however that this is undergoing continued evolution and\nimprovement and we highly recommend using adaptive multigrid use the\nlatest develop branch.  More details can be found [here]\n(https://github.com/lattice/quda/wiki/Multigrid-Solver).\n\nSupport for eigen-vector deflation solvers is also included through\nthe Thick Restarted Lanczos Method (TRLM), and we offer an Implicitly\nRestarted Arnoldi for observing non-hermitian operator spectra.\nFor more details we refer the user to the wiki:\n[QUDA's eigensolvers]\n(https://github.com/lattice/quda/wiki/QUDA%27s-eigensolvers)\n[Deflating coarse grid solves in Multigrid]\n(https://github.com/lattice/quda/wiki/Multigrid-Solver#multigrid-inverter--lanczos)\n\n## Software Compatibility:\n\nThe library has been tested under Linux (CentOS 7 and Ubuntu 18.04)\nusing releases 10.1 through 11.4 of the CUDA toolkit.  Earlier versions\nof the CUDA toolkit will not work, and we highly recommend the use of\n11.x.  QUDA has been tested in conjunction with x86-64, IBM\nPOWER8/POWER9 and ARM CPUs.  Both GCC and Clang host compilers are\nsupported, with the minimum recommended versions being 7.x and 6, respectively.\nCMake 3.15 or greater to required to build QUDA.\n\nSee also Known Issues below.\n\n\n## Hardware Compatibility:\n\nFor a list of supported devices, see\n\nhttp://developer.nvidia.com/cuda-gpus\n\nBefore building the library, you should determine the \"compute\ncapability\" of your card, either from NVIDIA's documentation or by\nrunning the deviceQuery example in the CUDA SDK, and pass the\nappropriate value to the `QUDA_GPU_ARCH` variable in cmake.\n\nQUDA 1.1.0, supports devices of compute capability 3.0 or greater.\nQUDA is no longer supported on the older Tesla (1.x) and Fermi (2.x)\narchitectures.\n\nSee also \"Known Issues\" below.\n\n\n## Installation:\n\nIt is recommended to build QUDA in a separate directory from the\nsource directory.  For instructions on how to build QUDA using cmake\nsee this page\nhttps://github.com/lattice/quda/wiki/QUDA-Build-With-CMake. Note\nthat this requires cmake version 3.15 or later. You can obtain cmake\nfrom https://cmake.org/download/. On Linux the binary tar.gz archives\nunpack into a cmake directory and usually run fine from that\ndirectory.\n\nThe basic steps for building with cmake are:\n\n1. Create a build dir, outside of the quda source directory. \n2. In your build-dir run `cmake \u003cpath-to-quda-src\u003e` \n3. It is recommended to set options by calling `ccmake` in\nyour build dir. Alternatively you can use the `-DVARIABLE=value`\nsyntax in the previous step.\n4. run 'make -j \u003cN\u003e' to build with N\nparallel jobs. \n5. Now is a good time to get a coffee.\n\nYou are most likely to want to specify the GPU architecture of the\nmachine you are building for. Either configure QUDA_GPU_ARCH in step 3\nor specify e.g. -DQUDA_GPU_ARCH=sm_60 for a Pascal GPU in step 2.\n\n### Multi-GPU support\n\nQUDA supports using multiple GPUs through MPI and QMP, together with\nthe optional use of NVSHMEM GPU-initiated communication for improved\nstrong scaling of the Dirac operators.  To enable multi-GPU support\neither set `QUDA_MPI` or `QUDA_QMP` to ON when configuring QUDA\nthrough cmake.\n\nNote that in any case cmake will automatically try to detect your MPI\ninstallation. If you need to specify a particular MPI please set\n`MPI_C_COMPILER` and `MPI_CXX_COMPILER` in cmake.  See also\nhttps://cmake.org/cmake/help/v3.9/module/FindMPI.html for more help.\n\nFor QMP please set `QUDA_QMP_HOME` to the installation directory of QMP.\n\nFor more details see https://github.com/lattice/quda/wiki/Multi-GPU-Support\n\nTo enable NVSHMEM support set `QUDA_NVSHMEM` to ON, and set the\nlocation of the local NVSHMEM installation with `QUDA_NVSHMEM_HOME`.\nFor more details see\nhttps://github.com/lattice/quda/wiki/Multi-GPU-with-NVSHMEM\n\n### External dependencies\n\nThe eigen-vector solvers (eigCG and incremental eigCG) by default will\nuse Eigen, however, QUDA can be configured to use MAGMA if available\n(see https://github.com/lattice/quda/wiki/Deflated-Solvers for more\ndetails).  MAGMA is available from\nhttp://icl.cs.utk.edu/magma/index.html.  MAGMA is enabled using the\ncmake option `QUDA_MAGMA=ON`.\n\nVersion 1.1.0 of QUDA includes interface for the external (P)ARPACK\nlibrary for eigenvector computing. (P)ARPACK is available, e.g., from\nhttps://github.com/opencollab/arpack-ng.  (P)ARPACK is enabled using\nCMake option `QUDA_ARPACK=ON`. Note that with a multi-GPU option, the\nbuild system will automatically use PARPACK library.\n\nAutomatic download and installation of Eigen, (P)ARPACK, QMP and QIO\nis supported in QUDA through the CMake options QUDA_DOWNLOAD_EIGEN,\nQUDA_DOWNLOAD_ARPACK, and QUDA_DOWNLOAD_USQCD.\n\n### Application Interfaces\n\nBy default only the QDP and MILC interfaces are enabled.  For\ninterfacing support with QDPJIT, BQCD, CPS or TIFR; this should be\nenabled at by setting the corresponding `QUDA_INTERFACE_\u003capplication\u003e`\nvariable e.g., `QUDA_INTERFACE_BQCD=ON`.  To keep compilation time to\na minimum it is recommended to only enable those interfaces that are\nused by a given application.\n\n## Tuning\n\nThroughout the library, auto-tuning is used to select optimal launch\nparameters for most performance-critical kernels.  This tuning process\ntakes some time and will generally slow things down the first time a\ngiven kernel is called during a run.  To avoid this one-time overhead in\nsubsequent runs (using the same action, solver, lattice volume, etc.),\nthe optimal parameters are cached to disk.  For this to work, the\n`QUDA_RESOURCE_PATH` environment variable must be set, pointing to a\nwritable directory.  Note that since the tuned parameters are hardware-\nspecific, this \"resource directory\" should not be shared between jobs\nrunning on different systems (e.g., two clusters with different GPUs\ninstalled).  Attempting to use parameters tuned for one card on a\ndifferent card may lead to unexpected errors.\n\nThis autotuning information can also be used to build up a first-order\nkernel profile: since the autotuner measures how long a kernel takes\nto run, if we simply keep track of the number of kernel calls, from\nthe product of these two quantities we have a time profile of a given\njob run.  If `QUDA_RESOURCE_PATH` is set, then this profiling\ninformation is output to the file \"profile.tsv\" in this specified\ndirectory.  Optionally, the output filename can be specified using the\n`QUDA_PROFILE_OUTPUT` environment variable, to avoid overwriting\npreviously generated profile outputs.  In addition to the kernel\nprofile, a policy profile, e.g., collections of kernels and/or other\nalgorithms that are auto-tuned, is also output to the file\n\"profile_async.tsv\".  The policy profile for example includes\nthe entire multi-GPU dslash, whose style and order of communication is\nautotuned.  Hence while the dslash kernel entries appearing the kernel\nprofile do include communication time, the entries in the policy\nprofile include all constituent parts (halo packing, interior update,\ncommunication and exterior update).\n\n## Using the Library:\n\nInclude the header file include/quda.h in your application, link against\nlib/libquda.so, and study tests/invert_test.cpp (for Wilson, clover,\ntwisted-mass, or domain wall fermions) or\ntests/staggered_invert_test.cpp (for asqtad/HISQ fermions) for examples\nof the solver interface.  The various solver options are enumerated in\ninclude/enum_quda.h.\n\n\n## Known Issues:\n\n* When the auto-tuner is active in a multi-GPU run it may cause issues\nwith binary reproducibility of this run if domain-decomposition\npreconditioning is used. This is caused by the possibility of\ndifferent launch configurations being used on different GPUs in the\ntuning run simultaneously. If binary reproducibility is strictly\nrequired make sure that a run with active tuning has completed. This\nwill ensure that the same launch configurations for a given kernel is\nused on all GPUs and binary reproducibility.\n\n## Getting Help:\n\nPlease visit http://lattice.github.io/quda for contact information. Bug\nreports are especially welcome.\n\n\n## Acknowledging QUDA:\n\n[![DOI](https://zenodo.org/badge/1300564.svg)](https://zenodo.org/badge/latestdoi/1300564)\n  \nIf you find this software useful in your work, please cite:\n\nM. A. Clark, R. Babich, K. Barros, R. Brower, and C. Rebbi, \"Solving\nLattice QCD systems of equations using mixed precision solvers on GPUs,\"\nComput. Phys. Commun. 181, 1517 (2010) [arXiv:0911.3191 [hep-lat]].\n\nWhen taking advantage of multi-GPU support, please also cite:\n\nR. Babich, M. A. Clark, B. Joo, G. Shi, R. C. Brower, and S. Gottlieb,\n\"Scaling lattice QCD beyond 100 GPUs,\" International Conference for High\nPerformance Computing, Networking, Storage and Analysis (SC), 2011\n[arXiv:1109.2935 [hep-lat]].\n\nWhen taking advantage of adaptive multigrid, please also cite:\n\nM. A. Clark, B. Joo, A. Strelchenko, M. Cheng, A. Gambhir, and R. Brower,\n\"Accelerating Lattice QCD Multigrid on GPUs Using Fine-Grained\nParallelization,\" International Conference for High Performance\nComputing, Networking, Storage and Analysis (SC), 2016\n[arXiv:1612.07873 [hep-lat]].\n\nWhen taking advantage of block CG, please also cite:\n\nM. A. Clark, A. Strelchenko, A. Vaquero, M. Wagner, and E. Weinberg,\n\"Pushing Memory Bandwidth Limitations Through Efficient\nImplementations of Block-Krylov Space Solvers on GPUs,\"\nComput. Phys. Commun. 233 (2018), 29-40 [arXiv:1710.09745 [hep-lat]].\n\nWhen taking advantage of the Möbius MSPCG solver, please also cite:\n\nJiqun Tu, M. A. Clark, Chulwoo Jung, Robert Mawhinney, \"Solving DWF\nDirac Equation Using Multi-splitting Preconditioned Conjugate Gradient\nwith Tensor Cores on NVIDIA GPUs,\" published in the Platform of\nAdvanced Scientific Computing (PASC21) [arXiv:2104.05615[hep-lat]].\n\n\n## Authors:\n\n*  Ronald Babich (NVIDIA)\n*  Simone Bacchio (Cyprus)\n*  Michael Baldhauf (Regensburg)\n*  Kipton Barros (Los Alamos National Laboratory)\n*  Richard Brower (Boston University) \n*  Nuno Cardoso (NCSA) \n*  Kate Clark (NVIDIA)\n*  Michael Cheng (Boston University)\n*  Carleton DeTar (Utah University)\n*  Justin Foley (NIH)\n*  Arjun Gambhir (William and Mary)\n*  Marco Garofalo (HISKP, University of Bonn)\n*  Joel Giedt (Rensselaer Polytechnic Institute) \n*  Steven Gottlieb (Indiana University) \n*  Anthony Grebe (Fermilab)\n*  Kyriakos Hadjiyiannakou (Cyprus)\n*  Ben Hoerz (Intel)\n*  Leon Hostetler (Indiana University)\n*  Dean Howarth (Cahill Center for Astronomy and Astrophysics, Caltech)\n*  Hwancheol Jeong (Indiana University)\n*  Xiangyu Jiang (ITP, Chinese Academy of Sciences)\n*  Balint Joo (OLCF, Oak Ridge National Laboratory, formerly Jefferson Lab)\n*  Rohith Karur (UC Berkeley, Lawrence Berkeley National Laboratory)\n*  Hyung-Jin Kim (Samsung Advanced Institute of Technology)\n*  Bartosz Kostrzewa (HPC/A-Lab, University of Bonn)\n*  Damon McDougall (AMD)\n*  Colin Morningstar (Carnegie Mellon University)\n*  James Osborn (Argonne National Laboratory)\n*  Ferenc Pittler (Cyprus)\n*  Claudio Rebbi (Boston University) \n*  Eloy Romero (William and Mary)\n*  Hauke Sandmeyer (Bielefeld)\n*  Mario Schröck (INFN)\n*  Aniket Sen (HISKP, University of Bonn)\n*  Guochun Shi (NCSA)\n*  James Simone (Fermi National Accelerator Laboratory)\n*  Alexei Strelchenko (Fermi National Accelerator Laboratory)\n*  Jiqun Tu (NVIDIA)\n*  Carsten Urbach (HISKP, University of Bonn)\n*  Alejandro Vaquero (Utah University)\n*  Michael Wagman (Fermilab)\n*  Mathias Wagner (NVIDIA)\n*  Andre Walker-Loud (Lawrence Berkley Laboratory)\n*  Evan Weinberg (NVIDIA)\n*  Frank Winter (Jefferson Lab)\n*  Yi-Bo Yang (ITP, Chinese Academy of Sciences)\n\n\nPortions of this software were developed at the Innovative Systems Lab,\nNational Center for Supercomputing Applications\nhttp://www.ncsa.uiuc.edu/AboutUs/Directorates/ISL.html\n\nDevelopment was supported in part by the U.S. Department of Energy under\ngrants DE-FC02-06ER41440, DE-FC02-06ER41449, and DE-AC05-06OR23177; the\nNational Science Foundation under grants DGE-0221680, PHY-0427646,\nPHY-0835713, OCI-0946441, and OCI-1060067; as well as the PRACE project\nfunded in part by the EUs 7th Framework Programme (FP7/2007-2013) under\ngrants RI-211528 and FP7-261557.  Any opinions, findings, and\nconclusions or recommendations expressed in this material are those of\nthe authors and do not necessarily reflect the views of the Department\nof Energy, the National Science Foundation, or the PRACE project.\n","funding_links":[],"categories":["C++"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flattice%2Fquda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flattice%2Fquda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flattice%2Fquda/lists"}