{"id":17777179,"url":"https://github.com/tudasc/cusan","last_synced_at":"2025-08-12T15:30:48.697Z","repository":{"id":259486996,"uuid":"859823352","full_name":"tudasc/cusan","owner":"tudasc","description":"A data race detector for CUDA C and C++ based on ThreadSanitizer","archived":false,"fork":false,"pushed_at":"2024-12-13T16:09:33.000Z","size":407,"stargazers_count":5,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-13T17:35:19.347Z","etag":null,"topics":["c","cpp","cuda","datarace","threadsanitizer"],"latest_commit_sha":null,"homepage":"","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tudasc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-19T10:42:57.000Z","updated_at":"2024-11-19T00:49:09.000Z","dependencies_parsed_at":"2024-12-13T17:23:26.581Z","dependency_job_id":"0c794086-0bfc-44aa-ab0c-b06469295915","html_url":"https://github.com/tudasc/cusan","commit_stats":{"total_commits":134,"total_committers":4,"mean_commits":33.5,"dds":0.6268656716417911,"last_synced_commit":"18fda7c4c12e10c0839ac59d83b4475417413cec"},"previous_names":["tudasc/cusan"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tudasc%2Fcusan","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tudasc%2Fcusan/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tudasc%2Fcusan/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tudasc%2Fcusan/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tudasc","download_url":"https://codeload.github.com/tudasc/cusan/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":229691822,"owners_count":18108507,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","cpp","cuda","datarace","threadsanitizer"],"created_at":"2024-10-26T23:05:28.779Z","updated_at":"2025-08-12T15:30:48.612Z","avatar_url":"https://github.com/tudasc.png","language":"C","readme":"# CuSan  \u0026middot; [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)\n\n\nCuSan \\[[CU24](#ref-cusan-2024)\\] is a tool for detecting data races between (asynchronous) CUDA calls and the host.\n\nTo achieve this, we analyze and instrument CUDA API usage in the target code during compilation with Clang/LLVM to track CUDA-specific memory accesses and synchronization semantics.\nOur runtime then exposes this information to [ThreadSanitizer](https://clang.llvm.org/docs/ThreadSanitizer.html) (packaged with Clang/LLVM) for final data race analysis.\n\n\n## Usage\n\nUsing CuSan involves two main steps:\n\n1. **Compile your code** with one of the CuSan compiler wrappers, su ch as `cusan-clang++` or `cusan-mpic++`. This process:\n   - Analyzes and instruments the CUDA API, including kernel calls and specific memory access semantics (r/w).\n   - Automatically adds ThreadSanitizer instrumentation (`-fsanitize=thread`).\n   - Links the CuSan runtime library.\n2. **Execute the target program** for data race analysis. Our runtime calls ThreadSanitizer to expose CUDA synchronization and memory access semantics.\n\n##### Compilation limitations\nCurrently, the compilation must be serialized, e.g., `make -j 1`, to ensure consistent kernel memory access information.\nOur analysis writes its kernel-specific data into a specific `.yaml` file during device side compilation (`env CUSAN_KERNEL_DATA_FILE` or wrapper argument `--cusan-kernel-data=`).\nThis file is subsequently read during the host side compilation.\n\n\n#### Example usage\nGiven the file [02_event.c](test/runtime/02_event.c), to detect CUDA data races, execute the following:\n\n```bash\n# Set explicit location of kernel memory access data file\n$ export CUSAN_KERNEL_DATA_FILE=kernel-data.yaml\n# Compile code with CuSan\n$ cusan-clang -O3 -g -x cuda -gencode arch=compute_70,code=sm_70 02_event.c -o event.exe\n$ export TSAN_OPTIONS=ignore_noninstrumented_modules=1\n$ ./event.exe\n```\n\n### Checking CUDA-aware MPI applications\nTo check CUDA-aware MPI applications, use the MPI correctness checker [MUST](https://hpc.rwth-aachen.de/must/) or preload our MPI interceptor `libCusanMPIInterceptor.so`. \nThe latter has very limited capabilities and is used mostly for internal testing.\nThese libraries call ThreadSanitizer with MPI-specific access semantics, ensuring that combined CUDA and MPI semantics are properly exposed to ThreadSanitizer for data race detection between dependent MPI and CUDA calls.\n\n#### Example usage for MPI\nGiven the file [03_cuda_to_mpi.c](test/runtime/03_cuda_to_mpi.c), execute the following:\n\n```bash\n$ cusan-mpic++ -O3 -g -x cuda -gencode arch=compute_70,code=sm_70  03_cuda_to_mpi.c -o cuda_to_mpi.exe\n$ LD_PRELOAD=/path/to/libCusanMPIInterceptor.so mpirun -n 2 ./cuda_to_mpi.exe\n```\n\n*Note*: To avoid false positives, you may need ThreadSanitizer suppression files.\nSee [suppression.txt](test/runtime/suppressions.txt), or refer to the [sanitizer special case lists documentation](https://clang.llvm.org/docs/SanitizerSpecialCaseList.html).\n\n\n#### Example report\nThe following is an example report for [03_cuda_to_mpi.c](test/runtime/03_cuda_to_mpi.c) of our test suite, where the necessary synchronization is missing:\n```c\nL.18  __global__ void kernel(int* arr, const int N)\n...\nL.53  int* d_data;\nL.54  cudaMalloc(\u0026d_data, size * sizeof(int));\nL.55\nL.56  if (world_rank == 0) {\nL.57    kernel\u003c\u003c\u003cblocksPerGrid, threadsPerBlock\u003e\u003e\u003e(d_data, size);\nL.58  #ifdef CUSAN_SYNC\nL.59    cudaDeviceSynchronize();  // CUSAN_SYNC needs to be defined\nL.60  #endif\nL.61    MPI_Send(d_data, size, MPI_INT, 1, 0, MPI_COMM_WORLD);\n```\n```\n==================\nWARNING: ThreadSanitizer: data race (pid=579145)\n  Read of size 8 at 0x7f1587200000 by main thread:\n    #0 main cusan/test/runtime/03_cuda_to_mpi.c:61:5 (03_cuda_to_mpi.c.exe+0xfad11)\n\n  Previous write of size 8 at 0x7f1587200000 by thread T6:\n    #0 __device_stub__kernel(int*, int) cusan/test/runtime/03_cuda_to_mpi.c:18:47 (03_cuda_to_mpi.c.exe+0xfaaed)\n\n  Thread T6 'cuda_stream 0' (tid=0, running) created by main thread at:\n    #0 cusan::runtime::Runtime::register_stream(cusan::runtime::Stream) \u003cnull\u003e (libCusanRuntime.so+0x3b830)\n    #1 main cusan/test/runtime/03_cuda_to_mpi.c:54:3 (03_cuda_to_mpi.c.exe+0xfabc7)\n\nSUMMARY: ThreadSanitizer: data race cusan/test/runtime/03_cuda_to_mpi.c:61:5 in main\n==================\nThreadSanitizer: reported 1 warnings\n```\n\n#### Caveats ThreadSanitizer and OpenMPI\nFor the Lichtenberg HPC system, some issues may arise when using ThreadSanitizer with OpenMPI 4.1.6:\n- Intel Compute Runtime requires specific environment flags, see [Intel Compute Runtime issue 376](https://github.com/intel/compute-runtime/issues/376):\n  ```bash\n  export NEOReadDebugKeys=1\n  export DisableDeepBind=1\n  ```\n- OpenMPI's memory interceptor may conflict with the sanitizer's., see [OpenMPI issue 12819](https://github.com/open-mpi/ompi/issues/12819). Need to disable *patcher*:\n  ```bash\n  export OMPI_MCA_memory=^patcher\n  ```\n\n### Using CuSan with CMake\nFor plain Makefiles, the wrapper replaces the Clang compiler variables, e.g., `CC` or `MPICC`. For CMake, during the configuration, it is advised to disable the wrapper temporarily. This is due to CMake executing internal compiler checks, where we do not need CuSan instrumentation:\n\n```bash\n# Temporarily disable wrapper with environment flag CUSAN_WRAPPER=OFF:\n$\u003e CUSAN_WRAPPER=OFF cmake -B build -DCMAKE_C_COMPILER=cusan-clang \n# Compile with cusan-clang:\n$\u003e cmake --build build --target install -- -j1\n```\n\n## Building CuSan\n\nCuSan is tested with LLVM version 14, 18 and 19, and CMake version \u003e= 3.20. Use CMake presets `develop` or `release`\nto build.\n\n### Dependencies\nCuSan was tested on the TUDa Lichtenberg II cluster with:\n- System modules: `1) gcc/11.2.0 2) cuda/11.8 3) openmpi/4.1.6 4) git/2.40.0 5) python/3.10.10 6) clang/14.0.6 or 6) clang/18.1.8`\n- The MPI dependency is optional\n- Optional external libraries: [TypeART](https://github.com/tudasc/TypeART/tree/v1.9.0b-cuda.1), FiberPool (both default off)\n- Testing: llvm-lit, FileCheck\n- GPU: Tesla T4 and Tesla V100 (mostly: arch=sm_70)\n\n### Build example\n\nCuSan uses CMake to build. Example build recipe (release build, installs to default prefix\n`${cusan_SOURCE_DIR}/install/cusan`)\n\n```sh\n$\u003e cd cusan\n$\u003e cmake --preset release\n$\u003e cmake --build build --target install --parallel\n```\n\n#### Build options\n\n| Option                        | Default | Description                                                                                                        |\n|-------------------------------|:-------:|--------------------------------------------------------------------------------------------------------------------|\n| `CUSAN_TYPEART`               | `OFF`   | Use TypeART library to track memory allocations.                                                                   |\n| `CUSAN_FIBERPOOL`             | `OFF`   | Use external library to efficiently manage fibers creation .                                                       |\n| `CUSAN_SOFTCOUNTER`           | `OFF`   | Runtime stats for calls to ThreadSanitizer and CUDA-callbacks. Only use for stats collection, not race detection.  |\n| `CUSAN_DEVICE_SYNC_CALLBACKS` | `OFF`   | Adds a callback after each CUDA sync call (device, stream, event) to our runtime including the calls return value. |\n| `CUSAN_SYNC_DETAIL_LEVEL`     | `ON`    | Analyze, e.g., memcpy and memcpyasync w.r.t. arguments to determine implicit sync.                                 |\n| `CUSAN_LOG_LEVEL_RT`          | `0`     | Granularity of runtime logger. 3 is most verbose, 0 is least. For release, set to 0.                               |\n| `CUSAN_LOG_LEVEL_PASS`        | `3`     | Granularity of pass plugin logger. 3 is most verbose, 0 is least. For release, set to 0.                           |\n\n### Development \n\nFor debugging, additional (hidden) build options and environment flags exists.\n\n\n#### Build options\n| Option                       | Default | Description                                                                                       |\n|------------------------------|:-------:|---------------------------------------------------------------------------------------------------|\n| `CUSAN_TEST_WORKAROUNDS`              |  `ON`  | Will set environment flags as described in **Caveats ThreadSanitizer and OpenMPI** for testing.                                      |\n\n#### Environment flags\n\n| Environment Flag                       | Default | Description                                                                                       |\n|------------------------------|:-------:|---------------------------------------------------------------------------------------------------|\n| `CUSAN_DUMP_HOST_IR`              |  -  | Dumps module IR of host side during compilation to stdout after our transformations. Unsupported with TypeART.                                      |\n| `CUSAN_DUMP_DEVICE_IR`              |  -  | Dumps module IR of device during compilation to stdout after our analysis. This includes the applied transformation *mem2reg*. Note: Device analysis happens before host. Unsupported with TypeART.                                     |\n\n## References\n\n\u003ctable style=\"border:0px\"\u003e\n\u003ctr\u003e\n    \u003ctd valign=\"top\"\u003e\u003ca name=\"ref-cusan-2024\"\u003e\u003c/a\u003e[CU24]\u003c/td\u003e\n    \u003ctd\u003eHück, Alexander and Ziegler, Tim and Schwitanski, Simon and Jenke, Joachim and Bischof, Christian,\n    \"Compiler-Aided Correctness Checking of CUDA-Aware MPI Applications\",\n    In \u003ci\u003eSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis\u003c/i\u003e,\n    pages 204-213. IEEE, 2024, doi: \u003ca href=https://doi.org/10.1109/SCW63240.2024.00032\u003e10.1109/SCW63240.2024.00032\u003c/a\u003e\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftudasc%2Fcusan","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftudasc%2Fcusan","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftudasc%2Fcusan/lists"}