{"id":50136281,"url":"https://github.com/bivex/cudahte","last_synced_at":"2026-05-23T22:03:23.645Z","repository":{"id":358745329,"uuid":"1242902554","full_name":"bivex/cudahte","owner":"bivex","description":null,"archived":false,"fork":false,"pushed_at":"2026-05-18T21:58:15.000Z","size":146,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-18T23:55:47.254Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bivex.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-18T21:39:09.000Z","updated_at":"2026-05-18T21:58:18.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/bivex/cudahte","commit_stats":null,"previous_names":["bivex/cudahte"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/bivex/cudahte","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bivex%2Fcudahte","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bivex%2Fcudahte/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bivex%2Fcudahte/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bivex%2Fcudahte/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bivex","download_url":"https://codeload.github.com/bivex/cudahte/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bivex%2Fcudahte/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33413624,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-23T18:09:33.147Z","status":"ssl_error","status_checked_at":"2026-05-23T18:09:31.380Z","response_time":53,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-23T22:03:19.140Z","updated_at":"2026-05-23T22:03:23.633Z","avatar_url":"https://github.com/bivex.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CUDA Code Smells Analyzer\n\nA Python-based static analysis tool designed to detect \"code smells\" and critical issues in NVIDIA CUDA (`.cu`, `.cuh`) files. \n\nThis project leverages [ANTLR4](https://www.antlr.org/) for accurate parsing of CUDA C++ grammar and is structured using **Clean Architecture** principles to ensure clear separation of concerns, high maintainability, and easy extensibility.\n\n## Features\n\n- Parse CUDA source code using an ANTLR4-generated AST.\n- Detect potentially unchecked CUDA API calls.\n- Detect naive potential memory leaks (mismatched `cudaMalloc` / `cudaFree` counts).\n- Clean Architecture (Domain, Application, Infrastructure layers).\n\n## Detected Code Smells\n\n| Rule Name | Description | Severity |\n| :--- | :--- | :--- |\n| **UncheckedCudaAPI** | CUDA API calls (e.g., `cudaMalloc`, `cudaMemcpy`, `cudaFree`, `cudaDeviceSynchronize`) should have their return values checked for errors. This rule detects if the call is not wrapped in a checking macro (like `CHECK`, `EXPECT`, `assert`) or assigned to a variable within 8 levels of the AST. | CRITICAL |\n| **PotentialMemoryLeak** | A naive heuristic that triggers if the number of `cudaMalloc` calls in a file is greater than the number of `cudaFree` calls. | WARNING |\n| **WarpDivergence** | Detects potential warp divergence by checking if an `if` statement's condition explicitly depends on `threadIdx` or `blockIdx` using equality or modulo operators. | CRITICAL |\n| **HostDeviceTransferInLoop** | Detects if `cudaMemcpy` is called inside an iteration statement (`for`, `while`, `do-while`), which leads to massive PCIe bottlenecking. | CRITICAL |\n| **SuboptimalGridBlock** | Checks kernel launch parameters `\u003c\u003c\u003cgrid, block\u003e\u003e\u003e`. Flags a warning if the `block` size is not a multiple of 32 (warp size), is hardcoded to `\u003c 128`, or if the `grid` is `1`. | WARNING |\n| **CudaDeviceSynchronizeInHotPath** | Detects if `cudaDeviceSynchronize()` is called inside a loop, which blocks the CPU from doing any other work and breaks async pipelines. | WARNING |\n| **SyncthreadsInDivergentCode** | Calling `__syncthreads()` inside a divergent branch (`if` statements depending on thread indices) can cause a deadlock. Ensures all threads reach the barrier. | CRITICAL |\n| **IntegerOverflowInIndex** | Detects global thread index calculation (`blockIdx.x * blockDim.x + threadIdx.x`) assigned to an `int`. For large arrays, this causes integer overflow. Recommends using `size_t`. | CRITICAL |\n| **KernelLaunchInLoop** | Detects kernel launches (`\u003c\u003c\u003c...\u003e\u003e\u003e`) inside loops. Launch overhead multiplies by iterations; recommends batching or using CUDA Graphs. | CRITICAL |\n| **DoubleUsage** | Detects usage of `double` precision. Double precision is significantly slower than float on most consumer GPUs. Recommends using `float` if precision is not critical. | WARNING |\n| **SlowMathFunction** | Detects standard math functions (e.g., `sin`, `cos`, `sqrt`) that might not be optimal. Recommends using intrinsic fast functions (e.g., `__sinf`) or compiling with `--use_fast_math`. | WARNING |\n| **MissingKernelErrorCheck** | Kernel launches `\u003c\u003c\u003c...\u003e\u003e\u003e` are asynchronous. Without calling `cudaGetLastError()` afterward, invalid grid/block configs fail silently. | CRITICAL |\n| **LargeSharedMemoryAllocation** | Detects static `__shared__` array allocations that are unusually large (\u003e8192 items) and might exceed the hardware 48KB limit without explicit dynamic opt-in. | CRITICAL |\n| **VolatileUsage** | Using `volatile` to synchronize threads or blocks across the GPU memory model is unsafe and undefined behavior. | CRITICAL |\n| **DefaultStreamUsage** | Detects kernel launches or `cudaMemcpyAsync` calls in the default (NULL) stream, which prevents concurrent execution with other operations. | WARNING |\n| **HardcodedDeviceId** | Detects `cudaSetDevice(0)` or other hardcoded IDs, which may fail on multi-GPU systems. | WARNING |\n| **DeprecatedAPI** | Detects usage of legacy CUDA APIs like `cudaThreadSynchronize` and recommends modern alternatives. | WARNING |\n| **UncoalescedMemoryAccess** | Detects non-coalesced memory access patterns like `ptr[threadIdx.x * stride]` where `stride \u003e 1`, which drastically reduces global memory throughput. | CRITICAL |\n| **SharedMemoryBankConflict** | Detects access to `__shared__` memory with a stride that is a multiple of 32, causing serialization of access within a warp. | WARNING |\n| **ArchitecturalCudaLeak** | Detects CUDA-specific code or headers inside Domain/Application layers, enforcing Clean Architecture boundaries. | CRITICAL |\n| **UnifiedMemoryWithoutPrefetch** | Detects `cudaMallocManaged` without explicit `cudaMemPrefetchAsync` in the same context, which triggers slow page faults. | WARNING |\n| **MissingBoundsCheckInKernel** | Detects kernel functions that compute a thread index and access arrays without a bounds guard (`if (tid \u003c N)`), causing out-of-bounds access when the grid is larger than the data. | CRITICAL |\n| **MissingSyncthreadsAfterSharedWrite** | Detects writes to `__shared__` memory followed by reads without an intervening `__syncthreads()`, causing data races and undefined values. | CRITICAL |\n| **IncorrectGridDimensionCalculation** | Detects grid dimension calculated with integer division (`N / blockSize`) instead of ceiling division, leaving trailing elements unprocessed. | WARNING |\n| **SharedMemoryUninitializedForAtomics** | Detects `__shared__` arrays used with atomic operations without prior zero-initialization, producing incorrect accumulation results. | WARNING |\n| **ConstantMemoryWrongCopyMethod** | Detects `cudaMemcpy` used with `__constant__` variables instead of `cudaMemcpyToSymbol`, which fails because constant memory resides in a separate address space. | WARNING |\n| **GlobalAtomicWithoutSharedIntermediate** | Detects atomic operations on global memory arrays without shared memory intermediate, causing severe thread contention when many threads compete for few locations. | WARNING |\n| **SynchronousMemcpyWithActiveStreams** | Detects synchronous `cudaMemcpy` when CUDA streams are active, which blocks the CPU and prevents copy/kernel overlap. Recommends `cudaMemcpyAsync`. | WARNING |\n| **MissingRestrictOnKernelPointers** | Detects kernel functions with multiple pointer parameters lacking `__restrict__` qualifiers, preventing compiler optimizations due to assumed pointer aliasing. | INFO |\n| **NonPowerOf2ReductionBlock** | Detects parallel reduction patterns where block size is not a power of 2, causing iterative halving (`i /= 2`) to skip elements and produce incorrect results. | WARNING |\n| **CudaEventResourceLeak** | Detects `cudaEventCreate` without a matching `cudaEventDestroy`, leaking finite GPU event resources. | WARNING |\n\n*Note: New rules can be easily added by implementing a new `CUDAParserVisitor` in `src/infrastructure/rules/` and registering it in `src/infrastructure/cli/main.py`.*\n\n## Architecture\n\nThis project strictly adheres to Clean Architecture:\n\n- **Domain (`src/domain/`)**: Contains core entities (`CodeSmell`, `Position`) and abstract ports (`CodeAnalyzerPort`). Contains NO dependencies on ANTLR or the CLI.\n- **Application (`src/application/`)**: Contains Use Cases (`AnalyzeFileUseCase`, `AnalyzeDirectoryUseCase`) that orchestrate the analysis workflow.\n- **Infrastructure (`src/infrastructure/`)**: \n  - **CLI**: The command-line interface (`main.py`).\n  - **Parsers**: `AntlrCudaAnalyzer`, an adapter that implements `CodeAnalyzerPort` and encapsulates the ANTLR parsing logic.\n  - **Rules**: Specific ANTLR visitors that traverse the AST to find code smells.\n\n## Setup and Installation\n\n1. **Clone the repository:**\n   Ensure you initialize the submodules to fetch the ANTLR grammar.\n   ```bash\n   git clone --recursive \u003crepository-url\u003e\n   cd \u003crepository-dir\u003e\n   ```\n\n2. **Set up a Python Virtual Environment:**\n   ```bash\n   python3 -m venv venv\n   source venv/bin/activate\n   ```\n\n3. **Install Dependencies:**\n   ```bash\n   pip install antlr4-python3-runtime==4.13.2\n   ```\n\n4. **(Optional) Re-generate the ANTLR Parser:**\n   If you modify the grammar in `parser/*.g4`, regenerate the Python parser using the `antlr4` tool:\n   ```bash\n   cd parser\n       antlr4 -Dlanguage=Python3 -visitor CUDALexer.g4 CUDAParser.g4\n       ```\n\n   ## Testing\n\n   The project includes a comprehensive test suite covering all detection rules.\n\n   ```bash\n   # Activate environment\n   source venv/bin/activate\n\n   # Run all tests\n   python3 -m unittest discover tests\n\n   # Run specific test suites\n   python3 tests/test_new_rules.py      # Unit tests for new rules\n   python3 tests/test_integration.py    # Integration test with PMP book examples\n   ```\n\n   ## Usage\nActivate your virtual environment before running the tool:\n\n```bash\nsource venv/bin/activate\n```\n\n### Analyze a Single File\n```bash\npython src/infrastructure/cli/main.py smells-file \u003cpath_to_file.cu\u003e\n```\n\n### Analyze a Directory\nRecursively scans for `.cu` and `.cuh` files.\n```bash\npython src/infrastructure/cli/main.py smells-dir \u003cpath_to_directory\u003e\n```\n\n### Example Output\n\n```text\n[CRITICAL] UncheckedCudaAPI at test.cu:6:4\n    CUDA API call 'cudaMalloc' is potentially unchecked. Always check the return value of CUDA API calls.\n\n[WARNING] PotentialMemoryLeak at test.cu:0:0\n    Found 1 'cudaMalloc' calls but only 0 'cudaFree' calls in file.\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbivex%2Fcudahte","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbivex%2Fcudahte","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbivex%2Fcudahte/lists"}