{"id":19867707,"url":"https://github.com/cwsmith/cudatesting","last_synced_at":"2026-06-13T10:32:44.605Z","repository":{"id":93888978,"uuid":"397296426","full_name":"cwsmith/cudaTesting","owner":"cwsmith","description":"simple test of cuda performance","archived":false,"fork":false,"pushed_at":"2021-08-23T13:54:03.000Z","size":25,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-01T00:39:14.047Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cwsmith.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-17T14:58:38.000Z","updated_at":"2021-08-23T13:54:06.000Z","dependencies_parsed_at":"2023-06-19T17:11:32.171Z","dependency_job_id":null,"html_url":"https://github.com/cwsmith/cudaTesting","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/cwsmith/cudaTesting","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cwsmith%2FcudaTesting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cwsmith%2FcudaTesting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cwsmith%2FcudaTesting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cwsmith%2FcudaTesting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cwsmith","download_url":"https://codeload.github.com/cwsmith/cudaTesting/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cwsmith%2FcudaTesting/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34281700,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T15:30:04.452Z","updated_at":"2026-06-13T10:32:44.580Z","avatar_url":"https://github.com/cwsmith.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"## contents\n\n- Makefile - a makefile; nothing more, nothing less\n- README.md - this file\n- simpleCUBLAS.cpp - test code, adapted from https://github.com/NVIDIA/cuda-samples.git @ 3342d60\n- helper_\\*.h - headers with helper functions \n\n## compile\n\nin general\n\n```\nmake SMS=\u003ccuda sm_##\u003e HOST_COMPILER=/path/to/host/c++/compiler\n```\n\n## run\n\n```\nfor i in 10 11 12 13 14; do ./simpleCUBLAS $((2**i)); done\n```\n\n## results\n\n### AiMOS\n\n- Rhel8\n- V100, 32GB\n- xl_r 16.1.1\n- CUDA 11.1\n\n```\n$ module load xl_r/16.1.1 spectrum-mpi/10.4 cuda/11.2\n$ make SMS=70 HOST_COMPILER=xlc++_r\n$ for i in 10 11 12 13 14; do ./simpleCUBLAS $((2**i)); done\n\nmatrix size 1048576\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 0.020986\nsimpleCUBLAS test passed.\nmatrix size 4194304\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 0.133679\nsimpleCUBLAS test passed (result not checked).\nmatrix size 16777216\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 0.998967\nsimpleCUBLAS test passed (result not checked).\nmatrix size 67108864\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 7.601731\nsimpleCUBLAS test passed (result not checked).\nmatrix size 268435456\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 61.422657\nsimpleCUBLAS test passed (result not checked).\n```\n\n- Rhel8\n- V100, 32GB\n- GCC 8.4.1\n- CUDA 11.1\n\n```\n$ module load gcc/8.4.1 spectrum-mpi/10.4 cuda/11.2\n$ make SMS=70 HOST_COMPILER=g++\n$ for i in 10 11 12 13 14; do ./simpleCUBLAS $((2**i)); done                                                                                                           \nmatrix size 1048576\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 0.022997\nsimpleCUBLAS test passed.\nmatrix size 4194304\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\n cublasSgemm time (seconds) 0.137055\nsimpleCUBLAS test passed (result not checked).\nmatrix size 16777216\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 0.980725\nsimpleCUBLAS test passed (result not checked).\nmatrix size 67108864\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 7.476899\nsimpleCUBLAS test passed (result not checked).\nmatrix size 268435456\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 60.654473\nsimpleCUBLAS test passed (result not checked).\n```\n\n### Summit \n\n- Rhel8\n- V100, 16GB\n- xl_r 16.1.1-10\n- CUDA 11.4.0 (CUDA 11.1 reported `nvcc fatal   : Unknown option '--threads'`)\n\n```\n$module load cuda/11.4.0\n$export CUDA_PATH=$CUDA_DIR\n$make SMS=70 HOST_COMPILER=xlc++_r\n#edit runSweepSummit.sh path and project settings\n$ bsub ./runSweepSummit.sh  \n#the results will be in sweep.\u003cjobid\u003e\nmatrix size 1048576\ncublasSgemm time (seconds) 0.115373\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\nsimpleCUBLAS test passed.\nmatrix size 4194304\ncublasSgemm time (seconds) 0.135333\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\nsimpleCUBLAS test passed (result not checked).\nmatrix size 16777216\ncublasSgemm time (seconds) 0.980264\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\nsimpleCUBLAS test passed (result not checked).\nmatrix size 67108864\ncublasSgemm time (seconds) 7.707417\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\nsimpleCUBLAS test passed (result not checked).\n\nmatrix size 268435456\ncublasSgemm time (seconds) 62.548403\nGPU Device 0: \"Volta\" with compute capability 7.0\n\nsimpleCUBLAS test running..\nsimpleCUBLAS test passed (result not checked).\n```\n\n\n\n### Cranium\n\n- 2060 Super, 8GB\n- GCC 7.3\n- CUDA 11.4 - CUDA 11.1 reported `nvcc fatal   : Unknown option '--threads'`\n\n```\n$ module load gcc/7.3.0-bt47fwr cuda/11.4\n$ make SMS=75 HOST_COMPILER=$CXX\n$ for i in 10 11 12 13 14; do ./simpleCUBLAS $((2**i)); done\nmatrix size 1048576\nGPU Device 0: \"Turing\" with compute capability 7.5\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 0.050958\nsimpleCUBLAS test passed.\nmatrix size 4194304\nGPU Device 0: \"Turing\" with compute capability 7.5\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 0.281673\nsimpleCUBLAS test passed (result not checked).\nmatrix size 16777216\nGPU Device 0: \"Turing\" with compute capability 7.5\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 1.808216\nsimpleCUBLAS test passed (result not checked).\nmatrix size 67108864\nGPU Device 0: \"Turing\" with compute capability 7.5\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 14.511971\nsimpleCUBLAS test passed (result not checked).\nmatrix size 268435456\nGPU Device 0: \"Turing\" with compute capability 7.5\n\nsimpleCUBLAS test running..\ncublasSgemm time (seconds) 123.049882\nsimpleCUBLAS test passed (result not checked).\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcwsmith%2Fcudatesting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcwsmith%2Fcudatesting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcwsmith%2Fcudatesting/lists"}