{"id":23317961,"url":"https://github.com/chrisdalvit/gpu-matrix-transpose","last_synced_at":"2026-04-17T10:31:33.049Z","repository":{"id":268936393,"uuid":"905880095","full_name":"chrisdalvit/gpu-matrix-transpose","owner":"chrisdalvit","description":"Implementation and benchmarking of different matrix transpose with CUDA","archived":false,"fork":false,"pushed_at":"2024-12-19T19:22:03.000Z","size":198,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-07T04:52:50.290Z","etag":null,"topics":["c","cpp","cuda","cuda-kernels","cuda-programming","gpu-acceleration","gpu-computing","gpu-programming","matrix-transpose","nvidia-gpu"],"latest_commit_sha":null,"homepage":"https://chrisdalvit.github.io/gpu-matrix-transpose","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chrisdalvit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-19T17:53:06.000Z","updated_at":"2024-12-19T19:40:56.000Z","dependencies_parsed_at":"2024-12-19T20:34:38.842Z","dependency_job_id":null,"html_url":"https://github.com/chrisdalvit/gpu-matrix-transpose","commit_stats":null,"previous_names":["chrisdalvit/gpu-matrix-transpose"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/chrisdalvit/gpu-matrix-transpose","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdalvit%2Fgpu-matrix-transpose","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdalvit%2Fgpu-matrix-transpose/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdalvit%2Fgpu-matrix-transpose/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdalvit%2Fgpu-matrix-transpose/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chrisdalvit","download_url":"https://codeload.github.com/chrisdalvit/gpu-matrix-transpose/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chrisdalvit%2Fgpu-matrix-transpose/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31925309,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-17T10:19:20.377Z","status":"ssl_error","status_checked_at":"2026-04-17T10:19:18.682Z","response_time":62,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","cpp","cuda","cuda-kernels","cuda-programming","gpu-acceleration","gpu-computing","gpu-programming","matrix-transpose","nvidia-gpu"],"created_at":"2024-12-20T17:14:38.407Z","updated_at":"2026-04-17T10:31:33.023Z","avatar_url":"https://github.com/chrisdalvit.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GPU Matrix Transpose\n\nThis repository implements and benchmarks different matrix transpose algorithms for GPU's using CUDA. Definitely check out the [corresponding blog post](https://chrisdalvit.github.io/gpu-matrix-transpose)\n\n## Repository structure\nThe ```data``` folder contains the original benchmark data from the tested architectures that was used in the experimental analyses. \n\nThe ```lib``` folder contains C-functions used by all tested algorithms with the corresponding header file.\n\nThe ```src``` folder contains C files with the different algorithms for matrix transposition.\n\n## Setup project\nAfter cloning the repository you can run \n```\nmake\n```\nand the files in ```src``` are compiled, and the benchmark test is started and stored in the ```stats``` folder (that is created by the Makefile). __Waring: It can take a lot of time for the benchmarks to finish!__\n\nIf you only want to compile the files in ```src``` create a folder ```bin``` and run \n```\nmake compile_c\n```\nThis should compile all C files in ```src``` and store them into the ```bin``` folder without starting the benchmarks. Compiled binaries follow the naming convention of ```\u003cALGORITHM\u003e-\u003cOPTIMIZATION LEVEL\u003e```\n\nIn order to run the experiments on the Marzola cluster run \n```\nmake marzola\n```\nThe script compiles the source code and launches SLURM jobs on the cluster. Results are stored in the ```stats/``` folder.\n\n## Validate implementations\nThe correctness of the provided implementations can be verified by running the compiled binaries in 'debug mode'. After compilation you can run \n```\n./bin/\u003cBINARY\u003e \u003cMATRIX SIZE\u003e --debug\n```\nFor example \n```\n./bin/naive-0 2 --debug\n```\nShould output a randomly initialized matrix with dimension 2^2 and the corresponding transposed matrix. Additionaly the execution time and the effective bandwidth are displayed.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisdalvit%2Fgpu-matrix-transpose","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchrisdalvit%2Fgpu-matrix-transpose","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchrisdalvit%2Fgpu-matrix-transpose/lists"}