{"id":18675648,"url":"https://github.com/miferreiro/cdap-cuda","last_synced_at":"2026-05-17T17:42:09.590Z","repository":{"id":102727747,"uuid":"274173260","full_name":"miferreiro/CDAP-CUDA","owner":"miferreiro","description":"CUDA exercises for the subject of \"Computación Distribuída e de Altas Prestacións\" in the Master Degree of Computer Engineering of the University of Vigo in 2020","archived":false,"fork":false,"pushed_at":"2020-06-22T15:21:38.000Z","size":593,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-18T12:08:35.301Z","etag":null,"topics":["c","cuda","scan"],"latest_commit_sha":null,"homepage":null,"language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/miferreiro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-22T15:21:14.000Z","updated_at":"2020-06-22T15:26:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"e675797a-78ef-4e10-aa57-c6dc8e4b04fb","html_url":"https://github.com/miferreiro/CDAP-CUDA","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/miferreiro/CDAP-CUDA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miferreiro%2FCDAP-CUDA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miferreiro%2FCDAP-CUDA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miferreiro%2FCDAP-CUDA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miferreiro%2FCDAP-CUDA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/miferreiro","download_url":"https://codeload.github.com/miferreiro/CDAP-CUDA/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/miferreiro%2FCDAP-CUDA/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266786798,"owners_count":23983871,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-24T02:00:09.469Z","response_time":99,"last_error":null,"robots_txt_status":null,"robots_txt_updated_at":null,"robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","cuda","scan"],"created_at":"2024-11-07T09:25:47.250Z","updated_at":"2026-05-17T17:41:59.576Z","avatar_url":"https://github.com/miferreiro.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Starting on CUDA\n\nThese three exercises were made in the subject of \"Computación Distribuída e de Altas Prestacións\" in the Master Degree of Computer Engineering of the University of Vigo in 2020\n\n### Introduction to CUDA\n\nImplement a program to do the next operations:\n1. Initialize a square matrix of 10.000 elements containing random values.\n2. Invoke a kernel with a grid composed by block of 16x16 threads.\n3. Each thread must square one element of the matrix and store it in a result matrix.\n4. The program will show the original matrix and the result matrix (or part of them) on the screen.\n\n### Asteroids field on CUDA\n\nThe exercise consists of simulating an asteroid field that interacts two by two by means of gravitational attraction. To generate data to work with, GenerateAsteroids.c program is provided that will create a data.txt file with randomly generated asteroid data. With the statement attach two files data256.txt and data1024.txt with data of 256 and 1024 asteroids, respectively, located in a 20 km-side bi-dimensional area with masses between 500 and 10 million tons.\n\nThis exercise is divided into 3 sections:\n\n#### Section A\n\nImplementation of asteroid field simulation using GPU computing resources is proposed. In this first version, a kernel composed of a single one-dimensional block of N threads (N is the number of asteroids) will be used.\n\nEach thread of the kernel will take care of one asteroid. It is recommended to implement the main loop of the simulation outside the kernel, that is, each call to the kernel will correspond to an iteration.\n\n#### Section B\n\nTo increase the level of parallelism, a kernel will be implemented in which each thread will calculate the interaction between two asteroids. Each thread calculate the acceleration variations on the x and y axis and accumulate them in the speeds. Comments concerning this implementation:\n\n1. It is not possible to perform the calculation using a kernel with only one block, because exceeds the maximum thread count per GPU block. For 256 recommended to use a two-dimensional grid of 8x8 blocks, each one of which consists of 32x32 threads. In the case of 1024 asteroids, it is recommended to use a two-dimensional grid of 32x32 blocks each of which shall be composed of 32x32 threads.\n\n2. Each thread will calculate the accelerations ax and ay, and will accumulate them in the velocities vx and vy of the asteroids. To do this, it is necessary to use an atomic operation (atomicAdd) that is implemented for double variables on GPUs with a computing capability greater than 6.0 (that of the practice team is 6.1).\n\nThe implementation thus carried out gives a running time in the  practices of about 11 seconds (256 asteroids) or 47 seconds (1024  asteroids).\n\n#### Section C\n\nIn the implementation of the previous section, each thread is in charge of calculating an interaction, both in the x-axis and in the y-axis. The program can be further optimized by increasing the degree of parallelism.\n\nIt is proposed to implement a kernel in which each block is three-dimensional, with a dimension equal to 2 in the z axis. The threads with z=0 will take care of the projections of the interactions in the x-axis, while the threads with z=1 will take care of the projections in the y-axis. If blocks of 32x32 threads are used, the maximum number of threads per block will be reached, so there will be two possibilities: (1) reduce the size of the blocks, or (2) use two-dimensional with a three-dimensional grid.\n\n### SCAN-based algorithm on CUDA\n\nAn algorithm is implemented on Cuda to apply some kind of process to a large vector. Using the SCAN algorithm, the processes can be performed in parallel. If the vector is large, the implementation will have to be divided into blocks.\n\nOnce the algorithm is implemented in blocks, compare the time it takes to process a large vector in series and using the scan algorithm on GPUs.\n\nIn addition, the same program is implemented using the thrust library.\n\nNOTE: Due to hardware limitations, the blocks will be one dimensional and 1024 thread size.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiferreiro%2Fcdap-cuda","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmiferreiro%2Fcdap-cuda","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmiferreiro%2Fcdap-cuda/lists"}