{"id":22349324,"url":"https://github.com/bd2720/accesspatterns","last_synced_at":"2026-05-16T08:02:45.546Z","repository":{"id":266453485,"uuid":"799681701","full_name":"bd2720/AccessPatterns","owner":"bd2720","description":"Comparing chunked vs. striped memory access patterns for CPU and GPU code using the CUDA toolkit in C.","archived":false,"fork":false,"pushed_at":"2024-05-12T21:12:26.000Z","size":2,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-26T11:22:07.845Z","etag":null,"topics":["c","cache","cuda","cuda-toolkit","performance-analysis","performance-testing","profiling"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bd2720.png","metadata":{"files":{"readme":"README.txt","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-12T21:11:07.000Z","updated_at":"2024-07-18T03:50:53.000Z","dependencies_parsed_at":"2024-12-04T11:08:00.515Z","dependency_job_id":"bcf14166-135c-4750-80b8-d94fc3033be3","html_url":"https://github.com/bd2720/AccessPatterns","commit_stats":null,"previous_names":["bd2720/accesspatterns"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bd2720/AccessPatterns","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bd2720%2FAccessPatterns","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bd2720%2FAccessPatterns/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bd2720%2FAccessPatterns/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bd2720%2FAccessPatterns/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bd2720","download_url":"https://codeload.github.com/bd2720/AccessPatterns/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bd2720%2FAccessPatterns/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278219817,"owners_count":25950358,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-03T02:00:06.070Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c","cache","cuda","cuda-toolkit","performance-analysis","performance-testing","profiling"],"created_at":"2024-12-04T11:07:58.613Z","updated_at":"2025-10-03T20:15:46.260Z","avatar_url":"https://github.com/bd2720.png","language":"Cuda","funding_links":[],"categories":[],"sub_categories":[],"readme":"access-cpu: Uses pthreads to demonstrate how chunked memory access is faster\nthan striped access on the CPU. This is because threads are scheduled for\na period of time on the CPU, where each one is scheduled after the next. This\nmeans cache usage is maximized in a given thread when memory is accessed in\na sequential pattern (chunked). Striped access is slow because it only allows\na given thread to access a fraction (1 / NTHREADS) of each cache line.\n\naccess-gpu: Uses CUDA to demonstrate how striped memory access is faster\nthan chunked access on the GPU. This is because GPU threads execute together\non a per-block basis. Since they share the same cache, an interleaved (striped) \nmemory access pattern will allow all threads in a block to read from the same\ncache line.\n\nGeneral Findings:\n\npthread speedup 1 -\u003e 10 (bad access):\t1.15x\npthread speedup 1 -\u003e 10 (good access):\t4-5x\n\ncuda speedup \u003c\u003c\u003c1,1\u003e\u003e\u003e -\u003e \u003c\u003c\u003c8,64\u003e\u003e\u003e (bad access):\t9x\ncuda speedup \u003c\u003c\u003c1,1\u003e\u003e\u003e -\u003e \u003c\u003c\u003c8,64\u003e\u003e\u003e (good access):\t256x\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbd2720%2Faccesspatterns","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbd2720%2Faccesspatterns","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbd2720%2Faccesspatterns/lists"}