{"id":16540032,"url":"https://github.com/prince781/libgpublas","last_synced_at":"2026-04-29T15:02:14.477Z","repository":{"id":148103489,"uuid":"102565519","full_name":"Prince781/libgpublas","owner":"Prince781","description":"Drop-in GPU acceleration for linear algebra.","archived":false,"fork":false,"pushed_at":"2020-02-22T21:18:59.000Z","size":456,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-18T10:52:39.281Z","etag":null,"topics":["blas","blas-kernels","c","cblas","clblas","cuda","gpu","gpu-acceleration","hpc","interposition","linear-algebra","nvidia","opencl"],"latest_commit_sha":null,"homepage":"","language":"Fortran","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Prince781.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-09-06T05:14:10.000Z","updated_at":"2023-08-21T08:43:45.000Z","dependencies_parsed_at":"2023-05-19T04:00:22.759Z","dependency_job_id":null,"html_url":"https://github.com/Prince781/libgpublas","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Prince781/libgpublas","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prince781%2Flibgpublas","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prince781%2Flibgpublas/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prince781%2Flibgpublas/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prince781%2Flibgpublas/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Prince781","download_url":"https://codeload.github.com/Prince781/libgpublas/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Prince781%2Flibgpublas/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32430803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T13:34:34.882Z","status":"ssl_error","status_checked_at":"2026-04-29T13:34:29.830Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blas","blas-kernels","c","cblas","clblas","cuda","gpu","gpu-acceleration","hpc","interposition","linear-algebra","nvidia","opencl"],"created_at":"2024-10-11T18:51:24.820Z","updated_at":"2026-04-29T15:02:14.454Z","avatar_url":"https://github.com/Prince781.png","language":"Fortran","funding_links":[],"categories":[],"sub_categories":[],"readme":"# blas2cuda\n\nThis is a library to intercept calls to CPU BLAS kernels and run their\nequivalent on the GPU in a CUDA environment.\n\n## Compiling\n```\n$ export CUDA=...\n$ meson -DCUDA=$CUDA build \u0026\u0026 ninja -C build\n```\n\n## How it works\nTODO: expand this section\n\n### Allocation tracking\n- done once\n- track ALL object allocations\n    - remove object-specific information, giving us only calls to malloc()\n- saved to file\n\n### Object tracking\n- in code (blas2cuda):\n    - define custom object manager (`struct objmngr`)\n    - any time an allocation file is loaded, we use this memory manager\n- any time `malloc()` is called:\n    1. object tracker compares call info (from requested size and using\n       libunwind to get instruction pointer) with allocation list\n    2. if call info matches, allocate the object using the custom memory\n       manager defined for the call, and track the object\n    3. if call info doesn't match, act normally\n\n### Motivation for object tracking\n- Each time a kernel is called, we would have to copy data to GPU, invoke the\n  kernel, and copy it back to the CPU. For a series of calls to kernels that\n  aren't computation-intensive (Level 1 and Level 2 BLAS calls are vector-vector\n  and matrix-vector operations), throughput is significantly degraded as the\n  time to transfer data dominates computation.\n    - This is why [NVBLAS](https://docs.nvidia.com/cuda/nvblas/index.html), a\n      similar project, only intercepts computation-intensive Level 3\n      matrix-matrix operations, where the computation dominates data transfer.\n    - However, there's still this issue of copying back and forth.\n- blas2cuda uses object tracking to distinguish memory objects that are used in\n  BLAS kernels from other memory objects we don't care about.\n- When a call is made to `malloc()` that we should care about, we use\n  `cudaMallocManaged()` instead and return a memory address that is shared\n  between the CPU and GPU. This memory is a managed object, and a later call to\n  `free()` will use `cudaFree()` instead.\n- By intercepting the right calls, we can tell when these memory objects are\n  later used in kernels, and avoid copying.\n- Instead of explicit copying, a [page faulting mechanism is used to move data\n  between the CPU and GPU](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-data-migration).\n\n### (Outdated) Running a program\n`./blas2cuda.sh \u003cobjtrackfile\u003e \u003cprogram\u003e`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprince781%2Flibgpublas","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fprince781%2Flibgpublas","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fprince781%2Flibgpublas/lists"}