{"id":19418978,"url":"https://github.com/playform/blast","last_synced_at":"2025-02-25T03:43:29.774Z","repository":{"id":227834101,"uuid":"754845713","full_name":"PlayForm/Blast","owner":"PlayForm","description":null,"archived":false,"fork":false,"pushed_at":"2024-04-02T17:15:52.000Z","size":6725,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"Current","last_synced_at":"2025-02-23T01:17:31.754Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://playform.cloud","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PlayForm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null},"funding":{"github":"CNugteren"}},"created_at":"2024-02-08T21:47:38.000Z","updated_at":"2024-08-07T03:30:59.000Z","dependencies_parsed_at":"2024-03-15T11:48:14.461Z","dependency_job_id":"8e048e38-6549-4ff7-806b-74f2576a65e8","html_url":"https://github.com/PlayForm/Blast","commit_stats":null,"previous_names":["playform/blast"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayForm%2FBlast","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayForm%2FBlast/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayForm%2FBlast/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PlayForm%2FBlast/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PlayForm","download_url":"https://codeload.github.com/PlayForm/Blast/tar.gz/refs/heads/Current","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240599180,"owners_count":19826959,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T13:15:46.427Z","updated_at":"2025-02-25T03:43:29.755Z","avatar_url":"https://github.com/PlayForm.png","language":"C++","readme":"\nCLBlast: The tuned OpenCL BLAS library\n================\n\n| Platform | Build status |\n|-----|-----|\n| Windows | [![Build Status](https://ci.appveyor.com/api/projects/status/github/cnugteren/clblast?branch=master\u0026svg=true)](https://ci.appveyor.com/project/CNugteren/clblast) |\n| Linux/macOS | [![Build Status](https://github.com/cnugteren/clblast/actions/workflows/build_and_test.yml/badge.svg?branch=master)](https://github.com/CNugteren/CLBlast/actions/workflows/build_and_test.yml) |\n\n\n| Test machine (thanks to [ArrayFire](https://ci.arrayfire.org:8010/#/builders)) | Test status |\n|-----|-----|\n| clblast-linux-nvidia-a100 | [![Test Status](http://ci.arrayfire.org:8010/badges/clblast-linux-nvidia-a100.svg)](http://ci.arrayfire.org:8010/#/builders/clblast-linux-nvidia-a100) |\n| clblast-linux-nvidia-k80 | [![Test Status](http://ci.arrayfire.org:8010/badges/clblast-linux-nvidia-k80.svg)](http://ci.arrayfire.org:8010/#/builders/clblast-linux-nvidia-k80) |\n| clblast-linux-nvidia-p100 | [![Test Status](http://ci.arrayfire.org:8010/badges/clblast-linux-nvidia-p100.svg)](http://ci.arrayfire.org:8010/#/builders/clblast-linux-nvidia-p100) |\n| clblast-linux-nvidia-t4 | [![Test Status](http://ci.arrayfire.org:8010/badges/clblast-linux-nvidia-t4.svg)](http://ci.arrayfire.org:8010/#/builders/clblast-linux-nvidia-t4) |\n| clblast-linux-nvidia-v100 | [![Test Status](http://ci.arrayfire.org:8010/badges/clblast-linux-nvidia-v100.svg)](http://ci.arrayfire.org:8010/#/builders/clblast-linux-nvidia-v100) |\n| clblast-windows-amd-r9 | [![Test Status](http://ci.arrayfire.org:8010/badges/clblast-windows-amd-r9.svg)](http://ci.arrayfire.org:8010/#/builders/clblast-windows-amd-r9) |\n| clblast-windows-nvidia-m6000 | [![Test Status](http://ci.arrayfire.org:8010/badges/clblast-windows-nvidia-m6000.svg)](http://ci.arrayfire.org:8010/#/builders/clblast-windows-nvidia-m6000) |\n\nCLBlast is a lightweight, performant and tunable OpenCL BLAS library written in C++11. It is designed to leverage the full performance potential of a wide variety of OpenCL devices from different vendors, including desktop and laptop GPUs, embedded GPUs, and other accelerators. CLBlast implements BLAS routines: basic linear algebra subprograms operating on vectors and matrices. See [the CLBlast website](https://cnugteren.github.io/clblast) for performance reports on some devices.\n\nThe library is not tuned for all possible OpenCL devices: __if out-of-the-box performance is poor, please run the tuners first__. See [the docs for a list of already tuned devices](doc/tuning.md#already-tuned-for-devices) and [instructions on how to tune yourself](doc/tuning.md) and contribute to future releases of the CLBlast library.\n\n\nWhy CLBlast and not clBLAS or cuBLAS?\n-------------\n\nUse CLBlast instead of clBLAS:\n\n* When you care about achieving maximum performance.\n* When you want to be able to inspect the BLAS kernels or easily customize them to your needs.\n* When you run on exotic OpenCL devices for which you need to tune yourself.\n* When you are still running on OpenCL 1.1 hardware.\n* When you prefer a C++ API over a C API (C API also available in CLBlast).\n* When you value an organized and modern C++ codebase.\n* When you target Intel CPUs and GPUs or embedded devices.\n* When you can benefit from the increased performance of half-precision fp16 data-types.\n\nUse CLBlast instead of cuBLAS:\n\n* When you want your code to run on devices other than NVIDIA CUDA-enabled GPUs.\n* When you want to tune for a specific configuration (e.g. rectangular matrix-sizes).\n* When you sleep better if you know that the library you use is open-source.\n* When you are using OpenCL rather than CUDA.\n\nWhen not to use CLBlast:\n\n* When you run on NVIDIA's CUDA-enabled GPUs only and can benefit from cuBLAS's assembly-level tuned kernels.\n\n\nGetting started\n-------------\n\nCLBlast can be compiled with minimal dependencies (apart from OpenCL) in the usual CMake-way, e.g.:\n\n    mkdir build \u0026\u0026 cd build\n    cmake ..\n    make\n\nDetailed instructions for various platforms can be found are [here](doc/installation.md).\n\nLike clBLAS and cuBLAS, CLBlast also requires OpenCL device buffers as arguments to its routines. This means you'll have full control over the OpenCL buffers and the host-device memory transfers. CLBlast's API is designed to resemble clBLAS's C API as much as possible, requiring little integration effort in case clBLAS was previously used. Using CLBlast starts by including the C++ header:\n\n    #include \u003cclblast.h\u003e\n\nOr alternatively the plain C version:\n\n    #include \u003cclblast_c.h\u003e\n\nAfterwards, any of CLBlast's routines can be called directly: there is no need to initialize the library. The available routines and the required arguments are described in the above mentioned include files and the included [API documentation](doc/api.md). The API is kept as close as possible to the Netlib BLAS and the cuBLAS/clBLAS APIs. For an overview of the supported routines, see [here](doc/routines.md).\n\nTo get started quickly, a couple of stand-alone example programs are included in the `samples` subfolder. They can optionally be compiled using the CMake infrastructure of CLBlast by providing the `-DSAMPLES=ON` flag, for example as follows:\n\n    cmake -DSAMPLES=ON ..\n\nAfterwards, you can optionally read more about running proper [benchmarks](doc/benchmarking.md) and [tuning the library](doc/tuning.md).\n\n\nFull documentation\n-------------\n\nMore detailed documentation is available in separate files:\n\n* [Building and installing](doc/installation.md)\n* [Supported routines overview](doc/routines.md)\n* [Performance measuring and benchmarking](doc/benchmarking.md)\n* [Tuning for better performance](doc/tuning.md)\n* [Testing the library for correctness](doc/testing.md)\n* [Bindings / wrappers for other languages](doc/bindings.md)\n* [More details on the GEMM kernel](doc/details_gemm.md)\n* [More details on the convolution implementation](doc/details_conv.md)\n* [Glossary with some terms explained](doc/glossary.md)\n* [Frequently asked questions (FAQ) and their answers](doc/faq.md)\n\n\nKnown issues\n-------------\n\nKnown performance related issues:\n\n* Severe performance issues with Beignet v1.3.0 due to missing support for local memory. Please downgrade to v1.2.1 or upgrade to v1.3.1 or newer.\n\nOther known issues:\n\n* Routines returning an integer are currently not properly tested for half-precision FP16: IHAMAX/IHAMIN/IHMAX/IHMIN\n\n* Half-precision FP16 tests might sometimes fail based on order multiplication, i.e. (a * b) * c != (c * b) * a\n\n* The AMD APP SDK has a bug causing a conflict with libstdc++, resulting in a segfault when initialising static variables. This has been reported to occur with the CLBlast tuners.\n\n* The AMD run-time compiler has a bug causing it to get stuck in an infinite loop. This is reported to happen occasionally when tuning the CLBlast GEMM routine.\n\n* AMD Southern Island GPUs might cause wrong results with the amdgpu-pro drivers. Do configure CMake with `AMD_SI_EMPTY_KERNEL_WORKAROUND` to resolve the issue, [see issue #301](https://github.com/CNugteren/CLBlast/issues/301).\n\n* Tests might fail on an Intel IvyBridge GPU with the latest Beignet. Please downgrade Beignet to 1.2.1, [see issue #231](https://github.com/CNugteren/CLBlast/issues/231).\n\n\nContributing\n-------------\n\nContributions are welcome in the form of tuning results for OpenCL devices previously untested or pull requests. See [the contributing guidelines](CONTRIBUTING.md) for more details.\n\nThe main contributing authors (code, pull requests, testing) can be found in the list of[GitHub contributors](https://github.com/CNugteren/CLBlast/graphs/contributors).\n\nTuning and testing on a variety of OpenCL devices was made possible by:\n\n* [TU/e ES research group](http://www.es.ele.tue.nl/)\n* [ASCI DAS4 and DAS5](http://www.cs.vu.nl/das4/)\n* [dividiti](http://www.dividiti.com)\n* [SURFsara HPC center](http://www.surfsara.com)\n* [ArrayFire](http://arrayfire.org)\n* [TomTom](http://www.tomtom.com)\n* Everyone reporting [tuning results](https://github.com/CNugteren/CLBlast/issues/1)\n\nHardware/software for this project was contributed by:\n\n* [HPC research group at the University of Bristol](http://uob-hpc.github.io/zoo/) for access to their GPU zoo\n* [ArrayFire](http://arrayfire.org) for settings up and supporting Buildbot correctness tests on multiple platforms\n* [JetBrains](https://www.jetbrains.com/clion/) for supply a free CLion IDE license for CLBlast developers\n* [Travis CI](https://travis-ci.org/CNugteren/CLBlast/branches) and [AppVeyor](https://ci.appveyor.com/project/CNugteren/clblast) for free automated build tests for open-source projects\n\n\nMore information\n-------------\n\nFurther information on CLBlast is available through the following links:\n\n* A 20-minute presentation of CLBlast was given at the GPU Technology Conference in May 2017. A recording is available on the [GTC on-demand website](http://on-demand.gputechconf.com/gtc/2017/video/s7280-nugteren-clblast.mp4) (poor audio quality however) and a full slide-set is also available [as PDF](http://on-demand.gputechconf.com/gtc/2017/presentation/s7280-cedric-nugteren-clblast.pdf). An updated version was also presented at IWOCL in May 2018. The slide set can be found [here as PDF](https://cnugteren.github.io/downloads/CLBlastIWOCL18.pdf).\n* More in-depth information and experimental results are also available in a scientific paper titled [CLBlast: A Tuned OpenCL BLAS Library](https://arxiv.org/abs/1705.05249) (v1 May 2017, updated to v2 in April 2018). For CLTune, the inspiration for the included auto-tuner, see also the [CLTune: A Generic Auto-Tuner for OpenCL Kernels](https://arxiv.org/abs/1703.06503) paper.\n\nHow to cite this work:\n\n    Cedric Nugteren. CLBlast: A Tuned OpenCL BLAS Library. In IWOCL'18: International Workshop\n    on OpenCL. ACM, New York, NY, USA, 10 pages. 2018. https://doi.org/10.1145/3204919.3204924\n\n\nSupport us\n-------------\n\nThis project started in March 2015 as an evenings and weekends free-time project next to a full-time job for Cedric Nugteren. You can find contact information on the [website of the main author](http://cnugteren.github.io).\n","funding_links":["https://github.com/sponsors/CNugteren"],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplayform%2Fblast","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fplayform%2Fblast","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fplayform%2Fblast/lists"}