An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with tensor-core

A curated list of projects in awesome lists tagged with tensor-core .

https://github.com/Bruce-Lee-LY/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

cublas cuda gemm gpu hgemm matrix-multiply nvidia tensor-core

Last synced: 14 May 2025

https://github.com/bruce-lee-ly/cuda_hgemm

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

cublas cuda gemm gpu hgemm matrix-multiply nvidia tensor-core

Last synced: 05 Apr 2025

https://github.com/bruce-lee-ly/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core

Last synced: 17 Jun 2025

https://github.com/Bruce-Lee-LY/cuda_hgemv

Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.

cublas cuda cuda-core gemm gemv gpu hgemm hgemv matrix-multiply nvidia tensor-core

Last synced: 14 May 2025

https://github.com/bruce-lee-ly/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

cuda cutlass flash-attention flash-attention-2 gpu inference large-language-model llm mha multi-head-attention nvidia tensor-core

Last synced: 25 Aug 2025

https://github.com/bruce-lee-ly/cutlass_gemm

Multiple GEMM operators are constructed with cutlass to support LLM inference.

cublas cublaslt cutlass gemm gpu llm matrix-multiply nvidia tensor-core

Last synced: 10 Oct 2025

https://github.com/bruce-lee-ly/cuda_back2back_hgemm

Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.

back2back-gemm back2back-hgemm cublas cuda fused-gemm fused-hgemm gemm gpu hgemm matrix-multiply nvidia tensor-core

Last synced: 13 Apr 2025