https://github.com/tlc-pack/cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
https://github.com/tlc-pack/cutlass_fpA_intB_gemm
Last synced: 16 days ago
JSON representation
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
- Host: GitHub
- URL: https://github.com/tlc-pack/cutlass_fpA_intB_gemm
- Owner: tlc-pack
- License: apache-2.0
- Created: 2023-06-06T22:26:34.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-02-22T02:11:00.000Z (about 2 months ago)
- Last Synced: 2025-02-22T03:19:34.988Z (about 2 months ago)
- Language: C++
- Size: 204 KB
- Stars: 88
- Watchers: 18
- Forks: 22
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-gemm - cutlass_fpA_intB_gemm - 2.0`](https://github.com/tlc-pack/cutlass_fpA_intB_gemm/blob/main/LICENSE) (Libraries / GPU Libraries)
README
Extracted fp16 A and int8/4 B CUTLASS GEMM kernels from FasterTransformer for easier integration in third-party projects. See the original code below.
* https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/cutlass_extensions/include/cutlass_extensions
* https://github.com/NVIDIA/FasterTransformer/tree/main/src/fastertransformer/kernels/cutlass_kernels/fpA_intB_gemmBuild with
```
mkdir build && cd build
cmake ..
make
```