https://github.com/actypedef/mixedgemm
a mixed-precision gemm with quantize and reorder kernel.
https://github.com/actypedef/mixedgemm
cuda inference-acceleration llm mlsys quantization
Last synced: 9 months ago
JSON representation
a mixed-precision gemm with quantize and reorder kernel.
- Host: GitHub
- URL: https://github.com/actypedef/mixedgemm
- Owner: actypedef
- Created: 2025-05-15T15:40:10.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2025-06-03T13:21:51.000Z (9 months ago)
- Last Synced: 2025-06-03T23:44:38.048Z (9 months ago)
- Topics: cuda, inference-acceleration, llm, mlsys, quantization
- Language: Cuda
- Homepage:
- Size: 25.3 MB
- Stars: 9
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MixedGemm
[](https://imgse.com/i/pV9V2gx)
**MixedGemm** is a mixed-precision GEMM with quantize and reorder kernel performed on Blackwell GPUs(RTX5090).
We use [CUTLASS](https://github.com/NVIDIA/cutlass) to perform the mxfp4, mxfp6, mxfp8 GEMM.
In this example, we quantized Weight to 100% mxfp4, Activation to 62.5% mxfp4, 34.375% mxfp6 and 3.125% mxfp8 to achieve best performance with tolerant accuracy loss.
[CUDA TOOLKIT 12.8.1](https://developer.nvidia.com/cuda-12-8-1-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=runfile_local) is required.
## Installation
1. Clone this repo and CUTLASS (Make sure you install Git, and Conda)
```
git clone https://github.com/actypedef/MixedGemm.git
git clone https://github.com/NVIDIA/cutlass.git
cd MixedGemm
```
2. Prepare environment
```
sudo apt-get update
sudo apt-get install python3-dev
curl -s https://apt.kitware.com/keys/kitware-archive-latest.asc 2>/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg >/dev/null
sudo apt-add-repository "deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main"
sudo apt update
sudo apt install cmake
conda create -n mixedgemm python=3.12
conda activate mixedgemm
conda install pybind11
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```
3. Replace following paths in CMakeLists.txt with your actual paths
```
CMAKE_PREFIX_PATH
torch_python PATHS
PYTHON ROOT
CUTLASS ROOT
```
4. Make and run
```
bash remake.sh
python main.py
```