{"id":28719156,"url":"https://github.com/actypedef/mixedgemm","last_synced_at":"2025-06-15T06:00:23.908Z","repository":{"id":293494332,"uuid":"984230164","full_name":"actypedef/MixedGemm","owner":"actypedef","description":"a mixed-precision gemm with quantize and reorder kernel.","archived":false,"fork":false,"pushed_at":"2025-06-03T13:21:51.000Z","size":26523,"stargazers_count":9,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-03T23:44:38.048Z","etag":null,"topics":["cuda","inference-acceleration","llm","mlsys","quantization"],"latest_commit_sha":null,"homepage":"","language":"Cuda","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/actypedef.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-15T15:40:10.000Z","updated_at":"2025-06-03T13:09:37.000Z","dependencies_parsed_at":"2025-05-15T16:54:38.563Z","dependency_job_id":null,"html_url":"https://github.com/actypedef/MixedGemm","commit_stats":null,"previous_names":["actypedef/mixedgemm"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/actypedef/MixedGemm","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/actypedef%2FMixedGemm","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/actypedef%2FMixedGemm/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/actypedef%2FMixedGemm/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/actypedef%2FMixedGemm/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/actypedef","download_url":"https://codeload.github.com/actypedef/MixedGemm/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/actypedef%2FMixedGemm/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259929946,"owners_count":22933527,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cuda","inference-acceleration","llm","mlsys","quantization"],"created_at":"2025-06-15T06:00:18.555Z","updated_at":"2025-06-15T06:00:23.789Z","avatar_url":"https://github.com/actypedef.png","language":"Cuda","readme":"# MixedGemm\n\n[![pV9V2gx.png](https://s21.ax1x.com/2025/05/30/pV9V2gx.png)](https://imgse.com/i/pV9V2gx)\n\n**MixedGemm** is a mixed-precision GEMM with quantize and reorder kernel performed on Blackwell GPUs(RTX5090).\n\nWe use [CUTLASS](https://github.com/NVIDIA/cutlass) to perform the mxfp4, mxfp6, mxfp8 GEMM.\n\nIn this example, we quantized Weight to 100% mxfp4, Activation to 62.5% mxfp4, 34.375% mxfp6 and 3.125% mxfp8 to achieve best performance with tolerant accuracy loss.\n\n[CUDA TOOLKIT 12.8.1](https://developer.nvidia.com/cuda-12-8-1-download-archive?target_os=Linux\u0026target_arch=x86_64\u0026Distribution=Ubuntu\u0026target_version=22.04\u0026target_type=runfile_local) is required.\n\n## Installation\n\n1. Clone this repo and CUTLASS (Make sure you install Git, and Conda)\n```\ngit clone https://github.com/actypedef/MixedGemm.git\ngit clone https://github.com/NVIDIA/cutlass.git\ncd MixedGemm\n```\n2. Prepare environment\n```\nsudo apt-get update\nsudo apt-get install python3-dev\n\ncurl -s https://apt.kitware.com/keys/kitware-archive-latest.asc 2\u003e/dev/null | gpg --dearmor - | tee /etc/apt/trusted.gpg.d/kitware.gpg \u003e/dev/null\nsudo apt-add-repository \"deb https://apt.kitware.com/ubuntu/ $(lsb_release -cs) main\"\nsudo apt update\nsudo apt install cmake\n\nconda create -n mixedgemm python=3.12\nconda activate mixedgemm\nconda install pybind11\npip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128\n```\n3. Replace following paths in CMakeLists.txt with your actual paths\n```\nCMAKE_PREFIX_PATH\ntorch_python PATHS\nPYTHON ROOT\nCUTLASS ROOT\n```\n4. Make and run\n```\nbash remake.sh\npython main.py\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Factypedef%2Fmixedgemm","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Factypedef%2Fmixedgemm","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Factypedef%2Fmixedgemm/lists"}