https://github.com/neuralmagic/quant_kernel_benchmarks
Benchmarking code for running quantized kernels from vLLM and other libraries
https://github.com/neuralmagic/quant_kernel_benchmarks
Last synced: 12 months ago
JSON representation
Benchmarking code for running quantized kernels from vLLM and other libraries
- Host: GitHub
- URL: https://github.com/neuralmagic/quant_kernel_benchmarks
- Owner: neuralmagic
- Created: 2024-09-21T23:42:31.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-12-03T21:02:33.000Z (over 1 year ago)
- Last Synced: 2025-06-05T08:45:10.668Z (about 1 year ago)
- Language: Python
- Size: 21.5 KB
- Stars: 5
- Watchers: 5
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
# Example Usage
Run the benchmark (generates a .pkl file with the results)
```
python benchmark_kernels.py --act-type bfloat16 --kernels torch_fp16,machete,fbgemm_i4,marlin,gemlite model_bench
```
Plot the results
```
python plot/plot_normalized_runtime.py .pkl --highlight machete
```