https://github.com/neuralmagic/quant_kernel_benchmarks

Benchmarking code for running quantized kernels from vLLM and other libraries
https://github.com/neuralmagic/quant_kernel_benchmarks

Last synced: 12 months ago
JSON representation

Benchmarking code for running quantized kernels from vLLM and other libraries

Host: GitHub
URL: https://github.com/neuralmagic/quant_kernel_benchmarks
Owner: neuralmagic
Created: 2024-09-21T23:42:31.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-12-03T21:02:33.000Z (over 1 year ago)
Last Synced: 2025-06-05T08:45:10.668Z (about 1 year ago)
Language: Python
Size: 21.5 KB
Stars: 5
Watchers: 5
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

# Example Usage

Run the benchmark (generates a .pkl file with the results)

```
python benchmark_kernels.py --act-type bfloat16 --kernels torch_fp16,machete,fbgemm_i4,marlin,gemlite model_bench
```

Plot the results

```
python plot/plot_normalized_runtime.py .pkl --highlight machete
```