An open API service indexing awesome lists of open source software.

https://github.com/ericlbuehler/candle-benching

Benchmarking for Candle
https://github.com/ericlbuehler/candle-benching

Last synced: about 1 year ago
JSON representation

Benchmarking for Candle

Awesome Lists containing this project

README

          

# candle-benching

Benchmarking for Candle.

Just clone and then run one of:
- `cargo run --release`
- `cargo run --release --features cuda`
- `cargo run --release --features metal`

## **CUDA**: RTX 4070
| test | device | candle_time_per_pass | torch_time_per_pass | n | result |
|-----------------|--------|----------------------|---------------------|--------|-----------------------------------------|
| add | cuda | 19.496µs | 17.935µs | 100000 | ❌ Candle slower than Torch by 1.087x |
| matmul | cuda | 259.859µs | 234.579µs | 100000 | ❌ Candle slower than Torch by 1.108x |
| cublaslt_matmul | cuda | 136.350µs | 254.747µs | 100000 | ✅ Candle faster than Torch by 1.868x |
| relu | cuda | 17.804µs | 22.853µs | 100000 | ✅ Candle faster than Torch by 1.284x |
| gelu | cuda | 21.429µs | 10.031µs | 100000 | ❌ Candle slower than Torch by 2.136x |
| silu | cuda | 20.767µs | 10.429µs | 100000 | ❌ Candle slower than Torch by 1.991x |
| softmax | cuda | 22.955µs | 27.366µs | 100000 | ✅ Candle faster than Torch by 1.192x |
| reshape | cuda | 0.123µs | 9.104µs | 100000 | ✅ Candle faster than Torch by 73.951x |
| transpose | cuda | 0.094µs | 5.810µs | 100000 | ✅ Candle faster than Torch by 61.876x |
| narrow | cuda | 0.113µs | 11.722µs | 100000 | ✅ Candle faster than Torch by 104.156x |

## **CPU**: Intel Core Ultra 9 185H
| test | device | candle_time_per_pass | torch_time_per_pass | n | result |
|-----------|--------|----------------------|---------------------|------|-----------------------------------------|
| add | cpu | 133.909µs | 51.276µs | 1000 | ❌ Candle slower than Torch by 2.612x |
| matmul | cpu | 4464.338µs | 4658.558µs | 1000 | ✅ Candle faster than Torch by 1.044x |
| relu | cpu | 175.867µs | 50.372µs | 1000 | ❌ Candle slower than Torch by 3.491x |
| gelu | cpu | 2374.970µs | 10.533µs | 1000 | ❌ Candle slower than Torch by 225.487x |
| silu | cpu | 2115.940µs | 9.030µs | 1000 | ❌ Candle slower than Torch by 234.315x |
| softmax | cpu | 1655.149µs | 266.818µs | 1000 | ❌ Candle slower than Torch by 6.203x |
| reshape | cpu | 0.074µs | 13.229µs | 1000 | ✅ Candle faster than Torch by 179.150x |
| transpose | cpu | 0.120µs | 8.362µs | 1000 | ✅ Candle faster than Torch by 69.404x |
| narrow | cpu | 0.064µs | 12.545µs | 1000 | ✅ Candle faster than Torch by 197.377x |

## **CPU and MKL**: Intel Core Ultra 9 185H
| test | device | candle_time_per_pass | torch_time_per_pass | n | result |
|-----------|--------|----------------------|---------------------|------|-----------------------------------------|
| add | cpu | 34.672µs | 49.539µs | 1000 | ✅ Candle faster than Torch by 1.429x |
| matmul | cpu | 4718.913µs | 4705.201µs | 1000 | ❌ Candle slower than Torch by 1.003x |
| relu | cpu | 257.947µs | 65.408µs | 1000 | ❌ Candle slower than Torch by 3.944x |
| gelu | cpu | 2405.888µs | 12.113µs | 1000 | ❌ Candle slower than Torch by 198.618x |
| silu | cpu | 523.669µs | 9.747µs | 1000 | ❌ Candle slower than Torch by 53.725x |
| softmax | cpu | 1667.239µs | 272.704µs | 1000 | ❌ Candle slower than Torch by 6.114x |
| reshape | cpu | 0.132µs | 18.616µs | 1000 | ✅ Candle faster than Torch by 141.064x |
| transpose | cpu | 0.171µs | 7.215µs | 1000 | ✅ Candle faster than Torch by 42.137x |
| narrow | cpu | 0.107µs | 13.318µs | 1000 | ✅ Candle faster than Torch by 124.090x |