{"id":15902253,"url":"https://github.com/ericlbuehler/candle-benching","last_synced_at":"2025-03-20T19:31:10.598Z","repository":{"id":252303686,"uuid":"840033657","full_name":"EricLBuehler/candle-benching","owner":"EricLBuehler","description":"Benchmarking for Candle","archived":false,"fork":false,"pushed_at":"2024-08-19T17:31:48.000Z","size":29,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2024-10-07T11:23:21.609Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EricLBuehler.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-08T20:42:01.000Z","updated_at":"2024-08-22T06:40:07.000Z","dependencies_parsed_at":"2024-08-10T15:43:40.888Z","dependency_job_id":null,"html_url":"https://github.com/EricLBuehler/candle-benching","commit_stats":null,"previous_names":["ericlbuehler/candle-benching"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fcandle-benching","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fcandle-benching/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fcandle-benching/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricLBuehler%2Fcandle-benching/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EricLBuehler","download_url":"https://codeload.github.com/EricLBuehler/candle-benching/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":221796080,"owners_count":16881782,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-06T11:23:26.994Z","updated_at":"2024-10-28T07:03:15.673Z","avatar_url":"https://github.com/EricLBuehler.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# candle-benching\n\nBenchmarking for Candle.\n\nJust clone and then run one of:\n- `cargo run --release`\n- `cargo run --release --features cuda`\n- `cargo run --release --features metal`\n\n## **CUDA**: RTX 4070\n| test            | device | candle_time_per_pass | torch_time_per_pass | n      | result                                  |\n|-----------------|--------|----------------------|---------------------|--------|-----------------------------------------|\n| add             | cuda   | 19.496µs             | 17.935µs            | 100000 | ❌ Candle slower than Torch by 1.087x   |\n| matmul          | cuda   | 259.859µs            | 234.579µs           | 100000 | ❌ Candle slower than Torch by 1.108x   |\n| cublaslt_matmul | cuda   | 136.350µs            | 254.747µs           | 100000 | ✅ Candle faster than Torch by 1.868x   |\n| relu            | cuda   | 17.804µs             | 22.853µs            | 100000 | ✅ Candle faster than Torch by 1.284x   |\n| gelu            | cuda   | 21.429µs             | 10.031µs            | 100000 | ❌ Candle slower than Torch by 2.136x   |\n| silu            | cuda   | 20.767µs             | 10.429µs            | 100000 | ❌ Candle slower than Torch by 1.991x   |\n| softmax         | cuda   | 22.955µs             | 27.366µs            | 100000 | ✅ Candle faster than Torch by 1.192x   |\n| reshape         | cuda   | 0.123µs              | 9.104µs             | 100000 | ✅ Candle faster than Torch by 73.951x  |\n| transpose       | cuda   | 0.094µs              | 5.810µs             | 100000 | ✅ Candle faster than Torch by 61.876x  |\n| narrow          | cuda   | 0.113µs              | 11.722µs            | 100000 | ✅ Candle faster than Torch by 104.156x |\n\n## **CPU**: Intel Core Ultra 9 185H\n| test      | device | candle_time_per_pass | torch_time_per_pass | n    | result                                  |\n|-----------|--------|----------------------|---------------------|------|-----------------------------------------|\n| add       | cpu    | 133.909µs            | 51.276µs            | 1000 | ❌ Candle slower than Torch by 2.612x   |\n| matmul    | cpu    | 4464.338µs           | 4658.558µs          | 1000 | ✅ Candle faster than Torch by 1.044x   |\n| relu      | cpu    | 175.867µs            | 50.372µs            | 1000 | ❌ Candle slower than Torch by 3.491x   |\n| gelu      | cpu    | 2374.970µs           | 10.533µs            | 1000 | ❌ Candle slower than Torch by 225.487x |\n| silu      | cpu    | 2115.940µs           | 9.030µs             | 1000 | ❌ Candle slower than Torch by 234.315x |\n| softmax   | cpu    | 1655.149µs           | 266.818µs           | 1000 | ❌ Candle slower than Torch by 6.203x   |\n| reshape   | cpu    | 0.074µs              | 13.229µs            | 1000 | ✅ Candle faster than Torch by 179.150x |\n| transpose | cpu    | 0.120µs              | 8.362µs             | 1000 | ✅ Candle faster than Torch by 69.404x  |\n| narrow    | cpu    | 0.064µs              | 12.545µs            | 1000 | ✅ Candle faster than Torch by 197.377x |\n\n## **CPU and MKL**: Intel Core Ultra 9 185H\n| test      | device | candle_time_per_pass | torch_time_per_pass | n    | result                                  |\n|-----------|--------|----------------------|---------------------|------|-----------------------------------------|\n| add       | cpu    | 34.672µs             | 49.539µs            | 1000 | ✅ Candle faster than Torch by 1.429x   |\n| matmul    | cpu    | 4718.913µs           | 4705.201µs          | 1000 | ❌ Candle slower than Torch by 1.003x   |\n| relu      | cpu    | 257.947µs            | 65.408µs            | 1000 | ❌ Candle slower than Torch by 3.944x   |\n| gelu      | cpu    | 2405.888µs           | 12.113µs            | 1000 | ❌ Candle slower than Torch by 198.618x |\n| silu      | cpu    | 523.669µs            | 9.747µs             | 1000 | ❌ Candle slower than Torch by 53.725x  |\n| softmax   | cpu    | 1667.239µs           | 272.704µs           | 1000 | ❌ Candle slower than Torch by 6.114x   |\n| reshape   | cpu    | 0.132µs              | 18.616µs            | 1000 | ✅ Candle faster than Torch by 141.064x |\n| transpose | cpu    | 0.171µs              | 7.215µs             | 1000 | ✅ Candle faster than Torch by 42.137x  |\n| narrow    | cpu    | 0.107µs              | 13.318µs            | 1000 | ✅ Candle faster than Torch by 124.090x |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericlbuehler%2Fcandle-benching","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fericlbuehler%2Fcandle-benching","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericlbuehler%2Fcandle-benching/lists"}