{"id":24861527,"url":"https://github.com/gameofdimension/limulidae","last_synced_at":"2026-04-28T17:02:51.190Z","repository":{"id":273032308,"uuid":"918514068","full_name":"gameofdimension/limulidae","owner":"gameofdimension","description":"benchmark gpu/npu flops and bandwidth","archived":false,"fork":false,"pushed_at":"2025-01-21T02:54:34.000Z","size":17,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-15T05:40:03.652Z","etag":null,"topics":["910b","ascend","benchmark","flops","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/gameofdimension.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-18T05:52:45.000Z","updated_at":"2025-03-13T07:52:09.000Z","dependencies_parsed_at":"2025-01-18T07:18:36.995Z","dependency_job_id":"39cdf094-75d1-4f88-b7da-c759ba840caf","html_url":"https://github.com/gameofdimension/limulidae","commit_stats":null,"previous_names":["gameofdimension/limulidae"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/gameofdimension/limulidae","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gameofdimension%2Flimulidae","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gameofdimension%2Flimulidae/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gameofdimension%2Flimulidae/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gameofdimension%2Flimulidae/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/gameofdimension","download_url":"https://codeload.github.com/gameofdimension/limulidae/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/gameofdimension%2Flimulidae/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32390067,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T14:34:11.604Z","status":"ssl_error","status_checked_at":"2026-04-28T14:32:37.009Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["910b","ascend","benchmark","flops","pytorch"],"created_at":"2025-01-31T22:05:58.163Z","updated_at":"2026-04-28T17:02:51.174Z","avatar_url":"https://github.com/gameofdimension.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# limulidae\n\n- 用于测试 NVIDIA GPU 和 Ascend NPU 的实际算力；\n- 用于测试 NVIDIA GPU 和 Ascend NPU 的实际节点内通信带宽。\n\n## 实测数据\n \n### 算力\n\n| 数据类型 | 加速器 | 实测算力 TFlops |\n|----------|--------|------------------|\n| BF16     | A800   | 286              |\n| BF16     | 910B   | 328              |\n| FP32     | A800   | 19               |\n| FP32     | 910B   | 87               |\n\n### 节点内带宽\n\n| 卡数 | 加速器 | all_gather 带宽GB/s | all_reduce 带宽GB/s |\n|------|--------|---------------------|---------------------|\n| 2    | A800   | 230                 | 143                 |\n| 2    | 910B   | 38                  | 18                  |\n| 4    | A800   | 190                 | 104                 |\n| 4    | 910B   | 64                  | 30                  |\n| 8    | A800   | 173                 | 89                  |\n| 8    | 910B   | 149                 | 72                  |\n\n\n### 显存带宽\n\n| 算子 | 加速器 | 显存带宽GB/s |\n|------|------|------|\n| `torch.exp` |A800|884|\n| `torch.exp` |910B|642|\n|`torch.nn.Sigmoid`|A800|887|\n|`torch.nn.Sigmoid`|910B|640|\n|$\\frac{1}{1+e^{-x}}$（手写 sigmoid）|A800|176|\n|$\\frac{1}{1+e^{-x}}$（手写 sigmoid）|910B|128|\n\n\n## 复现步骤\n\n### 准备工作\n1. 910B 安装相关 CANN(8.0.0.beta1), torch(cpu+2.4.0) 和 torch_npu(2.4.0.post2) 等。[详细参考](https://www.hiascend.com/document/detail/zh/Pytorch/600/configandinstg/instg/insg_0001.html)；\n2. 安装本项目依赖。\n\n### 算力测试\n`python bench_flops.py ${dtype}`, dtype 可取 fp32/fp16/bf16。\n\n### 测试带宽\n`torchrun --nproc-per-node=${卡数} bench_collective.py ${通信算子}` 通信算子目前支持 `all_reduce` 和 `all_gather`。\n\n## 参考\n\n- 算力测试：https://github.com/mag-/gpu_benchmark\n- 带宽测试：https://github.com/IBM/pytorch-communication-benchmarks\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgameofdimension%2Flimulidae","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgameofdimension%2Flimulidae","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgameofdimension%2Flimulidae/lists"}