{"id":28254712,"url":"https://github.com/nexusgpu/benchmark","last_synced_at":"2025-07-10T07:32:31.072Z","repository":{"id":293480960,"uuid":"980533733","full_name":"NexusGPU/benchmark","owner":"NexusGPU","description":"TensorFusion Remote/Local vGPU Benchmark","archived":false,"fork":false,"pushed_at":"2025-07-03T16:54:18.000Z","size":18,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-03T17:47:52.923Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Smarty","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NexusGPU.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-05-09T09:34:36.000Z","updated_at":"2025-07-03T16:54:21.000Z","dependencies_parsed_at":"2025-05-15T15:37:40.577Z","dependency_job_id":"4875d56a-469d-49f6-a47f-2933e2f27ae3","html_url":"https://github.com/NexusGPU/benchmark","commit_stats":null,"previous_names":["nexusgpu/benchmark"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/NexusGPU/benchmark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fbenchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fbenchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fbenchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fbenchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NexusGPU","download_url":"https://codeload.github.com/NexusGPU/benchmark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NexusGPU%2Fbenchmark/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":264545157,"owners_count":23625403,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-05-19T20:15:19.064Z","updated_at":"2025-07-10T07:32:31.060Z","avatar_url":"https://github.com/NexusGPU.png","language":"Smarty","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TensorFusion Remote/Local vGPU Benchmark Helm Chart\n\nThis Helm chart deploys the TensorFusion Remote/Local vGPU Benchmark application, which includes a deployment for running the benchmark tests and a cronjob for automated testing.\n\n## Benchmark Results\n\n### TorchBenchmark Results (2025-07-06)\n\nTo run the TorchBenchmark tests:\n```bash\ncd benchmark\npython3 test.py -k \"test_${model_name}_eval_cuda\" -t ${eval_times}\n```\n\n| Model | Native | NGPU Mode | Loss(NGPU) | Local | Loss(Local) | Same AZ | Loss(Same AZ) | Cross AZ | Loss(Cross AZ) |\n|-------|---------|------------|------------|--------|-------------|---------|--------------|----------|----------------|\n| basic_gnn_edgecnn | 41.15 s | 40.95 s | -0.49% | 43.48 s | 5.66% | 46.07 s | 11.96% | 54.97 s | 33.58% |\n| BERT_pytorch | 249.02 s | 248.84 s | -0.07% | 251.26 s | 0.90% | 253.71 s | 1.88% | 261.62 s | 5.06% |\n| basic_gnn_gcn | 15.05 s | 15.24 s | 1.26% | 19.63 s | 30.43% | 29.70 s | 97.34% | 64.39 s | 327.84% |\n| basic_gnn_gin | 9.47 s | 9.53 s | 0.63% | 9.78 s | 3.27% | 12.66 s | 33.69% | 21.83 s | 130.52% |\n| hf_Albert | 24.73 s | 24.00 s | -2.95% | 29.19 s | 18.03% | 39.19 s | 58.47% | 73.00 s | 195.19% |\n| hf_Bart | 39.88 s | 38.68 s | -3.01% | 54.96 s | 37.81% | 94.17 s | 136.13% | 211.68 s | 430.79% |\n| hf_Bert | 24.15 s | 24.35 s | 0.83% | 29.55 s | 22.36% | 42.00 s | 73.91% | 75.86 s | 214.12% |\n| llama | 39.91 s | 41.20 s | 3.23% | 42.90 s | 7.49% | 45.80 s | 14.76% | 52.55 s | 31.67% |\n| hf_distil_whisper | 170.61 s | 170.87 s | 0.15% | 172.16 s | 0.91% | 178.75 s | 4.77% | 189.45 s | 11.04% |\n| hf_clip | 191.60 s | 191.70 s | 0.05% | 194.52 s | 1.52% | 197.51 s | 3.08% | 208.90 s | 9.03% |\n| hf_Whisper | 58.98 s | 59.18 s | 0.34% | 63.50 s | 7.66% | 66.66 s | 13.02% | 72.63 s | 23.14% |\n| **Average Loss** | - | - | **0.00%** | - | **12.37%** | - | **40.82%** | - | **128.36%** |\n\n### MLPerf Results (2025-07-04)\n\nTo run the MLPerf benchmark:\n```bash\nmlcr run-mlperf,inference,_full,_r5.0-dev \\\n    --model=bert-99 \\\n    --implementation=reference \\\n    --framework=pytorch \\\n    --category=edge \\\n    --scenario=SingleStream \\\n    --execution_mode=valid \\\n    --device=cuda \\\n    --quiet --rerun\n```\n\n| Mode | Time | Loss |\n|------|------|------|\n| Native | 27.008 s | - |\n| Local | 29.930 s | 10.82% |\n| Same AZ | 33.341 s | 23.45% |\n| Cross AZ | 41.597 s | 54.02% |\n\n### Simulating AZ Latencies\n\nTo simulate different AZ (Availability Zone) network conditions, you can use the Linux Traffic Control (tc) tool to inject artificial network latency:\n\n1. Inject network latency:\n```bash\n# For Same AZ simulation (0.3ms latency)\ntc qdisc add dev lo root netem delay 0.3ms\n\n# For Cross AZ simulation (1ms latency)\ntc qdisc add dev lo root netem delay 1ms\n```\n\n2. Verify the latency:\n```bash\nping target_host\n```\n\n3. Remove the artificial latency when done:\n```bash\ntc qdisc del dev lo root\n```\n\n## Prerequisites\n\n- Kubernetes 1.19+\n- Helm 3.2.0+\n- PV provisioner support in the underlying infrastructure\n- A GPU node with NVIDIA drivers installed\n\n## Installing the Chart\n\nTo install the chart with the release name `my-release`:\n\n```bash\nhelm install my-release ./helm/torchbench\n```\n\nThe command deploys the benchmark application on the Kubernetes cluster with default configuration.\n\n## Configuration\n\nThe following table lists the configurable parameters of the chart and their default values.\n\n| Parameter | Description | Default |\n|-----------|-------------|---------|\n| `replicaCount` | Number of replicas | `1` |\n| `image.repository` | Image repository | `crpi-wpzfqfci37r0ad3n.cn-hangzhou.personal.cr.aliyuncs.com/tensorfusionrobin/tensorfusionrobin` |\n| `image.tag` | Image tag | `latest` |\n| `image.pullPolicy` | Image pull policy | `Always` |\n| `serviceAccount.create` | Create service account | `true` |\n| `serviceAccount.name` | Service account name | `cronjob-sa` |\n| `podAnnotations` | Pod annotations | See values.yaml |\n| `podLabels` | Pod labels | See values.yaml |\n| `resources` | Pod resource requests and limits | See values.yaml |\n| `nodeSelector` | Node selector | `kubernetes.io/hostname: gpu-2` |\n| `cronjob.schedule` | Cronjob schedule | `0 0 * * *` |\n| `cronjob.concurrencyPolicy` | Cronjob concurrency policy | `Allow` |\n| `cronjob.successfulJobsHistoryLimit` | Number of successful jobs to keep | `3` |\n| `cronjob.failedJobsHistoryLimit` | Number of failed jobs to keep | `1` |\n\n## Usage\n\n### Running the Benchmark\n\nThe benchmark will run automatically according to the cronjob schedule. You can also manually trigger a benchmark run by:\n\n1. Finding the cronjob:\n```bash\nkubectl get cronjob\n```\n\n2. Creating a job from the cronjob:\n```bash\nkubectl create job --from=cronjob/my-release-torchbench-test-runner manual-run\n```\n\n### Viewing Results\n\nTo view the benchmark results:\n\n```bash\nkubectl logs -l app=my-release-torchbench\n```\n\n### Customizing the Configuration\n\nTo customize the configuration, create a custom values file:\n\n```bash\nhelm install my-release ./helm/torchbench -f custom-values.yaml\n```\n\n## Uninstalling the Chart\n\nTo uninstall/delete the deployment:\n\n```bash\nhelm uninstall my-release\n```\n\n## Troubleshooting\n\nIf you encounter any issues:\n\n1. Check the pod status:\n```bash\nkubectl get pods -l app=my-release-torchbench\n```\n\n2. Check the pod logs:\n```bash\nkubectl logs -l app=my-release-torchbench\n```\n\n3. Check the cronjob status:\n```bash\nkubectl get cronjob\nkubectl get jobs\n```\n\n4. Check the service account:\n```bash\nkubectl get serviceaccount\n``` \n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexusgpu%2Fbenchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnexusgpu%2Fbenchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnexusgpu%2Fbenchmark/lists"}