{"id":18843452,"url":"https://github.com/ai-hypercomputer/gpu-recipes","last_synced_at":"2025-06-25T09:04:04.556Z","repository":{"id":259310130,"uuid":"858928709","full_name":"AI-Hypercomputer/gpu-recipes","owner":"AI-Hypercomputer","description":"Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.","archived":false,"fork":false,"pushed_at":"2025-06-19T05:00:56.000Z","size":486,"stargazers_count":73,"open_issues_count":7,"forks_count":26,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-06-25T09:03:09.136Z","etag":null,"topics":["benchmarks","distributed-training","google-cloud-platform","gpu","serving"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AI-Hypercomputer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-17T19:19:54.000Z","updated_at":"2025-06-20T05:59:10.000Z","dependencies_parsed_at":"2024-12-07T01:20:46.268Z","dependency_job_id":"8086782f-55fd-44fe-8ff8-3c275869e856","html_url":"https://github.com/AI-Hypercomputer/gpu-recipes","commit_stats":null,"previous_names":["ai-hypercomputer/gpu-recipes"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AI-Hypercomputer/gpu-recipes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fgpu-recipes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fgpu-recipes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fgpu-recipes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fgpu-recipes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AI-Hypercomputer","download_url":"https://codeload.github.com/AI-Hypercomputer/gpu-recipes/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fgpu-recipes/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261841926,"owners_count":23217911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarks","distributed-training","google-cloud-platform","gpu","serving"],"created_at":"2024-11-08T02:57:50.881Z","updated_at":"2025-06-25T09:04:04.495Z","avatar_url":"https://github.com/AI-Hypercomputer.png","language":"Python","readme":"\n# Reproducible benchmark recipes for GPUs\n\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)\n\nWelcome to the reproducible benchmark recipes repository for GPUs! This repository contains recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.\n\n## Overview\n\n1. **Identify your requirements:** Determine the model, GPU type, workload, framework, and orchestrator you are interested in.\n2. **Select a recipe:** Based on your requirements use the [Benchmark support matrix](#benchmarks-support-matrix) to find a recipe that meets your needs.\n3. Follow the recipe: each recipe will provide you with procedures to complete the following tasks:\n   * Prepare your environment\n   * Run the benchmark\n   * Analyze the benchmarks results. This includes not just the results but detailed logs for further analysis\n\n## Benchmarks support matrix\n\n### Training benchmarks A3 Mega\n\nModels            | GPU Machine Type                                                                                          | Framework | Workload Type | Orchestrator | Link to the recipe\n----------------- | --------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------\n**GPT3-175B**     | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo      | Pre-training  | GKE          | [Link](./training/a3mega/gpt3-175b/nemo-pretraining-gke/README.md)\n**Llama-3-70B**   | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo      | Pre-training  | GKE          | [Link](./training/a3mega/llama3-70b/nemo-pretraining-gke/README.md)\n**Llama-3.1-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo      | Pre-training  | GKE          | [Link](./training/a3mega/llama3-1-70b/nemo-pretraining-gke/README.md)\n**Mixtral-8-7B**  | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo      | Pre-training  | GKE          | [Link](./training/a3mega/mixtral-8x7b/nemo-pretraining-gke/README.md)\n\n\n### Training benchmarks A3 Ultra\n\nModels             | GPU Machine Type                                                                                            | Framework | Workload Type | Orchestrator | Link to the recipe\n------------------ | ----------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------\n**Llama-3.1-70B**  | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText   | Pre-training  | GKE          | [Link](./training/a3ultra/llama3-1-70b/maxtext-pretraining-gke/README.md)\n**Llama-3.1-70B**  | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo      | Pre-training  | GKE          | [Link](./training/a3ultra/llama3-1-70b/nemo-pretraining-gke/README.md)\n**Llama-3.1-405B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | MaxText   | Pre-training  | GKE          | [Link](./training/a3ultra/llama3-1-405b/maxtext-pretraining-gke/README.md)\n**Llama-3.1-405B** | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo.     | Pre-training  | GKE          | [Link](./training/a3ultra/llama3-1-405b/nemo-pretraining-gke/README.md)\n**Mixtral-8-7B**   | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms) | NeMo      | Pre-training  | GKE          | [Link](./training/a3ultra/mixtral-8x7b/nemo-pretraining-gke/README.md)\n\n### Training benchmarks A4\n\nModels             | GPU Machine Type                                                                                     | Framework | Workload Type | Orchestrator | Link to the recipe\n------------------ | ---------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------\n**Llama-3.1-70B** | [A4 (NVIDIA B200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms)      | MaxText   | Pre-training  | GKE          | [Link](./training/a4/llama3-1-70b/maxtext-pretraining-gke/README.md)\n**Llama-3.1-70B** | [A4 (NVIDIA B200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms)      | NeMo      | Pre-training  | GKE          | [Link](./training/a4/llama3-1-70b/nemo-pretraining-gke/README.md)\n**Llama-3.1-405B** | [A4 (NVIDIA B200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms)      | MaxText   | Pre-training  | GKE          | [Link](./training/a4/llama3-1-405b/maxtext-pretraining-gke/README.md)\n**Llama-3.1-405B** | [A4 (NVIDIA B200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms)      | NeMo      | Pre-training  | GKE          | [Link](./training/a4/llama3-1-405b/nemo-pretraining-gke/README.md)\n**Mixtral-8-7B**   | [A4 (NVIDIA B200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a4-vms)      | NeMo      | Pre-training  | GKE          | [Link](./training/a4/mixtral-8x7b/nemo-pretraining-gke/README.md)\n\n### Inference benchmarks A3 Mega\n\n| Models           | GPU Machine Type | Framework | Workload Type       | Orchestrator | Link to the recipe |\n| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |\n| **Llama-4**      | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms)    | SGLang  | Inference   | GKE          | [Link](./inference/a3mega/llama-4/vllm-serving-gke/README.md)\n| **DeepSeek R1 671B**     | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms)    | SGLang  | Inference   | GKE          | [Link](./inference/a3mega/deepseek-r1-671b/sglang-serving-gke/README.md)\n| **DeepSeek R1 671B**     | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms)    | vLLM  | Inference   | GKE          | [Link](./inference/a3mega/deepseek-r1-671b/vllm-serving-gke/README.md)\n\n### Inference benchmarks A3 Ultra\n\n| Models           | GPU Machine Type | Framework | Workload Type       | Orchestrator | Link to the recipe |\n| ---------------- | ---------------- | --------- | ------------------- | ------------ | ------------------ |\n| **Llama-4**      | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms)    | vLLM  | Inference   | GKE          | [Link](./inference/a3ultra/single-host-serving/vllm/README.md#serving-llama-4-models)\n| **Llama-3.1-405B**     | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms)    | TensorRT-LLM  | Inference   | GKE          | [Link](./inference/a3ultra/single-host-serving/trtllm/README.md#serving-llama-3.1-405b-model)\n| **DeepSeek R1 671B**     | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms)    | SGLang  | Inference   | GKE          | [Link](./inference/a3ultra/single-host-serving/sglang/README.md#serving-deepseek-r1-671b-model)\n| **DeepSeek R1 671B**     | [A3 Ultra (NVIDIA H200)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-ultra-vms)    | vLLM  | Inference   | GKE          | [Link](./inference/a3ultra/single-host-serving/vllm/README.md#serving-deepseek-r1-671b-model)\n\n### Checkpointing benchmarks\n\nModels            | GPU Machine Type                                                                                          | Framework | Workload Type | Orchestrator | Link to the recipe\n----------------- | --------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------\n**Llama-3.1-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo      | Pre-training using Google Cloud Storage buckets for checkpoints  | GKE          | [Link](./training/a3mega/llama3-1-70b/nemo-pretraining-gke-gcs/README.md)\n**Llama-3.1-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo      | Pre-training using Google Cloud Parallelstore for checkpoints  | GKE          | [Link](./training/a3mega/llama3-1-70b/nemo-pretraining-gke-parallelstore/README.md)\n\n### Goodput benchmarks\n\nModels            | GPU Machine Type                                                                                          | Framework | Workload Type | Orchestrator | Link to the recipe\n----------------- | --------------------------------------------------------------------------------------------------------- | --------- | ------------- | ------------ | ------------------\n**Llama-3.1-70B** | [A3 Mega (NVIDIA H100)](https://cloud.google.com/compute/docs/accelerator-optimized-machines#a3-mega-vms) | NeMo      | Pre-training using  the Google Cloud Resiliency library  | GKE          | [Link](./training/a3mega/llama3-1-70b/nemo-pretraining-gke-resiliency/README.md)\n\n## Repository structure\n\n* **[training/](./training)**: Contains recipes to reproduce training benchmarks with GPUs.\n* **[inference/](./inference)**: Contains recipes to reproduce inference benchmarks with GPUs.\n* **[src/](./src)**: Contains shared dependencies required to run benchmarks, such as Docker and Helm charts.\n* **[docs/](./docs)**: Contains supporting documentation for the recipes, such as explanation of benchmark methodologies or configurations.\n\n## Getting help\n\nIf you have any questions or if you found any problems with this repository, please report through GitHub issues.\n\n## Disclaimer\n\nThis is not an officially supported Google product. The code in this repository is for demonstrative purposes only.","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-hypercomputer%2Fgpu-recipes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fai-hypercomputer%2Fgpu-recipes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-hypercomputer%2Fgpu-recipes/lists"}