{"id":18843450,"url":"https://github.com/ai-hypercomputer/xpk","last_synced_at":"2026-01-23T14:47:13.476Z","repository":{"id":204051071,"uuid":"652846631","full_name":"AI-Hypercomputer/xpk","owner":"AI-Hypercomputer","description":"xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.","archived":false,"fork":false,"pushed_at":"2025-02-26T19:51:01.000Z","size":2207,"stargazers_count":105,"open_issues_count":37,"forks_count":31,"subscribers_count":22,"default_branch":"main","last_synced_at":"2025-02-26T20:37:11.010Z","etag":null,"topics":["gcloud","gke","tpu"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AI-Hypercomputer.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"docs/contributing.md","funding":null,"license":"LICENSE","code_of_conduct":"docs/code-of-conduct.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-12T23:14:49.000Z","updated_at":"2025-02-25T20:13:00.000Z","dependencies_parsed_at":"2023-12-12T20:33:41.651Z","dependency_job_id":"add00a00-8552-4bbe-98b0-96b5b2a582a8","html_url":"https://github.com/AI-Hypercomputer/xpk","commit_stats":null,"previous_names":["google/xpk","ai-hypercomputer/xpk"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fxpk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fxpk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fxpk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AI-Hypercomputer%2Fxpk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AI-Hypercomputer","download_url":"https://codeload.github.com/AI-Hypercomputer/xpk/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247266563,"owners_count":20910836,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gcloud","gke","tpu"],"created_at":"2024-11-08T02:57:50.407Z","updated_at":"2026-01-23T14:47:13.464Z","avatar_url":"https://github.com/AI-Hypercomputer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!--\n Copyright 2025 Google LLC\n\n Licensed under the Apache License, Version 2.0 (the \"License\");\n you may not use this file except in compliance with the License.\n You may obtain a copy of the License at\n\n      https://www.apache.org/licenses/LICENSE-2.0\n\n Unless required by applicable law or agreed to in writing, software\n distributed under the License is distributed on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n See the License for the specific language governing permissions and\n limitations under the License.\n --\u003e\n\n[![Build Tests](https://github.com/google/xpk/actions/workflows/build_tests.yaml/badge.svg?query=branch%3Amain)](https://github.com/google/xpk/actions/workflows/build_tests.yaml?query=branch%3Amain)\n[![Nightly Tests](https://github.com/google/xpk/actions/workflows/nightly_tests.yaml/badge.svg?query=branch%3Amain)](https://github.com/google/xpk/actions/workflows/nightly_tests.yaml?query=branch%3Amain)\n\n# Overview\n\nXPK (Accelerated Processing Kit, pronounced x-p-k) is a command line interface that simplifies cluster creation and workload execution on Google Kubernetes Engine (GKE). XPK generates preconfigured, training-optimized clusters and allows easy workload scheduling without any Kubernetes expertise.\n\nXPK is recommended for quick creation of GKE clusters for proofs of concepts and testing.\n\nXPK decouples provisioning capacity from running jobs. There are two structures: clusters (provisioned VMs) and workloads (training jobs). Clusters represent the physical resources you have available. Workloads represent training jobs -- at any time some of these will be completed, others will be running and some will be queued, waiting for cluster resources to become available.\n\nThe ideal workflow starts by provisioning the clusters for all of the ML\nhardware you have reserved. Then, without re-provisioning, submit jobs as\nneeded. By eliminating the need for re-provisioning between jobs, using Docker\ncontainers with pre-installed dependencies and cross-ahead of time compilation,\nthese queued jobs run with minimal start times. Further, because workloads\nreturn the hardware back to the shared pool when they complete, developers can\nachieve better use of finite hardware resources. And automated tests can run\novernight while resources tend to be underutilized.\n\nXPK supports a variety of hardware accelerators.\n| Accelerator | Type | Recipes |\n| :--- | :--- | :--- |\n| **Ironwood** | tpu7x | [Run training workload with Ironwood and regular/gSC/DWS Calendar reservations using GCS Bucket storage](./docs/usage/tpu7x/recipes/reservation_gcs_bucket_recipe.md)\u003cbr\u003e[Run training workload with Ironwood with flex-start using Filestore storage](./docs/usage/tpu7x/recipes/flex_filestore_recipe.md)\u003cbr\u003e[Run training workload with Ironwood and flex-start using Lustre storage](./docs/usage/tpu7x/recipes/flex_lustre_recipe.md) |\n| **Trillium** | v6e | [Create Cluster](./docs/usage/clusters.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n| **TPU v5p** | v5p | [Create Cluster](./docs/usage/clusters.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n| **TPU v5e** | v5e | [Create Cluster](./docs/usage/clusters.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n| **TPU v4** | v4 | [Create Cluster](./docs/usage/clusters.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n| **GPU A4X** | gb200 | [Create Cluster](./docs/usage/gpu.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n| **GPU A4** | b200 | [Create Cluster](./docs/usage/clusters.md#provisioning-a3-ultra-a3-mega-and-a4-clusters-gpu-machines)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md#workloads-for-a3-ultra-a3-mega-and-a4-clusters-gpu-machines) |\n| **GPU A3 Ultra** | h200 | [Create Cluster](./docs/usage/clusters.md#provisioning-a3-ultra-a3-mega-and-a4-clusters-gpu-machines)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md#workloads-for-a3-ultra-a3-mega-and-a4-clusters-gpu-machines) |\n| **GPU A3 Mega** | h100-mega | [Create Cluster](./docs/usage/clusters.md#provisioning-a3-ultra-a3-mega-and-a4-clusters-gpu-machines)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md#workloads-for-a3-ultra-a3-mega-and-a4-clusters-gpu-machines) |\n| **GPU A3 High** | h100 | [Create Cluster](./docs/usage/gpu.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n| **GPU A100** | A100 | [Create Cluster](./docs/usage/gpu.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n| **CPU** | n2-standard-32 | [Create Cluster](./docs/usage/cpu.md)\u003cbr\u003e[Create Workload](./docs/usage/workloads.md) |\n\nXPK also supports the following [Google Cloud Storage solutions](./docs/usage/storage.md):\n\n| Storage Type                               | Documentation                                                           |\n| ------------------------------------------ | ----------------------------------------------------------------------- |\n| Cloud Storage FUSE                         | [docs](./docs/usage/storage.md#fuse)                                    |\n| Filestore                                  | [docs](./docs/usage/storage.md#filestore)                               |\n| Parallelstore                              | [docs](./docs/usage/storage.md#parallelstore)                           |\n| Block storage (Persistent Disk, Hyperdisk) | [docs](./docs/usage/storage.md#block-storage-persistent-disk-hyperdisk) |\n\n# Documentation\n\n- [Permissions](./docs/permissions.md)\n- [Installation](./docs/installation.md)\n- Usage:\n  - [Clusters](./docs/usage/clusters.md)\n    - [GPU](./docs/usage/gpu.md)\n    - [CPU](./docs/usage/cpu.md)\n    - [Autoprovisioning](./docs/usage/autoprovisioning.md)\n  - [Workloads](./docs/usage/workloads.md)\n    - [Docker](./docs/usage/docker.md)\n  - [Storage](./docs/usage/storage.md)\n  - [Advanced](./docs/usage/advanced.md)\n  - [Inspector](./docs/usage/inspector.md)\n- [Troubleshooting](./docs/troubleshooting.md)\n\n# Dependencies\n\n| Dependency                                                                                                   | When used                   |\n| ------------------------------------------------------------------------------------------------------------ | --------------------------- |\n| [Google Cloud SDK (gcloud)](https://cloud.google.com/sdk/docs/install)                                       | _always_                    |\n| [kubectl](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#install_kubectl) | _always_                    |\n| [ClusterToolkit](https://github.com/GoogleCloudPlatform/cluster-toolkit)                                     | Provisioning GPU clusters   |\n| [Kueue](https://github.com/kubernetes-sigs/kueue)                                                            | Scheduling workloads        |\n| [JobSet](https://github.com/kubernetes-sigs/jobset)                                                          | Workload creation           |\n| [Docker](https://docs.docker.com/engine/install/)                                                            | Building workload container |\n| [CoreDNS](https://github.com/coredns/deployment/tree/master/kubernetes)                                      | Cluster set up              |\n| [PathwaysJob](https://github.com/google/pathways-job)                                                        | Running Pathways workloads  |\n\n# Privacy notice\n\nTo help improve XPK, feature usage statistics are collected and sent to Google. You can opt-out at any time by executing\nthe following shell command:\n\n```shell\nxpk config set send-telemetry \u003ctrue/false\u003e\n```\n\nXPK telemetry overall is handled in accordance with the [Google Privacy Policy](https://policies.google.com/privacy). When\nyou use XPK to interact with or utilize GCP Services, your information is handled in accordance with the\n[Google Cloud Privacy Notice](https://cloud.google.com/terms/cloud-privacy-notice).\n\n# Contributing\n\nPlease read [`contributing.md`](./docs/contributing.md) for details on our code of conduct, and the process for submitting pull requests to us.\n\n# Get involved\n\nWe'd love to hear from you! If you have questions or want to discuss ideas, join us on [GitHub Discussions](https://github.com/AI-Hypercomputer/xpk/discussions). Found a bug or have a feature request? Please let us know on [GitHub Issues](https://github.com/AI-Hypercomputer/xpk/issues).\n\n# License\n\nThis project is licensed under the Apache License 2.0 - see the [`LICENSE`](./LICENSE) file for details\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-hypercomputer%2Fxpk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fai-hypercomputer%2Fxpk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fai-hypercomputer%2Fxpk/lists"}