{"id":13560677,"url":"https://github.com/pytorch/xla","last_synced_at":"2025-05-16T00:00:42.980Z","repository":{"id":37542584,"uuid":"156293205","full_name":"pytorch/xla","owner":"pytorch","description":"Enabling PyTorch on XLA Devices (e.g. Google TPU)","archived":false,"fork":false,"pushed_at":"2025-05-08T23:02:53.000Z","size":80595,"stargazers_count":2596,"open_issues_count":798,"forks_count":518,"subscribers_count":54,"default_branch":"master","last_synced_at":"2025-05-08T23:34:10.468Z","etag":null,"topics":["compiler","deep-learning","pytorch","xla"],"latest_commit_sha":null,"homepage":"https://pytorch.org/xla","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/pytorch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2018-11-05T22:42:04.000Z","updated_at":"2025-05-08T22:25:15.000Z","dependencies_parsed_at":"2023-10-16T10:32:17.468Z","dependency_job_id":"325e3dd5-22d1-474c-bc9e-627c4e04f9df","html_url":"https://github.com/pytorch/xla","commit_stats":{"total_commits":5091,"total_committers":228,"mean_commits":22.32894736842105,"dds":0.784325279905716,"last_synced_commit":"ea8c47a345a29c6a1b1bdf4ee38a9159b07a980f"},"previous_names":[],"tags_count":54,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fxla","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fxla/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fxla/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/pytorch%2Fxla/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/pytorch","download_url":"https://codeload.github.com/pytorch/xla/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254442854,"owners_count":22071877,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["compiler","deep-learning","pytorch","xla"],"created_at":"2024-08-01T13:00:48.592Z","updated_at":"2025-05-16T00:00:42.920Z","avatar_url":"https://github.com/pytorch.png","language":"Python","readme":"# PyTorch/XLA\n\n\u003cb\u003eCurrent CI status:\u003c/b\u003e  ![GitHub Actions\nstatus](https://github.com/pytorch/xla/actions/workflows/build_and_test.yml/badge.svg)\n\nPyTorch/XLA is a Python package that uses the [XLA deep learning\ncompiler](https://www.tensorflow.org/xla) to connect the [PyTorch deep learning\nframework](https://pytorch.org/) and [Cloud\nTPUs](https://cloud.google.com/tpu/). You can try it right now, for free, on a\nsingle Cloud TPU VM with\n[Kaggle](https://www.kaggle.com/discussions/product-feedback/369338)!\n\nTake a look at one of our [Kaggle\nnotebooks](https://github.com/pytorch/xla/tree/master/contrib/kaggle) to get\nstarted:\n\n* [Stable Diffusion with PyTorch/XLA\n  2.0](https://github.com/pytorch/xla/blob/master/contrib/kaggle/pytorch-xla-2-0-on-kaggle.ipynb)\n* [Distributed PyTorch/XLA\n  Basics](https://github.com/pytorch/xla/blob/master/contrib/kaggle/distributed-pytorch-xla-basics-with-pjrt.ipynb)\n\n## Installation\n\n### TPU\n\nTo install PyTorch/XLA stable build in a new TPU VM:\n\n```sh\npip install torch==2.7.0 'torch_xla[tpu]==2.7.0'\n```\n\nTo install PyTorch/XLA nightly build in a new TPU VM:\n\n```sh\npip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cpu\n# Edit `cp310-cp310` to fit your desired Python version as needed\npip install 'torch_xla[tpu] @ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev-cp310-cp310-linux_x86_64.whl' \\\n  -f https://storage.googleapis.com/libtpu-wheels/index.html\n```\n\n### C++11 ABI builds\n**As of 03/18/2025 and starting from Pytorch/XLA 2.7 release, C++11 ABI builds\nare the default and we no longer provide wheels built with pre-C++11 ABI.**\n\nIn Pytorch/XLA 2.6, we'll provide wheels and docker images built with\ntwo C++ ABI flavors: C++11 and pre-C++11. Pre-C++11 is the default to align with\nPyTorch upstream, but C++11 ABI wheels and docker images have better lazy tensor\ntracing performance.\n\nTo install C++11 ABI flavored 2.6 wheels (Python 3.10 example):\n\n```sh\npip install torch==2.6.0+cpu.cxx11.abi \\\n  https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0%2Bcxx11-cp310-cp310-manylinux_2_28_x86_64.whl \\\n  'torch_xla[tpu]' \\\n  -f https://storage.googleapis.com/libtpu-releases/index.html \\\n  -f https://storage.googleapis.com/libtpu-wheels/index.html \\\n  -f https://download.pytorch.org/whl/torch\n```\n\nThe above command works for Python 3.10. We additionally have Python 3.9 and 3.11\nwheels:\n\n- 3.9: https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0%2Bcxx11-cp39-cp39-manylinux_2_28_x86_64.whl\n- 3.10: https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0%2Bcxx11-cp310-cp310-manylinux_2_28_x86_64.whl\n- 3.11: https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0%2Bcxx11-cp311-cp311-manylinux_2_28_x86_64.whl\n\nTo access C++11 ABI flavored docker image:\n\n```\nus-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.6.0_3.10_tpuvm_cxx11\n```\n\nIf your model is tracing bound (e.g. you see that the host CPU is busy tracing\nthe model while TPUs are idle), switching to the C++11 ABI wheels/docker images\ncan improve performance. Mixtral 8x7B benchmarking results on v5p-256, global\nbatch size 1024:\n\n- Pre-C++11 ABI MFU: 33%\n- C++ ABI MFU: 39%\n\n\n## Github Doc Map\n\nOur github contains many useful docs on working with different aspects of PyTorch XLA, here is a list of useful docs spread around our repository:\n\n- [docs/source/learn](https://github.com/pytorch/xla/tree/master/docs/source/learn): docs for learning concepts associated with XLA, troubleshooting, pjrt, eager mode, and dynamic shape.\n- [docs/source/accelerators](https://github.com/pytorch/xla/tree/master/docs/source/accelerators): references to `GPU` and `TPU` accelerator documents.\n- [docs/source/perf](https://github.com/pytorch/xla/tree/master/docs/source/perf): documentation about performance specific aspects of PyTorch/XLA such as: `AMP`, `DDP`, `Dynamo`, Fori loop, `FSDP`, quantization, recompilation, and `SPMD`\n- [docs/source/features](https://github.com/pytorch/xla/tree/master/docs/source/features): documentation on distributed torch, pallas, scan, stable hlo, and triton.\n- [docs/source/contribute](https://github.com/pytorch/xla/tree/master/docs/source/contribute): documents on setting up PyTorch for development, and guides for lowering operations.\n- PJRT plugins:\n  - [CPU](https://github.com/pytorch/xla/blob/master/plugins/cpu/README.md)\n  - [CUDA](https://github.com/pytorch/xla/blob/master/plugins/cuda/README.md)\n- [torchax/docs](https://github.com/pytorch/xla/tree/master/torchax/docs): torchax documents\n  - [torchax/examples](https://github.com/pytorch/xla/tree/master/torchax/examples): torchax examples\n\n## Getting Started\n\nFollowing here are guides for two modes:\n- Single process: one Python interpreter controlling a single GPU/TPU at a time\n- Multi process: N Python interpreters are launched, corresponding to N GPU/TPUs\nfound on the system\n\nAnother mode is SPMD, where one Python interpreter controls all N GPU/TPUs found on\nthe system. Multi processing is more complex, and is not compatible with SPMD. This\ntutorial does not dive into SPMD. For more on that, check our\n[SPMD guide](https://github.com/pytorch/xla/blob/master/docs/source/perf/spmd_basic.md).\n\n### Simple single process\n\nTo update your exisitng training loop, make the following changes:\n\n```diff\n+import torch_xla\n\n def train(model, training_data, ...):\n   ...\n   for inputs, labels in train_loader:\n+    with torch_xla.step():\n       inputs, labels = training_data[i]\n+      inputs, labels = inputs.to('xla'), labels.to('xla')\n       optimizer.zero_grad()\n       outputs = model(inputs)\n       loss = loss_fn(outputs, labels)\n       loss.backward()\n       optimizer.step()\n\n+  torch_xla.sync()\n   ...\n\n if __name__ == '__main__':\n   ...\n+  # Move the model paramters to your XLA device\n+  model.to('xla')\n   train(model, training_data, ...)\n   ...\n```\n\nThe changes above should get your model to train on the TPU.\n\n### Multi processing\n\nTo update your existing training loop, make the following changes:\n\n```diff\n-import torch.multiprocessing as mp\n+import torch_xla\n+import torch_xla.core.xla_model as xm\n\n def _mp_fn(index):\n   ...\n\n+  # Move the model paramters to your XLA device\n+  model.to(torch_xla.device())\n\n   for inputs, labels in train_loader:\n+    with torch_xla.step():\n+      # Transfer data to the XLA device. This happens asynchronously.\n+      inputs, labels = inputs.to(torch_xla.device()), labels.to(torch_xla.device())\n       optimizer.zero_grad()\n       outputs = model(inputs)\n       loss = loss_fn(outputs, labels)\n       loss.backward()\n-      optimizer.step()\n+      # `xm.optimizer_step` combines gradients across replicas\n+      xm.optimizer_step(optimizer)\n\n if __name__ == '__main__':\n-  mp.spawn(_mp_fn, args=(), nprocs=world_size)\n+  # torch_xla.launch automatically selects the correct world size\n+  torch_xla.launch(_mp_fn, args=())\n```\n\nIf you're using `DistributedDataParallel`, make the following changes:\n\n\n```diff\n import torch.distributed as dist\n-import torch.multiprocessing as mp\n+import torch_xla as xla\n+import torch_xla.distributed.xla_backend\n\n def _mp_fn(rank):\n   ...\n\n-  os.environ['MASTER_ADDR'] = 'localhost'\n-  os.environ['MASTER_PORT'] = '12355'\n-  dist.init_process_group(\"gloo\", rank=rank, world_size=world_size)\n+  # Rank and world size are inferred from the XLA device runtime\n+  dist.init_process_group(\"xla\", init_method='xla://')\n+\n+  model.to(xm.xla_device())\n+  ddp_model = DDP(model, gradient_as_bucket_view=True)\n\n-  model = model.to(rank)\n-  ddp_model = DDP(model, device_ids=[rank])\n\n   for inputs, labels in train_loader:\n+    with xla.step():\n+      inputs, labels = inputs.to(xla.device()), labels.to(xla.device())\n       optimizer.zero_grad()\n       outputs = ddp_model(inputs)\n       loss = loss_fn(outputs, labels)\n       loss.backward()\n       optimizer.step()\n\n if __name__ == '__main__':\n-  mp.spawn(_mp_fn, args=(), nprocs=world_size)\n+  xla.launch(_mp_fn, args=())\n```\n\nAdditional information on PyTorch/XLA, including a description of its semantics\nand functions, is available at [PyTorch.org](http://pytorch.org/xla/). See the\n[API Guide](API_GUIDE.md) for best practices when writing networks that run on\nXLA devices (TPU, CUDA, CPU and...).\n\nOur comprehensive user guides are available at:\n\n[Documentation for the latest release](https://pytorch.org/xla)\n\n[Documentation for master branch](https://pytorch.org/xla/master)\n\n\n## PyTorch/XLA tutorials\n\n* [Cloud TPU VM\n  quickstart](https://cloud.google.com/tpu/docs/run-calculation-pytorch)\n* [Cloud TPU Pod slice\n  quickstart](https://cloud.google.com/tpu/docs/pytorch-pods)\n* [Profiling on TPU\n  VM](https://cloud.google.com/tpu/docs/pytorch-xla-performance-profiling-tpu-vm)\n* [GPU guide](docs/gpu.md)\n\n## Reference implementations\n\nThe [AI-Hypercomputer/tpu-recipes](https://github.com/AI-Hypercomputer/tpu-recipes)\nrepo. contains examples for training and serving many LLM and diffusion models.\n\n## Available docker images and wheels\n\n### Python packages\n\nPyTorch/XLA releases starting with version r2.1 will be available on PyPI. You\ncan now install the main build with `pip install torch_xla`. To also install the\nCloud TPU plugin corresponding to your installed `torch_xla`, install the optional `tpu` dependencies after installing the main build with\n\n```\npip install torch_xla[tpu]\n```\n\nGPU release builds and GPU/TPU nightly builds are available in our public GCS bucket.\n\n| Version | Cloud GPU VM Wheels |\n| --- | ----------- |\n| 2.7 (CUDA 12.6 + Python 3.9) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.6/torch_xla-2.7.0-cp39-cp39-manylinux_2_28_x86_64.whl` |\n| 2.7 (CUDA 12.6 + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.6/torch_xla-2.7.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.7 (CUDA 12.6 + Python 3.11) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl` |\n| nightly (Python 3.9) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev-cp39-cp39-linux_x86_64.whl` |\n| nightly (Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev-cp310-cp310-linux_x86_64.whl` |\n| nightly (Python 3.11) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev-cp311-cp311-linux_x86_64.whl` |\n| nightly (CUDA 12.6 + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.6/torch_xla-2.8.0.dev-cp310-cp310-linux_x86_64.whl` |\n\n#### Use nightly build\n\nYou can also add `yyyymmdd` like `torch_xla-2.8.0.devyyyymmdd` (or the latest dev version)\nto get the nightly wheel of a specified date. Here is an example:\n\n```\npip3 install torch==2.8.0.dev20250423+cpu --index-url https://download.pytorch.org/whl/nightly/cpu\npip3 install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev20250423-cp310-cp310-linux_x86_64.whl\n```\n\nThe torch wheel version `2.8.0.dev20250423+cpu` can be found at https://download.pytorch.org/whl/nightly/torch/.\n\n\u003cdetails\u003e\n\n\u003csummary\u003eolder versions\u003c/summary\u003e\n\n| Version | Cloud TPU VMs Wheel |\n|---------|-------------------|\n| 2.6 (Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.6.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.5 (Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.4 (Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.3 (Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.2 (Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.1 (XRT + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/xrt/tpuvm/torch_xla-2.1.0%2Bxrt-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.1 (Python 3.8) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.1.0-cp38-cp38-linux_x86_64.whl` |\n\n\u003cbr/\u003e\n\n| Version | GPU Wheel |\n| --- | ----------- |\n| 2.5 (CUDA 12.1 + Python 3.9) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl` |\n| 2.5 (CUDA 12.1 + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.5 (CUDA 12.1 + Python 3.11) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl` |\n| 2.5 (CUDA 12.4 + Python 3.9) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp39-cp39-manylinux_2_28_x86_64.whl` |\n| 2.5 (CUDA 12.4 + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.5 (CUDA 12.4 + Python 3.11) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.4/torch_xla-2.5.0-cp311-cp311-manylinux_2_28_x86_64.whl` |\n| 2.4 (CUDA 12.1 + Python 3.9) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp39-cp39-manylinux_2_28_x86_64.whl` |\n| 2.4 (CUDA 12.1 + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.4 (CUDA 12.1 + Python 3.11) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.4.0-cp311-cp311-manylinux_2_28_x86_64.whl` |\n| 2.3 (CUDA 12.1 + Python 3.8) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp38-cp38-manylinux_2_28_x86_64.whl` |\n| 2.3 (CUDA 12.1 + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.3 (CUDA 12.1 + Python 3.11) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.3.0-cp311-cp311-manylinux_2_28_x86_64.whl` |\n| 2.2 (CUDA 12.1 + Python 3.8) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp38-cp38-manylinux_2_28_x86_64.whl` |\n| 2.2 (CUDA 12.1 + Python 3.10) | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.1/torch_xla-2.2.0-cp310-cp310-manylinux_2_28_x86_64.whl` |\n| 2.1 + CUDA 11.8 | `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/11.8/torch_xla-2.1.0-cp38-cp38-manylinux_2_28_x86_64.whl` |\n| nightly + CUDA 12.0 \u003e= 2023/06/27| `https://storage.googleapis.com/pytorch-xla-releases/wheels/cuda/12.0/torch_xla-nightly-cp38-cp38-linux_x86_64.whl` |\n\n\u003c/details\u003e\n\n### Docker\nNOTE: Since PyTorch/XLA 2.7, all builds will use the C++11 ABI by default\n| Version | Cloud TPU VMs Docker |\n| --- | ----------- |\n| 2.7 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.7.0_3.10_tpuvm` |\n| 2.6 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.6.0_3.10_tpuvm` |\n| 2.6 (C++11 ABI) | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.6.0_3.10_tpuvm_cxx11` |\n| 2.5 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_tpuvm` |\n| 2.4 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_tpuvm` |\n| 2.3 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_tpuvm` |\n| 2.2 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_tpuvm` |\n| 2.1 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_tpuvm` |\n| nightly python | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm` |\n\nTo use the above dockers, please pass `--privileged --net host --shm-size=16G` along. Here is an example:\n```bash\ndocker run --privileged --net host --shm-size=16G -it us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm /bin/bash\n```\n\u003cbr/\u003e\n\n| Version | GPU CUDA 12.6 Docker |\n| --- | ----------- |\n| 2.7 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.7.0_3.10_cuda_12.6` |\n\n\n\u003cbr/\u003e\n\n\n| Version | GPU CUDA 12.4 Docker |\n| --- | ----------- |\n| 2.5 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.4` |\n| 2.4 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.4` |\n\n\u003cbr/\u003e\n\n\n| Version | GPU CUDA 12.1 Docker |\n| --- | ----------- |\n| 2.5 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.5.0_3.10_cuda_12.1` |\n| 2.4 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.4.0_3.10_cuda_12.1` |\n| 2.3 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.3.0_3.10_cuda_12.1` |\n| 2.2 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.2.0_3.10_cuda_12.1` |\n| 2.1 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_12.1` |\n| nightly | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1` |\n| nightly at date | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.8_cuda_12.1_YYYYMMDD` |\n\n\u003cbr/\u003e\n\n| Version | GPU CUDA 11.8 + Docker |\n| --- | ----------- |\n| 2.1 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.1.0_3.10_cuda_11.8` |\n| 2.0 | `us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:r2.0_3.8_cuda_11.8` |\n\n\u003cbr/\u003e\n\n\nTo run on [compute instances with\nGPUs](https://cloud.google.com/compute/docs/gpus/create-vm-with-gpus).\n\n## Troubleshooting\n\nIf PyTorch/XLA isn't performing as expected, see the [troubleshooting\nguide](docs/source/learn/troubleshoot.md), which has suggestions for debugging and optimizing\nyour network(s).\n\n## Providing Feedback\n\nThe PyTorch/XLA team is always happy to hear from users and OSS contributors!\nThe best way to reach out is by filing an issue on this Github. Questions, bug\nreports, feature requests, build issues, etc. are all welcome!\n\n## Contributing\n\nSee the [contribution guide](CONTRIBUTING.md).\n\n## Disclaimer\n\nThis repository is jointly operated and maintained by Google, Meta and a\nnumber of individual contributors listed in the\n[CONTRIBUTORS](https://github.com/pytorch/xla/graphs/contributors) file. For\nquestions directed at Meta, please send an email to opensource@fb.com. For\nquestions directed at Google, please send an email to\npytorch-xla@googlegroups.com. For all other questions, please open up an issue\nin this repository [here](https://github.com/pytorch/xla/issues).\n\n## Additional Reads\n\nYou can find additional useful reading materials in\n* [Performance debugging on Cloud TPU\n  VM](https://cloud.google.com/blog/topics/developers-practitioners/pytorchxla-performance-debugging-tpu-vm-part-1)\n* [Lazy tensor\n  intro](https://pytorch.org/blog/understanding-lazytensor-system-performance-with-pytorch-xla-on-cloud-tpu/)\n* [Scaling deep learning workloads with PyTorch / XLA and Cloud TPU\n  VM](https://cloud.google.com/blog/topics/developers-practitioners/scaling-deep-learning-workloads-pytorch-xla-and-cloud-tpu-vm)\n* [Scaling PyTorch models on Cloud TPUs with\n  FSDP](https://pytorch.org/blog/scaling-pytorch-models-on-cloud-tpus-with-fsdp/)\n\n## Related Projects\n\n* [OpenXLA](https://github.com/openxla)\n* [HuggingFace](https://huggingface.co/docs/accelerate/en/basic_tutorials/tpu)\n* [JetStream](https://github.com/google/JetStream-pytorch)\n","funding_links":[],"categories":["C++","Pytorch \u0026 related libraries｜Pytorch \u0026 相关库","Pytorch \u0026 related libraries","Optimizations and fine-tuning","\u003ca name=\"cpp\"\u003e\u003c/a\u003eC++"],"sub_categories":["Other libraries｜其他库:","Other libraries:"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Fxla","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpytorch%2Fxla","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpytorch%2Fxla/lists"}