{"id":17980814,"url":"https://github.com/nvidia/apex","last_synced_at":"2025-05-13T20:06:40.230Z","repository":{"id":37431685,"uuid":"130725814","full_name":"NVIDIA/apex","owner":"NVIDIA","description":"A PyTorch Extension:  Tools for easy mixed precision and distributed training in Pytorch","archived":false,"fork":false,"pushed_at":"2025-04-11T02:10:33.000Z","size":16113,"stargazers_count":8646,"open_issues_count":746,"forks_count":1445,"subscribers_count":98,"default_branch":"master","last_synced_at":"2025-05-06T01:59:36.709Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NVIDIA.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-23T16:28:52.000Z","updated_at":"2025-05-05T15:17:50.000Z","dependencies_parsed_at":"2022-07-14T05:50:31.238Z","dependency_job_id":"491b57ed-0630-4554-b28e-8be2e69393e6","html_url":"https://github.com/NVIDIA/apex","commit_stats":{"total_commits":1051,"total_committers":128,"mean_commits":8.2109375,"dds":0.7649857278782113,"last_synced_commit":"2863aa02d1f0b2b5af3c27fb30e232b90a748b71"},"previous_names":[],"tags_count":6,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fapex","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fapex/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fapex/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NVIDIA%2Fapex/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NVIDIA","download_url":"https://codeload.github.com/NVIDIA/apex/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252606934,"owners_count":21775414,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-29T18:06:28.305Z","updated_at":"2025-05-13T20:06:40.223Z","avatar_url":"https://github.com/NVIDIA.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\n\nThis repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch.\nSome of the code here will be included in upstream Pytorch eventually.\nThe intent of Apex is to make up-to-date utilities available to users as quickly as possible.\n\n# Installation\nEach [`apex.contrib`](./apex/contrib) module requires one or more install options other than `--cpp_ext` and `--cuda_ext`.\nNote that contrib modules do not necessarily support stable PyTorch releases, some of them might only be compatible with nightlies.\n\n## Containers\nNVIDIA PyTorch Containers are available on NGC: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch.\nThe containers come with all the custom extensions available at the moment. \n\nSee [the NGC documentation](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/index.html) for details such as:\n- how to pull a container\n- how to run a pulled container\n- release notes\n\n## From Source\n\nTo install Apex from source, we recommend using the nightly Pytorch obtainable from https://github.com/pytorch/pytorch.\n\nThe latest stable release obtainable from https://pytorch.org should also work.\n\nWe recommend installing [`Ninja`](https://ninja-build.org/) to make compilation faster.\n\n### Linux\nFor performance and full functionality, we recommend installing Apex with\nCUDA and C++ extensions via\n```bash\ngit clone https://github.com/NVIDIA/apex\ncd apex\n# if pip \u003e= 23.1 (ref: https://pip.pypa.io/en/stable/news/#v23-1) which supports multiple `--config-settings` with the same key... \npip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" ./\n# otherwise\npip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\n```\n\nTo reduce the build time of APEX, parallel building can be enhanced via\n```bash\nNVCC_APPEND_FLAGS=\"--threads 4\" pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext --cuda_ext --parallel 8\" ./\n```\nWhen CPU cores or memory are limited, the `--parallel` option is generally preferred over `--threads`. See [pull#1882](https://github.com/NVIDIA/apex/pull/1882) for more details.\n\nAPEX also supports a Python-only build via\n```bash\npip install -v --disable-pip-version-check --no-build-isolation --no-cache-dir ./\n```\nA Python-only build omits:\n- Fused kernels required to use `apex.optimizers.FusedAdam`.\n- Fused kernels required to use `apex.normalization.FusedLayerNorm` and `apex.normalization.FusedRMSNorm`.\n- Fused kernels that improve the performance and numerical stability of `apex.parallel.SyncBatchNorm`.\n- Fused kernels that improve the performance of `apex.parallel.DistributedDataParallel` and `apex.amp`.\n`DistributedDataParallel`, `amp`, and `SyncBatchNorm` will still be usable, but they may be slower.\n\n\n### [Experimental] Windows\n`pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings \"--build-option=--cpp_ext\" --config-settings \"--build-option=--cuda_ext\" .` may work if you were able to build Pytorch from source\non your system. A Python-only build via `pip install -v --no-cache-dir .` is more likely to work.  \nIf you installed Pytorch in a Conda environment, make sure to install Apex in that same environment.\n\n\n## Custom C++/CUDA Extensions and Install Options\n\nIf a requirement of a module is not met, then it will not be built.\n\n|  Module Name  |  Install Option  |  Misc  |\n|---------------|------------------|--------|\n|  `apex_C`     |  `--cpp_ext`     | |\n|  `amp_C`      |  `--cuda_ext`    | |\n|  `syncbn`     |  `--cuda_ext`    | |\n|  `fused_layer_norm_cuda`  |  `--cuda_ext`  | [`apex.normalization`](./apex/normalization) |\n|  `mlp_cuda`   |  `--cuda_ext`    | |\n|  `scaled_upper_triang_masked_softmax_cuda`  |  `--cuda_ext`  | |\n|  `generic_scaled_masked_softmax_cuda`  |  `--cuda_ext`  | |\n|  `scaled_masked_softmax_cuda`  |  `--cuda_ext`  | |\n|  `fused_weight_gradient_mlp_cuda`  |  `--cuda_ext`  | Requires CUDA\u003e=11 |\n|  `permutation_search_cuda`  |  `--permutation_search`  | [`apex.contrib.sparsity`](./apex/contrib/sparsity)  |\n|  `bnp`        |  `--bnp`         |  [`apex.contrib.groupbn`](./apex/contrib/groupbn) |\n|  `xentropy`   |  `--xentropy`    |  [`apex.contrib.xentropy`](./apex/contrib/xentropy)  |\n|  `focal_loss_cuda`  |  `--focal_loss`  |  [`apex.contrib.focal_loss`](./apex/contrib/focal_loss)  |\n|  `fused_index_mul_2d`  |  `--index_mul_2d`  |  [`apex.contrib.index_mul_2d`](./apex/contrib/index_mul_2d)  |\n|  `fused_adam_cuda`  |  `--deprecated_fused_adam`  |  [`apex.contrib.optimizers`](./apex/contrib/optimizers)  |\n|  `fused_lamb_cuda`  |  `--deprecated_fused_lamb`  |  [`apex.contrib.optimizers`](./apex/contrib/optimizers)  |\n|  `fast_layer_norm`  |  `--fast_layer_norm`  |  [`apex.contrib.layer_norm`](./apex/contrib/layer_norm). different from `fused_layer_norm` |\n|  `fmhalib`    |  `--fmha`        |  [`apex.contrib.fmha`](./apex/contrib/fmha)  |\n|  `fast_multihead_attn`  |  `--fast_multihead_attn`  |  [`apex.contrib.multihead_attn`](./apex/contrib/multihead_attn)  |\n|  `transducer_joint_cuda`  |  `--transducer`  |  [`apex.contrib.transducer`](./apex/contrib/transducer)  |\n|  `transducer_loss_cuda`   |  `--transducer`  |  [`apex.contrib.transducer`](./apex/contrib/transducer)  |\n|  `cudnn_gbn_lib`  |  `--cudnn_gbn`  | Requires cuDNN\u003e=8.5, [`apex.contrib.cudnn_gbn`](./apex/contrib/cudnn_gbn) |\n|  `peer_memory_cuda`  |  `--peer_memory`  |  [`apex.contrib.peer_memory`](./apex/contrib/peer_memory)  |\n|  `nccl_p2p_cuda`  |  `--nccl_p2p`  | Requires NCCL \u003e= 2.10, [`apex.contrib.nccl_p2p`](./apex/contrib/nccl_p2p)  |\n|  `fast_bottleneck`  |  `--fast_bottleneck`  |  Requires `peer_memory_cuda` and `nccl_p2p_cuda`, [`apex.contrib.bottleneck`](./apex/contrib/bottleneck) |\n|  `fused_conv_bias_relu`  |  `--fused_conv_bias_relu`  | Requires cuDNN\u003e=8.4, [`apex.contrib.conv_bias_relu`](./apex/contrib/conv_bias_relu) |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Fapex","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnvidia%2Fapex","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnvidia%2Fapex/lists"}