https://github.com/tybrucechen/tutorial-conda-cudnn-nccl-installation-for-pytorch

This is a tutorial for installing CUDA (v11.8) and cuDNN (8.6.9) to enable programming torch with GPU. It also mentions about implementation of NCCL for distributed GPU DNN model training.
https://github.com/tybrucechen/tutorial-conda-cudnn-nccl-installation-for-pytorch

cuda-installation nccl pytorch-installation ubuntu ubuntu-server windows-10

Last synced: about 2 months ago
JSON representation

This is a tutorial for installing CUDA (v11.8) and cuDNN (8.6.9) to enable programming torch with GPU. It also mentions about implementation of NCCL for distributed GPU DNN model training.

Host: GitHub
URL: https://github.com/tybrucechen/tutorial-conda-cudnn-nccl-installation-for-pytorch
Owner: TyBruceChen
Created: 2024-05-23T13:15:58.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-12-20T06:57:50.000Z (5 months ago)
Last Synced: 2025-02-14T08:27:32.638Z (3 months ago)
Topics: cuda-installation, nccl, pytorch-installation, ubuntu, ubuntu-server, windows-10
Language: Jupyter Notebook
Homepage:
Size: 1.79 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Conda-Installation-Tutorial-Windows10 (for Linux (Ubuntu18), click [here](https://github.com/TyBruceChen/Tutorial-Conda-cuDNN-NCCL-installation-for-Pytorch/blob/main/README_virtual_env_Ubuntu18_server.md)) (for Pytorch distributed GPU training with NCCL (as well as by [Accelerate class](https://huggingface.co/docs/accelerate/v0.12.0/en/basic_tutorials/launch)), click [here](https://github.com/TyBruceChen/Tutorial-Conda-cuDNN-NCCL-installation-for-Pytorch/blob/main/README_nccl_distributed_compute.md))
This is a tutorial for installing CUDA (v11.8) and cuDNN (8.6.9) to enable programming Pytorch with GPU. It also mentioned about the solution of unabling for Pytorch to detect the CUDA core.

**Claim:** This tutorial was done when I came back from abroad at NAU. I found my computer like a stranger so I devoted myself to re-install the whole system. Thus the CUDA environment needs to be re-configured, where I have met several obstacles while doing this although it's my third (or fourth time...? I do not remember) to do this. So I decided to do a full-scope tutorial to record the problem I met and its corresponding solution which may help me in the future and others.

**Suggestion: Install the CUDA first then install the corresponding CUDA-compatible Pytorch**

### 1. Check the compatibility:

1. Check whether your GPU is compatible with CUDA (and the supported CUDA version) at https://developer.nvidia.com/cuda-gpus, and update your driver.
2. Confirm the version of CUDA that you want and can install with (you can search for a specific version). In this tutorial, it's v11.8.
3. Find the compatible version of cuDNN (plug-in for optimizing AI training) at https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/windows-x86_64/. In this tutorial, it's 8.6.0.

### 2. Install CUDA:

If the high version fails to install, you can try the older version.
Here I check all the checkboxes.

### 3. Verify the installation of CUDA:

First go the settings -> edit system environment variable (path) -> ```Environment Variables...```

-> Under System variables, ensure the existence of these two paths as below:
![image](https://github.com/TyBruceChen/Conda-Installation-Tutorial-Windows-/assets/152252677/1c2685ad-58b3-4188-b908-00753c04accf)

-> Also under this column, find ```Path```, double click to open, and add these two paths if they did not exist:
![image](https://github.com/TyBruceChen/Conda-Installation-Tutorial-Windows-/assets/152252677/150d6a50-4352-46e4-a370-9d36d4968d53)

Second, to verify the system has already detected installed CUDA, type the command ```nvcc --version``` in the command prompt, the displayed version should match with the CUDA version you just installed.

### 4. Add Plug-in: cuDNN to CUDA:

Copy all the files (folders) of the downloaded cuDNN zip file that is compatible with your CUDA version, and paste them under the CUDA folder (in my case, it's ```C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\```) to finish those paste action at one time.

### 5. Installation of compatible Pytorch:

This step was where I got stuck and spent most of my time working it out.
Run the similar command to install cuda corresponding version of Pytorch: ```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118``` (here cu118 indicates the CUDA version is 11.8). The proper size of the main package of this installation should be near 2.4GB.

### 6. Verification:

```
import torch
torch.cuda.is_available()
```
should return ```True``` value

### 7. Some torch functions that may helps debugging if error occurs:

Click [here](https://github.com/TyBruceChen/Conda-Installation-Tutorial-Windows-/blob/main/debug.ipynb)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tybrucechen/tutorial-conda-cudnn-nccl-installation-for-pytorch

Awesome Lists containing this project

README