An open API service indexing awesome lists of open source software.

https://github.com/charlesq34/diy-deep-learning-workstation

Build a deep learning workstation from scratch (HW & SW).
https://github.com/charlesq34/diy-deep-learning-workstation

cuda deep-learning gpu ubuntu workstations

Last synced: 1 day ago
JSON representation

Build a deep learning workstation from scratch (HW & SW).

Awesome Lists containing this project

README

        

# DIY-Deep-Learning-Workstation
Build a deep learning workstation from scratch. While this document is written for Ubuntu 14.04 with TensorFlow, most steps should also apply to other Ubuntu versions and deep leanring frameworks.

Have fun DIY-ing!

### Contents
1. [Build a GPU Workstation](#1-build-a-gpu-workstation)
2. [Install Ubuntu](#2-install-ubuntu)
3. [Install CUDA and cuDNN](#3-install-cuda-and-cudnn)
4. [Install TensorFlow](#4-install-tensorflow)
5. [Others](#5-others)

**Disclaimer:** This document records my own experience and lessons learnt in building a workstation for deep learning. However, there is no gurantee on saftey or success of the construction. It's your own responsibility to maintain safety during the process. Please refer to professional IT service when you have question or meet trouble.

## 1. Build a GPU Workstation
Feel free to skip this section if you already have a GPU machine or plan to buy a pre-assembled one.

### Pick and purchase parts
In general, you need to have motherboard, CPU (including CPU fan), memory cards, power supply unit (PSU), hard drive (+SSD), case and graphics cards (GPUs). A good place to pick and compare computer parts is PC Part Picker. The website will help you check compatibility of components and provides price comparisons from multiple providers, which is really helpful.

You can find my part list HERE. The parts I bought are not completely the same as the list because, for convenience, I just bought all parts from Frys and choose parts depending on their availability. But at least you could get a sense of how much it costs and what parts are necessary. For my case it costs around $2.2k (May 2017) in total for a machine with two GTX 1080 GPUs, 32GB memory, 500GB SSD + 4TB hard drive with a quad-core i5 CPU and 850W power. Note that this spec is optimized for cost with strong constraint to preserve two GPUs. CPU, SSD and memory are compromised.

### Assemble the machine
Read the instructions of motherboard, power and case *carefully* before installation. There are also plenty of videos online on how to assemble a machine, which might be helpful if you are doing it the first time.

A brief guideline on the assembling steps are as follows.
1. Install CPU on motherboard, install CPU fan, make sure the CPU fan stays solid, connect fan power cable to motherboard
2. Install memory card to motherboard
3. Connect cables to PSU, motherboard power (2 of them one for CPU, one for motherboard general, read the text on the cable, don’t misplace the PCI power cables!), harddrive power (one for HD, one for SSD), GPU power (2 cables), put PSU into the case
4. Install motherboard into the case, put the IO shield (the one for USB, ethernet etc.) before putting the motherboard
5. Connect power, reset, LED etc. jump cables to motherboard, connect case fan power cables to motherboard, connect motherboard power cables
6. Install HD and SSD, connect power cables
7. Install GPUs, connect power cables
8. Connect case cable to plug-in, start machine, enter the BIOS window to check if all HWs are detected and work normally :)

Tip: if you find it extremely difficult to connect some cable, you are probably doing it in the wrong way!

## 2. Install Ubuntu
I assume at this step, you already have a functional machine connected to display, keyboard and mouse.

### Create a bootable USB stick on Ubuntu
Assuming you or your friend already have a computer, then you can prepare a USB stick for OS installation following this guideline: Create A USB Stick on Ubuntu or equivalents for Windows and macOS.

The 14.04 ISO file (ubuntu-14.04.5-desktop-amd64.iso ) can be found here: http://releases.ubuntu.com/14.04/

### Install Ubuntu 14.04 LTS
Insert the disk to the machine's USB stick. Start the machine and it should automatically enter a window for Ubuntu installation. If your system has a preinstalled OS, you need to modify BIOS boot order to set USB stick as first priority. The installation process should be very fast (less than 10 minutes for my case) and simple.

## 3. Install CUDA and cuDNN
This and the following steps require Internet connection.

### Preparation

Install some useful packages in terminal:
``` bash
sudo apt-get update
sudo apt-get install \
aptitude \
freeglut3-dev \
g++-4.8 \
gcc-4.8 \
libglu1-mesa-dev \
libx11-dev \
libxi-dev \
libxmu-dev \
nvidia-modprobe \
python-dev \
python-pip \
 python-virtualenv \
vim
```

### Install CUDA

Download CUDA installation file: https://developer.nvidia.com/cuda-downloads

Choose Linux -> x86_64 -> Ubuntu -> 14.04 -> deb (local)  -> Download

Install CUDA in terminal (use the specific .deb file you've downloaded):
``` bash
cd ~/Downloads
sudo dpkg -i cuda-repo-ubuntu1404-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda
```

Restart the computer to activate CUDA driver. Now your screen resolution should be automatically changed to highest resolution for the display!

### Install cuDNN

The NVIDIA CUDA® Deep Neural Network library (cuDNN) is aGPU-accelerated library of primitives for deep neural networks with optimizations for convolutions etc.

Register an (free) acount on NVIDIA website and login to download the latest cuDNN library: https://developer.nvidia.com/cudnn

Choose the specific version of cuDNN (denpending on support of your prefered deep learning framework)

Choose `Download cuDNN v5.1 (Jan 20, 2017), for CUDA 8.0` -> `cuDNN v5.1 Library for Linux`

Install cuDNN (by copying files :) in terminal:
```bash
cd ~/Downloads
tar xvf cudnn-8.0-linux-x64-v5.1.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
```

### Update your .bashrc

Add the following lines to your `~/.bashrc` file (you can open it by `gedit ~/.bashrc` in terminal)
```bash
export PATH=/usr/local/cuda/bin:$PATH
export MANPATH=/usr/local/cuda/man:$MANPATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```

```bash
source ~/.bashrc
```

To check the installation, print some GPU and driver information by:
```bash
nvidia-smi
nvcc --version
```

## 4. Install TensorFlow
Follow TensorFlow official page for installation: https://www.tensorflow.org/install/
Or install whatever deep learning frameworks that you prefer :)

For example to install TF1.1 with GPU and Python 2.7:
```bash
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.1.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL
```

## 5. Others
### Mount hard drive

I used SSD as my boot dist and followed THIS document to mount the hard drive.

To check the disk status:
```bash
sudo fdisk -l /dev/sdb
```

To create a partition start GNU parted as follows (assuming the hard drive is /dev/sdb):
```bash
sudo parted /dev/sdb
```

Now inside command line interface of the parted tool (set unit to TB and set a partition from 0TB to 4TB, then use `print` to check partition and use `quit` to save & quit the parted tool):
```bash
(parted) unit TB
(parted) mkpart primary 0 4
(parted) print
(parted) quit
```

Then use mkfs.ext4 command to format the file system, enter:
``` bash
sudo mkfs.ext4 /dev/sdb1
```

Type the following commands to mount /dev/sdb1, enter:
``` bash
sudo mkdir /data
sudo mount /dev/sdb1 /data
```

You can use `df -H` to check current disk info.

To kee the mount after reboot, add the mouting setup to `/etc/fstab`:
```bash
sudo vim /etc/fstab
```

Add the following line at the end:

`/dev/sdb1       /data   ext4   defaults       0       1`

### Set up SSH connection

Server side, install SSH server:
```bash
sudo apt-get install openssh-server
```

Edit SSH configuration to whitelist users:
```bash
sudo vim /etc/ssh/sshd_config
```
Change root login permission line to: PermitRootLogin no

Add allow users: AllowUsers your_username

Then restart SSH server:
```bash
sudo /etc/init.d/ssh restart
sudo ufw allow 22
```

Client side, to connect with the workstation, you need to firstly know the server's IP (or hostname if it has one). Use `ifconfig -a` on the server to check IP address (look for that in `eth0`).

Client side (Mac OS), you need to whitelist the server IP in `/etc/hosts`:
```bash
sudo vim /etc/hosts
```

Add line
` `