An open API service indexing awesome lists of open source software.

https://github.com/igorcosta/deep-docker

Docker image for Deep Learning on AWS Cloud
https://github.com/igorcosta/deep-docker

cuda deep-learning docker docker-image tensorflow

Last synced: about 2 months ago
JSON representation

Docker image for Deep Learning on AWS Cloud

Awesome Lists containing this project

README

          

# deep-docker

> For running deeplearning experiments on AWS EC2 g2.2xlarge advised or bigger with
> Docker and Docker machine

This image includes
- Optimized Python 3.6 for Docker, find out more there [https://www.revsys.com/tidbits/optimized-python/]
- Nvidia driver 346.46
- CUDA 7.0
- Anaconda 3.18.8 (Python 2.7.11)
- Preconfigured .theanorc to use GPU and float32 by default

## Useful Commands

### Preparing the host machine

The host machine needs to run the **same version** of the NVidia driver as inside the container. So I built an AMI based on the Ubuntu 14.04 HBM SSD AMI (ami-5c207736) by the following script.

sudo su -
apt-get update
apt-get install -y build-essential
apt-get install -y linux-headers-$(uname -r) linux-image-$(uname -r) linux-image-extra-$(uname -r)
echo "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off" > /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
update-initramfs -u
reboot

sudo su -
cd /opt
wget http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_7.0.28_linux.run
chmod +x cuda_*_linux.run
./cuda_*_linux.run -extract=`pwd`/nvidia_installers
cd nvidia_installers
./NVIDIA-Linux-x86_64-*.run -s
./cuda-linux64-rel-*.run -noprompt
./cuda-samples-linux-7.0.28-19326674.run -noprompt -cudaprefix=/usr/local/cuda
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
./deviceQuery
ls /dev | grep nvidia

rm /opt/cuda_7.0.28_linux.run
rm -r /opt/nvidia_installers

You should save the instance as an AMI so you can reuse it later.

To create a host using spot instance

docker-machine create --driver amazonec2 \
--amazonec2-ami ami-... \
--amazonec2-access-key $AWS_ACCESS_KEY_ID \
--amazonec2-secret-key $AWS_SECRET_ACCESS_KEY \
--amazonec2-vpc-id vpc-... \
--amazonec2-root-size 60 \
--amazonec2-instance-type g2.2xlarge \
--amazonec2-request-spot-instance \
--amazonec2-spot-price 0.15 \
aws01

To activate the newly created instance

eval "$(docker-machine env aws01)"

To view all created hosts

docker-machine ls

SSH into the instance and sanity check

docker-machine ssh aws01
nvidia-smi
# Should see information about the GPU
ls /dev | grep nvidia
# Should see nvidia0 nvidiactl nvidia-uvm

If nvidia-uvm is not found

docker-machine ssh aws01
/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
ls /dev | grep nvidia
exit

To terminate and remove the instance

docker-machine rm aws01

### Running the image

To build this image

docker build -t igorcosta/deeplearning .

Make sure the GPU is working inside the container

docker run -ti --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm igorcosta/deeplearning python -c "import theano"
# Should see "Using gpu device 0: GRID K520"

Debug inside the container

docker run -ti --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm igorcosta/deeplearning /bin/bash

To publish the image

docker push igorcosta/deeplearning

To start over

docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)