https://github.com/igorcosta/deep-docker
Docker image for Deep Learning on AWS Cloud
https://github.com/igorcosta/deep-docker
cuda deep-learning docker docker-image tensorflow
Last synced: about 2 months ago
JSON representation
Docker image for Deep Learning on AWS Cloud
- Host: GitHub
- URL: https://github.com/igorcosta/deep-docker
- Owner: igorcosta
- Created: 2018-05-11T04:28:39.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2018-05-11T04:42:31.000Z (about 8 years ago)
- Last Synced: 2025-10-13T10:26:51.523Z (8 months ago)
- Topics: cuda, deep-learning, docker, docker-image, tensorflow
- Size: 2.93 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# deep-docker
> For running deeplearning experiments on AWS EC2 g2.2xlarge advised or bigger with
> Docker and Docker machine
This image includes
- Optimized Python 3.6 for Docker, find out more there [https://www.revsys.com/tidbits/optimized-python/]
- Nvidia driver 346.46
- CUDA 7.0
- Anaconda 3.18.8 (Python 2.7.11)
- Preconfigured .theanorc to use GPU and float32 by default
## Useful Commands
### Preparing the host machine
The host machine needs to run the **same version** of the NVidia driver as inside the container. So I built an AMI based on the Ubuntu 14.04 HBM SSD AMI (ami-5c207736) by the following script.
sudo su -
apt-get update
apt-get install -y build-essential
apt-get install -y linux-headers-$(uname -r) linux-image-$(uname -r) linux-image-extra-$(uname -r)
echo "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off" > /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
update-initramfs -u
reboot
sudo su -
cd /opt
wget http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_7.0.28_linux.run
chmod +x cuda_*_linux.run
./cuda_*_linux.run -extract=`pwd`/nvidia_installers
cd nvidia_installers
./NVIDIA-Linux-x86_64-*.run -s
./cuda-linux64-rel-*.run -noprompt
./cuda-samples-linux-7.0.28-19326674.run -noprompt -cudaprefix=/usr/local/cuda
cd /usr/local/cuda/samples/1_Utilities/deviceQuery
make
./deviceQuery
ls /dev | grep nvidia
rm /opt/cuda_7.0.28_linux.run
rm -r /opt/nvidia_installers
You should save the instance as an AMI so you can reuse it later.
To create a host using spot instance
docker-machine create --driver amazonec2 \
--amazonec2-ami ami-... \
--amazonec2-access-key $AWS_ACCESS_KEY_ID \
--amazonec2-secret-key $AWS_SECRET_ACCESS_KEY \
--amazonec2-vpc-id vpc-... \
--amazonec2-root-size 60 \
--amazonec2-instance-type g2.2xlarge \
--amazonec2-request-spot-instance \
--amazonec2-spot-price 0.15 \
aws01
To activate the newly created instance
eval "$(docker-machine env aws01)"
To view all created hosts
docker-machine ls
SSH into the instance and sanity check
docker-machine ssh aws01
nvidia-smi
# Should see information about the GPU
ls /dev | grep nvidia
# Should see nvidia0 nvidiactl nvidia-uvm
If nvidia-uvm is not found
docker-machine ssh aws01
/usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery
ls /dev | grep nvidia
exit
To terminate and remove the instance
docker-machine rm aws01
### Running the image
To build this image
docker build -t igorcosta/deeplearning .
Make sure the GPU is working inside the container
docker run -ti --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm igorcosta/deeplearning python -c "import theano"
# Should see "Using gpu device 0: GRID K520"
Debug inside the container
docker run -ti --device /dev/nvidia0:/dev/nvidia0 --device /dev/nvidiactl:/dev/nvidiactl --device /dev/nvidia-uvm:/dev/nvidia-uvm igorcosta/deeplearning /bin/bash
To publish the image
docker push igorcosta/deeplearning
To start over
docker stop $(docker ps -a -q)
docker rm $(docker ps -a -q)
docker rmi $(docker images -q)