{"id":13574189,"url":"https://github.com/cresset-template/cresset","last_synced_at":"2025-04-04T14:32:00.082Z","repository":{"id":37466718,"uuid":"407194526","full_name":"cresset-template/cresset","owner":"cresset-template","description":"Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.","archived":false,"fork":false,"pushed_at":"2024-04-03T09:38:25.000Z","size":1119,"stargazers_count":702,"open_issues_count":0,"forks_count":40,"subscribers_count":5,"default_branch":"main","last_synced_at":"2024-04-03T10:37:20.427Z","etag":null,"topics":["build","cuda","deep-learning","deep-learning-tutorial","docker","docker-compose","machine-learning","makefile","mlops","mlops-template","python","pytorch","source","source-python","template","template-repository","wheel"],"latest_commit_sha":null,"homepage":"","language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cresset-template.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2021-09-16T14:22:59.000Z","updated_at":"2024-04-15T08:33:35.736Z","dependencies_parsed_at":"2023-02-16T04:15:54.961Z","dependency_job_id":"b9779fe8-866e-4d42-9645-809aa4c028a8","html_url":"https://github.com/cresset-template/cresset","commit_stats":{"total_commits":441,"total_committers":4,"mean_commits":110.25,"dds":0.4058956916099773,"last_synced_commit":"37c7b5df7236d3b9d96c4908efe5af8bc90066e3"},"previous_names":[],"tags_count":26,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cresset-template%2Fcresset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cresset-template%2Fcresset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cresset-template%2Fcresset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cresset-template%2Fcresset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cresset-template","download_url":"https://codeload.github.com/cresset-template/cresset/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247194189,"owners_count":20899440,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["build","cuda","deep-learning","deep-learning-tutorial","docker","docker-compose","machine-learning","makefile","mlops","mlops-template","python","pytorch","source","source-python","template","template-repository","wheel"],"created_at":"2024-08-01T15:00:47.822Z","updated_at":"2025-04-04T14:31:57.238Z","avatar_url":"https://github.com/cresset-template.png","language":"Dockerfile","funding_links":[],"categories":["Dockerfile"],"sub_categories":[],"readme":"# Cresset: The One Template to Train Them All\n\n[![GitHub stars](https://img.shields.io/github/stars/cresset-template/cresset?style=flat)](https://github.com/cresset-template/cresset/stargazers)\n[![GitHub issues](https://img.shields.io/github/issues/cresset-template/cresset?style=flat)](https://github.com/cresset-template/cresset/issues)\n[![GitHub forks](https://img.shields.io/github/forks/cresset-template/cresset?style=flat)](https://github.com/cresset-template/cresset/network)\n[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)\n[![GitHub license](https://img.shields.io/github/license/cresset-template/cresset?style=flat)](https://github.com/cresset-template/cresset/blob/main/LICENSE)\n[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7939089.svg)](https://doi.org/10.5281/zenodo.7939089)\n[![Twitter](https://img.shields.io/twitter/url?url=https%3A%2F%2Fgithub.com%2Fcresset-template%2Fcresset)](https://twitter.com/intent/tweet?text=Awesome_Project!!!:\u0026url=https%3A%2F%2Fgithub.com%2Fcresset-template%2Fcresset)\n\n![Cresset Logo](https://github.com/cresset-template/cresset/blob/main/assets/logo.png \"Logo\")\n\n---\n\n## TL;DR\n\n**_A new MLOps system for deep learning development using Docker Compose\nwith the aim of providing reproducible and easy-to-use interactive\ndevelopment environments for deep learning practitioners.\nHopefully, the methods presented here will become\nbest practice in both academia and industry._**\n\n## Introductory Video (In English)\n\n## [![Weights and Biases Presentation](https://res.cloudinary.com/marcomontalbano/image/upload/v1649474431/video_to_markdown/images/youtube--sW3VxlJl46o-c05b58ac6eb4c4700831b2b3070cd403.jpg)](https://youtu.be/sW3VxlJl46o?t=6865 \"Weights and Biases Presentation\")\n\n## Installation on a New Host\n\nIf this is your first time using this project, follow these steps:\n\n1. Install the NVIDIA CUDA [Driver](https://www.nvidia.com/download/index.aspx)\n   appropriate for the target host and NVIDIA GPU.\n   If the driver has already been installed,\n   check that the installed version is compatible with the target CUDA version.\n   CUDA driver version mismatch is the single most common issue for new users.\n   See the\n   [compatibility matrix](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions)\n   for compatible versions of the CUDA driver and CUDA Toolkit.\n\n2. Install [Docker](https://docs.docker.com/get-docker) (v23.0+ is recommended)\n   or update to a recent version compatible with Docker Compose V2.\n   Docker incompatibility with Docker Compose V2 is another common issue for new users.\n   Note that Windows users may use WSL (Windows Subsystem for Linux).\n   Cresset has been tested on Windows 11 WSL2 with the Windows CUDA driver\n   using Docker Desktop for Windows. There is no need to install a separate\n   WSL CUDA driver or Docker for Linux inside WSL.\n   Note that only Docker Desktop is under a commercial EULA and Docker Engine\n   (for Linux) and Lima Docker (for Mac) are still both open-source.\n   _N.B._ Windows Security real-time protection causes significant slowdown if enabled.\n   Disable any active antivirus programs on Windows for best performance.\n   _N.B._ Linux hosts may also install via this\n   [repo](https://github.com/docker/docker-install).\n\n3. Install the NVIDIA Container Toolkit as specified in this\n   [link](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).\n\n4. Run `make install-compose` to install Docker Compose V2 for Linux hosts.\n   Installation does _**not**_ require `root` permissions. Visit the\n   [documentation](https://docs.docker.com/compose/cli-command/#install-on-linux)\n   for the latest installation information. Note that Docker Compose V2\n   is available for WSL users with Docker Desktop by default.\n\n5. Run `make env SERVICE=(train|devel|ngc|simple)` on the terminal\n   at project root to create a basic `.env` file.\n   The `.env` file provides environment variables for `docker-compose.yaml`,\n   allowing different users and machines to set their own variables as required.\n   The Makefile has also been configured to read values from the `.env` file\n   if it exists, allowing non-default values to be specified only once.\n   Each host should have a separate `.env` file for host-specific configurations.\n\n6. Run `make over` to create a `docker-compose.override.yaml` file.\n   Add configurations that should not be shared via source control there.\n   For example, volume-mount pairs specific to each host machine.\n\n7. If Cresset is being placed within a pre-existing project's subdirectory,\n   change the `volume` pairing from `.:${PROJECT_ROOT}` to `..:${PROJECT_ROOT}`.\n   All commands in Cresset assume that they are being run at project root\n   but this can be changed easily.\n\n### Explanation of services\n\nDifferent Docker Compose services are organized to serve different needs.\n\n- `train`, the default service, should be used when compiled dependencies are\n  necessary or when PyTorch needs to be compiled from source due to\n  Compute Capability issues, etc.\n- `devel` is designed for PyTorch CUDA/C++ developers who need to recompile\n  frequently and have many complex dependencies.\n- `ngc` is derived from the official NVIDIA PyTorch NGC images with the option\n  to install additional packages. It is recommended for users who wish to base\n  their projects on the NGC images provided by NVIDIA. Note that the NGC images\n  change between different releases and that configurations for one\n  release may not work for another one.\n- `simple` is derived from the Official Ubuntu Linux image by default as some\n  corporations restrict the use of Docker images not officially verified by\n  Docker. It installs all packages via `conda` by default and can optionally\n  install highly reproducible environments via `conda-lock`. Note that\n  `pip` packages can also be installed via `conda`. Also, the base image can\n  be configured to use images other than the Official Linux Docker images\n  by specifying the `BASE_IMAGE` argument directly in the `.env` file.\n  PyTorch runtime performance may be superior in official NVIDIA CUDA images\n  under certain circumstances. Use the tests to benchmark runtime speeds.\n  **The `simple` service is recommended for users without compiled dependencies.**\n\nThe `Makefile` has been configured to take values specified in the `.env` file\nif the `.env` file exists. Therefore, all `make` commands will automatically\nuse the `${SERVICE}` specified by `make env SERVICE=${SERVICE}` after the\n`.env` file is created.\n\n### Notes for Rootless Users\n\nMany institutions forbid the use of Docker because it requires `root` permissions, compromising security.\nFor users without Docker `root` access, using rootless Docker\n[link](https://docs.docker.com/engine/security/rootless) is recommended.\n\nWhile installing rootless Docker requires root permissions on the host,\nroot permissions are not necessary after the initial installation.\n\nWhen using rootless Docker, it is most convenient to set `ADD_USER=exclude` in the `.env` file\nas the `root` user will be the host user in rootless Docker.\n\n## Project Configuration\n\n1. To build PyTorch from source, set `BUILD_MODE=include` and the\n   CUDA Compute Capability (CCC) of the target NVIDIA GPU in the `.env` file.\n   Visit the NVIDIA [website](https://developer.nvidia.com/cuda-gpus#compute)\n   to find compute capabilities of NVIDIA GPUs. Visit the\n   [documentation](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities)\n   for an explanation of compute capability and its relevance.\n   Note that the Docker cache will save previously built binaries\n   if the given configurations are identical.\n\n2. Read the `docker-compose.yaml` file to fill in extra variables in `.env`.\n   Also, feel free to edit `docker-compose.yaml` as necessary by changing\n   session names, hostnames, etc. for different projects and configurations.\n   The `docker-compose.yaml` file provides reasonable default values but these\n   can be overridden by values specified in the `.env` file.\n   An important configuration is `ipc: host`, which allows the container to\n   access the shared memory of the host. This is required for multiprocessing,\n   e.g., to use `num_workers` in the PyTorch `DataLoader` class.\n   Disable this configuration on WSL and specify `shm_size:` instead as WSL\n   cannot use host IPC as of the time of writing.\n\n3. Edit requirements in `reqs/apt-train.requirements.txt`\n   and `reqs/train-environment.yaml`.\n   These contain project package dependencies.\n   The `apt` requirements are designed to resemble an\n   ordinary Python `requirements.txt` file.\n\n4. Edit the `volumes` section of a service\n   to include external directories in the container environment.\n   Run `make over` to create a `docker-compose.override.yaml` file\n   to add custom volumes and configurations.\n   The `docker-compose.override.yaml` file is excluded from version control\n   to allow per-user and per-server settings.\n\n5. (Advanced) If an external file must be included in the Docker image build process,\n   edit the `.dockerignore` file to allow the Docker context to find the external file.\n   By default, all files except requirements\n   files are excluded from the Docker build context.\n\nExample `.env` file for user with username `USERNAME`,\ngroup name `GROUPNAME`, user id `1000`, group id `1000` on service `train`.\nUse the `simple` service if no dependencies need to be compiled and requirements\ncan either be downloaded or installed via `apt`, `conda`, or `pip`.\n\n```text\n# Generated automatically by `make env`.\n# When using the `root` user with `UID=0`/`USR=root`, set `ADD_USER=exclude`.\nGID=1000\nUID=1000\nGRP=GROUPNAME\nUSR=USERNAME\nHOST_ROOT=.\nSERVICE=train\n# Do not use the same `PROJECT` name for different projects on the same host!\nPROJECT=train-username             # `PROJECT` must be in lowercase.\nPROJECT_ROOT=/opt/project\nIMAGE_NAME=cresset:train-username  # `IMAGE_NAME` is also converted to lowercase.\nCOMMAND=/usr/bin/zsh --login       # Command to execute on starting the container.\nTZ=Asia/Seoul                      # Set the container timezone.\n\n# [[Optional]]: Fill in these configurations manually if the defaults do not suffice.\n\n# NVIDIA GPU Compute Capability (CCC) values may be found at https://developer.nvidia.com/cuda-gpus\nCCC=8.6              # Compute capability. CCC=8.6 for RTX3090.\n# CCC='8.6+PTX'      # The '+PTX' enables forward compatibility. Multiple CCCs can also be specified.\n# CCC='7.5 8.6+PTX'  # Visit https://pytorch.org/docs/stable/cpp_extension.html for details.\n\n# Used only if building PyTorch from source (`BUILD_MODE=include`).\n# The `*_TAG` variables are used only if `BUILD_MODE=include`. No effect otherwise.\nBUILD_MODE=exclude               # Whether to build PyTorch from source.\nPYTORCH_VERSION_TAG=v2.0.0       # Any `git` tag can be used (but not just any commit hash).\nTORCHVISION_VERSION_TAG=v0.15.1\n\n# General environment configurations.\nLINUX_DISTRO=ubuntu   # Visit the NVIDIA Docker Hub repo for available base images.\nDISTRO_VERSION=22.04  # https://hub.docker.com/r/nvidia/cuda/tags\nCUDA_VERSION=11.8.0   # Must be compatible with hardware and CUDA driver.\nCUDNN_VERSION=8       # Only major version specifications are available.\nPYTHON_VERSION=3.10   # Specify the Python version.\nMKL_MODE=include      # Enable MKL for Intel CPUs.\n\n# Advanced Usage.\nTARGET_STAGE=train    # Target Dockerfile stage. The `*.whl` files are available in `train-builds`.\nADD_USER=include      # Whether to create a new user (include) or use `root` user (exclude).\n```\n\n## General Usage After Initial Installation and Configuration\n\n1. Run `make build` to build the image from the Dockerfile and start the service.\n   The `make` commands are defined in the\n   `Makefile` and target the `train` service by default.\n   Run `make up` if the image has already been built and\n   rebuilding the image from the Dockerfile is not necessary.\n2. Run `make exec` to enter the interactive container environment.\n   Using `tmux` inside the container is recommended.\n3. There is no step 3. Just start coding.\n   Check out the documentation or create an issue if anything goes wrong.\n\n## Makefile Instructions\n\nThe Makefile contains shortcuts for common docker compose commands.\nPlease read the Makefile to see the exact commands.\n\n1. `make build` builds the Docker image from the Dockerfile\n   regardless of whether the image already exists.\n   This will reinstall packages to the updated requirements files,\n   and then recreate the container.\n2. `make up` creates a fresh container from the image,\n   undoing any changes to the container made by the user.\n   Allows changing container settings as network ports,\n   mounted volumes, shared memory configurations, etc.\n   Recommended method for using this project.\n3. `make exec` enters the interactive terminal of the container\n   created by `make build` or `make up`.\n4. `make down` stops Compose containers and deletes networks.\n   Necessary for service teardown.\n5. `make start` restarts a stopped container without recreating it.\n   Similar to `make up` but does not delete the current container.\n   Not recommended unless data saved in container are absolutely necessary.\n6. `make ls` shows all Docker Compose services, both active and inactive.\n7. `make run` is used for debugging. Containers are removed on exit.\n   If a service fails to start, use this to find the error.\n8. `make build-only` builds the Docker image from the Dockerfile\n   without starting the service.\n   It exists to help publish images to container registries.\n\n### Tips\n\n- The `PROJECT`, `SERVICE`, and `COMMAND` variables in the Makefile\n  use variables specified in the `.env` file if available.\n- If something does not work, first try `make down` to remove the current container and\n  then `make up` to create a new container from the image.\n  Explicitly tearing the container down is often necessary when something happens to the host.\n- If the service startup stalls during `make up`,\n  check `docker system df` to see if there is space left on the host machine.\n- `make up` is akin to rebooting a computer.\n  The current container is removed and a new container is created from the current image.\n- `make build` is akin to resetting/formatting a computer.\n  The current image, if present, is removed and a new image is built from the Dockerfile,\n  after which a container is created from the resulting image.\n  In contrast, `make up`\n  only creates an image from source if the specified image is not present.\n- `make exec` is akin to logging into a computer.\n  It is the most important command\n  and allows the user to access the container's terminal interactively.\n- Configurations such as connected volumes and network ports cannot\n  be changed in a running container, requiring a new container to be created.\n- Docker automatically caches all builds up to `defaultKeepStorage`.\n  Builds use caches from previous builds by default,\n  greatly speeding up later builds by only building modified layers.\n- If the build fails during `git clone`,\n  try `make build` again with a stable internet connection.\n- If the build fails during `pip install`,\n  check the PyPI mirror URLs and package requirements.\n- If any networking issues arise, check `docker network ls` and check for conflicts.\n  Most networking and SSH problems can be solved by running `docker network prune`.\n\n## Project Overview\n\nThe main components of the project are as follows. The other files are utilities.\n\n1. Dockerfile\n2. docker-compose.yaml\n3. docker-compose.override.yaml\n4. reqs/(`*requirements.txt`|`*environment.yaml`)\n5. .env\n\nWhen the user inputs `make up` or another `make` command,\ncommands specified in the `Makefile` are executed.\nThe `Makefile` is used to specify shorthand commands and variables.\n\nWhen a command related to Docker Compose (e.g., `make build`) is executed,\nThe `docker-compose.yaml` file and the `.env` file are read by Docker Compose.\nThe `docker-compose.yaml` file specifies reasonable default values\nbut users may wish to change them as per their needs.\nThe values specified in the `.env` file take precedence over\nthe defaults specified in the `docker-compose.yaml` file.\nEnvironment variables specified in the shell\ntake precedence over those in the `.env` file.\nThe `.env` file is deliberately excluded from source control\nto allow different users and machines to use different configurations.\n\nThe `docker-compose.yaml` file manages configurations,\nbuilds, runs, etc. using the `Dockerfile`.\nVisit the Docker Compose [Specification](https://github.com/compose-spec/compose-spec/blob/master/spec.md)\nand [Reference](https://docs.docker.com/compose/compose-file/compose-file-v3/) for details.\n\nThe `docker-compose.override.yaml` is read by the `docker-compose.yaml` file\nduring the setup phase. Add configurations specific to each host that should not be\nshared via source control such as volume mounts for host-specific paths.\n\nThe `Dockerfile` is configured to read only requirements files in the `reqs` directory.\nEdit `reqs/pip-train.requirements.txt` to specify Python package requirements.\nEdit `reqs/apt-train.requirements.txt` to specify Ubuntu package requirements.\nUsers must edit the `.dockerignore` file to `COPY` other files into the Docker build,\nfor example, when building from private code during the Docker build.\n\nThe `Dockerfile` uses Docker BuildKit and a multi-stage build where\ncontrol flow is specified via stage names and build-time environment variables\ngiven via `docker-compose.yaml`. See the Docker BuildKit\n[Syntax](https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/syntax.md)\nfor more information on Docker BuildKit.\nThe `train` service specified in the `docker-compose.yaml` file uses\nthe `train` stage specified in the `Dockerfile`, which assumes an Ubuntu image.\n\n## _Raison d'Être_\n\nThe purpose of this section is to introduce a new paradigm for deep learning development.\nThe hope is that Cresset, or at least the ideas behind it, will eventually become\nbest practice for small to medium-scale deep learning research and development.\n\nDeveloping in local environments with `conda` or `pip`\nis commonplace in the deep learning community.\nHowever, this risks rendering the development environment,\nand the code meant to run on it, unreproducible.\nThis state of affairs is a serious detriment to scientific progress\nthat many readers of this article will have experienced at first-hand.\n\nDocker containers are the standard method for providing reproducible programs\nacross different computing environments.\nThey create isolated environments where programs\ncan run without interference from the host or from one another.\nFor details, see the\n[documentation](https://www.docker.com/resources/what-container).\n\nBut in practice, Docker containers are often misused.\nContainers are meant to be transient and best practice dictates\nthat a new container be created for each run.\nHowever, this is very inconvenient for development,\nespecially for deep learning applications,\nwhere new libraries must constantly be installed and\nbugs are often only evident at runtime.\nThis leads many researchers to develop inside interactive containers.\nDocker users often have `run.sh` files with commands such as\n`docker run -v my_data:/mnt/data -p 8080:22 -t my_container my_image:latest /bin/bash`\n(look familiar, anyone?) and use SSH to connect to running containers.\nVSCode even provides a remote development mode to code inside containers.\n\nThe problem with this approach is that these interactive containers\nbecome just as unreproducible as local development environments.\nA running container cannot connect to a new port or attach a new\n[volume](https://docs.docker.com/storage/volumes).\nBut if the computing environment within the container was created over\nseveral months of installs and builds, the only way to keep it is to\nsave the container as an image and create a new container from the saved image.\nAfter a few iterations of this process, the resulting images become bloated and\nno less scrambled than the local environments that they were meant to replace.\n\nProblems become even more evident when preparing for deployment.\nMLOps, defined as a set of practices that aims to deploy and maintain\nmachine learning models reliably and efficiently, has gained enormous popularity\nof late as many practitioners have come to realize the importance of\ncontinuously maintaining ML systems long after the initial development phase ends.\n\nHowever, bad practices such as those mentioned above mean that much coffee has\nbeen spilled turning research code into anything resembling a production-ready product.\nOften, even the original developers cannot recreate the same model after a few months.\nMany firms thus have entire teams dedicated to model translation, a huge expenditure.\n\nTo alleviate these problems, Docker Compose is proposed as a simple MLOps solution.\nUsing Docker and Docker Compose, the entire training environment can be reproduced.\nCompose has not yet caught on in the deep learning community,\npossibly because it is usually advertised as a multi-container solution.\nThis is a misunderstanding\nas it can be used for single-container development just as well.\n\nA `docker-compose.yaml` file is provided for easy management of containers.\n**Using the provided `docker-compose.yaml` file will create an interactive environment,\nproviding a programming experience very similar to using a terminal on a remote server.\nIntegrations with popular IDEs (PyCharm, VSCode) are also available.**\n\nMoreover, it also allows the user to specify settings for both build and run,\nremoving the need to manage the environment with custom shell scripts.\nConnecting a new volume or port is as simple as removing the current container,\nadding a line in the `docker-compose.yaml` file, then running `make up`\nto create a new container from the same image.\n\nBuild caches allow new images to be built very quickly,\nremoving another barrier to Docker adoption, the long initial build time.\nFor more information on Compose, visit the\n[documentation](https://docs.docker.com/compose).\n\nDocker [Compose](https://www.compose-spec.io) can also be used for deployment,\nwhich is useful for small to medium-sized deployments.\nIf and when large-scale deployments using container orchestration such as\nKubernetes becomes necessary, using reproducible Docker environments from\nthe very beginning will accelerate the development process\nand smooth the path to MLOps adoption.\nAccelerating time-to-market by streamlining the development process\nis a competitive edge for any firm, whether lean startup or tech titan.\n\nWith luck, the techniques proposed here will enable\nthe deep learning community to \"_write once, train anywhere_\".\nBut even if most users are not persuaded of the merits of this method,\nMany a hapless grad student may be spared from the\nsisyphean labor of setting up their `conda` environment,\nonly to have it crash and burn right before their paper submission is due.\n\n## Compose as Best Practice\n\nDocker Compose is superior to using custom shell scripts for each environment.\nNot only does it gather all variables and commands\nfor both build and run into a single file,\nbut its native integration with Docker means that it makes complicated\nDocker build/run setups simple to implement and use.\n\nUsing Docker Compose this way is a general-purpose technique\nthat does not depend on anything about this project.\nThe other services available in the project emphasize this point.\n\n### Using Compose with PyCharm and VSCode\n\nThe Docker Compose container environment can be used with popular Python IDEs,\nnot just in the terminal.\nPyCharm and Visual Studio Code, both very popular in the deep learning community,\nare compatible with Docker Compose.\n\n#### PyCharm (Professional only)\n\nBoth Docker and Docker Compose are natively available as Python interpreters.\nSee tutorials for [Docker](https://www.jetbrains.com/help/pycharm/docker.html) and\n[Compose](https://www.jetbrains.com/help/pycharm/using-docker-compose-as-a-remote-interpreter.html#summary)\nfor details. JetBrains [Gateway](https://www.jetbrains.com/remote-development/gateway)\ncan also be used to connect to running containers.\n\nWhen using the `ngc` service, add `/usr/local/lib/python3/dist-packages` and\n`/opt/conda/lib/python3/site-packages` to the interpreter search paths via\nthe GUI to enable code assistance on the packages installed with `conda`.\n\n_N.B._ PyCharm Professional and other JetBrains IDEs are available\nfree of charge to anyone with a valid university e-mail address.\n\n#### VSCode\n\nInstall the Remote Development extension pack. See\n[tutorial](https://code.visualstudio.com/docs/remote/containers-tutorial)\nfor details.\n\n##### VSCode Tips\n\nVSCode may fail to start up when accessing remote containers created by\nCresset because of the `${HOME}/.vscode-server` volume mounted in the\n`docker-compose.yaml` file, which is used to preserve the `.vscode-server`\ndirectory between separate containers.\n\nThe reason for VSCode connection failure is that if any host directory\nspecified as a volume does not exist, Docker will automatically create\nthe specified host directory with the directory owner set to `root`.\nDirectories that already exist retain their directory ownership.\nWhen the `.vscode-server` directory is created by Docker this way,\nVSCode is unable to install any files in the `.vscode-server` directory.\n\nThis has been fixed in the Makefile but problems related to\nthe `.vscode-server` directory occur frequently.\nTo solve this problem, simply change the directory ownership to the\nuser with `sudo chown -R $(id -u):$(id -g) ${HOME}/.vscode-server`.\nThis command can be run either on the host or inside the container,\nwhich is useful if `sudo` permissions are unavailable on the host.\n\nAlso, when one user switches between multiple Cresset-based containers\non a single machine, VSCode may not be able to find the container workspace.\nThis is because the `docker-compose.yaml` file mounts the host's\n`~/.vscode-server` directory to the `/home/${USR}/.vscode-server` directory\nof all containers to preserve VSCode extensions between containers.\nTo fix this issue, create a new directory on the host\nto mount the containers' `.vscode-server` directories.\nFor example, one can set volume pairs as\n`${HOME}/.vscode-project1:/home/${USR}/.vscode-server` for project1 and\n`${HOME}/.vscode-project2:/home/${USR}/.vscode-server` for project2.\nDo not forget to create `${HOME}/.vscode-project1` and\n`${HOME}/.vscode-project2` on the host first.\nOtherwise, the directory will be owned by `root`,\nwhich will cause VSCode to stall indefinitely due to permission issues.\n\nFor other VSCode problems, try deleting `~/.vscode-server` on the host.\n\n# Known Issues\n\n1. Connecting to a running container by `ssh` will remove all variables\n   set by `ENV`. This is because `sshd` starts a new environment,\n   deleting all previous variables. Using `docker`/`docker compose`\n   to enter containers is strongly recommended.\n\n2. `pip install package[option]` will fail on the terminal because of\n   Z-shell globbing. Characters such as `[`,`]`,`*`, etc. will be\n   interpreted by Z-shell as special commands. Use string literals,\n   e.g., `pip install 'package[option]'`, for cross-shell consistency.\n\n3. If the build fails during `git clone`, simply try `make build` again.\n   Most of the build will be cached. Failure is probably due to\n   networking issues during installation. Updating git submodules is\n   [not fail-safe](https://stackoverflow.com/a/8573310/9289275).\n\n4. `torch.cuda.is_available()` will return a\n   `... UserWarning: CUDA initialization:...`\n   error or the image will simply not start if the host CUDA driver is\n   incompatible with the CUDA version on the Docker image.\n   Either upgrade the host CUDA driver or downgrade the CUDA version of the image.\n   Check the\n   [compatibility matrix](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cuda-major-component-versions__table-cuda-toolkit-driver-versions)\n   to see if the host CUDA driver is compatible with the desired version of CUDA.\n   Also, check if the CUDA driver has been configured correctly on the host.\n   The CUDA driver version can be found using the `nvidia-smi` command.\n\n5. Docker Compose V2 will silently fail if the installed Docker engine\n   version is too low on Linux hosts. Update Docker to the latest\n   version (23.0+) to use Docker Compose V2.\n\n6. If the user is set to `root` in the `.env` file, i.e., `UID=0, USR=root`,\n   then set `ADD_USER=exclude` to prevent the creation of a new user, which is\n   expected to be non-root.\n\n# Desiderata\n\n1. **MORE STARS**. _**No Contribution Without Appreciation!**_\n\n2. Bug reports are welcome. Only the latest versions have been tested rigorously.\n   Please raise an issue if there are any versions that do not build properly.\n   However, please check that your host Docker, Docker Compose,\n   and especially NVIDIA Driver are up-to-date before doing so.\n\n3. Translations into other languages and updates to existing translations are welcome.\n   Please create a separate `LANG.README.md` file and make a pull request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcresset-template%2Fcresset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcresset-template%2Fcresset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcresset-template%2Fcresset/lists"}