{"id":13584043,"url":"https://github.com/docker-science/cookiecutter-docker-science","last_synced_at":"2025-04-06T22:31:48.562Z","repository":{"id":50630756,"uuid":"114621971","full_name":"docker-science/cookiecutter-docker-science","owner":"docker-science","description":"Cookiecutter template for data scientists working with Docker containers","archived":false,"fork":false,"pushed_at":"2021-10-14T10:12:05.000Z","size":126,"stargazers_count":348,"open_issues_count":14,"forks_count":81,"subscribers_count":10,"default_branch":"master","last_synced_at":"2024-11-06T00:39:49.442Z","etag":null,"topics":["cookiecutter-template","jupyter-notebook","machine-learning"],"latest_commit_sha":null,"homepage":"https://docker-science.github.io/","language":"Makefile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/docker-science.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE-HEADER.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-12-18T09:32:59.000Z","updated_at":"2024-11-05T08:07:41.000Z","dependencies_parsed_at":"2022-09-13T21:11:55.351Z","dependency_job_id":null,"html_url":"https://github.com/docker-science/cookiecutter-docker-science","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docker-science%2Fcookiecutter-docker-science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docker-science%2Fcookiecutter-docker-science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docker-science%2Fcookiecutter-docker-science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/docker-science%2Fcookiecutter-docker-science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/docker-science","download_url":"https://codeload.github.com/docker-science/cookiecutter-docker-science/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247563898,"owners_count":20958971,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cookiecutter-template","jupyter-notebook","machine-learning"],"created_at":"2024-08-01T15:03:58.489Z","updated_at":"2025-04-06T22:31:48.282Z","avatar_url":"https://github.com/docker-science.png","language":"Makefile","funding_links":[],"categories":["Makefile"],"sub_categories":[],"readme":".. |travis| image:: https://travis-ci.org/docker-science/cookiecutter-docker-science.svg?branch=master\n    :target: https://travis-ci.org/docker-science/cookiecutter-docker-science\n\n|travis|\n\nTable of Contents\n------------------\n\n.. contents:: This article consists of the following sections.\n    :depth: 1\n\nFeatures\n--------\n\n `Cookiecutter Docker Science \u003chttps://docker-science.github.io/\u003e`_ provides the following features.\n\n* **Improve reproducibility** of the results in machine learning projects with **Docker**\n* Output optimal directories and file template for machine learning projects\n* `Edit codes with favorite editors (Atom, vim, Emacs etc) \u003chttps://docker-science.github.io/#edit-codes-with-preferred-editors\u003e`_\n\n* Provide `make` targets useful for data analysis (Jupyter notebook, test, lint, docker etc)\n\nIntroduction\n------------\n\n**NOTE**: please visit `home page \u003chttps://docker-science.github.io/\u003e`_ before you get started.\n\nMany researchers and engineers do their machine learning or data mining experiments.\nFor such data engineering tasks, researchers apply various tools and system libraries which are constantly\nupdated, installing and updating them cause problems in local environments. Even when we work in hosting\nenvironments such as EC2, we are not free from this problem. Some experiments succeeded in one\ninstance but failed in another one, since library versions of each EC2 instances could be different.\n\nBy contrast, we can creates the identical Docker container in which needed tools with the correct versions are already installed in one command without\nchanging system libraries in host machines. This aspect of Docker is important for reproducibility of experiments,\nand keep the projects in continuous integration systems.\n\nUnfortunately running experiments in a Docker containers is troublesome. Adding a new library into ``requirements.txt``\nor ``Dockerfile`` does not installed as if local machine. We need to create Docker image and container each time.\nWe also need to forward ports to see server responses such as Jupyter Notebook UI launch in Docker container in our local PC.\nCookiecutter Docker Science provides utilities to make working in Docker container simple.\n\nThis project is a tiny template for machine learning projects developed in Docker environments.\nIn machine learning tasks, projects glow uniquely to fit target tasks, but in the initial state,\nmost directory structure and targets in `Makefile` are common.\nCookiecutter Docker Science generates initial directories which fits simple machine learning tasks.\n\nRequirements\n------------\n\n* Python 3.5 or later\n* `Cookiecutter 1.6 or later \u003chttps://cookiecutter.readthedocs.io/en/latest/installation.html\u003e`_\n* `Docker version 17 or later \u003chttps://docs.docker.com/install/#support\u003e`_\n\nQuick start\n-----------\n\nTo generate project from the cookiecutter-docker-science template, please run the following command.\n\n``$cookiecutter git@github.com:docker-science/cookiecutter-docker-science.git``\n\nThen the cookiecutter command ask for several questions on generated project as follows.\n\n::\n\n    $cookiecutter git@github.com:docker-science/cookiecutter-docker-science.git\n    project_name [project_name]: food-image-classification\n    project_slug [food_image_classification]:\n    jupyter_host_port [8888]:\n    description [Please Input a short description]: Classify food images into several categories\n    Select data_source_type:\n    1 - s3\n    2 - nfs\n    3 - url\n    data_source [Please Input data source]: s3://research-data/food-images\n\nThen you get the generated project directory, ``food-image-classification``.\n\nInitial directories and files\n-----------------------------\n\nThe following is the initial directory structure generated in the previous section.\n\n::\n\n    ├── Makefile                          \u003c- Makefile contains many targets such as create docker container or\n    │                                        get input files.\n    ├── config                            \u003c- This directory contains configuration files used in scripts\n    │   │                                    or Jupyter Notebook.\n    │   └── jupyter_config.py\n    ├── data                              \u003c- data directory contains the input resources.\n    ├── docker                            \u003c- docker directory contains Dockerfile.\n    │   ├── Dockerfile                    \u003c- Base Dockerfile contains the basic settings.\n    │   ├── Dockerfile.dev                \u003c- Dockerfile for experiments this Docker image is derived from the base Docker image.\n    │   │                                    This Docker image does not copy the files and directory but used mount the top\n    │   │                                    directory of the host environments.\n    │   └── Dockerfile.release            \u003c- Dockerfile for production this Docker image is derived from the base Docker image.\n    │                                        The Docker image copy the files and directory under the project top directory.\n    ├── model                             \u003c- model directory store the model files created in the experiments.\n    ├── my_data_science_project           \u003c- cookie-cutter-docker-science creates the directory whose name is same\n    │   │                                    as project name. In this directory users puts python files used in scripts\n    │   │                                    or Jupyter Notebook.\n    │   └── __init__.py\n    ├── notebook                          \u003c- This directory stores the ipynb files saved in Jupyter Notebook.\n    ├── requirements.txt                  \u003c- Libraries needed in the project. The library listed in this file\n    │                                        are installed in the Docker images for not only development but also production.\n    ├── requirements_dev.txt              \u003c- Libraries needed to run experiments. The library listed in this file\n    │                                        are installed in the Docker images for developments.\n    └── scripts                           \u003c- Users add the script files to generate model files or run evaluation.\n\n\nMakefile targets\n----------------\n\nCookiecutter Docker Science provides many Makefile targets to supports experiments in a Docker container. Users can run the target with `make [TARGET]` command.\n\ninit\n~~~~~\n\nAfter cookiecutter-docker-science generate the directories and files, users first run this command. `init` setups resources for experiments.\nSpecifically `init` run `init-docker` and `sync-from-source` command.\n\n- init-docker\n\n  `init-docker` command first creates Docker the images based on `docker/Dockerfile`.\n\n- sync-from-source\n\n  `sync-from-source` downloads input files which we specified in the project generation.  If you want to change the input files, please modify this target to download the new data source.\n\ncreate-container\n~~~~~~~~~~~~~~~~~\n\n`create-container` command creates Docker container based on the created image and login the Docker container.\n\nstart-container\n~~~~~~~~~~~~~~~~\n\nUsers can start and login the Docker container with `start container` created by the `create-container`.\n\njupyter\n~~~~~~~\n\n`jupyter` target launch Jupyter Notebook server.\n\nprofile\n~~~~~~~\n\n`profile` target shows the misc information of the project such as port number or container name.\n\n\nclean\n~~~~~\n\n`clean` target removes the artifacts such as models and *.pyc files.\n\n- clean-model\n\n  `clean-model` command removes model files in `model` directory.\n\n- clean-pyc\n\n  `clean-pyc` command removes model files of *.pyc, *.pyo and __pycache__.\n\n- clean-docker\n\n  `clean-docker` command removes the Docker images and container generated with `make init-docker` and `make create-container`.\n  When we update Python libraries in `requirements.txt` or system tools in `Dockerfile`, we need to clean Docker the image and container with this target and create the updated image and container with `make init-docker` and `make create-container`.\n\ndistclean\n~~~~~~~~~\n\n`distclean` target removes all reproducible objects. Specifically this target run `clean` target and remove all files in data directory.\n\n- clean-data\n\n  `clean-data` command removes all datasets in `data` directory.\n\nlint\n~~~~~\n\n`lint` target check if coding style meets the coding standard.\n\ntest\n~~~~~\n\n`test` target executes tests.\n\n\nsync-to-source\n~~~~~~~~~~~~~~\n\n`sync-to-remote` target uploads the local files stored in `data` to specified data sources in such as S3 or NFS directories.\n\nWorking with Docker container\n------------------------------\n\nWith Cookiecutter Docker Science, data scientists or software engineers do their developments in host environment.\nThey open Jupyter notebook in the browsers in the host machine connecting the Jupyter server launched in Docker container.\nThey also writes the ML scripts or library classes in the host machine. The code modification in host environment are\nreflected in the container environment. In the containers, they just launch Jupyter server or start ML scripts\nwith make command.\n\nFiles and directories\n~~~~~~~~~~~~~~~~~~~~~\n\nWhen you log in a Docker container by ``make create-container`` or ``make start-container`` command, the log in directory is ``/work``.\nThe directory contains the project top directories in host computer such as ``data`` or ``model``. Actually the Docker container mounts\nthe project directory to ``/work`` of the container and therefore when you can edit the files in the host environment with your favorite editor\nsuch as Vim, Emacs, Atom or PyCharm. The changes in host environment are reflected in container environment.\n\nJupyter Notebook\n~~~~~~~~~~~~~~~~~\n\nWe can run a Jupyter Notebook in the Docker container. The Jupyter Notebook uses the default port ``8888`` in **Docker container (NOT host machine)** and\nthe port is forwarded to the one you specify with ``JUPYTER_HOST_PORT``  in the cookiecutter command. You can see the Jupyter Notebook UI accessing\n\"http://localhost:JUPYTER_HOST_PORT\". When you save notebooks the files are saved in the ``notebook`` directory.\n\nTips\n-----\n\nGenerate Docker Image for production\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n`make init-docker` command creates a Docker image based on `docker/Dockerfile.dev`, which contains\nlibraries for developments. The libraries are not needed in production.\n\nTo create a Docker image for production which does not contain the development\nlibraries such as Jupyter, we run `make init-docker` command specifying a environment variable `MODE` to `release` as `make init-docker MODE=release`.\n\nOverride port number for Jupyter Notebook\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nIn the generation of project with cookiecutter, the default port of Jupyter Notebook in host is ``8888``. The number is common and could\nhave a collision to another server processes.\n\nIf we already have the container, we first need to remove the current container with ``make clean-container``. And then\nwe create the Docker container changing the port number with ``make create-container`` command adding the Jupyter port parameter (JUPYTER_HOST_PORT).\nFor example the following command creates Docker container forwarding Jupyter default port ``8888`` to ``9900`` in host.\n\n::\n\n    make create-container JUPYTER_HOST_PORT=9900\n\nThen you launch Jupyter Notebook in the Docker container, you can see the Jupyter Notebook in http://localhost:9900\n\nSpecify suitable Dockerfile in stages\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nSome projects can have multiple Dockerfiles. ``Dockerfile.gpu`` contains the settings for GPU machines. ``Dockerfile.cpu`` contains settings to be that can be used in production for non-GPU machines.\n\nTo use one of these specific Dockerfile, override the settings by adding parameters to the make command. For example, when we want to create a container from ``docker/Dockerfile.cpu``, we run ``make create-container DOCKERFILE=docker/Dockerfile.cpu``.\n\n\nShow target specific help\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\n`help` target flushes the details of specified target. For example, to get the details of `clean` target.\n\n:: \n\n    $make help TARGET=clean\n    target: clean\n    dependencies: clean-model clean-pyc clean-docker\n    description: remove all artifacts\n\nAs we can see, the dependencies and description of the specified target (`clean`) are shown.\n\nLicense\n-------\n\nApache version 2.0\n\nContribution\n-------------\n\nSee `CONTRIBUTING.md \u003cCONTRIBUTING.md\u003e`_.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocker-science%2Fcookiecutter-docker-science","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdocker-science%2Fcookiecutter-docker-science","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdocker-science%2Fcookiecutter-docker-science/lists"}