{"id":13934849,"url":"https://github.com/google-aai/sc17","last_synced_at":"2025-07-19T19:32:29.273Z","repository":{"id":67612064,"uuid":"118012709","full_name":"google-aai/sc17","owner":"google-aai","description":"SuperComputing 2017 Deep Learning Tutorial","archived":true,"fork":false,"pushed_at":"2018-04-04T19:53:47.000Z","size":142,"stargazers_count":213,"open_issues_count":0,"forks_count":44,"subscribers_count":22,"default_branch":"master","last_synced_at":"2024-08-08T23:19:05.479Z","etag":null,"topics":["data-science","deep-learning","google-cloud-platform","machine-learning","tutorial"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-aai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-01-18T17:06:52.000Z","updated_at":"2024-06-26T03:54:44.000Z","dependencies_parsed_at":null,"dependency_job_id":"88d1e52b-bffe-46b4-a962-1ac3aebca2df","html_url":"https://github.com/google-aai/sc17","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Fsc17","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Fsc17/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Fsc17/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-aai%2Fsc17/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-aai","download_url":"https://codeload.github.com/google-aai/sc17/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":226666437,"owners_count":17665030,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","deep-learning","google-cloud-platform","machine-learning","tutorial"],"created_at":"2024-08-07T23:01:16.490Z","updated_at":"2025-07-19T19:32:29.265Z","avatar_url":"https://github.com/google-aai.png","language":"Jupyter Notebook","readme":"# Step-by-Step Deep Learning Tutorial\n\nAuthors: Cassie Kozyrkov (@kozyrkov) and Brian Foo (@bkungfoo)\n\nTeam: Google Applied AI\n\n## Follow along with us!\n\nTake a look at these **walkthrough slides** with screenshots to guide you along:\n\n[github.com/kozyrkov/deep-learning-walkthrough](https://github.com/kozyrkov/deep-learning-walkthrough)\n\nBonus: slides contain ML hints, summaries, and pitfall alerts.\n\n## Create and Setup Cloud Project\n\n**This tutorial is meant to run fully on the Google Cloud Platform.**\n\nStarting with your web browser, do the following:\n\n* Open a browser and sign up for a [google account](https://accounts.google.com).\n* Sign into your account.\n* [Create a new GCP project](https://cloud.google.com/resource-manager/docs/creating-managing-projects)\nand [enable billing](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project).\n\n### Editing Resource Quota\n\n**Why edit resource quota?**  To complete this tutorial, you will need more computing resources than your\nGoogle Cloud Platform account has access to by default.  One reason that accounts start out with limits on\nresources is that this protects users from being billed unexpectedly for the more expensive options. \nFor more information, see the [quotas documentation page](https://cloud.google.com/compute/quotas).\n\nFor this project, we will require several types of resources:\n\n* **Compute Engine VM for Data Science:** We will create a VM that you will log\ninto and do most of your work and run notebooks. You will have the option of\ncreating a VM with a GPU choice, or no GPU. A GPU is strongly recommended for\ndeep network training because it dramatically cuts down the time to completion.\nHowever, the cost is also higher. Please refer to the\n[resource guide](RESOURCE_GUIDE.md) for a brief discussion and comparison of\nperformances and costs.\n\n* **Cloud resources:** We will be using Dataflow to run distributed\npreprocessing jobs. Thus, we need to extend quotas on Cloud resources, such as\nCPUs, IP addresses, and total disk space.\n\nWe will be setting quotas for these two types of resources. Note that quotas\nonly determine the maximum amount of a resource that your project is allowed to\nrequest! It does not mean that your jobs will use this amount necessarily,\nbut that you are permitted to use up to this amount. The general\nguideline is to set higher quotas such that there is no need to readjust them\nfor compute-intensive tasks.\n\n\u003cspan style=\"color:darkgreen\"\u003e**Quota Setup Instructions:**\u003c/span\u003e\n\nTo set up the data science VM, we will need to extend the quota for GPUs.\n* Select your project from the list on the\n[resource quota page](https://console.cloud.google.com/iam-admin/quotas).\n  * (If this is the first time creating the project, compute engine may still\n  need to boot up. If the quota page does not have GPU options, click on\n  \"Compute Engine\" in the dropdown menu on the top left, and click quota there.\n  Wait for it to load, and return to the quota page above.)\n* If you would like to try out a GPU machine (recommended),\n[find a region that has gpu support](https://cloud.google.com/compute/docs/gpus/).\n At the time this tutorial was written, valid regions include\n us-east1, us-west1, europe-west1, and asia-east1.\n* Select your chosen region from the Region dropdown menu.\n Then select the following:\n  * NVIDIA K80 GPUs\n  * NVIDIA P100 GPUs\n* Click \"+ edit quotas\" at the top of the page. Change the fields above to\nthe following values:\n  * NVIDIA K80 GPUs: 1\n  * NVIDIA P100 GPUs: 1\n* Follow the process to submit a request.\n\nTo setup cloud resources for preprocessing jobs, follow a similar request as\nabove to edit quotas:\n* Find a region with\n [Dataflow support](https://cloud.google.com/dataflow/docs/concepts/regional-endpoints#supported_regional_endpoints)\n At the time this tutorial was written, valid regions include us-central1 and\n europe-west1.\n* Select this region in the dropdown menu on the\n[resource quota page]((https://console.cloud.google.com/iam-admin/quotas)).\n* Change the following quotas:\n  * CPUs: 400\n  * In-use IP addresses: 100\n  * Persistent Disk Standard (GB): 65536\n* Select region \"Global\" in the dropdown menu:\n  * Change the following quotas:\n    * CPUs (all regions): 400\n\nAfter you have completed these steps, you will need to wait until you receive\nan email approving of the quota increases. Please note that you may be asked to\nprovide credit card details to confirm these increases.\n\n## Setup Cloud Project Components and API\n\n\u003cspan style=\"color:red\"\u003e*Expected setup time: 5 minutes*\u003c/span\u003e\n\nClick on the \"\u003e_\" icon at the top right of your web console to open a cloud\nshell. Inside the cloud shell, execute the following commands:\n\n```\ngit clone https://github.com/google-aai/sc17.git\ncd sc17\n```\n\nIf you happen to have the project files locally, you can also upload locally by\nclicking on the 3 vertically arranged dots on the top right of the shell window,\nand then click \"upload file\".\n\nAfter you have the proper scripts uploaded, set permissions on the following\nscript:\n\n```\nchmod 777 setup_step_1_cloud_project.sh\n```\n\nThen run the script to create storage, dataset,\nand compute VMs for your project (Note: using the \"sh\" command will fail as it\nis missing some necessary syntax in the cloud shell environment.)\n```\n./setup_step_1_cloud_project.sh project-name [gpu-type] [compute-region] [dataflow-region]\n```\n\nwhere:\n* [project-name] is the ID of the project you created (check the Cloud Dashboard for the ID extension if needed)\n* [gpu-type] (optional) is either None, K80, or P100\n(default: None)\n* [compute-region] (optional) is the region you will create your data science VM\n(default: us-east1)\n* [dataflow-region] (optional) is where you will run dataflow preprocessing jobs\n(default: us-central1)\n\nIf this is your first time setting up the project, you will be prompted during\nthe course of running the script, such as selecting the number corresponding to\nyour project. Enter what is needed to allow the script to continue running.\n\nIf the script stops with an error message \"ERROR [timestamp]: message\" (e.g.\nquota limits are too low), use relevant parts of the error message to fix your\nproject setting if needed, and rerun the script.\n\n## Setting up your VM environment\n\n### Library installations\n\n\u003cspan style=\"color:red\"\u003e*Expected setup time: 15 minutes*\u003c/span\u003e\n\nFrom the\n[VM instances page](https://console.cloud.google.com/compute/instances),\nclick the \"SSH\" text under \"Connect\" to connect to your compute VM instance.\nYou may have to click twice if your browser auto-blocks pop-ups.\n\nIn the new window, run git clone to download the project onto your VM, and cd\ninto it:\n\n```\ngit clone https://github.com/google-aai/sc17.git\ncd sc17\n```\n\nIf you happen to have the project files locally, you can also upload locally by\nopening your Storage Bucket from the GCP Console menu and dragging your local files\nover.  Then in your VM window, download them from your storage bucket by running:\n\n```\ngsutil cp [gs://[bucket-name]/* .]\n```\n\nNote that tab-complete will work after ```gs://``` if you don't know your bucket name.\n\nAfter you have the script files downloaded to your VM, run the following script:\n\n```\nsh setup_step_2_install_software.sh\n```\n\nThe script should setup opencv dependencies, python, virtual env, and\njupyter. It will also automatically detect the presence of\nan NVIDIA GPU and install/link CUDA libraries and tensorflow GPU if necessary.\nThe script will also prompt you to **provide a password** at some\npoint. This password is for connecting to jupyter from your web browser.  Please\ntake note of it since you'll be prompted to enter it when you start working in Jupyter.\n\nTo complete and test the setup, reload bashrc to load the newly created virtual\nenvironment:\n\n```\n. ~/.bashrc\n```\n\n## Use Unix Screen to Start a Notebook\n\nScreen takes a little to get used to, but it will make working on cloud VMs much\nmore pleasant, especially with a project that needs to run many tasks!\n\nFor those not familiar with the unix screen command, Screen is known as a\n\"terminal multiplexer\", which allows you to run multiple terminal (shell)\ninstances at the same time in your ssh session. Furthermore, Screen sessions are\nNOT tied to your ssh session, which means that if you accidentally log out or\ndisconnect from your ssh session in the middle of running a long process running\non your VM, you will not lose your work!\n\nFurthermore, you might want multiple processes running simultaneously and have\nan easy way to switch back and forth. A simple example is that you want to leave\nyour Jupyter notebook open while running a Cloud Dataflow job (which you do not\nwant abruptly canceled!). Running these in separate terminals is ideal.\n\nTo start screen for the first time, run:\n\n```\nscreen\n```\n\nand press return. This opens up a screen terminal (defaults to terminal 0).\n\nLet's create one more Screen terminal (terminal 1) by pressing `Ctrl-a`, and then\n`c` (We will write this shorthand as `Ctrl-a c`)\n\nYou can now jump between the two terminals by using `Ctrl-a n`, or access them\ndirectly using `Ctrl-a 0` or `Ctrl-a 1`.\n\nGo to terminal 0 by typing `Ctrl-a 0`, and then type:\n\n```\njupyter notebook\n```\n\nto start jupyter.\n\nFinally, detach from both Screen terminals by typing `Ctrl-a d`. If you want to\nresume the screen terminals, simply type:\n\n```\nscreen -R\n```\n\nFantastic! Now let's do another cool trick: Make sure you are detached from\nScreen terminals (type `Ctrl-a d` if necessary), and then exit the machine\nby typing:\n\n```\nexit\n```\n\nat the command line. You just exited the machine, but the Screen terminals are\nstill be running, including Jupyter which you started in Screen!\n\n## Connecting to Jupyter\n\nJupyter is now running inside a Screen terminal even though your ssh session has\nended. Let's try it out through an ssh tunnel (For security reasons, we will\nnot simply open up a firewall port and show your notebook to the entire world!)\n\nOn your local computer, make sure you have [gcloud sdk installed](https://cloud.google.com/sdk/downloads).\nThen run:\n\n```\ngcloud init\n```\n\nFollow the instructions and choose your project, and then choose the region\ncorresponding to where your vm was created. After this has been setup, run:\n\n```\ngcloud compute config-ssh\n```\n\nAfter this runs successfully, you will get this back in your shell:\n\n```\nYou should now be able to use ssh/scp with your instances.\nFor example, try running:\n\n  $ ssh [instance-name].[zone-name].[project-name]\n```  \n  \nRun the suggested command to check that ssh works when connecting to your cloud\nVM. Then exit the ssh shell by typing `exit`.\n\nNow we are ready to connect to Jupyter! Run the same ssh command again, but this\ntime, add some flags and ports:\n\n```\nssh -N -f -L localhost:8888:localhost:5000 [instance-name].[zone-name].[project-name]\n```\n\nThis command basically configures port forwarding, redirecting port 5000 on your\ncloud VM to your own computer's port 8888. Now go to your web browser, and type:\n\n```\nlocalhost:8888\n```\n\nIf you see a password page for Jupyter, enter your password as prompted.\nOnce you are in, you can see the notebook view of the directory you\nstarted Jupyter in.\n\nBefore proceeding, please read the [resource guide](RESOURCE_GUIDE.md) to beware\nof common pitfalls (such as forgetting to stop your VM when not using it!) and\nother ways to save on cost.\n\n**Hooray! [Let's go detect some cats!](cats/README.md)**\n","funding_links":[],"categories":["Jupyter Notebook"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-aai%2Fsc17","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-aai%2Fsc17","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-aai%2Fsc17/lists"}