{"id":15166951,"url":"https://github.com/datarobot/datarobot-sagemaker-examples","last_synced_at":"2025-09-30T23:30:32.276Z","repository":{"id":104905767,"uuid":"130381860","full_name":"datarobot/datarobot-sagemaker-examples","owner":"datarobot","description":"This repository contains some sample notebooks illustrating the use of DataRobot and SageMaker","archived":true,"fork":false,"pushed_at":"2018-04-20T15:41:47.000Z","size":624,"stargazers_count":9,"open_issues_count":0,"forks_count":9,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-01-20T12:39:18.169Z","etag":null,"topics":["amazon","aws","ipython-notebook","sagemaker","tech-alliances"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/datarobot.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-04-20T15:38:21.000Z","updated_at":"2023-01-28T18:33:06.000Z","dependencies_parsed_at":"2023-04-30T20:32:09.427Z","dependency_job_id":null,"html_url":"https://github.com/datarobot/datarobot-sagemaker-examples","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/datarobot/datarobot-sagemaker-examples","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fdatarobot-sagemaker-examples","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fdatarobot-sagemaker-examples/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fdatarobot-sagemaker-examples/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fdatarobot-sagemaker-examples/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/datarobot","download_url":"https://codeload.github.com/datarobot/datarobot-sagemaker-examples/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/datarobot%2Fdatarobot-sagemaker-examples/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":277771905,"owners_count":25874466,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-30T02:00:09.208Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["amazon","aws","ipython-notebook","sagemaker","tech-alliances"],"created_at":"2024-09-27T05:04:21.417Z","updated_at":"2025-09-30T23:30:31.955Z","avatar_url":"https://github.com/datarobot.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# What is DataRobot\n\nThe DataRobot automated machine learning platform helps data scientists and\nbusiness analysts discover the best predictive models for every situation,\nand then deploy them so they can consistently make smarter and faster\nbusiness decisions that impact their company's bottom line.\n\n## Why use DataRobot with SageMaker\n\nDataRobot brings the power of auto-modeling to SageMaker users allowing them\nto quickly determine and use the best machine learning model for their\nproblem. Within minutes DataRobot can iterate on thousands of combinations of\nmodels, data preparation steps and parameters that would take days or weeks\nto do manually.\n\n## Before you start: Pre-requisites\n\nTo experience the power of DataRobot+SageMaker you’ll need a DataRobot\naccount. If your company already deployed DataRobot please get an account\nfrom your administrator. Otherwise, please [contact us] here:\n[https://www.datarobot.com/contact-us/][contact us]\n\n[contact us]: https://www.datarobot.com/contact-us/\n\n## Getting your DataRobot API Endpoint\n\n1. While logged in the DataRobot interface, click on the _profile_ icon on the top right corner of the screen.\n\n    ![profile icon](img/user_avatar.png)\n\n2. Select `Profile` from the drop down menu:\n\n    ![profile link](img/user_dropdown.png)\n\n3. Your API Token will be in the top section of your profile, copy to insert in your notebooks.\n\n    ![profile page](img/profile_page.png)\n\n---\n\n## Overview of Modeling Example\n\nStatistics on whether a flight was delayed and for how long are available\nfrom government databases for all the major carriers. It would be useful to\nbe able to predict before scheduling a flight whether or not it was likely to\nbe delayed. In the example notebooks below, we will use DataRobot to try to\nmodel whether a flight will be delayed, based on information such as the\nscheduled departure time and whether it rained the day of the flight.\n\n## List of notebooks\n\n\u003e**Before beginning with these notebooks**, make sure you have read through the\n[Installing Dependencies](#installing-dependencies) section of this document.\n\n1. [Basic Introduction][1] walks the user through the basics of using DataRobot\n   from a SageMaker notebook instance. This includes covering topics such as:\n   data preparation, uploading the dataset to DataRobot, kicking off\n   auto-modeling and finally getting predictions from the top ranking model.\n\n2. [Diving Deeper into Modeling][2] shows the user how to explore the models\n   created by the auto-modeling process in more detail. For example, we will see\n   how the models are performing against the training data.\n\n3. [Exploring Reasons for Prediction Results][3] examines enhanced\n   functionality supported by DataRobot to provide more insights into prediction\n   results. For certain project types, DataRobot supports producing explanations\n   about a its prediction output on a per-row basis.\n\n[1]: Basic%20Introduction%20to%20DataRobot%20via%20API.ipynb\n[2]: Diving%20Deeper%20into%20DataRobot%20Models.ipynb\n[3]: Working%20with%20DataRobot%20Reason%20Codes.ipynb\n\n## Installing Dependencies\n\nTo use these notebooks we require extra dependencies to be installed into the\n_Notebook Instance_. While it is possible to install packages directly inside\nthe running Jupyter instance, this is not ideal as everytime the instance\nis restarted, all modifications will be lost. To better support customization\nof the Notebook Instance environment Amazon provides [Lifecycle Configurations],\nwhich is a shell script that can be configured to run each time a notebook\ninstance runs. To learn more, see Amazon's [documentation][Lifecycle Configurations]\nor their [blog][lifecycle blog] post on the subject.\n\n[Lifecycle Configurations]: https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html\n[lifecycle blog]: https://aws.amazon.com/blogs/machine-learning/customize-your-amazon-sagemaker-notebook-instances-with-lifecycle-configurations-and-the-option-to-disable-internet-access/\n\n### Create Notebook Instance\n\nBelow we provide some simple steps on how to create a new notebook instance with\na lifecycle configuration that will prepare the instance to work with DataRobot. Unfortunately,\nit SageMaker **does not** allow you to attach a lifecycle configuration to an existing\nnotebook instance so we will launch a new one:\n\n1. Click the `Create notebook instance` button\n\n    ![create notebook first](img/create-notebook1.png)\n\n2. Fill in all the appropriate fields until you get to the `Lifecycle\n   configuration` drop-down and select `Create a new lifecycle configuration`\n   and this will pop open a new modal panel\n\n    ![create configuration drop down](img/create-configuration-dropdown.png)\n\n3. Give the configuration a descriptive name (e.g. _DataRobot-Standard_) and\n   then click on the `Start notebook` tab under the _Scripts_ section:\n\n    ![start script](img/start_notebook.png)\n\n4. Paste the script below into the script editor:\n\n```bash\n#!/bin/bash\n\nset -e\nexport PATH=/home/ec2-user/anaconda3/bin/:$PATH\n\n# Install DataRobot client package for Python 2\nconda install -n python2 -c conda-forge datarobot -y -q\n\n# Install DataRobot client package for Python 3\nconda install -n python3 -c conda-forge datarobot -y -q\n```\n\n\u003c!-- markdownlint-disable MD029 --\u003e\n5. Click the `Create configuration` button at the bottom of the page to save\n   a new Lifecycle Configuration.\n\n    ![create configuration](img/create-configuration2.png)\n\n5. Finish filling in the rest of the options for your instance and click the\n   `Create notebook instance` button at the bottom of the page to begin\n   launching your new instance.\n\n    ![create notebook final](img/create-notebook2.png)\n\n---\nIn the future, if you need to launch new notebook instances, you can reuse\nthe Lifecycle Configuration created in the steps above rather than creating a\nnew configuration. Note, scripts cannot run for longer than 5 minutes. If a\nscript runs for longer than 5 minutes, it fails and the notebook instance is\nnot created or started.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatarobot%2Fdatarobot-sagemaker-examples","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdatarobot%2Fdatarobot-sagemaker-examples","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdatarobot%2Fdatarobot-sagemaker-examples/lists"}