Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yousafkhamza/sagemaker-notebook-instance-autoshutdown
This is a terraform script and it includes a python script that we can use to shutdown sagemaker notebook instances using the lambda function with tags. Also, terraform creates a cloudwatch trigger for lambda, and which that time you can assign whenever you want.
https://github.com/yousafkhamza/sagemaker-notebook-instance-autoshutdown
aws lambda-functions python sagemaker-notebook-container script shutdown terraform
Last synced: about 1 month ago
JSON representation
This is a terraform script and it includes a python script that we can use to shutdown sagemaker notebook instances using the lambda function with tags. Also, terraform creates a cloudwatch trigger for lambda, and which that time you can assign whenever you want.
- Host: GitHub
- URL: https://github.com/yousafkhamza/sagemaker-notebook-instance-autoshutdown
- Owner: yousafkhamza
- Created: 2022-07-30T03:33:39.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-07-30T09:14:18.000Z (over 2 years ago)
- Last Synced: 2024-11-08T13:12:09.594Z (3 months ago)
- Topics: aws, lambda-functions, python, sagemaker-notebook-container, script, shutdown, terraform
- Language: HCL
- Homepage:
- Size: 511 KB
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# SageMaker Instance AutoShutDown with Python and Terraform
[![Build](https://travis-ci.org/joemccann/dillinger.svg?branch=master)](https://travis-ci.org/joemccann/dillinger)
---
## Description
This is a terraform script and it includes a python script that we can use to shutdown sagemaker notebook instances using the lambda function with tags. Also, terraform creates a cloudwatch trigger for lambda, and which that time you can assign whenever you want.Furthermore, if you have a sagemaker notebook instance please add a tag like "AutoStop: True". Please note that the lambda python script works with the tag so please add the same from yourself on your existing sagemaker notebook instances.
----
## Feature
- Save Money (SageMaker instances are too costly.)
- Automated with CloudWatch Event Rule
- SageMaker Shutdown with tags feature.----
## Architecture
![alt text](https://i.ibb.co/tXS81cJ/flow.jpg)----
## Services Created- IAM Role (Custom Inline Policies for lambda)
- Lambda Function (Python for SageMaker ShutDown)
- Cloudwatch Event Rule Trigger----
## Pre-Requests
- Terraform (Configure your machine with AWS Creds with the above service access)
- Git
- Please add tag (Key: AutoStop, Value: True)
![alt text](https://i.ibb.co/fvHY48f/Screenshot-7.png)#### Terraform Installation
[Terraform Installation from official](https://www.terraform.io/downloads)_TFSwitch Installation:_
```
curl -L https://raw.githubusercontent.com/warrensbox/terraform-switcher/release/install.sh | bash
```#### Pre-Requests (for RedHat-based-Linux)
```
yum install -y git
```#### Pre-Requests (for Debian-based-Linux)
````
apt install -y git
````#### Pre-Requests (for Termux-based-Linux)
````
pkg upgrade
pkg install git
````---
## How to Get
```
git clone https://github.com/yousafkhamza/SageMaker-Notebook-Instance-AutoShutDown.git
cd SageMaker-Notebook-Instance-AutoShutDown
```----
## How to execute
```
terraform init
terraform plan
terraform apply
```----
## Output be like
```
$terraform apply --auto-approveTerraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
+ createTerraform will perform the following actions:
# aws_cloudwatch_event_rule.trigger_to_stop_sagemaker_instance will be created
+ resource "aws_cloudwatch_event_rule" "trigger_to_stop_sagemaker_instance" {
+ arn = (known after apply)
+ description = "Trigger that moving data lambda"
+ event_bus_name = "default"
+ id = (known after apply)
+ is_enabled = true
+ name = "Trigger-stop-sagemaker-instance-lambda"
+ name_prefix = (known after apply)
+ schedule_expression = "cron(0 16 * * ? *)"
+ tags = {
+ "Name" = "sagemaker stop cloudwatch trigger"
}
+ tags_all = {
+ "Name" = "sagemaker stop cloudwatch trigger"
}
}# aws_cloudwatch_event_target.send_to_stop_lambda_target will be created
+ resource "aws_cloudwatch_event_target" "send_to_stop_lambda_target" {
+ arn = (known after apply)
+ event_bus_name = "default"
+ id = (known after apply)
+ rule = "Trigger-stop-sagemaker-instance-lambda"
+ target_id = "SendToLambda"
}# aws_iam_role.lambda_iam_role_terraform will be created
+ resource "aws_iam_role" "lambda_iam_role_terraform" {
+ arn = (known after apply)
+ assume_role_policy = jsonencode(
{
+ Statement = [
+ {
+ Action = "sts:AssumeRole"
+ Effect = "Allow"
+ Principal = {
+ Service = "lambda.amazonaws.com"
}
+ Sid = ""
},
]
+ Version = "2012-10-17"
}
)
+ create_date = (known after apply)
+ description = "IAM role for lambda to stop that SageMaker instance which we used tags AutoStop"
+ force_detach_policies = false
+ id = (known after apply)
+ managed_policy_arns = (known after apply)
+ max_session_duration = 3600
+ name = "Lambda-IAM-Role-For-SageMaker-Auto-ShutDown"
+ name_prefix = (known after apply)
+ path = "/lambda/"
+ tags_all = (known after apply)
+ unique_id = (known after apply)+ inline_policy {
+ name = "SageMaker-Auto-ShutDown-Inline-Policy"
+ policy = jsonencode(
{
+ Statement = [
+ {
+ Action = [
+ "sagemaker:StopNotebookInstance",
+ "sagemaker:ListTags",
+ "sagemaker:DescribeNotebookInstance",
+ "sagemaker:AddTags",
]
+ Effect = "Allow"
+ Resource = "arn:aws:sagemaker:ap-south-1:514645378310:notebook-instance/*"
+ Sid = ""
},
+ {
+ Action = "sagemaker:ListNotebookInstances"
+ Effect = "Allow"
+ Resource = "*"
+ Sid = ""
},
]
+ Version = "2012-10-17"
}
)
}
}# aws_lambda_function.sagemaker_stop_lambda_function will be created
+ resource "aws_lambda_function" "sagemaker_stop_lambda_function" {
+ architectures = (known after apply)
+ arn = (known after apply)
+ description = "This lambda using to stop the sagemaker instance which we mentioned"
+ filename = "tmp/stop-lambda-function.zip"
+ function_name = "sagemaker-stop-lambda-function"
+ handler = "lambda_function.lambda_handler"
+ id = (known after apply)
+ invoke_arn = (known after apply)
+ last_modified = (known after apply)
+ memory_size = 128
+ package_type = "Zip"
+ publish = false
+ qualified_arn = (known after apply)
+ reserved_concurrent_executions = -1
+ role = (known after apply)
+ runtime = "python3.9"
+ signing_job_arn = (known after apply)
+ signing_profile_version_arn = (known after apply)
+ source_code_hash = (known after apply)
+ source_code_size = (known after apply)
+ tags = {
+ "Name" = "sagemaker stop lambda function"
}
+ tags_all = {
+ "Name" = "sagemaker stop lambda function"
}
+ timeout = 60
+ version = (known after apply)+ environment {
+ variables = {
+ "REGION" = "ap-south-1"
}
}+ ephemeral_storage {
+ size = (known after apply)
}+ tracing_config {
+ mode = (known after apply)
}
}# aws_lambda_permission.allow_cloudwatch_to_call_stop_lambda will be created
+ resource "aws_lambda_permission" "allow_cloudwatch_to_call_stop_lambda" {
+ action = "lambda:InvokeFunction"
+ function_name = "sagemaker-stop-lambda-function"
+ id = (known after apply)
+ principal = "events.amazonaws.com"
+ source_arn = (known after apply)
+ statement_id = "AllowExecutionFromCloudWatch"
+ statement_id_prefix = (known after apply)
}Plan: 5 to add, 0 to change, 0 to destroy.
aws_iam_role.lambda_iam_role_terraform: Creating...
aws_iam_role.lambda_iam_role_terraform: Creation complete after 4s [id=Lambda-IAM-Role-For-SageMaker-Auto-ShutDown]
aws_lambda_function.sagemaker_stop_lambda_function: Creating...
aws_lambda_function.sagemaker_stop_lambda_function: Still creating... [10s elapsed]
aws_lambda_function.sagemaker_stop_lambda_function: Still creating... [20s elapsed]
aws_lambda_function.sagemaker_stop_lambda_function: Creation complete after 22s [id=sagemaker-stop-lambda-function]
aws_cloudwatch_event_rule.trigger_to_stop_sagemaker_instance: Creating...
aws_cloudwatch_event_rule.trigger_to_stop_sagemaker_instance: Creation complete after 1s [id=Trigger-stop-sagemaker-instance-lambda]
aws_lambda_permission.allow_cloudwatch_to_call_stop_lambda: Creating...
aws_cloudwatch_event_target.send_to_stop_lambda_target: Creating...
aws_cloudwatch_event_target.send_to_stop_lambda_target: Creation complete after 0s [id=Trigger-stop-sagemaker-instance-lambda-SendToLambda]
aws_lambda_permission.allow_cloudwatch_to_call_stop_lambda: Creation complete after 0s [id=AllowExecutionFromCloudWatch]Apply complete! Resources: 5 added, 0 changed, 0 destroyed.
```
### _Output Screenshots (at AWS)_
![alt text](https://i.ibb.co/vPp6Csq/Screenshot-8.png)----
## Behind the code
### Terraform code
`Fetch.tf`:
```
# Fetch Account ID for IAM.
data "aws_caller_identity" "current" {
}
```
> File used to fetch current account id and the account used to secure IAM role creation for lambda`IAM.tf`:
```
# ----------------------------------
# IAM Role for Lambda Function
# ----------------------------------
# Assume Role - For Lambda
data "aws_iam_policy_document" "lambda_assume_role" {
statement {
effect = "Allow"
actions = [
"sts:AssumeRole"]
principals {
type = "Service"
identifiers = [
"lambda.amazonaws.com"]
}
}
}# Inline policy for sagemaker stop for lambda
data "aws_iam_policy_document" "auto_shutdown_iam_inline" {
statement {
actions = [
"sagemaker:ListTags",
"sagemaker:StopNotebookInstance",
"sagemaker:DescribeNotebookInstance",
"sagemaker:AddTags"
]
resources = [
"arn:aws:sagemaker:${var.aws_region}:${data.aws_caller_identity.current.account_id}:notebook-instance/*",
]
}
statement {
actions = [
"sagemaker:ListNotebookInstances",
]
resources = [
"*",
]
}
}# Role for Lambda and both assume and inline integrated
resource "aws_iam_role" "lambda_iam_role_terraform" {
name = "Lambda-IAM-Role-For-SageMaker-Auto-ShutDown"
path = "/lambda/"
assume_role_policy = data.aws_iam_policy_document.lambda_assume_role.jsondescription = "IAM role for lambda to stop that SageMaker instance which we used tags AutoStop"
inline_policy {
name = "SageMaker-Auto-ShutDown-Inline-Policy"
policy = data.aws_iam_policy_document.auto_shutdown_iam_inline.json
}
}
```
> Creating IAM role with SageMaker Shutdown and tag describing privilege for lambda function execution`Lambda_Stop.tf`:
```
# ----------------------------------
# Lambda_Function
# ----------------------------------
#archiving py file to zipping
data "archive_file" "stop_lambda_zip" {
type = "zip"
source_dir = "./lambda_code/stop/"
output_path = "tmp/${local.stop_lambda_function}.zip"
}resource "aws_lambda_function" "sagemaker_stop_lambda_function" {
filename = data.archive_file.stop_lambda_zip.output_path
function_name = "sagemaker-stop-lambda-function"
role = aws_iam_role.lambda_iam_role_terraform.arn
description = "This lambda using to stop the sagemaker instance which we used AutoStop tag"
handler = "lambda_function.lambda_handler"
runtime = local.runtime_lambda_function
timeout = 60
memory_size = 128environment {
variables = {
REGION = var.aws_region
}
}tags = tomap({"Name" = "sagemaker stop lambda function"})
}#-----------------------
# CloudWatch Trigger to stop sagemaker instance
#-----------------------
resource "aws_cloudwatch_event_rule" "trigger_to_stop_sagemaker_instance" {
name = "Trigger-stop-sagemaker-instance-lambda"
description = "Trigger that moving data lambda"
schedule_expression = var.stop_cron
tags = tomap({"Name" = "sagemaker stop cloudwatch trigger"})depends_on = [aws_lambda_function.sagemaker_stop_lambda_function]
}resource "aws_cloudwatch_event_target" "send_to_stop_lambda_target" {
rule = aws_cloudwatch_event_rule.trigger_to_stop_sagemaker_instance.name
target_id = "SendToLambda"
arn = aws_lambda_function.sagemaker_stop_lambda_function.arndepends_on = [aws_lambda_function.sagemaker_stop_lambda_function]
}resource "aws_lambda_permission" "allow_cloudwatch_to_call_stop_lambda" {
statement_id = "AllowExecutionFromCloudWatch"
action = "lambda:InvokeFunction"
function_name = aws_lambda_function.sagemaker_stop_lambda_function.function_name
principal = "events.amazonaws.com"
source_arn = aws_cloudwatch_event_rule.trigger_to_stop_sagemaker_instance.arndepends_on = [aws_lambda_function.sagemaker_stop_lambda_function,aws_cloudwatch_event_rule.trigger_to_stop_sagemaker_instance]
}
```
> will create a lambda function with the python code. also, create a cloudwatch trigger with your time and set up target with the same lambda as we created the same.`provider.tf`:
```
provider "aws" {
region = var.aws_region
}
```
> which region do you want?`variables.tf`:
```
variable "aws_region" {
type = string
description = "Which region do you use"
default = ""
}variable "stop_cron" {
type = string
description = "Cron time stop the instance which we created"
default = "cron(0 17 * * ? *)"
}locals {
stop_lambda_function = "stop-lambda-function"
runtime_lambda_function = "python3.9"
}
```
> variables creation and values are stored in tfvars file`terraform.tfvars`:
```
aws_region = "ap-south-1" # mention which region you need here.
stop_cron = "cron(0 16 * * ? *)" # Which time to stop that SageMaker instance here.
```
> Mention your region and lambda cloudwatch event trigger cron time`versions.tf`:
```
terraform {
required_version = ">= 0.13"required_providers {
aws = ">= 3.36"
}
}
```
> Supported versions with terraform`lambda_function.py`:
```
import boto3
from botocore.exceptions import ClientError
import osREGION = os.environ['REGION']
def lambda_handler(event, context):
client = boto3.client('sagemaker', region_name=REGION)
response = client.list_notebook_instances(MaxResults=100)
notebooks = response['NotebookInstances']
notebook_list = []
for notebook in notebooks:
notebook_dict = dict()
notebook_dict['NotebookInstanceName'] = notebook['NotebookInstanceName']
notebook_dict['NotebookInstanceArn'] = notebook['NotebookInstanceArn']
notebook_dict['NotebookInstanceStatus'] = notebook['NotebookInstanceStatus']
notebook_dict['InstanceType'] = notebook['InstanceType']
notebook_list.append(notebook_dict)for InstanceName in notebook_list:
InstanceSName=InstanceName['NotebookInstanceName']
DescribeInstance = client.describe_notebook_instance(
NotebookInstanceName=f'{InstanceSName}')
InstanceArn=DescribeInstance['NotebookInstanceArn']
Tags=client.list_tags(
ResourceArn=f'{InstanceArn}')
if len(Tags['Tags']) != 0:
if Tags['Tags'][0]['Key'] == "AutoStop" and Tags['Tags'][0]['Value'] == 'True':
if DescribeInstance['NotebookInstanceStatus'] == 'InService':
try:
print("Stopping SageMaker Instance : {}".format(InstanceSName))
response = client.stop_notebook_instance(NotebookInstanceName=f'{InstanceSName}')
except ClientError as e:
print(e.response['Error']['Message'])
```
> lambda function code with sagemaker instance shutdown and please note that we are using tags AutoStop. So, if you need to change please change inside the code----
## Conclusion
SageMaker Notebook instances are costly. So, if you forget to shut down an instance and get more bills then you can use the terraform script and deploy the same to your aws account and add the tags to your existing instances then this will help you to shut down a particular time daily. Please note that you may start the instance yourself that couldn't be included in the script.### ⚙️ Connect with Me