https://github.com/massdriver-cloud/webinar-aws-sagemaker-inferencing
https://github.com/massdriver-cloud/webinar-aws-sagemaker-inferencing
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/massdriver-cloud/webinar-aws-sagemaker-inferencing
- Owner: massdriver-cloud
- License: mit
- Created: 2023-11-20T20:13:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-12-26T16:34:42.000Z (over 1 year ago)
- Last Synced: 2025-01-23T03:13:53.222Z (5 months ago)
- Language: Python
- Size: 265 KB
- Stars: 0
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SageMaker Demo
## Bundles Used:
- AWS VPC
- API Gateway
- S3 Bucket
- SageMaker Inference Endpoint (x2)
- Lambda
- SageMaker Domain## Demo Values
### SDXL
For [SDXL 1.0](https://stablediffusionxl.com) we're using SageMaker's JumpStart model data along with [AWS's Deep Learning Container (DLC)](https://github.com/aws/deep-learning-containers/blob/master/available_images.md).
- Initial Instance Count: 1
- SageMaker Instance Type: ml.g5.4xlarge ($2.03/hr)
- ECR URL: 763104351884.dkr.ecr.us-east-1.amazonaws.com/stabilityai-pytorch-inference:2.0.1-sgm0.1.0-gpu-py310-cu118-ubuntu20.04-sagemaker
- Model Data S3 URL: s3://jumpstart-cache-prod-us-east-1/stabilityai-infer/prepack/v1.0.1/infer-prepack-model-imagegeneration-stabilityai-stable-diffusion-xl-base-1-0.tar.gz### Mistral
Trained with https://github.com/philschmid/llm-sagemaker-sample
For [Mistral 7B](https://mistral.ai/news/announcing-mistral-7b/) we've downloaded the weights
- Initial Instance Count: 1
- SageMaker Instance Type: ml.g5.4xlarge ($2.03/hr)
- ERC URL: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04
- Model Data: s3://neuro-dev-awss3applica-xmb1/model.tar.gz (trained via SageMaker Studio Notebook in Domain)| ENV VARS | |
| ----------------------------- | ----------- |
| HF_MODEL_ID | /opt/ml/model |
| MAX_INPUT_LENGTH | 1024 |
| MAX_TOTAL_TOKENS | 2048 |
| SAGEMAKER_CONTAINER_LOG_LEVEL | 20 |
| SAGEMAKER_REGION | us-east-1 |
| SM_NUM_GPUS | 1 |### Lambda
In order to create a logic layer, we're going to create a Lambda that interacts with the SageMaker Endpoints via the SageMaker SDK
- Path: generate-image
- HTTP Method: POST
- Path: prompt-mistral
- HTTP Method: POST
- ECR URI: 083014189801.dkr.ecr.us-east-1.amazonaws.com/massdriver/sagemaker_demo
- ECR image tag: latest
- Runtime Memory Limit (MB): 1024
- Execution Timeout: 3 Minutes### Set Lambda ENV VARS
| Key | Value |
|-------------------|----------------------------------|
|LLM_ENDPOINT_NAME | prefix-env-name-hash-endpoint |
|S3_BUCKET | prefix-env-name-hash |
|SDXL_ENDPOINT_NAME | prefix-env-name-hash-endpoint |