{"id":22428495,"url":"https://github.com/kwame-mintah/aws-lambda-model-training","last_synced_at":"2025-07-09T09:12:15.340Z","repository":{"id":223746780,"uuid":"761376351","full_name":"kwame-mintah/aws-lambda-model-training","owner":"kwame-mintah","description":"A lambda function split preprocessed data into training and validation used for starting a training job within AWS SageMaker.","archived":false,"fork":false,"pushed_at":"2024-08-17T11:05:35.000Z","size":126,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-01T11:43:46.705Z","etag":null,"topics":["aws","aws-lambda","model-training","python","python311","sagemaker"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kwame-mintah.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-02-21T18:52:10.000Z","updated_at":"2024-08-17T11:05:38.000Z","dependencies_parsed_at":"2024-05-08T21:25:18.244Z","dependency_job_id":"005a7611-86f6-48ba-add9-980ae100d312","html_url":"https://github.com/kwame-mintah/aws-lambda-model-training","commit_stats":null,"previous_names":["kwame-mintah/aws-lambda-model-training"],"tags_count":16,"template":false,"template_full_name":"kwame-mintah/aws-lambda-function-template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwame-mintah%2Faws-lambda-model-training","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwame-mintah%2Faws-lambda-model-training/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwame-mintah%2Faws-lambda-model-training/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kwame-mintah%2Faws-lambda-model-training/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kwame-mintah","download_url":"https://codeload.github.com/kwame-mintah/aws-lambda-model-training/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245798538,"owners_count":20673901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","aws-lambda","model-training","python","python311","sagemaker"],"created_at":"2024-12-05T20:15:01.883Z","updated_at":"2025-03-27T06:43:56.573Z","avatar_url":"https://github.com/kwame-mintah.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AWS Lambda Model Training\n\n[![Python 3.12](https://img.shields.io/badge/python-3.12-blue.svg)](https://www.python.org/downloads/release/python-3121/)\n[![🚧 Bump version](https://github.com/kwame-mintah/aws-lambda-model-training/actions/workflows/bump-repository-version.yml/badge.svg)](https://github.com/kwame-mintah/aws-lambda-model-training/actions/workflows/bump-repository-version.yml)\n[![🚀 Push Docker image to AWS ECR](https://github.com/kwame-mintah/aws-lambda-model-training/actions/workflows/push-docker-image-to-aws-ecr.yml/badge.svg)](https://github.com/kwame-mintah/aws-lambda-model-training/actions/workflows/push-docker-image-to-aws-ecr.yml)\n[![🧹 Run linter](https://github.com/kwame-mintah/aws-lambda-model-training/actions/workflows/run-python-linter.yml/badge.svg)](https://github.com/kwame-mintah/aws-lambda-model-training/actions/workflows/run-python-linter.yml)\n\nA lambda to split pre-processed data into, training, validation and test datasets uploaded to an S3 bucket. Training and validation\ndatasets will be used when starting the AWS SageMaker training job and the test data will be used during [model evaluation](https://github.com/kwame-mintah/aws-lambda-model-evaluation).\n\nThis repository does not create the AWS resources, this is created via Terraform found here [terraform-aws-machine-learning-pipeline](https://github.com/kwame-mintah/terraform-aws-machine-learning-pipeline).\nFor more details on the entire flow and how this lambda is deployed, see [aws-automlops-serverless-deployment](https://github.com/kwame-mintah/aws-automlops-serverless-deployment).\n\n# Flowchart\n\nThe [diagram below](https://mermaid.js.org/syntax/flowchart.html#flowcharts-basic-syntax) demonstrates what happens when the lambda is trigger, when a new `.csv` object has been uploaded to the S3 Bucket.\n\n```mermaid\ngraph LR\n  S0(Start)\n  T1(Dataset pulled from S3 Bucket)\n  T2(Random split and sort using Numpy)\n  T3[[\"`70% training data\n    20% validation data\n    10% test data`\"]]\n  T4(\"Upload split data into S3 Bucket as `.csv`\")\n  T5(\"Start training job with training and validation data\")\n  E0(End)\n\n  S0--\u003eT1\n  T1--\u003eT2\n  T2--\u003eT3\n  T3--\u003eT4\n  T4--\u003eT5\n  T5--\u003eE0\n```\n\n# Notice\n\nThe code provided here should serve as an example for creating a lambda function to start a AWS SageMaker training job.\nTraining algorithm used within this project is [XGBoost](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) so\nhyperparameters may not be suitable for all algorithms.\n\nAdditionally, the hyperparameters values are currently hardcoded, one approach could be to read from a `manifest.json`\nfile detailing what parameters and value to use for each dataset received. This would allow for more flexibility, so\na new docker image does not need to be deployed, when hyperparameters need to be changed.\n\n## Development\n\n### Dependencies\n\n- [Python](https://www.python.org/downloads/release/python-3121/)\n- [Docker for Desktop](https://www.docker.com/products/docker-desktop/)\n- [Amazon Web Services](https://aws.amazon.com/?nc2=h_lg)\n\n## Usage\n\n1. Build the docker image locally:\n\n   ```shell\n   docker build --no-cache -t model_training:local .\n   ```\n\n2. Run the docker image built:\n\n   ```shell\n   docker run --platform linux/amd64 -p 9000:8080 model_training:local\n   ```\n\n3. Send an event to the lambda via curl:\n   ```shell\n   curl \"http://localhost:9000/2015-03-31/functions/function/invocations\" -d '{\u003cREPLACE_WITH_JSON_BELOW\u003e}'\n   ```\n   \u003cdetails\u003e\n   \u003csummary\u003eExample AWS S3 event received\u003c/summary\u003e\n   ```json\n   {\n     \"Records\": [\n       {\n         \"eventVersion\": \"2.0\",\n         \"eventSource\": \"aws:s3\",\n         \"awsRegion\": \"us-east-1\",\n         \"eventTime\": \"1970-01-01T00:00:00.000Z\",\n         \"eventName\": \"ObjectCreated:Put\",\n         \"userIdentity\": { \"principalId\": \"EXAMPLE\" },\n         \"requestParameters\": { \"sourceIPAddress\": \"127.0.0.1\" },\n         \"responseElements\": {\n           \"x-amz-request-id\": \"EXAMPLE123456789\",\n           \"x-amz-id-2\": \"EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH\"\n         },\n         \"s3\": {\n           \"s3SchemaVersion\": \"1.0\",\n           \"configurationId\": \"testConfigRule\",\n           \"bucket\": {\n             \"name\": \"example-bucket\",\n             \"ownerIdentity\": { \"principalId\": \"EXAMPLE\" },\n             \"arn\": \"arn:aws:s3:::example-bucket\"\n           },\n           \"object\": {\n             \"key\": \"automl/example-bank-file.csv\",\n             \"size\": 515246,\n             \"eTag\": \"0e29c0d99c654bbe83c42097c97743ed\",\n             \"sequencer\": \"00656A54CA3D69362D\"\n           }\n         }\n       }\n     ]\n   }\n   ```\n   \u003c/details\u003e\n\n## GitHub Action (CI/CD)\n\nThe GitHub Action \"🚀 Push Docker image to AWS ECR\" will check out the repository and push a docker image to the chosen AWS ECR using\n[configure-aws-credentials](https://github.com/aws-actions/configure-aws-credentials/tree/v4.0.1/) action. The following repository secrets need to be set:\n\n| Secret             | Description                  |\n|--------------------|------------------------------|\n| AWS_REGION         | The AWS Region.              |\n| AWS_ACCOUNT_ID     | The AWS account ID.          |\n| AWS_ECR_REPOSITORY | The AWS ECR repository name. |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkwame-mintah%2Faws-lambda-model-training","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkwame-mintah%2Faws-lambda-model-training","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkwame-mintah%2Faws-lambda-model-training/lists"}