{"id":27646061,"url":"https://github.com/stablecaps/ice-cat-wrangler","last_synced_at":"2025-04-24T01:16:25.269Z","repository":{"id":285790604,"uuid":"959355093","full_name":"stablecaps/ice-cat-wrangler","owner":"stablecaps","description":"A serverless REST API that identifies whether an uploaded image contains a cat","archived":false,"fork":false,"pushed_at":"2025-04-23T23:21:54.000Z","size":33815,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-04-24T01:16:04.849Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stablecaps.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-04-02T16:52:57.000Z","updated_at":"2025-04-15T23:22:03.000Z","dependencies_parsed_at":"2025-04-02T18:21:31.164Z","dependency_job_id":"04daedf0-3bb6-4c51-a16c-3b72b4050c61","html_url":"https://github.com/stablecaps/ice-cat-wrangler","commit_stats":null,"previous_names":["stablecaps/ice-cat-wrangler"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stablecaps%2Fice-cat-wrangler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stablecaps%2Fice-cat-wrangler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stablecaps%2Fice-cat-wrangler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stablecaps%2Fice-cat-wrangler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stablecaps","download_url":"https://codeload.github.com/stablecaps/ice-cat-wrangler/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250540935,"owners_count":21447428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-24T01:16:24.595Z","updated_at":"2025-04-24T01:16:25.258Z","avatar_url":"https://github.com/stablecaps.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ice-cat-wrangler - 2025 (A Timed Interview Test ~11 days)\n\n## Context\n\nIn this task, we ask you to prepare a simple service that, for a given image file (JPEG/PNG), answers the question: **Does the image file contain the image of a cat?**\n\n**Problem: Does the image file provided contain a cat?**\n\n**Input:** Picture file (JPG/PNG)\n\n**Output:** Yes or No\n\n---\n\n## Task Requirements\n\nWe need the following behaviors:\n\n1. User can upload an image file for scanning.\n2. User can only upload JPEG and PNG files.\n3. The image is kept in persistent storage.\n4. Scanning is a non-blocking operation.\n5. User can check the result of image scanning.\n6. Results are kept in persistent storage.\n7. Debug data is available for a power user.\n8. There is any kind of interface available to interact with the application.\n\n---\n\n## Extra Information\n\n1. Only consider **AWS cloud** and AWS/Amazon services.\n2. Build a **serverless application**.\n3. Use any high-level programming language.\n4. You are free to choose the build tools, libraries, etc.\n5. Ensure that the code in the submission is fully functional and include instructions for building and running it.\n6. You don't need to dedicate days to this; simply demonstrate your ability to craft excellent software.\n7. Use this exercise as a guide for design decisions, considering it as the initial prototype of a Minimum Viable Product (MVP) that will evolve into a production-ready deliverable.\n8. Be prepared for further discussions regarding how to transition and scale the prototype for future deployment.\n\n---\n\n# Solution Overview:\n\nI've tried to write code that sets-up avenues whereby we can minimise cost whilst maximising performance. Obviously, this is an iterative process that will materialise results as the requirements crystalise.\n\nThe repository is divided into 4 major components:\n1. **`infra-terra`:** Contains terraform code to create various AWS resources\n2. **`serverless`:** Contains a lambda function that handles image processing\n3. **`api_client`:** Contains an api_client that can upload images and get the categorisation results from uploaded files. Uses `rich` print instead of logs for a better user experience.\n4. **`shared_helpers`:** Contains shared functions and classes that can be shared between `infra-terra` and serverless. I've tried to write the code to be easily abstracted to other use cases. That way it could potentially be split out into a separate repo so that other applications can also use this code.\n\nThe solution relies upon using both terraform and serverless as deployment agents because each tool has its strengths \u0026 weaknesses. Note the following reasons:\n\n1. **Serverless**:\n    - Ideal for deploying Lambdas and API Gateways.\n    - `sls deploy` leverages easily definable configurations in the `serverless.yml` file.\n    - Requires less code to deploy things such as an API Gateway compared to Terraform.\n    - Resources are tightly coupled to application code changes which is helpful to developers.\n    - Allows developers to deploy without requiring specialised DevOps knowledge.\n\n#### Resources are shared between terraform and serverless using SSM variables. The api_client also leverages the same SSM variables to auto-configure env vars.\n\n2. **Terraform**:\nIt is good to deploy more fundamental resources such as S3 buckets, iam policies, etc using terraform because:\n    - It involves less code in some cases. (have you seen an IAM policy in `serverless.yml`?)\n    - The terraform deployment role in most companies usually ends up getting more \u0026 more permissions added to it till it converges to administrator perms. Allowing many devs to have admin access is a security issue. Thus, things like IAM, route53, etc should be restricted.\n    - As we would like to promote the app between identical dev --\u003e uat --\u003e prod stages, it would make sense to let devs create the app in a dev AWS account with admin perms. These perms would not be available in uat and prod AWS accounts.\n\n\nFor more details on the co-location method used here please see: [Terraform \u0026 Serverless Co-location Demo](https://github.com/stablecaps/terraform_and_serverless_demo)\n\nThe article also discusses other aspects such as how to best organise terraform repos, etc.\n\n## Infrastructure Overview\n\n[![Infrastructure Diagram](docs/infrastructure.drawio.png)](docs/infrastructure.drawio.png)\n\nThe `ice-cat-wrangler` application consists of the following AWS infrastructure components:\n\n### 1. **S3 Buckets**\n- **Source Bucket**: Stores images uploaded by the user.\n- **Destination Bucket**: Stores successfully processed images.\n- **Failure Bucket**: Stores images that failed processing.\n\n### 2. **Lambda Function**\n- **s3_bulkimganalyse.run**: Processes images uploaded to the source bucket.\n  - Reads the image from the source bucket.\n  - Submits the image to **Amazon Rekognition** for analysis.\n  - Writes results to **DynamoDB**.\n  - Moves the image to the destination or failure bucket based on the Rekognition response.\n\n### 3. **Amazon Rekognition**\n- Analyzes the image to determine if it contains a cat.\n- Returns a response to the Lambda function.\n\n### 4. **DynamoDB**\n- Stores metadata and processing results for each image.\n- Tracks the status of image processing (`pending`, `success`, `fail`).\n- Stores Rekognition responses and debug logs.\n\n### 5. **SSM Parameter Store**\n- Stores configuration values such as bucket names, IAM roles, and DynamoDB table names.\n- Used by both Terraform and Serverless for resource sharing.\n\n### 6. **IAM Roles**\n- Provides permissions for the Lambda function to access S3, Rekognition, and DynamoDB.\n* Another role provides permissions to Github actions so that an automated pipeline can deploy into AWS.\n\n\n### System Process Overview\nThe lambda function interacts with S3 and DynamoDB in the following manner after the user uploads an image to the S3 source bucket:\n```mermaid\nsequenceDiagram\n    participant User as User\n    participant S3 as S3 Bucket (Source)\n    participant Lambda as Lambda (s3_bulkimganalyse.run)\n    participant Rekognition as Rekognition\n    participant DynamoDB as DynamoDB\n    participant S3Dest as S3 Bucket (Destination)\n    participant S3Fail as S3 Bucket (Failure)\n\n    User-\u003e\u003eS3: Uploads image\n    S3--\u003e\u003eLambda: Triggers s3_bulkimganalyse.run\n    Lambda-\u003e\u003eDynamoDB: Write initial item (Step 2)\n    Lambda-\u003e\u003eS3: Retrieve file bytes (Step 3)\n    Lambda-\u003e\u003eRekognition: Submit image for categorization (Step 4)\n    Rekognition--\u003e\u003eLambda: Returns categorization response\n    Lambda-\u003e\u003eDynamoDB: Update item with Rekognition response (Step 5)\n    alt Success\n        Lambda-\u003e\u003eS3Dest: Success: Move image to destination bucket (Step 6)\n    else Failure\n        Lambda-\u003e\u003eS3Fail: Fail: Move image to failure bucket (Step 6)\n    end\n    Lambda-\u003e\u003eDynamoDB: Update item with final status (Step 7)\n    Lambda-\u003e\u003eDynamoDB: Write logs (if debug mode enabled)\n```\n\n\n---\n\n## Setup\n\n**Setup assumes you have AWS admin credentials and can export them into your terminal environment**\n\n### Installation order\n1. Terraform\n2. Serverless\n3. api_client\n\n---\n\n### A. setup repo\n```shell\ngit clone https://github.com/stablecaps/ice-cat-wrangler.git\ncd ice-cat-wrangler/\n```\n\n---\n\n### B. Terraform\n1. export your **aws admin keys**\n2. prepare terraform env vars\n3. Note you need to use terraform v1.11.3 binary\n\n```shell\n# Remove encrypted secrets file. This is for the repo pipeline (used by `secrets_decryptor.sh`)\n# - you won't need this unless you want to run the pipeline.\nrm -f infra-terra/envs/dev/dev.backend.hcl.enc\n\ncp infra-terra/envs/dev/dev.template.backend.hcl infra-terra/envs/dev/dev.backend.hcl\ncp infra-terra/envs/dev/dev.template.tfvars infra-terra/envs/dev/dev.tfvars\n\n# Now edit dev.tfvars \u0026 dev.backend.hcl with your preferred vars. Note you should change the\n# number at the end of `ice1` to something random because S3 buckets need to be globally unique.\n# Also change unique string to something random\n```\n\n3. Run terraform code using `infra-terra/xxx_tfhelperv3.sh`\n\nThis script runs terraform with your chosen TF binary (make sure it is in your PATH) and supplies various options such as backend \u0026 car files, autoapprove, and the TF action to take. Entrypoints are numbered to show the install order.\n\n```shell\ncd infra-terra/\n\n# Get Available entrypoints \u0026 help text by running script without args\n$ ./xxx_tfhelperv3.sh --help\n\nAvailable entrypoints:\n00_setup_terraform_remote_s3_backend_dev\n01b_github_actions_oidc\n01_sls_deployment_bucket\n02_cat_wrangler_s3_buckets\n03_cat_wrangler_backend\n04_create_lambda_permissions\n\nUsage: ./xxx_tfhelperv3.sh\n          terraform_exec=[path_to_terraform]\n          inipath=[path]\n          autoapprove=[yes|no]\n          env=[dev|prod]\n          action=[init|validate|plan|apply|full|destroy]\n\nParameters:\n  terraform_exec   Path to the Terraform executable.\n  inipath          Path from which Terraform is invoked.\n  autoapprove      Whether to auto-approve actions (yes or no).\n  env              Environment (dev or prod).\n  action           Terraform action to perform (init, plan, apply, full, destroy).\n\n\n# Example commands to run for a full TF deployment\n./xxx_tfhelperv3.sh terraform_v1.11.3 02_cat_wrangler_s3_buckets yes dev init\n./xxx_tfhelperv3.sh terraform_v1.11.3 02_cat_wrangler_s3_buckets yes dev plan\n./xxx_tfhelperv3.sh terraform_v1.11.3 02_cat_wrangler_s3_buckets yes dev validate\n./xxx_tfhelperv3.sh terraform_v1.11.3 02_cat_wrangler_s3_buckets yes dev apply\n\n# To run the whole thing in one go\n./xxx_tfhelperv3.sh terraform_v1.11.3 02_cat_wrangler_s3_buckets yes dev full\n\n# To destroy\n./xxx_tfhelperv3.sh terraform_v1.11.3 02_cat_wrangler_s3_buckets yes dev destroy\n```\n\n4. Entrypoint descriptions:\n* 00_setup_terraform_remote_s3_backend_dev: Sets up TF remotestate backend with DynamoDB \u0026 S3.\n* 01b_github_actions_oidc: Sets-up a [Github OIDC Role](https://docs.github.com/en/actions/security-for-github-actions/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services) so that pipelines can deploy into AWS via Github actions. This is optional as it is only used if running the github actions pipeline.\n* 01_sls_deployment_bucket: Creates a serverless deployment bucket into the root of S3. Ensures the root of S3 does not get cluttered with various serverless deploys.\n* 02_cat_wrangler_s3_buckets: Creates S3 buckets for uploaded images - source, success (dest) \u0026 fail buckets.\n* 03_cat_wrangler_backend: Creates DynamoDb Table.\n* 04_create_lambda_permissions: Creates IAM lambda role and permissions for cat-wrangler.\n\nNote SSM variables are exported at various stages so that `api_client` and `serverless` can grab variables such as ARNS, env-vars, etc created during TF deploys.\n\n5. Setting up TF remote backend\nOn first run comment out the S3 backend section. This will generate a local .tfstate file. Do this by editing `entrypoints/00_setup_terraform_remote_s3_backend_dev/provider.tf`:\n\n```shell\n# backend \"S3\" {\n#   key     = \"terraform-remotestate-stablecaps-dev/terraform.tfstate\"\n#   encrypt = \"true\"\n# }\n```\n\nThen run:\n\n```shell\n./xxx_tfhelperv3.sh terraform_v1.11.3 00_setup_terraform_remote_s3_backend_dev yes dev full\n\n# Then copy local tfstate file to remote backend.\ncd entrypoints/00_setup_terraform_remote_s3_backend_dev\n\n# uncomment 3 block in entrypoints/00_setup_terraform_remote_s3_backend_dev/provider.tf.\nbackend \"S3\" {\n  key     = \"terraform-remotestate-stablecaps-dev/terraform.tfstate\"\n  encrypt = \"true\"\n}\n\n# Then upload tfstate to remote backend and remove it.\nterraform_exec init -backend-config ../../envs/dev/dev.backend.hcl -migrate-state\nrm terraform.tfstate*\n```\n\n*Note: When destroying the backend, you should download the remote tfstate file to the local directory. Then comment the S3 backend block again. Then run the destroy using the local tfstate.*\n\n5. After this, install rest of TF infrastructure using folder numbers as an order guide.\n\n*Note: The permissions in 04_create_lambda_permissions are somewhat broad as this is a dev environment. These permissions would be tightened up via granular permissions in UAT before being deployed to PROD. I would utilise cloudtrail to create [restrictive policies](https://skildops.com/blog/generate-restricted-aws-iam-policy-via-cloudtrail).*\n\n6. `infra-terra` has a `Makefile`. It contains a convenience function to create terraform docs. Run using:\n\n```shell\nmake docs\n```\n\n---\n\n### C. Serverless\n1. Export your **aws admin keys**.\n2. Install serverless using nvm.\n```shell\n# Install nvm.\ncurl https://raw.githubusercontent.com/creationix/nvm/master/install.sh | bash\n\n# follow instructions to add env vars to .profile. Then\nsource ~/.profile\n\n# Find node version to use, ans install using nvm. I used v18.20.8.\nnvm ls-remote\nnvm install v18.20.8\nnvm use v18.20.8\nnvm alias default v18.20.8\n\n# Install serverless.\nnpm i serverless -g\nserverless update\n```\n\n3. Prepare python3.12 development environment with `shared_helpers`.\n\nThere is Makefile in the `serverless` directory that will help us install packages. We need to also install the `shared_helpers` module which is shared with `api_client` in editable mode (allows code changes to be reflected immediately without having to redo a pip install).\n\n```shell\nmake develop\n```\n\nThe make command basically:\n- Creates a virtual env called `venv`.\n- Installs pip-requirements-dev.\n- Installs `shared_helpers` module (which contains Boto3 dependency) in editable mode `pip install -e ../shared_helpers`.\n\n4. Prepare serverless env vars.\n\n```shell\ncd serverless\ncp serverless/config/dev.template.yml serverless/config/dev.yml\n\n# Then edit serverless/config/dev.yml to change env vars. We will get the iam_role_arn\n# from SSM. So you can leave this as some string.\n```\n\nThe `serverless.yml` file is setup to automatically download the following variables from SSM:\n- Deployment_bucket\n- IAM_role_arn\n- Image upload bucket\n- Image success bucket\n- Image fail bucket\n- DynamoDb table name\n\nIf you want to read from config file instead of SSM, uncomment code under the string:\n`\"# re-enable to get the deployment bucket from the config file instead of the SSM parameter\"` in `serverless.yml`.\n\n5. Install serverless plugins\n\nThis installs plugins specified in `serverless.yml`.\n```shell\nmake slsplugins\n```\n\n6. Deploying serverless with `shared_helpers`.\n\nWhen deploying our serverless package we also need to created a layer to hold the `shared_helpers` packages. There are several makefile helpers that will assist us in this.\n\n```shell\n# Build layer from ../shared_helpers.\nmake slslayer\n\n# Deploy just serverless without rebuilding layer (use if edits only occur\n# in serverless src files).\nmake slsdeploy\n\n# The previous 2 command combined (use if shared_helpers have been edited).\nmake slsdeployfull\n\n# run pytest:\nmake pytest\n\n# Generate pytest local coverage files:\nmake pytestcov\n```\n\n---\n\n### D. api_client\n\nThe api_client uses Boto3 to:\n- Upload images to S3.\n- Fetch results from DynamoDB.\n\n1. Export your **aws admin keys**.\n2. Prepare python3.12 development environment with `shared_helpers`.\n\nIt also has a make file so you can run:\n\n```shell\nmake develop\n\n# run pytest:\nmake pytest\n\n# Generate pytest local coverage files:\nmake pytestcov\n\n```\n3. Prepare api_client env vars.\n\n```shell\ncd api_client\ncp config/dev_conf_secrets.template config/dev_conf_secrets\n\n# Then edit api_client/config/dev_conf_secrets to change env vars.\n```\nNotes:\n- Instead of passing the secretsfile to the program to read the config file, you can instead use SSM.\n- If using SSM, please ensure the correct aws region you wish to use is exported:\n\n```shell\nexport AWS_REGION=eu-west-1\n```\n\n4. Running the api_client\n\nThe client uses the dispatch pattern to read CLI args and has several modes.\n\n```shell\n# show help\n$ ./client_launcher.py --help\n\nusage: .e.g: ./client_launcher.py {--secretsfile [ssm|dev_conf_secrets]} [--debug] {bulkanalyse|result|bulkresults} [\u003cargs\u003e]\n./client_launcher.py --secretsfile ssm --debug bulkanalyse --folder_path bulk_uploads/\n./client_launcher.py --secretsfile ssm result --imgfprint f54c84046c5ad9... --batchid 1744370618\n./client_launcher.py --secretsfile ssm bulkresults --batchfile logs/stablecaps900_batch-1744377772.json\n\nICE Cat API Client\n\npositional arguments:\n  {bulkanalyse,result,bulkresults}\n    bulkanalyse         Bulk upload images from local directory to AWS S3 bucket\n    result              Get results from AWS Lambda results function\n    bulkresults         Upload local image to AWS Lambda analyse function\n\noptions:\n  -h, --help            show this help message and exit\n  --secretsfile SECRETSFILE, -s SECRETSFILE\n                        Secrets file name located in config folder_path to load environment variables from, or 'ssm' to fetch from AWS SSM\n                        Parameter Store.\n  --debug, -d           Debug mode. Set to True to enable debug output.\n```\n\n#### bulkanalyse subcommand help\n```shell\n$ ./client_launcher.py bulkanalyse --help\n\nusage: .e.g: ./client_launcher.py {--secretsfile [ssm|dev_conf_secrets]} [--debug] {bulkanalyse|result|bulkresults} [\u003cargs\u003e]\n./client_launcher.py --secretsfile ssm --debug bulkanalyse --folder_path bulk_uploads/\n./client_launcher.py --secretsfile ssm result --imgfprint f54c84046c5ad9... --batchid 1744370618\n./client_launcher.py --secretsfile ssm bulkresults --batchfile logs/stablecaps900_batch-1744377772.json bulkanalyse\n       [-h] --folder FOLDER_PATH\n\noptions:\n  -h, --help            show this help message and exit\n  --folder FOLDER_PATH, -f FOLDER_PATH\n                        Path to the local folder containing images to upload.\n```\n\n#### result subcommand help\n```shell\n$ ./client_launcher.py bulkanalyse --help\n\nusage: .e.g: ./client_launcher.py {--secretsfile [ssm|dev_conf_secrets]} [--debug] {bulkanalyse|result|bulkresults} [\u003cargs\u003e]\n./client_launcher.py --secretsfile ssm --debug bulkanalyse --folder_path bulk_uploads/\n./client_launcher.py --secretsfile ssm result --imgfprint f54c84046c5ad9... --batchid 1744370618\n./client_launcher.py --secretsfile ssm bulkresults --batchfile logs/stablecaps900_batch-1744377772.json result\n       [-h] --batchid BATCH_ID --imgfprint IMG_FPRINT\n\noptions:\n  -h, --help            show this help message and exit\n  --batchid BATCH_ID, -b BATCH_ID\n                        Batch ID to get results for. e.g. 1234567890\n  --imgfprint IMG_FPRINT, -p IMG_FPRINT\n                        Image fingerprint hash to get results for. e.g. a91c54f1f00...\n```\n\n#### bulkesults subcommand help\n```shell\n$ /client_launcher.py result --help\n\nusage: .e.g: ./client_launcher.py {--secretsfile [ssm|dev_conf_secrets]} [--debug] {bulkanalyse|result|bulkresults} [\u003cargs\u003e]\n./client_launcher.py --secretsfile ssm --debug bulkanalyse --folder_path bulk_uploads/\n./client_launcher.py --secretsfile ssm result --imgfprint f54c84046c5ad9... --batchid 1744370618\n./client_launcher.py --secretsfile ssm bulkresults --batchfile logs/stablecaps900_batch-1744377772.json result\n       [-h] --batchid BATCH_ID --imgfprint IMG_FPRINT\n\noptions:\n  -h, --help            show this help message and exit\n  --batchid BATCH_ID, -b BATCH_ID\n                        Batch ID to get results for. e.g. 1234567890\n  --imgfprint IMG_FPRINT, -p IMG_FPRINT\n                        Image fingerprint hash to get results for. e.g. a91c54f1f00...\n```\n\n#### Command examples\n```shell\n# Bulk upload files to S3 source/incoming bucket.\n./client_launcher.py --secretsfile SSM --debug bulkanalyse --folder bulk_uploads/\n\n\n# Get the results for a single uploaded image (you need to know imgfprint \u0026 batchid).\n/client_launcher.py --secretsfile SSM result --imgfprint 0eaf1da24040970c6396ca59488ad7fa739ef7ab4ee1f757f180dade9adc43cf --batchid 1744481929\n\n# Get the results for multiple uploaded images from a previou run of bulkanalyse (you need the bulkanalyse batch file).\n./client_launcher.py --secretsfile ssm bulkresults --batchfile logs/stablecaps900_batch-1744377772.json result\n```\n\n5. How the api_client integrates with DynamoDB:\n\nThe DB uses `batch_id` as the partition key and `img_fprint` as the sort key.\n\n- batch_id: Every time the client is run a new random `batch_id` gets created.\n- img_fprint: This is the file hash of the image (sha256).\n- client_id: Each instance of the client has a `client_id`. This id can be manually set at `api_client/config/client_id`. If this file is not present, the program will automatically generate one using the format `stablecaps_$random_3digits` on first run.\n- debug: When a power user adds the debug flag, the s3 key has a `-debug` string added to the end. This allows the cat-wrangler-lambda to know that logs should be saved in the DB for that particular file.\n\n**logs folder:** When uploading images, the client stores details of the uploads as a list of dicts in json format. This can be used to keep track of jobs via `batch_id`. Effectively, the DB partition key \u0026 sort key is logged per image upload. If desired, the PK \u0026 SK can then be retreived for specific files to query DynamoDB using the `result` subcommand. Additonally, the entire log can be used as input itonto the `bulkresults` subcommand to get multiple results in one go.\n\n**debug logs:** When the `--debug` flag is supplied, debug logs from serverless are collected and written to the logs folder. e.g.: `api_client/logs/stablecaps900_1744561671-debug-logs.json`\n\n```json\n# Batch file example:\n[\n    {\n        \"client_id\": \"stablecaps900\",\n        \"batch_id\": \"batch-1744377772\",\n        \"s3bucket_source\": \"cat-wrangler-source-ice1-dev\",\n        \"s3_key\": \"0eaf1da24040970c6396ca59488ad7fa739ef7ab4ee1f757f180dade9adc43cf/stablecaps900/batch-1744377772/2025-04-11-13/1744377772-debug.png\",\n        \"original_file_name\": \"siberian-cats-for-sale-siberian-kitten-malechampion-bloodline-hampstead-garden-suburb-london-image-2.webp.png\",\n        \"upload_time\": \"2025-04-11-13\",\n        \"img_fprint\": \"0eaf1da24040970c6396ca59488ad7fa739ef7ab4ee1f757f180dade9adc43cf\",\n        \"epoch_timestamp\": 1744377772\n    }\n]\n```\n\n---\n\n### E. Tests\nTests are run using pytest in `serverless`, `api_client` \u0026 `shared_helpers` directories using the following make commands\n\n```shell\n# Run tests\nmake pytest\n\n# generate html and xml coverage reports\nmake pytestcov\n\n# view coverage by running from serverless api_client \u0026 shared_helpers directories:\ngoogle-chrome htmlcov/index.html\n```\n\n---\n\n### F. Github Actions Pipeline\nThe deployment pipeline file is located at `.github/workflows/deploy_ice_cat_wrangler.yml`.\n- Deploys all terraform \u0026 serverless components.\n- Seals with secrets by leveraging `secrets.txt`, `secrets_encryptor.sh` \u0026 `secrets_decryptor.sh` located in root of repo.\n\n**Secrets:**\n\nWe need to run CI, but do not want to store unencrypted secrets or config values in the repository. So\n\n```shell\n# 1. Populate secrets.txt with config file paths.\n\n# 2. Run the following on local to encrypt secrets intoa fikle with .enc extension.\n./secrets_encryptor.sh path/to/ice-cat-secrets-pass.txt\n\n# 3. When the ci/cd system runs, it decrypts the secrets as the correct config file using:\n./secrets_decryptor.sh path/to/ice-cat-secrets-pass.txt\n```\n\n---\n\n### G. Pre-commit hooks\nNote that `api_client`, `shared_helpers,` and `serverless` virtual envs have pre-commit installed to perform various checks to make sure code follows best practices.\n- The root of the repo contains the config for this in `.pre-commit-config.yaml`.\n- When you run `make develop` for either environment the build script automatically runs `pre-commit install`\n\n\n### H. Axioms for S3 Design\nI assume that if performance is important, the program should implement or add the following features:\n1. [Click for S3 performance](https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance.html):\n    - Multiple prefixes.\n    - Increase throughput via multiple client connections to upload multiple images asynchronously.\n    - Client should handle 503 slowdown messages.\n    - Check that requests are being spread over a wide pool of Amazon S3 IP addresses\n    - If using CloudFront, see if S3 transfer acceleration is beneficial.\n2. [Click for S3 naming](https://aws.amazon.com/blogs/big-data/building-and-maintaining-an-amazon-S3-metadata-index-without-servers/):\n    - Name with high cardinality for performance.\n    - Save the hash of the file in the event that we need an option to avoid reprocessing.\n    - Uniquely rename the file to prevent filename clashes in S3 (use the hash).\n\n\n### **[Click for Detailed Design S3 keys](docs/design_s3_key_naming.md)**\n\n---\n\n### I. Axioms for DB Design\n\n1. The image is kept in persistent storage.\n2. Scanning is a non-blocking operation:\n   - Results cannot therefore be returned in the same POST call.\n   - A separate results call is needed (should handle single and batch image submission results).\n3. User can check the result of image scanning:\n   - Assume this check can be performed at any time.\n   - The client stores a record of:\n     - `batch_id`\n     - `original_file_name`\n     - `upload_time=\"YYYY-MM-DD-HH\"`\n     - `img_fprint`\n     - `epoch_timestamp`\n   - For the database:\n     - Multiple clients/customers should have unique IDs to assist with searching.\n\n### **[Click for Detailed Design DynamoDB](docs/design_dynamodb.md)**\n\n---\n\n## Things to do\n- bump app version automatically.\n- ~~Pre-commit setup.~~\n- S3 lifecycle: Delete old objects (14 days).\n- DynamoDB (DDB):\n  - ~~TTL to delete old entries (14 days).~~\n  - ~~Autoscaling to handle load. (To be done on provisioned setup)~~\n  - Handle deletion of items from the bucket: Delete corresponding entries from DynamoDB.\n  - Implement other api_client dynamodb query methods.\n  - Still need to store original filename attribute (before client rename)\n- Reduce logging to save costs.\n- Finish tests.\n- ~~atexit not behaving as expected in lambda env - investigate~~\n- Re-raise final exception to allow lambda to handle retries. need additional infra like SQS DLQ\n- Fix todo's.\n- Implement lambda alarms in cloudwatch for errors, latency, timeouts, out of memory, etc.\n- Tighten up AWS perms.\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstablecaps%2Fice-cat-wrangler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstablecaps%2Fice-cat-wrangler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstablecaps%2Fice-cat-wrangler/lists"}