{"id":27871003,"url":"https://github.com/ucl-arc/terraform-aws-batch-processing","last_synced_at":"2026-04-29T00:32:06.171Z","repository":{"id":183442285,"uuid":"610155855","full_name":"UCL-ARC/terraform-aws-batch-processing","owner":"UCL-ARC","description":null,"archived":false,"fork":false,"pushed_at":"2023-09-01T15:18:59.000Z","size":259,"stargazers_count":1,"open_issues_count":4,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-04T23:34:22.871Z","etag":null,"topics":["arc","aws","cloud","terraform","ucl","ucl-arc"],"latest_commit_sha":null,"homepage":"","language":"HCL","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UCL-ARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2023-03-06T07:53:25.000Z","updated_at":"2023-09-07T13:17:32.000Z","dependencies_parsed_at":"2025-05-04T23:42:56.954Z","dependency_job_id":null,"html_url":"https://github.com/UCL-ARC/terraform-aws-batch-processing","commit_stats":null,"previous_names":["ucl-arc/terraform-aws-batch-processing"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/UCL-ARC/terraform-aws-batch-processing","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCL-ARC%2Fterraform-aws-batch-processing","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCL-ARC%2Fterraform-aws-batch-processing/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCL-ARC%2Fterraform-aws-batch-processing/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCL-ARC%2Fterraform-aws-batch-processing/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UCL-ARC","download_url":"https://codeload.github.com/UCL-ARC/terraform-aws-batch-processing/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UCL-ARC%2Fterraform-aws-batch-processing/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32405901,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-28T19:38:08.556Z","status":"ssl_error","status_checked_at":"2026-04-28T19:37:55.688Z","response_time":56,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arc","aws","cloud","terraform","ucl","ucl-arc"],"created_at":"2025-05-04T23:29:56.500Z","updated_at":"2026-04-29T00:32:06.156Z","avatar_url":"https://github.com/UCL-ARC.png","language":"HCL","funding_links":[],"categories":[],"sub_categories":[],"readme":"# terraform-aws-batch-processing\n## Purpose\n A collection of terraform modules which creates a generic data processing pipeline. This is a recurring infrastructure pattern in both academic and commerical settings.\n The codebase is designed to be modulerised but care is needed when a subset of the modules are used.\n\n## Description\n\nThe containerised workflow consists of three highlevel steps: upload a file (image or text base), trigger an automatic workflow which processes the uploaded data and then return the resultant output. To capture this workflow in a modern serverless design the following architected solution was developed. \n![Schematic of the infrastructure](TF_schematic.png)\n\n- A S3 bucket with intelligent storage options and lambda trigger is set up to receive the uploaded data. When data (file, image, etc.) arrives on the S3 bucket a lambda function is triggered which instantiates AWS Step Functions.\n- AWS Step Functions consists of three states: \n    1. A DataSync task for copying the contents of the upload S3 bucket to EFS.\n    2. AWS Batch which launches an AWS Fargate containerized task.\n    3. A DataSync task to copy the resultant output from the Batch job on EFS to a reports S3 bucket.\n- A S3 bucket denoted as Reports S3 in the schematic figure provides storage for the processed output from the container.\n- AppStream 2.0 allows a potential user a virtual desktop within the VPC. This service is included to enable quality control to be carried out on the containerized workflow.\n\n## Installation Steps\n\n### Prerequisites \n- An [AWS account](https://console.aws.amazon.com.). \n- The [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) installed. \n- The [Terraform CLI](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/install-cli) (1.2.0+) installed. \n\nEnsure that your IAM credentials can be used to authenticate the Terraform AWS provider. Details can be found on the Terraform [documentation](https://developer.hashicorp.com/terraform/tutorials/aws-get-started/aws-build). \n\n### Steps to set up the infrastructure \n1. To enable Appstream 2.0 one needs to first create the appropriate stacks and fleets. This can be done through the management [console](https://eu-west-2.console.aws.amazon.com/appstream2). Make note of the AppStream2.0 image name. \n2. Clone this GitHub repo. \n3. Navigate to the cloned repo in the terminal and run the following commands: \n- `terraform init` \n- `terraform validate`\n- `terraform apply`\n \n4. When prompted for the `as2_image_name` pass the value from Step 1. \n5. A complete list of the AWS services which will be deployed appears. The user should check this before agreeing to the deployment (**Please be advised this will incur a cost**). \n\nIf this is the first time to deploy please note that it will take some time.  \n\n## Details on Step Functions \nAs stated, the step functions state machine consists of three main tasks: \n1. DataSync between the S3 uploads bucket and EFS\n2. AWS Batch job with launches a Fargate Task\n3. DataSync task to copy data from EFS to the S3 reports bucket. \n\nHowever, to ensure that the DataSync tasks finish before moving to the next state a polling and wait pattern has been implemented.\n\n## Test the deployment \nThe best way to test the deployment is to navigate to the uploads S3 bucket in the management console. From here you can upload an example file to the bucket. This action should automatically trigger the step functions. To see this, open step functions in the management console and you should note a running job will have started. Once the workflow has completed you will see the processed file appear in the reports S3 bucket. \nCleaning up \n\nTo avoid incurring future charges, delete the resources. \n\n\n## Warning\n Please be advised that to use Appstream2.0 you will need to follow the console set up at [https://eu-west-2.console.aws.amazon.com/appstream2] before deploying the terraform. \n\u003c!-- BEGIN_TF_DOCS --\u003e\n## Requirements\n\n| Name | Version |\n|------|---------|\n| \u003ca name=\"requirement_terraform\"\u003e\u003c/a\u003e [terraform](#requirement\\_terraform) | \u003e= 1.3.9 |\n| \u003ca name=\"requirement_aws\"\u003e\u003c/a\u003e [aws](#requirement\\_aws) | \u003e= 4.9.0 |\n\n## Providers\n\nNo providers.\n\n## Modules\n\n| Name | Source | Version |\n|------|--------|---------|\n| \u003ca name=\"module_appstream\"\u003e\u003c/a\u003e [appstream](#module\\_appstream) | ./modules/appstream | n/a |\n| \u003ca name=\"module_batch\"\u003e\u003c/a\u003e [batch](#module\\_batch) | ./modules/batch | n/a |\n| \u003ca name=\"module_datasync\"\u003e\u003c/a\u003e [datasync](#module\\_datasync) | ./modules/datasync | n/a |\n| \u003ca name=\"module_efs\"\u003e\u003c/a\u003e [efs](#module\\_efs) | ./modules/efs | n/a |\n| \u003ca name=\"module_s3_reports\"\u003e\u003c/a\u003e [s3\\_reports](#module\\_s3\\_reports) | ./modules/s3_reports | n/a |\n| \u003ca name=\"module_s3_upload\"\u003e\u003c/a\u003e [s3\\_upload](#module\\_s3\\_upload) | ./modules/s3_upload | n/a |\n| \u003ca name=\"module_step_function\"\u003e\u003c/a\u003e [step\\_function](#module\\_step\\_function) | ./modules/step_function | n/a |\n| \u003ca name=\"module_vpc\"\u003e\u003c/a\u003e [vpc](#module\\_vpc) | ./modules/vpc | n/a |\n\n## Resources\n\nNo resources.\n\n## Inputs\n\n| Name | Description | Type | Default | Required |\n|------|-------------|------|---------|:--------:|\n| \u003ca name=\"input_as2_desired_instance_num\"\u003e\u003c/a\u003e [as2\\_desired\\_instance\\_num](#input\\_as2\\_desired\\_instance\\_num) | Desired number of AS2 instances | `number` | `1` | no |\n| \u003ca name=\"input_as2_fleet_description\"\u003e\u003c/a\u003e [as2\\_fleet\\_description](#input\\_as2\\_fleet\\_description) | Fleet description | `string` | `\"ARC batch process fleet\"` | no |\n| \u003ca name=\"input_as2_fleet_display_name\"\u003e\u003c/a\u003e [as2\\_fleet\\_display\\_name](#input\\_as2\\_fleet\\_display\\_name) | Fleet diplay name | `string` | `\"ARC batch process fleet\"` | no |\n| \u003ca name=\"input_as2_fleet_name\"\u003e\u003c/a\u003e [as2\\_fleet\\_name](#input\\_as2\\_fleet\\_name) | Fleet name | `string` | `\"ARC-batch-fleet\"` | no |\n| \u003ca name=\"input_as2_image_name\"\u003e\u003c/a\u003e [as2\\_image\\_name](#input\\_as2\\_image\\_name) | AS2 image to deploy | `string` | n/a | yes |\n| \u003ca name=\"input_as2_instance_type\"\u003e\u003c/a\u003e [as2\\_instance\\_type](#input\\_as2\\_instance\\_type) | AS2 instance type | `string` | `\"stream.standard.medium\"` | no |\n| \u003ca name=\"input_as2_stack_description\"\u003e\u003c/a\u003e [as2\\_stack\\_description](#input\\_as2\\_stack\\_description) | Stack description | `string` | `\"ARC batch process stack\"` | no |\n| \u003ca name=\"input_as2_stack_display_name\"\u003e\u003c/a\u003e [as2\\_stack\\_display\\_name](#input\\_as2\\_stack\\_display\\_name) | Stack diplay name | `string` | `\"ARC batch process stack\"` | no |\n| \u003ca name=\"input_as2_stack_name\"\u003e\u003c/a\u003e [as2\\_stack\\_name](#input\\_as2\\_stack\\_name) | Stack name | `string` | `\"ARC-batch-stack\"` | no |\n| \u003ca name=\"input_compute_environments\"\u003e\u003c/a\u003e [compute\\_environments](#input\\_compute\\_environments) | Compute environments | `string` | `\"fargate\"` | no |\n| \u003ca name=\"input_compute_resources_max_vcpus\"\u003e\u003c/a\u003e [compute\\_resources\\_max\\_vcpus](#input\\_compute\\_resources\\_max\\_vcpus) | Max VCPUs resources | `number` | `1` | no |\n| \u003ca name=\"input_container_image_url\"\u003e\u003c/a\u003e [container\\_image\\_url](#input\\_container\\_image\\_url) | Container image URL | `string` | `\"public.ecr.aws/docker/library/busybox:latest\"` | no |\n| \u003ca name=\"input_container_memory\"\u003e\u003c/a\u003e [container\\_memory](#input\\_container\\_memory) | Containter Memory resources | `number` | `2048` | no |\n| \u003ca name=\"input_container_vcpu\"\u003e\u003c/a\u003e [container\\_vcpu](#input\\_container\\_vcpu) | Containter VCPUs resources | `number` | `1` | no |\n| \u003ca name=\"input_efs_throughput_in_mibps\"\u003e\u003c/a\u003e [efs\\_throughput\\_in\\_mibps](#input\\_efs\\_throughput\\_in\\_mibps) | EFS provisioned throughput in mibps | `number` | `1` | no |\n| \u003ca name=\"input_efs_transition_to_ia_period\"\u003e\u003c/a\u003e [efs\\_transition\\_to\\_ia\\_period](#input\\_efs\\_transition\\_to\\_ia\\_period) | Lifecycle policy transition period to IA | `string` | `\"AFTER_7_DAYS\"` | no |\n| \u003ca name=\"input_region\"\u003e\u003c/a\u003e [region](#input\\_region) | The region to deploy into. | `string` | `\"eu-west-2\"` | no |\n| \u003ca name=\"input_solution_name\"\u003e\u003c/a\u003e [solution\\_name](#input\\_solution\\_name) | Overall name for the solution | `string` | `\"arc-batch\"` | no |\n| \u003ca name=\"input_vpc_cidr_block\"\u003e\u003c/a\u003e [vpc\\_cidr\\_block](#input\\_vpc\\_cidr\\_block) | The CIDR block for the VPC | `string` | `\"10.0.0.0/25\"` | no |\n\n## Outputs\n\nNo outputs.\n\u003c!-- END_TF_DOCS --\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucl-arc%2Fterraform-aws-batch-processing","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fucl-arc%2Fterraform-aws-batch-processing","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fucl-arc%2Fterraform-aws-batch-processing/lists"}