https://github.com/manics/nextflow-aws-batch-tutorial
A basic example of using Nextflow with AWS Batch
https://github.com/manics/nextflow-aws-batch-tutorial
Last synced: 3 months ago
JSON representation
A basic example of using Nextflow with AWS Batch
- Host: GitHub
- URL: https://github.com/manics/nextflow-aws-batch-tutorial
- Owner: manics
- Created: 2021-04-20T17:46:07.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2021-06-16T16:13:46.000Z (about 5 years ago)
- Last Synced: 2025-01-22T00:46:46.796Z (over 1 year ago)
- Language: Nextflow
- Size: 3.91 KB
- Stars: 0
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Running nextflow jobs on AWS Batch
Notes on getting a Nextflow pipeline to run on AWS Batch
https://www.nextflow.io/docs/latest/awscloud.html
Note Nextflow [does not support Fargate](https://groups.google.com/g/nextflow/c/JFneg8d3x2w?pli=1), so you must use `EC2` or `EC2_SPOT` types.
## Setting up a batch queue
Create an execution environment https://docs.aws.amazon.com/cli/latest/reference/batch/create-compute-environment.html
Get (or create) subnets:
aws ec2 describe-subnets --query 'Subnets[].SubnetId'
Get the default security group (or alternative create a new one):
aws ec2 describe-security-groups --group-names default
Get the AWS Batch role ARN (this can be automatically created through the AWS console by creating and deleting a batch compute environment but you can also [create it manually](https://docs.aws.amazon.com/batch/latest/userguide/service_IAM_role.html)).
aws iam get-role --role-name AWSServiceRoleForBatch
Check you have the AWS ECS instance and spot fleet roles.
These can be automatically created through the AWS console by creating and deleting an ECS spot cluster but you can also create it manually: [ECS instance role](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/instance_IAM_role.html), [ECS spot fleet role](https://docs.aws.amazon.com/batch/latest/userguide/spot_fleet_IAM_role.html).
aws iam get-role --role-name ecsInstanceRole
aws iam get-role --role-name ecsSpotFleetRole
However the `ecsInstanceRole` does not contain the S3 permissions required by Nextflow, so you either need to augment that role, or preferably create a new role and instance profile `nextflowEcsInstanceRole`:
aws iam create-role --role-name nextflowEcsInstanceRole --assume-role-policy-document file://nextflowEcsInstanceRole-assume-role-policy.json
aws iam attach-role-policy --role-name nextflowEcsInstanceRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
aws iam attach-role-policy --role-name nextflowEcsInstanceRole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
aws iam create-instance-profile --instance-profile-name nextflowEcsInstanceRole
aws iam add-role-to-instance-profile --instance-profile-name nextflowEcsInstanceRole --role-name nextflowEcsInstanceRole
The above role has full S3 access, in production you may want to limit access to just one bucket.
Edit [`batch-compute-environment.json`](./batch-compute-environment.json), replace:
- `SUBNET_IDS`
- `SECURITY_GROUP_IDS`
- `AWS_BATCH_SERVICE_ROLE_ARN`
Now create the compute environment:
aws batch create-compute-environment --cli-input-json file://batch-compute-environment-spot.json
Create the job queue
aws batch create-job-queue --job-queue-name TEST-nextflow-batch-queue --state ENABLED --priority 1 --compute-environment-order order=1,computeEnvironment=TEST-nextflow-batch-compute-m4
## Storage bucket
Nextflow with AWS Batch requires an S3 location to store its outputs.
If you don't already have a location create a new bucket:
aws s3 mb s3://BUCKET_NAME
## Nextflow task container
Nextflow on AWS Batch requires the AWS CLI to be present either in the Docker image used for executing tasks, or in the Docker Host AMI. The latter is recommended so you can use unmodified task images for executing tasks, but for now build an image that includes the AWS CLI:
docker build -t /nextflow-test:latest ./docker-image
docker push /nextflow-test:latest
Edit the `container` lines in [`tutorial.nf`](./`tutorial.nf`) to `/nextflow-test:latest`.
## Running
nextflow run tutorial.nf -bucket-dir s3://BUCKET_NAME/some/path
Note if you are using temporary AWS session credentials then [setting them with environment variables (`AWS_ACCESS_KEY_ID` `AWS_SECRET_ACCESS_KEY` `AWS_SESSION_TOKEN`) does not work](https://github.com/nextflow-io/nextflow/issues/1724). Instead you should add the temporary credentials to your `~/.aws/credentials` file and set `AWS_PROFILE=`.
You can optionally [enable tracing](https://www.nextflow.io/docs/latest/tracing.html) by adding flags `-with-report out.html` and/or `-with-trace`.
## Fetching results
List all files in the S3 bucket recursively:
aws s3 ls --recursive s3://BUCKET_NAME/some/path
Copy all files
aws cp --recursive s3://BUCKET_NAME/some/path dest
## Clean up
Delete the compute environment
aws batch update-job-queue --job-queue TEST-nextflow-batch-queue --state DISABLED
aws delete-job-queue --job-queue TEST-nextflow-batch-queue
aws batch update-compute-environment --compute-environment TEST-nextflow-batch-compute-m4 --state DISABLED
aws batch delete-compute-environment --compute-environment TEST-nextflow-batch-compute-m4
## Additional options
- Specify a launch template in the compute environment to custommise an AMI at launch time without rebuilding
https://docs.aws.amazon.com/batch/latest/userguide/launch-templates.html
- You can set a container or AWS job definition in `nextflow.config` instead of in the nextflow file.