{"id":23768568,"url":"https://github.com/nci-gdc/gpas-aws-workflow-runner","last_synced_at":"2026-03-27T19:30:18.660Z","repository":{"id":42155603,"uuid":"259136739","full_name":"NCI-GDC/gpas-aws-workflow-runner","owner":"NCI-GDC","description":"Repository contains steps and scripts to execute GPAS workflows on EC2 instances. ","archived":false,"fork":false,"pushed_at":"2023-04-12T05:20:55.000Z","size":290,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":14,"default_branch":"develop","last_synced_at":"2025-01-01T01:37:28.073Z","etag":null,"topics":["devops","gpas"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/NCI-GDC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-26T21:27:23.000Z","updated_at":"2020-12-13T17:31:38.000Z","dependencies_parsed_at":"2023-01-20T04:04:43.398Z","dependency_job_id":null,"html_url":"https://github.com/NCI-GDC/gpas-aws-workflow-runner","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgpas-aws-workflow-runner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgpas-aws-workflow-runner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgpas-aws-workflow-runner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/NCI-GDC%2Fgpas-aws-workflow-runner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/NCI-GDC","download_url":"https://codeload.github.com/NCI-GDC/gpas-aws-workflow-runner/tar.gz/refs/heads/develop","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239946915,"owners_count":19723018,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["devops","gpas"],"created_at":"2025-01-01T01:37:35.761Z","updated_at":"2026-03-27T19:30:18.601Z","avatar_url":"https://github.com/NCI-GDC.png","language":"Ruby","funding_links":[],"categories":[],"sub_categories":[],"readme":"# GDC Workflow Runner\n\n## Overview\n\n- GDC workflows are written in Common Workflow Language (CWL), and can be found in the [NCI-GDC github organisation](https://github.com/NCI-GDC/)\n\n- GDC workflows are used for production with the GDC Pipeline Automation System (GPAS). For the 4 workflows that needs to be tested, we created external user entrypoints that can be used independently without GPAS. Check README in each repo for more details.\n  - [DNA alignment](https://github.com/NCI-GDC/gdc-dnaseq-cwl/tree/feat/BINF-309)\n    - To convert user submitted DNA-Seq (WGS, WXS) BAM files into a GDC re-alignment BAM file.\n    - Some other files such as BAI file, and alignment metrics are also generated.\n  - [WGS variant calling](https://github.com/NCI-GDC/gdc-sanger-somatic-cwl)\n    - To accept a pair of tumor and normal WGS BAM files, and derive somatic mutation in VCF/ TSV/ PEDPE, and other outputs.\n  - [WXS variant calling](https://github.com/NCI-GDC/gdc-somatic-variant-calling-workflow)\n    - To accept a pair of tumor and normal WXS BAM files, and derive somatic mutations in VCF, and other outputs.\n  - [RNA alignment](https://github.com/NCI-GDC/gdc-rnaseq-cwl/tree/feat/etl)\n    - To accept BAM or FASTQ inputs, and derive 3 different BAMs, quantification TSV, spliceJunction TSV, and other outputs.\n\n- GDC workflows load dockers. All external dockers are public, and internal dockers are hosted in quay.io. We have created a quay group to share the required dockers to the APS team for testing purposes. (Will require quay id of AWP team members to add into this group)\n\n- GDC workflows require input molecular files. Stored in the `uchig-genomics-pipeline-us-east-1` s3 bucket.\n\n- GDC workflows require other reference files (such as human genome sequence). Also stored in the `uchig-genomics-pipeline-us-east-1` bucket.\n\n_Figure 1: Overview of GDC workflow_\n![Figure 1](assets/gdc_workflow_figure.png)\n\nFirst workflow that we will run will be a DNA-Seq Alignment workflow on a 2.5Gb WGS bam file.\n\n## Prereqs\n\n- **EC2** instance resources depend on the type of workflow running and the size of the input file. In this(We used c5d.4xlarge):\n  - cpus \u003e 4\n  - ram \u003e 12 Gb\n  - disk space \u003e 50Gb\n- Access to gdc-dnaseq-cwl workflow in github\n- Access to **uchig-genomics-pipeline-us-east-1** buckets.\n- Requirements on the instance:\n  - awscli\n  - docker\n  - Access to quay (for docker images)\n  - python\n  - cwltool\n  - nodejs\n\nWe have checked in a chef cookbook (gpas-worker) that can be used to build an AMI that will have all the requirements baked in. You can find the instructions [here](packer/README.md).\n\n\n## Running the workflow\n\n### Download requirements\n\nPull the required repositories.\n\n- The dna-seq alignment workflow\n\n```\ngit clone -b feat/BINF-309 git@github.com:NCI-GDC/gdc-dnaseq-cwl.git\n```\n\n- Scripts to run the workflow\n```\ngit clone git@github.com:NCI-GDC/gpas-aws-workflow-runner.git\n```\n\n```\ncd gpas-aws-workflow-runner/workflows/\n./download-input-files.sh\n```\n\n- Pack the cwlworkflow into a json. We use this internally to pass it as a payload.\n```\n./pack-workflow.sh /path/to/gdc-dnaseq-cwl/workflows/main/gdc_dnaseq_main_workflow.cwl\n```\n\n- Download the input bam file and its index file.\n```\naws s3 cp s3://uchig-genomics-pipeline-us-east-1/bioinformatics_scratch/shenglai/binf389/COLO-829.bam .\n\n```\n\n- Edit [WGS-hello-world.input.json](workflows/example_input_json/WGS-hello-world/wgs.hello-world.input.json) to update the placeholder of the input and reference files.\n\n### Run workflow\n\n- Run the script in a directory where you want to store the output file.\n```\n$ df -h /mnt\n/dev/nvme0n1    366G   57G  310G  16% /mnt\n\ncd /mnt/SCRATCH\n```\n\n- Run the script\n```\n/home/ubuntu/gpas-aws-workflow-runner/workflows/run-workflow.sh\n```\n\n\n### Tasks\n\n[DNA-Seq WGS hello world](workflows/tasks/WGS-hello-world/README.md)\n\n[DNA-Seq WGS](workflows/tasks/WGS/README.md)\n\n[DNA-Seq WXS](workflows/tasks/WXS/README.md)\n\n[RNA-Seq](workflows/tasks/RNA/README.md)\n\n[DNA-Seq WGS Sanger variant calling](workflows/tasks/WGS-Sanger/README.md)\n\n[DNA-Seq WXS somatic variant calling](workflows/tasks/WXS/README.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnci-gdc%2Fgpas-aws-workflow-runner","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnci-gdc%2Fgpas-aws-workflow-runner","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnci-gdc%2Fgpas-aws-workflow-runner/lists"}