https://github.com/populationgenomics/cpg-flow-gatk-sv
CPG-Flow implementation of the GATK-SV pipeline, deployed at the Centre for Population Genomics
https://github.com/populationgenomics/cpg-flow-gatk-sv
bioinformatics cpg cpg-flow-pipeline pipeline workflow
Last synced: 4 months ago
JSON representation
CPG-Flow implementation of the GATK-SV pipeline, deployed at the Centre for Population Genomics
- Host: GitHub
- URL: https://github.com/populationgenomics/cpg-flow-gatk-sv
- Owner: populationgenomics
- License: mit
- Created: 2025-04-17T05:05:20.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2026-01-27T23:26:34.000Z (5 months ago)
- Last Synced: 2026-01-28T12:31:29.260Z (5 months ago)
- Topics: bioinformatics, cpg, cpg-flow-pipeline, pipeline, workflow
- Language: Python
- Homepage:
- Size: 193 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GATK-SV, in CPG-Flow
Current Version: 0.1.26
This repository contains the GATK-SV workflow, which is a wrapper around the [GATK-SV Cromwell workflow](https://github.com/broadinstitute/gatk-sv). It has been migrated from [Production-Pipelines](https://github.com/populationgenomics/production-pipelines/tree/main/cpg_workflows/stages/gatk_sv). This has been generated as part of the migration from Production-Pipelines's CPG_workflows framework, to the separate CPG-Flow.
## Workflows
This repository contains the following workflows:
- `singlesample_workflow` - previously called `gatk_sv_singlesample`. This is the initial per-sample variant calling, generation of QC metrics following variant calling, through to batching of samples based on those metrics. The exact stages are:
1. [GatherSampleEvidence](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/GatherSampleEvidence.wdl)
2. [EvidenceQC](https://github.com/broadinstitute/gatk-sv/blob/main/wdl/EvidenceQC.wdl)
3. `CreateSampleBatches` - a somewhat bespoke sample batching script
- `multisample_workflow` - previously called `gatk_sv_multisample`. This is the Cohort based batch processing of clustered samples, through to a final joint-called dataset.
## Structure
Following a best-practices CPG-Flow structure, the repository is structured as follows:
```commandline
├── Dockerfile
├── LICENSE
├── README.md
├── pull_request_template.md
├── pyproject.toml
└── src
├── cpg_flow_gatk_sv
│ ├── __init__.py
│ ├── config_template.toml
│ ├── singlesample_workflow.py
│ ├── multisample_workflow.py
│ ├── jobs
│ │ ├── CreateSampleBatches.py
│ │ ├── EvidenceQC.py
│ │ ├── GatherSampleEvidence.py
│ │ └── ...
│ ├── scripts
│ │ ├── __init__.py
│ │ └── sample_batching.py
│ │ └── ...
│ └── utils.py
```
This structure contains the following important files:
- `pyproject.toml` - the build instructions and linter settings for the repository
- `singlesample_workflow` - the Stages and workflow trigger for the first workflow
- `multisample_workflow` - the Stages and workflow trigger for the second workflow
- `jobs` - a directory containing the logic for each of the Stages in the workflow. Each file contains a single Stage, and is
named after the Stage it contains. This implements all the logic for a stage, including assembling the input dictionary to be passed to Cromwell.
- `scripts` - a directory containing any scripts which are not part of the GATK-SV Cromwell workflow, but are used in the workflow. This includes any helper scripts, or CPG-custom logic.
## Example invocation
This is designed to be run using analysis-runner, using a fully containerised installation of this codebase.
```bash
analysis-runner \
--skip-repo-checkout \
--image australia-southeast1-docker.pkg.dev/cpg-common/images/cpg-flow-gatk-sv:0.1.26 \
--dataset DATASET \
--description 'GATK-SV, CPG-flow' \
-o gatk-sv_cpg-flow \
--access-level full \
--config CONFIG \
singlesample_workflow
```
## Development
Semantic Versioning should be implemented with `bump-my-version`
```commandline
bump-my-version bump patch/minor/major
```
This is using the configuration block inside `pyproject.toml`