https://github.com/umccr/dracarys
:dragon_face: DRAGEN workflow tidying :fire:
https://github.com/umccr/dracarys
cancer-genomics dragen multiqc qc r-package r6
Last synced: 5 months ago
JSON representation
:dragon_face: DRAGEN workflow tidying :fire:
- Host: GitHub
- URL: https://github.com/umccr/dracarys
- Owner: umccr
- License: other
- Created: 2021-12-09T10:10:44.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-12-02T12:56:27.000Z (6 months ago)
- Last Synced: 2025-12-05T09:53:04.395Z (6 months ago)
- Topics: cancer-genomics, dragen, multiqc, qc, r-package, r6
- Language: R
- Homepage: https://umccr.github.io/dracarys/
- Size: 14.6 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 18
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
- [π₯ dracarys - UMCCR Workflow
Tidying](#fire-dracarys---umccr-workflow-tidying)
- [π Aim](#trophy-aim)
- [π Installation](#pizza-installation)
- [β¨ Supported Workflows](#sparkles-supported-workflows)
- [π CLI](#cyclone-cli)
- [π Running](#taxi-running)
# π₯ dracarys - UMCCR Workflow Tidying

- Docs:
[](https://anaconda.org/umccr/r-dracarys)
[](https://anaconda.org/umccr/r-dracarys)
## π Aim
Given a directory with results from a DRAGEN/UMCCR workflow, {dracarys}
will grab files of interest and transform them into βtidierβ structures
for output into TSV/Parquet/RDS format for downstream ingestion into a
database/data lake. See supported [workflows](#supported-workflows),
[running](#running) examples, and [CLI](#cli) options in the sections
below.
## π Installation
R
``` r
remotes::install_github("umccr/dracarys@vX.X.X") # for vX.X.X Release/Tag
```
Conda
- Linux & MacOS (non-M1)
``` bash
mamba create \
-n dracarys_env \
-c umccr -c bioconda -c conda-forge \
r-dracarys==X.X.X
conda activate dracarys_env
```
- MacOS M1
``` bash
CONDA_SUBDIR=osx-64 \
mamba create \
-n dracarys_env \
-c umccr -c bioconda -c conda-forge \
r-dracarys==X.X.X
conda activate dracarys_env
```
Docker
``` bash
docker pull --platform linux/amd64 ghcr.io/umccr/dracarys:X.X.X
```
## β¨ Supported Workflows
{dracarys} supports most outputs from the following DRAGEN/UMCCR
workflows:
| Workflow | Description |
|----|----|
| bcl_convert | [BCLConvert](https://emea.support.illumina.com/sequencing/sequencing_software/bcl-convert.html) workflow |
| tso_ctdna_tumor_only | [ctDNA TSO500](https://support-docs.illumina.com/SW/DRAGEN_TSO500_ctDNA_v2.1/Content/SW/TSO500/WorkflowDiagram_appT500ctDNAlocal.htm) workflow |
| wgs_alignment_qc | [DRAGEN DNA](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/DRAGEN/GPipelineIntro_fDG.htm) (alignment) workflow |
| wts_alignment_qc | [DRAGEN RNA](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/DRAGEN/GPipelineIntro_fDG.htm) (alignment) workflow |
| wts_tumor_only | [DRAGEN RNA](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/DRAGEN/GPipelineIntro_fDG.htm) workflow |
| wgs_tumor_normal | [DRAGEN Tumor/Normal](https://support-docs.illumina.com/SW/DRAGEN_v40/Content/SW/DRAGEN/GPipelineIntro_fDG.htm) workflow |
| umccrise | [umccrise](https://github.com/umccr/umccrise) workflow |
| rnasum | [RNAsum](https://github.com/umccr/RNAsum) workflow |
| sash | [sash](https://github.com/scwatts/sash) workflow |
| oncoanalyser | [oncoanalyser](https://github.com/nf-core/oncoanalyser) workflow |
See which output files from these workflows are supported in [Supported
Files](https://umccr.github.io/dracarys/articles/files.html).
## π CLI
A `dracarys.R` command line interface is available for convenience.
- If youβre using the conda package, the `dracarys.R` command will
already be available inside the activated conda environment.
- If youβre *not* using the conda package, you need to export the
`dracarys/inst/cli/` directory to your `PATH` in order to use
`dracarys.R`.
``` bash
dracarys_cli=$(Rscript -e 'x = system.file("cli", package = "dracarys"); cat(x, "\n")' | xargs)
export PATH="${dracarys_cli}:${PATH}"
```
dracarys.R --version
dracarys.R 0.16.0
#-----------------------------------#
dracarys.R --help
usage: dracarys.R [-h] [-v] {tidy} ...
π DRAGEN Output Post-Processing π₯
positional arguments:
{tidy} sub-command help
tidy Tidy UMCCR Workflow Outputs
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
#-----------------------------------#
#------- Tidy ----------------------#
dracarys.R tidy --help
usage: dracarys.R tidy [-h] -i IN_DIR -o OUT_DIR -p PREFIX [-t TOKEN]
[-l LOCAL_DIR] [-f FORMAT] [-n] [-q]
options:
-h, --help show this help message and exit
-i IN_DIR, --in_dir IN_DIR
βοΈ Directory with untidy UMCCR workflow results. Can
be GDS, S3 or local.
-o OUT_DIR, --out_dir OUT_DIR
π₯ Directory to output tidy results.
-p PREFIX, --prefix PREFIX
π» Prefix string used for all results.
-t TOKEN, --token TOKEN
π ICA access token. Default: ICA_ACCESS_TOKEN env var.
-l LOCAL_DIR, --local_dir LOCAL_DIR
π₯ If input is a GDS/S3 directory, download the
recognisable files to this directory. Default:
'/dracarys__sync'.
-f FORMAT, --format FORMAT
π¨ Format of output. Default: tsv.
-n, --dryrun π« Dry run - just show files to be tidied.
-q, --quiet π΄ Shush all the logs.
## π Running
{dracarys} takes as input (`--in_dir`) a directory with results from one
of the UMCCR [workflows](#supported-workflows). It will recursively scan
that directory for [supported
files](https://umccr.github.io/dracarys/articles/files.html), download
those into a local directory (`--gds_local_dir`), and then it will
parse, transform and write the tidied versions into the specified output
directory (`--out_dir`). A prefix (`--prefix`) is prepended to each of
the tidied files. The output file format (`--format`) can be tsv,
parquet, or both. To get just a list of supported files within the
specified input directory, use the `-n (--dryrun)` option.
R
``` r
# help(umccr_tidy)
in_dir <- "gds://path/to/subjectX_multiqc_data/"
out_dir <- tempdir()
prefix <- "subjectX"
umccr_tidy(in_dir = in_dir, out_dir = out_dir, prefix = prefix)
```
Mac/Linux
From within an activated conda environment or a shell with the
`dracarys.R` CLI available:
``` bash
dracarys.R tidy \
-i gds://path/to/subjectX_multiqc_data/ \
-o local_output_dir \
-p subjectX_prefix
```
Docker
``` bash
docker container run \
-v $(PWD):/mount1 \
--platform=linux/amd64 \
--env "ICA_ACCESS_TOKEN" \
--rm -it \
ghcr.io/umccr/dracarys:X.X.X \
dracarys.R tidy \
-i gds://path/to/subjectX_multiqc_data/ \
-o /mount1/output_dir \
-p subjectX_prefix
```