{"id":49700587,"url":"https://github.com/fulcrumgenomics/twistcgp","last_synced_at":"2026-05-08T07:12:27.953Z","repository":{"id":341622475,"uuid":"978176087","full_name":"fulcrumgenomics/twistcgp","owner":"fulcrumgenomics","description":"Nextflow pipeline for Twist Comprehensive Genomic Profiling (CGP) panel analysis","archived":false,"fork":false,"pushed_at":"2026-05-06T23:53:34.000Z","size":47725,"stargazers_count":4,"open_issues_count":27,"forks_count":1,"subscribers_count":4,"default_branch":"main","last_synced_at":"2026-05-07T01:30:47.979Z","etag":null,"topics":["bioinformatics","genomics","nextflow","nf-core","oncology","variant-calling"],"latest_commit_sha":null,"homepage":"","language":"Nextflow","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fulcrumgenomics.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-05T15:28:23.000Z","updated_at":"2026-04-20T22:30:21.000Z","dependencies_parsed_at":"2026-03-24T20:05:33.574Z","dependency_job_id":null,"html_url":"https://github.com/fulcrumgenomics/twistcgp","commit_stats":null,"previous_names":["fulcrumgenomics/twistcgp"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/fulcrumgenomics/twistcgp","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ftwistcgp","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ftwistcgp/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ftwistcgp/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ftwistcgp/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fulcrumgenomics","download_url":"https://codeload.github.com/fulcrumgenomics/twistcgp/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fulcrumgenomics%2Ftwistcgp/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32770620,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-08T02:36:36.067Z","status":"ssl_error","status_checked_at":"2026-05-08T02:36:07.210Z","response_time":54,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","genomics","nextflow","nf-core","oncology","variant-calling"],"created_at":"2026-05-08T07:12:27.056Z","updated_at":"2026-05-08T07:12:27.934Z","avatar_url":"https://github.com/fulcrumgenomics.png","language":"Nextflow","funding_links":[],"categories":[],"sub_categories":[],"readme":"# twistcgp\n\nA bioinformatics pipeline for processing data from [Twist Bioscience's](https://www.twistbioscience.com/) TwistCGP product for targeted enrichment of cancer-associated genes.\n\n\u003cp\u003e\n\u003ca href=\"https://fulcrumgenomics.com\"\u003e\u003cimg src=\".github/logos/fulcrumgenomics.svg\" alt=\"Fulcrum Genomics\" height=\"100\"/\u003e\u003c/a\u003e\n\u003c/p\u003e\n\n[Visit us at Fulcrum Genomics](https://www.fulcrumgenomics.com) to learn more about how we can power your Bioinformatics with twistcgp and beyond.\n\n\u003ca href=\"mailto:contact@fulcrumgenomics.com?subject=[GitHub inquiry]\"\u003e\u003cimg src=\"https://img.shields.io/badge/Email_us-brightgreen.svg?\u0026style=for-the-badge\u0026logo=gmail\u0026logoColor=white\"/\u003e\u003c/a\u003e\n\u003ca href=\"https://www.fulcrumgenomics.com\"\u003e\u003cimg src=\"https://img.shields.io/badge/Visit_Us-blue.svg?\u0026style=for-the-badge\u0026logo=wordpress\u0026logoColor=white\"/\u003e\u003c/a\u003e\n\n\u003c!-- TODO nf-core: Include a figure that guides the user through the major workflow steps. Many nf-core\n     workflows use the \"tube map\" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples.   --\u003e\n\n### Pipeline Steps\n\n1. Index Genome ([`bwa-mem2`](https://github.com/bwa-mem2/bwa-mem2), [`samtools`](https://www.htslib.org/))\n1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/))\n1. Trim Adapters ([`fastp`](https://github.com/OpenGene/fastp))\n1. Fastq to BAM ([`fgbio FastqToBam`](http://fulcrumgenomics.github.io/fgbio/tools/latest/FastqToBam.html))\n1. Align ([`bwa-mem2`](https://github.com/bwa-mem2/bwa-mem2))\n1. Mark Duplicates ([`picard MarkDuplicates`](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates))\n1. Variant Calling via local Assembly of Haplotypes ([`gatk4/mutect2`](https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2))\n1. Annotate Variants ([`SnpEff`](https://pcingola.github.io/SnpEff/), [`Ensembl VEP`](https://useast.ensembl.org/info/docs/tools/vep/index.html), [`CIViCpy`](https://github.com/griffithlab/civicpy))\n1. Calculate Tumor Mutational Burden ([`pyTMB`](https://github.com/bioinfo-pf-curie/TMB))\n1. Call CNVs ([`CNVkit`](https://cnvkit.readthedocs.io/en/stable/index.html))\n1. Identify MSI ([`MSIsensor2`](https://github.com/niu-lab/msisensor2) or [`MSIsensor-pro`](https://github.com/xjtu-omics/msisensor-pro))\n1. Collect Metrics ([`picard CollectHsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics), [`picard CollectMultipleMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics), [`perbase`](https://github.com/sstadick/perbase))\n1. Present QC ([`MultiQC`](http://multiqc.info/))\n\n## Usage\n\n\u003e [!NOTE]\n\u003e If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow.\n\u003e Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `nextflow run twistcpg/main.nf -profile \"test,[docker|singularity|conda]\" --outdir ./results` before running the workflow on actual data.\n\nFor a full list of available options run `nextflow run twistcpg/main.nf --help --show_hidden`.\n\n### Prepare a Samplesheet\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\n```text\nsample,fastq_1,fastq_2\nILLUMINA_PAIRED_END,assets/test-data/fastq/Illumina_TestReads_R1_001.fastq.gz,assets/test-data/fastq/Illumina_TestReads_R2_001.fastq.gz\nMGI_SINGLE_END,assets/test-data/fastq/MGI_TestReads_1.fq.gz\n```\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\nThe sample column provides a unique identifier for the given sample.\n\n### Obtain a Genome\n\nThe TwistCGP panel was designed using the hg38 Genome in a Bottle (GIAB) reference genome FASTA file which can be obtained from [GIAB](https://ftp-trace.ncbi.nlm.nih.gov/ReferenceSamples/giab/release/references/GRCh38/GRCh38_GIABv3_no_alt_analysis_set_maskedGRC_decoys_MAP2K3_KMT2C_KCNJ18.fasta.gz).\n\n### Obtain list of Baits \u0026 Targets\n\nYou will need a BED or Interval List file for (1) the panel baits and (2) the panel targets.\nThe TwistCGP baits and targets files are available from Twist after purchasing the TwistCGP product.\nBED files should follow the [UCSC BED format specifications](https://genome.ucsc.edu/FAQ/FAQformat.html#format1); interval list files should adhere to [GATK interval list conventions](https://gatk.broadinstitute.org/hc/en-us/articles/360035531852-Intervals-and-interval-lists).\n\nTargets will be padded prior to variant calling; the padding size can be adjusted using the `--target_padding` parameter (default: 100, which adds 100 bp on each side of the interval).\n\n\u003e [!NOTE]\n\u003e If you lack the baits file, you can provide the panel targets for both arguments.\n\u003e Providing the targets as the baits will invalidate the bait specific metrics in the picard `HsMetrics`.\n\u003e Additionally, CNV calls from CNVkit may be noiser due to inaccurate modeling of bait locations.\n\n### (Optionally) Provide Adapter Sequences\n\nIf sequencing data is likely to include adapter sequences, providing these sequences in FASTA format will allow `fastp` to trim those sequences prior to alignment.\nThe adapter sequences can be supplied to the pipeline using the `--adapters_fasta` parameter.\n\n### Optional Time and Resource Saving Setup\n\n\u003cdetails\u003e \u003csummary\u003ePre-Generate a Genome Index\u003c/summary\u003e\n\nBecause this pipeline uses bwa-mem2 for alignment, 87GB of memory are required to generate the human genome index.\nAlternatively, this index can be built without the pipeline and the directory supplied using the `--bwa` parameter.\nSee [docs/bwamem2_index.md](/docs/bwamem2_index.md) for details.\n\nAdditionally, the genome index can be saved to the output directory for future use by supplying the `--save_reference` parameter.\nSubsequently, you may pass the index using `--bwa results/reference/bwamem2`.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e \u003csummary\u003ePre-Generate a MSI Genome Scan List\u003c/summary\u003e\n\nGeneration of the MSIsensor2 or MSIsensor-pro microsatellite scan list requires a space intensive, uncompressed reference genome.\nTo save time and space you can supply the scan list generated by MSIsensor2 (or MSIsensor-pro) using the `--msisensor_scan` parameter.\nSee [docs/msisensor_scan.md](/docs/msisensor_scan.md) for details.\n\nAdditionally, the MSI scan list can be saved to the output directory for future use by supplying the `--save_reference` parameter.\nSubsequently, you may pass the MSIsensor scan file using `--msisensor_scan results/reference/reference.msisensor_scan.list`.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e \u003csummary\u003ePre-Generate Variant Annotation Caches\u003c/summary\u003e\n\nSnpEff and Ensembl VEP require many large files known as a cache with which to annotate variants. To use pre-downloaded caches for variant annotation, supply the parameters `--snpeff_cache` and/or `--ensemblvep_cache` with the path to the root of the annotation cache folder. If a cache is not provided, the pipeline will automatically download it (which will add computation time). For details on how to generate each cache see [docs/variant_annotation.md](/docs/variant_annotation.md).\n\nAdditionally, the caches can be saved to the output directory for future use by supplying the `--save_reference` parameter.\nSubsequently, you may pass the caches using `--snpeff_cache results/reference/snpeff_cache/GRCh38.105` and `--ensemblvep_cache results/reference/ensemblvep_cache/vep_cache`.\n\n\u003c/details\u003e\n\n### Optional Variant Calling Resource Files\n\n\u003cdetails\u003e \u003csummary\u003ePopulation Germline Resource VCF\u003c/summary\u003e\n\nThis pipeline uses [Mutect2](https://gatk.broadinstitute.org/hc/en-us/articles/360037593851-Mutect2) to perform somatic variant calling on local haplotypes.\nWhile Mutect2 does not require a germline resource or a panel of normals (PoN) to run, both are recommended.\nThe germline resource VCF encapsulates population allele frequencies of known germline variants (typically from healthy individuals).\nThese frequencies are used by Mutect2 to model the likelihood that a specific variant is somatic or inherited.\nThe provided VCF file must contain allele frequencies.\n\nThe germline resource VCF can be supplied to the pipeline using the `--population_germline_vcf` parameter.\nThe corresponding TBI file can be supplied using the `--population_germline_tbi` parameter.\n\nSee [docs/germline_resource_vcf.md](/docs/germline_resource_vcf.md) for more details on how to generate this input.\n\n   \u003cdetails\u003e\u003csummary\u003eExample Germline Resource VCF Records\u003c/summary\u003e\n\n```\n#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO\n*      1       10067   .       T       TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC      30.35   PASS    AC=3;AF=7.384E-5\n*      1       10108   .       CAACCCT C       46514.32        PASS    AC=6;AF=1.525E-4\n*      1       10109   .       AACCCTAACCCT    AAACCCT,*       89837.27        PASS    AC=48,5;AF=0.001223,1.273E-4\n*      1       10114   .       TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA  *,CAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAACCCTAACCCTAACCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCCTA,T      36728.97        PASS    AC=55,9,1;AF=0.001373,2.246E-4,2.496E-5\n*      1       10119   .       CT      C,*     251.23  PASS    AC=5,1;AF=1.249E-4,2.498E-5\n*      1       10120   .       TA      CA,*    14928.74        PASS    AC=10,6;AF=2.5E-4,1.5E-4\n*      1       10128   .       ACCCTAACCCTAACCCTAAC    A,*     285.71  PASS    AC=3,1;AF=7.58E-5,2.527E-5\n*      1       10131   .       CT      C,*     378.93  PASS    AC=7,5;AF=1.765E-4,1.261E-4\n*      1       10132   .       TAACCC  *,T     18025.11        PASS    AC=12,2;AF=3.03E-4,5.049E-5\n```\n\n   \u003c/details\u003e\n\u003c/details\u003e\n\n\u003cdetails\u003e \u003csummary\u003ePanel of Normals VCF\u003c/summary\u003e\n\nWhile a panel of normals (PoN) VCF is not required for Mutect2 to run, it is recommended.\nA PoN is a VCF that contains sites found across multiple \"normal\" samples (e.g., derived from healthy tissue that is believed to not have somatic alterations), ideally from the same sequencing preparation, pipeline, platform, etc. as the tumor samples.\nWhile the germline resource helps model population variants, the PoN VCF filters out technical artifacts to improve the quality of the variant calling analyses.\n\nThe panel of normals VCF can be supplied to the pipeline using the `--pon_vcf` parameter.\nIts corresponding TBI file can be supplied using the `--pon_tbi` parameter.\n\nSee [docs/panel_of_normals_vcf.md](/docs/panel_of_normals_vcf.md) for more details on how to generate a panel of normals VCF.\n\n\u003c/details\u003e\n\n\u003cdetails\u003e \u003csummary\u003ePanel of Normals Reference for CNV Calling\u003c/summary\u003e\n\nYou may supply a Panel of Normal (PON) reference `.cnn` file for use with [CNVkit](https://cnvkit.readthedocs.io/en/stable/index.html) using the `--pon_cnn` parameter. If you do not supply a PON reference, a \"flat\" reference will be used which assumes equal coverage across the panel regions.\n\nFor details on how to generate this file see [docs/cnvkit_pon.md](/docs/cnvkit_pon.md).\n\n\u003c/details\u003e\n\n\u003cdetails\u003e \u003csummary\u003eGenerate a gnomAD VCF for TMB Calculation\u003c/summary\u003e\n\nTumor mutational burden (TMB) is measure of the total number of somatic mutations present within the cancer genome.\nIt is crucial to exclude germline variants for the calculation of TMB.\nThis pipeline expects a VCF derived from [gnomAD](https://gnomad.broadinstitute.org/).\n\nThe gnomAD VCF can be supplied to the pipeline using the `--gnomad_vcf` parameter.\nIts corresponding TBI file can be supplied using the `--gnomad_tbi` parameter.\n\nSee [docs/gnomad_vcf.md](/docs/gnomad_vcf.md) for details on how to generate a gnomAD VCF.\n\n\u003c/details\u003e\n\n### Run the Pipeline\n\nNow, you can run the pipeline (including optional inputs) using a command like:\n\n```console\nnextflow run twistcgp/main.nf \\\n   -profile \u003cdocker/singularity/conda\u003e \\\n   --fasta hg38_giab.fa \\\n   --input samplesheet.csv \\\n   --baits baits.bed \\\n   --targets targets.bed \\\n   --outdir results \\\n   --bwa resources/hg38_giab/bwamem2 \\\n   --msisensor_scan resources/hg38_giab.msisensor_scan.list \\\n   --ensemblvep_cache resources/ensemblevep_cache/vep_cache \\\n   --snpeff_cache resources/snpeff_cache/GRCh38.105 \\\n   --population_germline_vcf resources/af-only-gnomad.hg38.vcf.gz \\\n   --population_germline_tbi resources/af-only-gnomad.hg38.vcf.gz.tbi \\\n   --pon_vcf resources/1000g_pon.hg38.vcf.gz \\\n   --pon_tbi resources/1000g_pon.hg38.vcf.gz.tbi \\\n   --pon_cnn resources/pon.cnn \\\n   --gnomad_vcf resources/all_chromosomes.intersect.vcf.bgz \\\n   --gnomad_tbi resources/all_chromosomes.intersect.vcf.bgz.tbi\n```\n\n\u003e [!WARNING]\n\u003e Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files).\n\n## Credits\n\ntwistcgp was originally written by Erin McAuley and Zach Norgaard of [Fulcrum Genomics](https://fulcrumgenomics.com/).\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n- Nils Homer\n- Tim Dunn\n\n## Sponsors\n\nSponsors provide support for `twistcgp` through direct funding or employing contributors.\nPublic sponsors include:\n\n\u003cp\u003e\n\u003ca href=\"https://fulcrumgenomics.com\"\u003e\u003cimg src=\".github/logos/fulcrumgenomics.svg\" alt=\"Fulcrum Genomics\" height=\"35\"/\u003e\u003c/a\u003e\n\u0026nbsp;\n\u003ca href=\"https://www.twistbioscience.com\"\u003e\u003cpicture\u003e\u003csource media=\"(prefers-color-scheme: dark)\" srcset=\".github/logos/Twist-logo-dark-mode.png\"\u003e\u003csource media=\"(prefers-color-scheme: light)\" srcset=\".github/logos/Twist-logo-light-mode.png\"\u003e\u003cimg alt=\"Twist Biosciences\" src=\".github/logos/Twist-logo-light-mode.png\" height=\"35\"\u003e\u003c/picture\u003e\u003c/a\u003e\n\u0026nbsp;\n\u003c/p\u003e\n\n## Citations\n\n\u003c!-- TODO nf-core: Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. --\u003e\n\u003c!-- If you use fulcrumgenomics/twistcgp for your analysis, please cite it using the following doi: [10.5281/zenodo.XXXXXX](https://doi.org/10.5281/zenodo.XXXXXX) --\u003e\n\nThis pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/main/LICENSE).\n\n\u003e **The nf-core framework for community-curated bioinformatics pipelines.**\n\u003e\n\u003e Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso \u0026 Sven Nahnsen.\n\u003e\n\u003e _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffulcrumgenomics%2Ftwistcgp","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffulcrumgenomics%2Ftwistcgp","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffulcrumgenomics%2Ftwistcgp/lists"}