An open API service indexing awesome lists of open source software.

https://github.com/datngu/nf-rasqual


https://github.com/datngu/nf-rasqual

Last synced: 2 months ago
JSON representation

Awesome Lists containing this project

README

        

# nf-rasqual Pipeline

## Overview

This pipeline processes and analyzes genomic, transcriptomic, and ATAC-seq data for QTL (Quantitative Trait Loci) analysis. It uses genotype and phenotype data to identify associations, focusing on eQTLs (expression QTLs) and ATAC-QTLs. The pipeline is designed for flexibility, allowing users to configure various parameters, including the use of external linkage disequilibrium (LD) data for multiple testing correction with EigenMT.

## Inputs

The following input files are required:

- **Genome FASTA file**: The reference genome sequence.
- **Annotation GTF file**: Gene annotations for the reference genome (we use emsembel annotation in our analysis)
- **ATAC-seq BAM files**: ATAC-seq alignment data.
- **ATAC-seq feature counts**: Consensus peaks with feature counts, example given at `data/atac_consensus_peak_featureCounts_filtered.txt`
- **RNA-seq BAM files**: RNA-seq alignment data.
- **RNA-seq gene-level counts**: Gene-level counts from Salmon quantification, example given at `data/rna_gene_level_count_salmon.txt`
- **Genotype VCF file**: Genotype data in VCF format (compressed), example given at: `data/genotype.vcf.gz`, please pay high attention to this VCF and refer to RASQUAL orginal work to prepare this file properly.
- **LD Genotype VCF file**: (Optional) LD genotype data in VCF format (compressed).
- **Meta Information CSV file**: Metadata associated with the samples, example given at `data/meta/*.csv`

## Parameters

- **Chromosome Range (`chrom`)**: Chromosomes to be processed. Default: `1..29`.
- **Phenotype PCs (`phenotype_PCs`)**: Number of principal components for phenotype correction. Default: `2`.
- **Expression Proportion (`exp_prop`)**: Proportion of sample pass the expression cutoff. Default: `0.5` means at least 50% sample must pass the FPKM cutoff to be retained.
- **FPKM Cutoff (`fpkm_cutoff`)**: Minimum FPKM value for filtering genes. Default: `0.5`.
- **ATAC Window (`atac_window`)**: Size of the window around ATAC-seq peaks for QTL analysis. Default: `10,000`.
- **eQTL Window (`eqtl_window`)**: Size of the window around gene features for eQTL analysis. Default: `500,000`.

## Pipeline Options

- **ATAC-QTL Analysis (`atac_qtl`)**: Perform ATAC-QTL analysis. Default: `true`.
- **eQTL Analysis (`eqtl_qtl`)**: Perform eQTL analysis. Default: `true`.
- **External LD Data (`external_ld`)**: Use external LD genotype data. Default: `false`.

## Output

The pipeline generates the following outputs:

- **Results Directory**: All output files are stored in `results/`.
- **Trace Directory**: Execution trace and logs are stored in `trace_dir/`.

## Running the Pipeline

To run the pipeline, configure the input files and parameters in the pipeline script, then execute the following scripts:

- run_officical_brain.sh
- run_officical_liver.sh
- run_officical_gonad.sh
- run_officical_muscle.sh
- run_officical_gill.sh