https://github.com/datngu/nf-rasqual
https://github.com/datngu/nf-rasqual
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/datngu/nf-rasqual
- Owner: datngu
- Created: 2024-09-21T04:37:53.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-09-21T04:51:01.000Z (8 months ago)
- Last Synced: 2025-01-21T20:48:47.400Z (4 months ago)
- Language: Nextflow
- Size: 84.8 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# nf-rasqual Pipeline
## Overview
This pipeline processes and analyzes genomic, transcriptomic, and ATAC-seq data for QTL (Quantitative Trait Loci) analysis. It uses genotype and phenotype data to identify associations, focusing on eQTLs (expression QTLs) and ATAC-QTLs. The pipeline is designed for flexibility, allowing users to configure various parameters, including the use of external linkage disequilibrium (LD) data for multiple testing correction with EigenMT.
## Inputs
The following input files are required:
- **Genome FASTA file**: The reference genome sequence.
- **Annotation GTF file**: Gene annotations for the reference genome (we use emsembel annotation in our analysis)
- **ATAC-seq BAM files**: ATAC-seq alignment data.
- **ATAC-seq feature counts**: Consensus peaks with feature counts, example given at `data/atac_consensus_peak_featureCounts_filtered.txt`
- **RNA-seq BAM files**: RNA-seq alignment data.
- **RNA-seq gene-level counts**: Gene-level counts from Salmon quantification, example given at `data/rna_gene_level_count_salmon.txt`
- **Genotype VCF file**: Genotype data in VCF format (compressed), example given at: `data/genotype.vcf.gz`, please pay high attention to this VCF and refer to RASQUAL orginal work to prepare this file properly.
- **LD Genotype VCF file**: (Optional) LD genotype data in VCF format (compressed).
- **Meta Information CSV file**: Metadata associated with the samples, example given at `data/meta/*.csv`## Parameters
- **Chromosome Range (`chrom`)**: Chromosomes to be processed. Default: `1..29`.
- **Phenotype PCs (`phenotype_PCs`)**: Number of principal components for phenotype correction. Default: `2`.
- **Expression Proportion (`exp_prop`)**: Proportion of sample pass the expression cutoff. Default: `0.5` means at least 50% sample must pass the FPKM cutoff to be retained.
- **FPKM Cutoff (`fpkm_cutoff`)**: Minimum FPKM value for filtering genes. Default: `0.5`.
- **ATAC Window (`atac_window`)**: Size of the window around ATAC-seq peaks for QTL analysis. Default: `10,000`.
- **eQTL Window (`eqtl_window`)**: Size of the window around gene features for eQTL analysis. Default: `500,000`.## Pipeline Options
- **ATAC-QTL Analysis (`atac_qtl`)**: Perform ATAC-QTL analysis. Default: `true`.
- **eQTL Analysis (`eqtl_qtl`)**: Perform eQTL analysis. Default: `true`.
- **External LD Data (`external_ld`)**: Use external LD genotype data. Default: `false`.## Output
The pipeline generates the following outputs:
- **Results Directory**: All output files are stored in `results/`.
- **Trace Directory**: Execution trace and logs are stored in `trace_dir/`.## Running the Pipeline
To run the pipeline, configure the input files and parameters in the pipeline script, then execute the following scripts:
- run_officical_brain.sh
- run_officical_liver.sh
- run_officical_gonad.sh
- run_officical_muscle.sh
- run_officical_gill.sh