Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/meeranhussain/rnaseq_deseq2_snakemake
Differential gene expression analysis for samples with replicates using STAR-DeSeq2 pipeline
https://github.com/meeranhussain/rnaseq_deseq2_snakemake
deseq2-analysis differential-gene-expression rna rna-seq-analysis rna-seq-pipeline snakemake-workflow star-aligner
Last synced: 8 days ago
JSON representation
Differential gene expression analysis for samples with replicates using STAR-DeSeq2 pipeline
- Host: GitHub
- URL: https://github.com/meeranhussain/rnaseq_deseq2_snakemake
- Owner: meeranhussain
- Created: 2024-01-24T03:46:42.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-04-06T21:41:25.000Z (8 months ago)
- Last Synced: 2024-04-06T22:29:19.941Z (8 months ago)
- Topics: deseq2-analysis, differential-gene-expression, rna, rna-seq-analysis, rna-seq-pipeline, snakemake-workflow, star-aligner
- Language: R
- Homepage:
- Size: 44.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# RNA_seq-analysis
This workflow is for differential gene expression study for the samples with replicates
## FOR STAND ALONE RUN (INDIVIDUAL COMMANDS)
**1. Perform quality check on fastq file using FASTQC/MULTIQC**FASTQC:
```bash
fastqc *.fastq -o
```
-o : Directory to save output files (file must be created)
"*.fastq" represents to select all files with the ".fastq" extension in the working directoryMULTIQC:
```bash
multiqc /
```**2. Quality control using Trimgalore**
```bash
trim_galore -q 20 --paired --fastqc --cores -o
```**3. Indexing reference file**
```bash
STAR --runMode genomeGenerate --genomeDir --genomeFastaFiles --sjdbGTFfile --sjdbOverhang 100 --runThreadN 10
```**4. Alignment using STAR**
```bash
STAR --genomeDir --runThreadN --outSAMtype BAM SortedByCoordinate --readFilesCommand zcat --readFilesIn --outFileNamePrefix
```**5. Read Quantification using Feature count**
```bash
featureCounts -p -T --verbose -t exon -g gene_id -a -o
```# FOR SNAKEMAKE RUN
### Step 1: Make a Project Folder with Project_ID
Create a project folder and give it a meaningful Project_ID.### Step 2: Copy Files into Project Folder
Copy the following files into the project folder:
- `Snakefile`
- `Deseq2_final.R`
- `create_combinations.R`
- `config.yaml`
- `Master_file.txt`### Step 3: Create a Sub-folder "1_Data"
Inside the project folder, create a sub-folder named `1_Data`.### Step 4: Copy Sample Files to 1_Data
Copy the sample files into the `1_Data` folder. If in case you want to use characters in sample name make sure to use underscores (_) instead of hyphens (-) in file names. For example, replace '-' with '_' (e.g., `Tumor-1_R1.fq.gz` --> `Tumor_1_R1.fq.gz`).### Step 5: Create "Master_file.txt"
Create a file named `Master_file.txt` in the project folder. This file should specify the combinations and replicates. Refer to the example file provided for better clarity.### Step 6: Use Config File to Add Additional Information
Utilize the `config.yaml` file to add any additional information required for the workflow.#### Config.yaml Content for RNA_SEQ Snakemake Workflow (Example file)
```yaml
#### Enter organism name (Scientific name)
org: "Homo sapiens"#### Enter Kegg organism code
org_code: "hsa"#### Specify Number of threads
threads: "40"#### Specify Combinations using "+" between combinations
combinations: "control_Tumor + Tumor_control"#### Path to indexed reference folder (Reference indexing command provided below)
reference: ""
```
##### Genome indexing using STAR
```bash
STAR --runMode genomeGenerate --genomeDir {index_dir_name} --genomeFastaFiles {path to ".fasta" file} --sjdbGTFfile {path to ".gtf" file} --sjdbOverhang 100 --runThreadN 10
```
### Step 7: Open Terminal in Project Folder
Navigate to the project folder in your terminal/command prompt.## Step 8: Run Snakemake
Type the following command in the terminal:
```bash
snakemake --configfile=config.yaml --cores 5
```