Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/meeranhussain/microbiome_analysis
Comparative study of soil microbiome using QIIME2
https://github.com/meeranhussain/microbiome_analysis
16s-rrna metagenomic-analysis microbiome qiime2
Last synced: 8 days ago
JSON representation
Comparative study of soil microbiome using QIIME2
- Host: GitHub
- URL: https://github.com/meeranhussain/microbiome_analysis
- Owner: meeranhussain
- Created: 2024-01-31T07:16:00.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-06-22T10:31:23.000Z (5 months ago)
- Last Synced: 2024-06-22T20:26:26.678Z (5 months ago)
- Topics: 16s-rrna, metagenomic-analysis, microbiome, qiime2
- Homepage:
- Size: 16.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Qiime2 Pipeline for 16s rRNA and ITS Sequence Analysis
**This pipeline was used in the following research work:** [Regenerative agriculture augments bacterial community structure for a healthier soil and agriculture](https://doi.org/10.3389/fagro.2023.1134514)The below pipeline provides codes for processing 16s rRNA data using Qiime2, from initial sequence trimming to taxonomy classification and visualization. Same steps can also be followed for processing ITS data
## 1. Load Sample List & Perform Quality Trimming
First, we load the sample IDs from a text file.```bash
#!/bin/bash### Load sample list
input="Sample_id.txt"while IFS= read -r i
do
echo $i
trim_galore -q 20 --paired --fastqc --cores 4 ${i}_R1.fq.gz ${i}_R2.fq.gz -o trimd_files
done < "$input"
```
## 2. Prepare Sample Metadata Sheet
Prepare a metadata sheet based on the study design and save it as soil_analysis_metadata.tsv.`EXAMPLE SHEET`
| sample_id | Farming_Method | Location | Kind_of_Vegetation |
|-----------|----------------|------------|--------------------|
| Indira-1 | Regenerative | Ramnagara | Beans |
| Indira-2 | Regenerative | Ramnagara | Ragi |
| Indira-3 | Regenerative | Magadi | Tomato |
| Indira-4 | Conventional | Ramnagara | Beans |
| Indira-5 | Regenerative | Magadi | Beans |
| Indira-6 | Conventional | Ramnagara | Beans |
| Indira-7 | Regenerative | Ramanagara | Ragi |
| Indira-8 | Regenerative | Magadi | Ragi |
| Indira-9 | Regenerative | Hosur | Tomato/Beans |### Visualize the metadata using Qiime2:
```bash
qiime metadata tabulate \
--m-input-file soil_analysis_metadata.tsv \
--o-visualization soil_analysis_metadata.qzv
```
## 3. Prepare Sample Load Sheet
Create a load sheet for Qiime2 named soil_samples.tsv:`EXAMPLE SHEET`
| SampleID | Forward_read_path | Reverse_read_path |
|----------|-------------------------------------|-------------------------------------|
| Indira-1 | trimd_files/Indira_1_R1.fq.gz | trimd_files/Indira_1_R2.fq.gz |
| Indira-2 | trimd_files/Indira_2_R1.fq.gz | trimd_files/Indira_2_R2.fq.gz |
| Indira-3 | trimd_files/Indira_3_R1.fq.gz | trimd_files/Indira_3_R2.fq.gz |
| Indira-4 | trimd_files/Indira_4_R1.fq.gz | trimd_files/Indira_4_R2.fq.gz |
| Indira-5 | trimd_files/Indira_5_R1.fq.gz | trimd_files/Indira_5_R2.fq.gz |
| Indira-6 | trimd_files/Indira_6_R1.fq.gz | trimd_files/Indira_6_R2.fq.gz |
| Indira-7 | trimd_files/Indira_7_R1.fq.gz | trimd_files/Indira_7_R2.fq.gz |
| Indira-8 | trimd_files/Indira_8_R1.fq.gz | trimd_files/Indira_8_R2.fq.gz |
| Indira-9 | trimd_files/Indira_9_R1.fq.gz | trimd_files/Indira_9_R2.fq.gz |### Import the sequence data into Qiime2 to create a SampleData[PairedEndSequencesWithQuality] artifact:
```bash
qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path soil_samples.tsv \
--output-path paired-end-demux.qza \
--input-format PairedEndFastqManifestPhred33V2
```
## 4. Visualize Demultiplexed Data
Visualize the paired-end demultiplexed sample data in Qiime2:
```bash
qiime demux summarize \
--i-data paired-end-demux.qza \
--o-visualization paired-end-demux.qzv
```
## 5. Denoise Data with DADA2
Denoise the samples using DADA2 and generate a visualization file:```bash
qiime dada2 denoise-paired \
--i-demultiplexed-seqs paired-end-demux.qza \
--p-trunc-len-f 285 \
--p-trunc-len-r 250 \
--o-representative-sequences rep-seqs-dada2.qza \
--o-table pet-table.qza \
--o-denoising-stats stats-dada2.qza \
--p-n-threads 8qiime metadata tabulate \
--m-input-file stats-dada2.qza \
--o-visualization stats-dada2.qzvmv rep-seqs-dada2.qza rep-seqs.qza
mv pet-table.qza table.qza
```## 6. Create Feature Table and Feature Data Summaries
Summarize the feature table and feature data:
```bash
qiime feature-table summarize \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file soil_analysis_metadata.tsvqiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
```
## 7. Taxonomy Classification
Classify the sequences using a pre-trained classifier and visualize the taxonomy:```bash
qiime feature-classifier classify-sklearn \
--i-classifier silva-138-99-nb-classifier.qza \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qzaqiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzvqiime taxa barplot \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file soil_analysis_metadata.tsv \
--o-visualization taxa-bar-plots.qzv
```
## 8. Collapse Taxa and Export Data
Collapse the taxonomy at level 7 and export the data:```bash
qiime taxa collapse \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--p-level 7 \
--o-collapsed-table collapsed-l2.qzaqiime tools export \
--input-path collapsed-l2.qza \
--output-path collapsed-l2-dir
```
## 9. Custom taxanomy plotting using CSV file downloaded from "taxa-bar-plots.qzv"
```R
# Install necessary packages
install.packages("dplyr")
install.packages("tibble")# Load libraries
library(dplyr)
library(tibble)# Read the CSV file
tax_lvl6 <- read.csv("Taxa_level-6.csv", header = TRUE, sep = ",", row.names = NULL)# Extract sample IDs
sam_ID <- select(tax_lvl6, c(1))# Remove specified columns
tax_lvl6 <- select(tax_lvl6, -c(1, 1056:1075))# Add total species count in each column
tax_lvl6 <- rbind(tax_lvl6, mapply(sum, tax_lvl6[, c(1:1054)]))# Order and select top 100 rows
list <- order(tax_lvl6[c(15), c(1:1054)], decreasing = TRUE)
top_100 <- tax_lvl6[list]
top_100 <- top_100[-c(15),]# Calculate percentages
tax_lvl6_pct <- t(apply(top_100, 1, function(x) { x / sum(x) * 100 }))[, 1:100]# Create a data frame with top 100 species
top_100 <- data.frame(tax_lvl6_pct[, c(1:100)])# Merge sample ID column
top_100 <- add_column(top_100, sam_ID, .before = 1)
colnames(top_100)[1] <- c("sample.id")# Selecting only ragi cultivated plants
ragi_100 <- top_100[-c(1, 2, 8, 9, 10, 11, 14),]# Read metadata table
meta_dta <- read.table("soil_analysis_metadata.tsv", header = TRUE, sep = "\t")# Remove specified columns
col_to_be_removed <- colnames(ragi_100)[c(4, 22, 34, 45, 49, 54, 63, 64, 81, 86, 88, 93, 98)]
ragi_sel <- ragi_100[, -c(4, 22, 34, 45, 49, 54, 63, 64, 81, 86, 88, 93, 98)]# Merge data frames
ragi_sel_meta <- merge(ragi_sel, meta_dta)# Melt data frame
melt_df_2 <- melt(ragi_sel_meta[, -c(1, 89, 90, 92:107)], id.vars = "Farming.Method")# For 50 species
ragi_50_meta <- ragi_sel_meta[, -c(1, 2, 51:90, 92:107)]
melt_50 <- melt(ragi_50_meta, id.vars = "Farming.Method")# Rename columns
colnames(melt_50) <- c("Farming.Method", "Species", "percentage")# Plot using ggplot
ggplot(melt_50, aes(fill = Farming.Method, y = percentage, x = Species)) +
geom_bar(position = "stack", stat = "identity") +
theme(axis.text.x = element_text(angle = 90))
```
## 9. Qiime Filteration & Alpha Diversity
**Filtering Samples:**
```bash
qiime feature-table filter-samples \
--i-table ../table.qza \
--m-metadata-file ../soil_analysis_metadata.tsv \
--p-where "[Crop_Type] IN ('Ragi', 'Barren')" \
--o-filtered-table ragi-filtered-table.qza
```
**Summarizing Filtered Table:**
```bash
qiime feature-table summarize \
--i-table ragi-filtered-table.qza \
--o-visualization ragi-filtered-table.qzv \
--m-sample-metadata-file ../soil_analysis_metadata.tsv
```**Alpha Rarefaction Analysis:**
```bash
qiime diversity alpha-rarefaction \
--i-table ragi-filtered-table.qza \
--i-phylogeny ../rooted-tree.qza \
--p-max-depth 24174 \
--m-metadata-file ../soil_analysis_metadata.tsv \
--o-visualization ragi-alpha-rarefaction.qzv
```
**Core Metrics Phylogenetic Analysis:**
```bash
qiime diversity core-metrics-phylogenetic \
--i-phylogeny ../rooted-tree.qza \
--i-table ragi-filtered-table.qza \
--p-sampling-depth 14222 \
--m-metadata-file ../soil_analysis_metadata.tsv \
--output-dir core-metrics-results
```
**Alpha Group Significance - Faith's PD:**
```bash
qiime diversity alpha-group-significance \
--i-alpha-diversity core-metrics-results/faith_pd_vector.qza \
--m-metadata-file ../soil_analysis_metadata.tsv \
--o-visualization core-metrics-results/faith-pd-group-significance.qzv
```**Alpha Group Significance - Evenness:**
```bash
qiime diversity alpha-group-significance \
--i-alpha-diversity core-metrics-results/evenness_vector.qza \
--m-metadata-file ../soil_analysis_metadata.tsv \
--o-visualization core-metrics-results/evenness-group-significance.qzv
```## 10. Custom ploting of Taxonomy, Alpha Rarefaction, Alpha Diversity: (Needs a bit clean)
```R
install.packages("reshape2")
library(reshape)
library(dplyr)
library(tibble)
install.packages("tidyverse")
library(tidyverse)
install.packages("hrbrthemes")
library(hrbrthemes)
library(viridis)tax_lvl6 <- read.csv("level-6.csv", header = TRUE, sep = ",", row.names = NULL)
#tax_lvl6 <- tax_lvl6[-c(7),]
sam_ID <- select(tax_lvl6, c(1)) #extract IDstax_lvl6 <- select(tax_lvl6,-c(1,750:759)) #removed col1 & 47 to 56
#rownames(tax_lvl6) <- sam_ID[,1]#tax_lvl6_pct <- t(apply(tax_lvl6[order(rowSums(tax_lvl6), decreasing = T),order(colSums(tax_lvl6), decreasing =T)], 1, function(x) { x / sum(x) * 100}))[,1:45]
tax_lvl6 <- rbind(tax_lvl6, mapply(sum,tax_lvl6[,c(1:748)])) #added the total species count in each col
list <- order(tax_lvl6[c(8),c(1:748)], decreasing = TRUE ) # ordered in increasing - has the index values
top_100 <- tax_lvl6[list] # sorted based on the index values
top_100 <- top_100[-c(8),] #removed the last row
tax_lvl6_pct <- t(apply(top_100, 1, function(x) { x / sum(x) * 100}))[,1:45]
top_100 <- data.frame(tax_lvl6_pct[,c(1:45)])
top_100 <- add_column(top_100, sam_ID, .before = 1) # merged the sample id col
colnames(top_100)[1] <- c("sample.id") #added 1st col name
#Selecting only raji cultivated plants
#ragi_100 <- top_100[-c(1,2,8,9,10,11,14), ] #selected only ragi plotsmeta_dta <- read.table("soil_analysis_metadata.tsv", header = TRUE, sep = "\t") #read the meta table
colnames(top_100) <- gsub(".*p__","",colnames(top_100)) #remove all the strings before ".g__"
#col_to_be_removed <- colnames(ragi_100)[c(4,22,34,45,49,54,63,64,81,86,88,93,98)] #remove the col with unidentified genus
#ragi_sel <- ragi_100[ ,-c(4,22,34,45,49,54,63,64,81,86,88,93,98)]
#ragi_sel_meta <- merge(ragi_sel, meta_dta)
rag_sel_meta <- merge(top_100, meta_dta)
melt_df_2 <- melt(rag_sel_meta[ ,-c(1, 14:49, 51:56 )], id.vars="Farming.Type") #converted to long table
#For 50 species
#ragi_50_meta <- ragi_sel_meta[ ,-c(1,2,51:90,92:107)]
#melt_50 <- melt(ragi_50_meta, id.vars="Farming.Method") #uses reshape2, removed col other than speciescolnames(melt_df_2) <- c("Farming.Type","Phylum","Percentage")
#ggplot
melt_df_2$Farming.Type <- factor(melt_df_2$Farming.Type, # Change ordering manually
levels = c("Barren", "Conv", "Reg (1y)", "Reg (>5y)"))
ggplot(melt_df_2, aes(fill=Farming.Type, y=Percentage, x=Phylum)) + geom_bar(position="fill", stat="identity") +
theme(axis.text.x = element_text(angle = 90)) + scale_y_continuous(labels = scales::percent)
#plot-2ggplot(melt_df_2, aes(fill=Phylum, y=Percentage, x=Farming.Type)) + geom_bar(position="fill", stat="identity") +
scale_fill_brewer(palette="Paired") + theme(axis.text.x = element_text(angle = 90)) + scale_y_continuous(labels = scales::percent)ggplot(melt_df_2, aes(fill=Phylum, y=Percentage, x=Farming.Type)) + geom_bar(position="stack", stat="summary", fun ="mean" ) + scale_fill_brewer(palette="Paired") + theme(axis.text.x = element_text(angle = 90))
+ scale_y_continuous(labels = scales::percent)#plot-3 for the faith-PD
ggplot(a, aes(x=as.factor(Farming.Type), y=faith_pd)) +
geom_boxplot(fill="slateblue", alpha=0.2) +
xlab("Farming.Type")#Alpha_diversity plots
a <- read.csv("Alpha_diversity_metadata.tsv", header = TRUE, sep = "\t", row.names = NULL)# Jitter plot
ggplot(a, aes(x = Farming.Type, y = faith_pd)) + geom_jitter(position = position_jitter(0.2))
#Pie Chart
#type-1pie(sapply(split(a$faith_pd, a$Farming.Type), mean))
#type-2
ggplot(a, aes(x="", y=faith_pd, fill=Farming.Type)) +
geom_bar(stat="summary", fun="mean", width=1) +
coord_polar("y", start=0)
#scattered plot
ggplot(a, aes(x=Farming.Type, y=faith_pd, color=Farming.Type)) +
geom_point(size=6) +
theme_ipsum()#Eveness
evn_data <- read.table("evenness_ragi.tsv", header = T, sep = "\t", row.names = NULL)# Jitter plot
ggplot(evn_data, aes(x = Farming.Type, y = faith_pd)) + geom_jitter(position = position_jitter(0.2))#scattered plot
colnames(evn_data)[3] <"Farming Type"
evn_data$Farming.Type <- factor(evn_data$Farming.Type, # Change ordering manually
levels = c("Barren", "Conv", "Reg (1y)", "Reg (>5y)"))
p2 <- ggplot(evn_data, aes(x=Farming.Type, y=`Bacterial Population Evenness`, color=Farming.Type)) +
geom_point(size=6) + theme(axis.title = element_text(size = 7))ggsave("Rplot_evenness(p=0.14).TIFF", plot=p2, height=8, width=14, units=c("cm"), device='tiff', dpi=300)
ggplot(evn_data, aes(x = Farming.Type, y = faith_pd)) + geom_jitter(position = position_jitter(0.2)) + stat_summary(fun.data="mean_sdl", mult=1, geom="crossbar", width=0.5)
data <- read.table("alpha_rac.tsv", header=T, sep = "\t")
melt_df_alpha <- melt(data[ ,-c(1, 62:101,103:110)], id.vars="Samples") #converted to long table
melt_df_alpha <- melt_df_alpha[order(melt_df_alpha$value),]
# Plot
color <- c("darkolivegreen4", "black", "blue", "darkmagenta", "chocolate1", "darkgoldenrod4", "brown1", "darkviolet", "deeppink")
colnames(data)[2:3] <- c("Number of Reads", "Soil Bacterial Diversity")
p5 <- ggplot(data, aes(x=`Number of Reads`, y=`Soil Bacterial Diversity`, color=Samples)) + geom_line( stat="identity") + geom_point(stat = "identity", size = 1 ) + scale_color_manual(values = color) + theme(axis.text.x = element_text(angle = 90)) + scale_x_comma()ggsave("Alpha_racfac_ragi.tiff", plot=p5, height=8, width=14, units=c("cm"),device='tiff', dpi=600)
#Selected genus for taxanomy plot
colnames(top_100) <- gsub(".*g__","",colnames(top_100)) #remove all the strings before ".g__"
top_rm_100 <- data.frame(top_100)sel_genus <- data.frame(top_rm_100$sample.id, top_rm_100$Allorhizobium.Neorhizobium.Pararhizobium.Rhizobium, top_rm_100$Pseudomonas, top_rm_100$Bacillus, top_rm_100$Nocardioides, top_rm_100$Streptomyces, top_rm_100$Nocardia, top_rm_100$Mycobacterium, top_rm_100$Micromonospora )
colnames(sel_genus) <- c("sample.id", "Rhizobium", "Pseudomonas", "Bacillus", "Nocardioides","Streptomyces", "Nocardia", "Mycobacterium", "Micromonospora")
library(tibble)
sel_gen_lng <- merge(sel_genus, meta_dta, by="sample.id")
sel_gen_lng <- melt(sel_gen_lng[ ,-c(1, 10:12, 14:19 )], id.vars="Farming.Type")
colnames(sel_gen_lng) <- c("Farming.Type","Genus","Counts")
ggplot(sel_gen_lng, aes(fill=Genus, y=Counts, x=Farming.Type)) + geom_bar(position="stack", stat="summary", fun ="mean") + scale_fill_brewer(palette="Paired") + theme(axis.text.x = element_text(angle = 90)) + coord_flip()#Selected taxa
sel_all <- data.frame( top_rm_100$sample.id, top_rm_100$Flavobacterium , top_rm_100$Bacillus , top_rm_100$Streptomyces , top_rm_100$Mesorhizobium , top_rm_100$Achromobacter, top_rm_100$Klebsiella , top_rm_100$Paenibacillus , top_rm_100$Burkholderia.Caballeronia.Paraburkholderia , top_rm_100$Pseudomonas)colnames(sel_all) <- c("sample.id","Flavobacterium","Bacillus","Streptomyces","Mesorhizobium","Achromobacter","Klebsiella","Paenibacillus","Burkholderia","Pseudomonas")
sel_all_lng <- merge(sel_all, meta_dta, by="sample.id")
sel_all_lng <- melt(sel_all_lng[ ,-c(1, 11:13, 15:20 )], id.vars="Farming.Type")
colnames(sel_all_lng) <- c("Farming_Type","Genus","Percentage")
sel_all_lng$Farming_Type <- factor(sel_all_lng$Farming_Type, # Change ordering manually
levels = c("Barren", "Conv", "Reg (1y)", "Reg (>5y)"))p3 <- ggplot(sel_all_lng, aes(fill=Genus, y=Percentage, x=Farming_Type)) + geom_bar(position="fill", stat="summary", fun ="mean" ) + scale_fill_brewer(palette="Paired") + theme(axis.text.x = element_text(angle = 90)) + scale_y_continuous(labels = scales::percent) + coord_flip()
ggsave("sel_taxa_1.TIFF", plot=p3, height=8, width=14, units=c("cm"), device='tiff', dpi=300)###Richness
richness <- read.table("faith_pd.csv", header = T, sep = ",", row.names = NULL)#scattered plot
colnames(a)[10] <- "Soil Bacterial Diversity"
a$Farming.Type <- factor(a$Farming.Type, # Change ordering manually
levels = c("Barren", "Conv", "Reg (1y)", "Reg (>5y)"))
p2 <- ggplot(a, aes(x=Farming.Type, y=`Soil Bacterial Diversity`, color=Farming.Type)) +
geom_point(size=6) + theme(axis.title = element_text(size = 7))ggsave("Rplot_richness(p=0.169).TIFF", plot=p2, height=8, width=14, units=c("cm"), device='tiff', dpi=300)
```