https://github.com/jcaperella29/processing-rna_seq-data-with-r-
An R script that reads in RNA_Seq data , preprocesses it , then performs differential expression analysis and feature selection.
https://github.com/jcaperella29/processing-rna_seq-data-with-r-
Last synced: 3 months ago
JSON representation
An R script that reads in RNA_Seq data , preprocesses it , then performs differential expression analysis and feature selection.
- Host: GitHub
- URL: https://github.com/jcaperella29/processing-rna_seq-data-with-r-
- Owner: jcaperella29
- Created: 2024-03-26T00:07:56.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-18T20:07:24.000Z (about 1 year ago)
- Last Synced: 2024-04-18T21:25:59.079Z (about 1 year ago)
- Language: R
- Size: 10.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Processing-RNA_Seq-data-with-R-
An R script that reads in RNA_Seq data , preprocesses it , then performs differential expression analysis and feature selection.
#First needed libraries are imported.
#Sample data from the "airways" package is read in.
#the counts and phenotype data are isolated.
#data is then filtered based on Group size and undergo VST transformation.
#then the distance between samples is measured and a visualized via a heatmap.
#Next PCA analysis is performed .
#Then diffenrential expression analysis is performed.
#results of differential expression are then sorted by adjusted p-value and p-values less 5e-8 are kept.
# After a bit of rearrangize the genes that produced the low p-values are isolated from the counts matrix
# A new dataframe is prepared with columns of the counts and the phenotype.
# Using a Random Forest model , hits are narrowed down via feature selection
# Finally the top 20 genes are output as a text file.
# Note when using real data, enter your counts matrix at line 62 and other information(like phenotypes) at line 64