https://github.com/shuyib/mouse_gut_otu
Vectorization and Unsupervised Learning of Mouse Operation Taxonomic Units to determine which species of bacteria form distinct groups in a dataset.
https://github.com/shuyib/mouse_gut_otu
16s-rrna anaconda analysis data-visualization dataset gut-microbiome matplotlib-figures mothur numpy-arrays pandas-dataframe pca-analysis python3 scikitlearn-machine-learning sops t-sne unsupervised-learning
Last synced: 2 months ago
JSON representation
Vectorization and Unsupervised Learning of Mouse Operation Taxonomic Units to determine which species of bacteria form distinct groups in a dataset.
- Host: GitHub
- URL: https://github.com/shuyib/mouse_gut_otu
- Owner: Shuyib
- Created: 2018-04-08T10:19:52.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2018-04-22T04:53:05.000Z (about 7 years ago)
- Last Synced: 2025-01-27T09:41:24.366Z (4 months ago)
- Topics: 16s-rrna, anaconda, analysis, data-visualization, dataset, gut-microbiome, matplotlib-figures, mothur, numpy-arrays, pandas-dataframe, pca-analysis, python3, scikitlearn-machine-learning, sops, t-sne, unsupervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 91.8 KB
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
This analysis is an additional step of the [ebioKit tutorials](http://77.235.253.122/tutorials/courses/16s-metabarcoding-analysis/) to visualize the results of 16S metabarcoding using unsupervised learning techniques such as t-distributed stochastic neighbor embedding(t-SNE) and Principal component analysis(PCA) -- You can even use a barplot to find the answer to this question or a frequency table.
Read more about MiSeq SOP the Schloss Lab uses to process their 16S rRNA gene sequences that are generated using Illumina's MiSeq platform using paired end reads [here.](https://www.mothur.org/wiki/MiSeq_SOP)
---
Setting up your environment
* Download Anaconda for your operating system for Python 3 [anaconda](https://www.anaconda.com/download/)
* Create a conda environment like mine:`conda env create -f environment.yml`
This creates an environment called py35. Activate it with this command in your terminal
`source activate py35`
* In your terminal, in the directory where you cloned this repository. Run this command
`jupyter notebook otu_data_viz.ipynb`
---A codebook is provided for the .csv file: I encourage you to go through the exercise [here.](http://77.235.253.122/tutorials/courses/16s-metabarcoding-analysis/) to generate the **0.16.cons.taxonomy.csv dataset**. Which was created by processing .fastq files obtained from *Mus musculus* with [Mothur](https://www.mothur.org/). Find the notebook [here.](https://nbviewer.jupyter.org/github/Shuyib/mouse_gut_OTU/blob/master/otu_data_viz.ipynb)