Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dev-ev/prot-rna-umap-clustering

Joint UMAP embedding and clustering of proteomic and transcriptomic data
https://github.com/dev-ev/prot-rna-umap-clustering

clustering jupyterlab mass-spectrometry proteomics proteomics-data-integration tmt-data-analysis transcriptomics umap

Last synced: about 1 month ago
JSON representation

Joint UMAP embedding and clustering of proteomic and transcriptomic data

Host: GitHub
URL: https://github.com/dev-ev/prot-rna-umap-clustering
Owner: dev-ev
License: mit
Created: 2021-02-07T10:35:30.000Z (almost 4 years ago)
Default Branch: main
Last Pushed: 2022-03-15T19:14:33.000Z (almost 3 years ago)
Last Synced: 2023-10-20T06:36:15.560Z (about 1 year ago)
Topics: clustering, jupyterlab, mass-spectrometry, proteomics, proteomics-data-integration, tmt-data-analysis, transcriptomics, umap
Language: HTML
Homepage:
Size: 4.42 MB
Stars: 3
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# prot-rna-umap-clustering
Joint UMAP embedding and clustering of proteomic and transcriptomic data.

The notebook was created in Jupyter Lab on Windows running Python 3.8. The workflow depends on third-party libraries that can be installed via pip:

pip install scipy numpy pandas matplotlib seaborn scikit-learn umap-learn.

The data is available in conjunction with the [article by Hultqvist *et al*](https://www.nature.com/articles/s41559-018-0568-5). The mass spectrometry-based proteomic files are [deposited at PRIDE archive](https://www.ebi.ac.uk/pride/archive/projects/PXD005236), and the relative protein abundance table from the proteomic analysis can be found in this GitHub repository. The RPKM table from the transcriptomic experiment can be found at the [Gene Expression Omnibus (GEO) project page ](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92601).

The project consisted of 10 samples of *Escherichia coli* cultures that belonged to 5 different conditions. Mass spectrometry-based proteomic and transcriptomic data has been acquired for each of the samples. The aim of this data processing workflow is to cluster the genes in an unsupervised fashion based on their profiles of change across both data sets.