Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dev-ev/prot-rna-umap-clustering
Joint UMAP embedding and clustering of proteomic and transcriptomic data
https://github.com/dev-ev/prot-rna-umap-clustering
clustering jupyterlab mass-spectrometry proteomics proteomics-data-integration tmt-data-analysis transcriptomics umap
Last synced: about 4 hours ago
JSON representation
Joint UMAP embedding and clustering of proteomic and transcriptomic data
- Host: GitHub
- URL: https://github.com/dev-ev/prot-rna-umap-clustering
- Owner: dev-ev
- License: mit
- Created: 2021-02-07T10:35:30.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2022-03-15T19:14:33.000Z (over 2 years ago)
- Last Synced: 2023-10-20T06:36:15.560Z (about 1 year ago)
- Topics: clustering, jupyterlab, mass-spectrometry, proteomics, proteomics-data-integration, tmt-data-analysis, transcriptomics, umap
- Language: HTML
- Homepage:
- Size: 4.42 MB
- Stars: 3
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# prot-rna-umap-clustering
Joint UMAP embedding and clustering of proteomic and transcriptomic data.The notebook was created in Jupyter Lab on Windows running Python 3.8. The workflow depends on third-party libraries that can be installed via pip:
pip install scipy numpy pandas matplotlib seaborn scikit-learn umap-learn.
The data is available in conjunction with the [article by Hultqvist *et al*](https://www.nature.com/articles/s41559-018-0568-5). The mass spectrometry-based proteomic files are [deposited at PRIDE archive](https://www.ebi.ac.uk/pride/archive/projects/PXD005236), and the relative protein abundance table from the proteomic analysis can be found in this GitHub repository. The RPKM table from the transcriptomic experiment can be found at the [Gene Expression Omnibus (GEO) project page ](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92601).The project consisted of 10 samples of *Escherichia coli* cultures that belonged to 5 different conditions. Mass spectrometry-based proteomic and transcriptomic data has been acquired for each of the samples. The aim of this data processing workflow is to cluster the genes in an unsupervised fashion based on their profiles of change across both data sets.