Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/sbl-sdsc/mmtf-genomics
Methods for mapping genomic data onto 3D protein structure.
https://github.com/sbl-sdsc/mmtf-genomics
binder cyverse dbsnp genomics mmtf-pyspark mutations protein-data-bank protein-drug-interactions protein-ligand-interactions protein-protein-interaction protein-structure pyspark
Last synced: about 1 month ago
JSON representation
Methods for mapping genomic data onto 3D protein structure.
- Host: GitHub
- URL: https://github.com/sbl-sdsc/mmtf-genomics
- Owner: sbl-sdsc
- License: apache-2.0
- Created: 2018-08-14T06:48:16.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-05-17T21:29:54.000Z (over 2 years ago)
- Last Synced: 2023-02-28T19:26:31.794Z (almost 2 years ago)
- Topics: binder, cyverse, dbsnp, genomics, mmtf-pyspark, mutations, protein-data-bank, protein-drug-interactions, protein-ligand-interactions, protein-protein-interaction, protein-structure, pyspark
- Language: Jupyter Notebook
- Homepage:
- Size: 6.17 MB
- Stars: 27
- Watchers: 4
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# mmtf-genomics
[![Twitter](https://img.shields.io/badge/Tweet--lightgrey.svg?logo=twitter&style=social)](https://twitter.com/peterwrose/status/1097614306542112769)An experimental project for mapping genomic data onto 3D protein structures in Jupyter Notebooks.
## Run mmtf-genomics in your Web Browser
The Jupyter Notebooks in this repository can be run in your web browser using two freely available servers: Binder and CyVerse/VICE. Click on the buttons below to launch Jupyter Lab. **It may take several minutes for Jupyter Lab to launch.**#### Binder
[Binder](https://mybinder.org/) is a platform for reproducible research developed by [Project Jupyter](https://jupyter.org/). Learn more about [Binder](https://blog.jupyter.org/mybinder-org-serves-two-million-launches-7543ae498a2a). There are specific links for each notebook below, however, once Jupyter Lab is launched, navigate to any of the other notebooks using the Jupyter Lab file panel.Binder provides an easy to use demo environment. Due to limited resoures, Binder is not suitable for compute or memory intensive production analyses and may occasionally fail to run the notebooks in this repository.
**NOTE:** Authentication is now required to launch binder! Sign into GitHub from your browser, then click on the `launch binder` badge below to launch Jupyter Lab.
[![Binder](https://aws-uswest2-binder.pangeo.io/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/sbl-sdsc/mmtf-genomics/master?urlpath=lab)
#### CyVerse/VICE
The new [VICE (Visual Interactive Computing Environment)](https://cyverse-visual-interactive-computing-environment.readthedocs-hosted.com/en/latest/index.html) in the [CyVerse Discovery Environment](https://www.cyverse.org/discovery-environment) enables users to run Jupyter Lab in a production environment. To use VICE, sign up for a free [CyVerse account](https://www.cyverse.org/create-account).The VICE environment supports large-scale analyses. Users can upload and download files, and save and share results of their analyses in their user accounts (up to 100GB of data).
[![Vice](docs/vice_badge.png)](https://de.cyverse.org/de/?type=apps&app-id=00d83c10-9b9a-11e9-8421-008cfa5ae621&system-id=de)
[Follow these step to run Jupyter Lab on VICE](docs/vice_instructions.md)
---
# Examples using mmtf-genomics
## NEW: Map SARS-CoV-2 Missense Mutations to 3D Structures
The notebooks in [sars-cov-2 folder](sars-cov-2) map missense mutations aggregated by the [COVID-19-Net Knowledge Graph](https://github.com/covid-19-net/covid-19-community) to available 3D protein structures in the Protein Data Bank. Mutations are mapped onto protein-protein interaction sites, ligand binding sites, drug binding sites.| | |
|:-- |:-- |
| | Map SARS-CoV2 mutations to 3D structures,
Example: Two Regeneron Fab fragments bound to Spike glycoprotein RBD (REGN10933-RBD-REGN10987 complex (1)) with observed mutations highlighted
[![Binder](https://mybinder.org/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/sbl-sdsc/mmtf-genomics/master?urlpath=lab/tree/sars-cov-2%2F1-MapTo3DStructures.ipynb)|
| | Map SARS-CoV2 mutations to protein-protein interactions,
Example: Two Regeneron Fab fragments bound to Spike glycoprotein RBD with observed mutations at the binding interface
[![Binder](https://mybinder.org/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/sbl-sdsc/mmtf-genomics/master?urlpath=lab/tree/sars-cov-2%2F2-MapToPolymerInteractions.ipynb)|Reference: (1) Hansen J, Baum A, Pascal KE, et al. Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail. Science. 2020;369(6506):1010-1014. [doi:10.1126/science.abd0827](https://doi.org/10.1126/science.abd0827), PDB id: 6XD6.
## Map Mutations from dbSNP to 3D Structures
The notebooks below visualize the positions of missense mutations mapped from [dbSNP](https://www.ncbi.nlm.nih.gov/projects/SNP/) to 3D protein structures in the Protein Data Bank. Variations can be filtered by the clinical significance level from [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/), UniProt Ids, or a list of specific variants specified by the rs identifier or genomic location.| | |
|:-- |:-- |
| | Map missense mutations from dbSNP to 3D structures
[![Binder](https://mybinder.org/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/sbl-sdsc/mmtf-genomics/master?urlpath=lab/tree/dbsnp%2FdbSNPTo3DChain.ipynb) |
| | Map missense mutations from dbSNP to 3D structures that contain the associated amino acid change
[![Binder](https://mybinder.org/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/sbl-sdsc/mmtf-genomics/master?urlpath=lab/tree/dbsnp%2FMutationsInPdb.ipynb) |## Map Mutations with high Allele Frequences to 3D Structures
This notebook maps a dataset of 63,197 missense mutations with allele frequencies >=1% and <25% extracted from the [ExAC](http://exac.broadinstitute.org/) database to 3D structures in the Protein Data Bank. The dataset is described in:Niroula A, Vihinen M (2019) How good are pathogenicity predictors in detecting benign variants? PLoS Comput Biol 15(2): e1006481. doi: [10.1371/journal.pcbi.1006481](https://doi.org/10.1371/journal.pcbi.1006481)
| | |
|:-- |:-- |
| | Map mutations with high allele frequences to 3D structures
[![Binder](https://mybinder.org/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/sbl-sdsc/mmtf-genomics/master?urlpath=lab/tree/benign%2F1-BenignMutationsTo3DStructure.ipynb) |## Custom 3D Structure Mapping Pipeline
This protype pipeline demonstrates how to map genetic locations of SNVs to 3D structures. To run this demo, click on the "launch binder" link below. At the bottom of each notebook is a link to the next step. In total, there are 5 steps to this pipeline, shown below.By replacing the demo input file with your own data and adjusting the notebook that reads the data, you can run our own custom analysis.
| | |
|:-- |:-- |
| | Read and standardize genetic location data
[![Binder](https://mybinder.org/badge_logo.svg)](https://aws-uswest2-binder.pangeo.io/v2/gh/sbl-sdsc/mmtf-genomics/master?urlpath=lab/tree/pipeline1%2F1-ReadMutations.ipynb)|
| | Map genetic locations to 3D protein structures |
| | Map genetic locations to protein-protein and protein-nucleic acid interfaces |
| | Map genetic locations to ligand binding sites |
| | Map genetic locations to drug binding sites |## Feature Requests and Collaborations
Please send [feedback or feature requests](https://github.com/sbl-sdsc/mmtf-genomics/issues/new).Interested in a collaboration? Please send us use cases.
## Local Installation
[Mac and Linux](/docs/MacLinuxInstallation.md)
[Windows](/docs/WindowsInstallation.md)
## How to Cite this Work
Bhattacharya R, Rose PW, Burley SK, Prlić A (2017) Impact of genetic variation on three dimensional structure and function of proteins. PLoS ONE 12(3): e0171355. doi: [10.1371/journal.pone.0171355](https://doi.org/10.1371/journal.pone.0171355)
Bradley AR, Rose AS, Pavelka A, Valasatava Y, Duarte JM, Prlić A, Rose PW (2017) MMTF - an efficient file format for the transmission, visualization, and analysis of macromolecular structures. PLOS Computational Biology 13(6): e1005575. doi: [10.1371/journal.pcbi.1005575](https://doi.org/10.1371/journal.pcbi.1005575)
Glusman G, et al. (2017) Mapping genetic variations to three-dimensional protein structures to enhance variant interpretation: a proposed framework. Genome Medicine 9 (1), 113. doi: [10.1186/s13073-017-0509-y](https://doi.org/10.1186/s13073-017-0509-y)
Rose AS, Bradley AR, Valasatava Y, Duarte JM, Prlić A, Rose PW (2018) NGL viewer: web-based molecular graphics for large complexes, Bioinformatics, bty419. doi: [10.1093/bioinformatics/bty419](https://doi.org/10.1093/bioinformatics/bty419)
Valasatava Y, Bradley AR, Rose AS, Duarte JM, Prlić A, Rose PW (2017) Towards an efficient compression of 3D coordinates of macromolecular structures. PLOS ONE 12(3): e0174846. doi: [10.1371/journal.pone.01748464](https://doi.org/10.1371/journal.pone.0174846)
#### Binder
Project Jupyter, et al. (2018) Binder 2.0 - Reproducible, Interactive, Sharable Environments for Science at Scale. Proceedings of the 17th Python in Science Conference. 2018. doi: [10.25080/Majora-4af1f417-011](https://doi.org/10.25080/Majora-4af1f417-011)#### CyVerse
Merchant N, Lyons E, Goff S, Vaughn M, Ware D, Micklos D, et al. (2016) The iPlant Collaborative: Cyberinfrastructure for Enabling Data to Discovery for the Life Sciences. PLoS Biol 14(1): e1002342. doi: [10.1371/journal.pbio.1002342](https://doi.org/10.1371/journal.pbio.1002342)Goff, Stephen A., et al. (2011) The iPlant Collaborative: Cyberinfrastructure for Plant Biology. Frontiers in Plant Science 2. doi: [10.3389/fpls.2011.00034](https://doi.org/10.3389/fpls.2011.00034)
#### dbSNP Data
Sayers EW, et al. (2019) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res., 47, D23–D28. doi: [10.1093/nar/gky1069](https://doi.org/10.1093/nar/gky1069)#### G2S Web Services
Wang J, Sheridan R, Onur Sumer S, Schultz N, Xu D, Gao JJ (2018) G2S: A web-service for annotating genomic variants on 3D protein structures, Bioinformatics, 34(11), 1949-1950. doi: [10.1093/bioinformatics/bty047](https://doi.org/10.1093/bioinformatics/bty047)#### Py3Dmol
Rego N, Koes, D (2015) 3Dmol.js: molecular visualization with WebGL, Bioinformatics 31, 1322–1324. doi: [10.1093/bioinformatics/btu829](https://doi.org/10.1093/bioinformatics/btu829)## Funding
The MMTF project (Compressive Structural BioInformatics: High Efficiency 3D Structure Compression) is supported by the National Cancer Institute of the National Institutes of Health under Award Number U01CA198942. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.The CyVerse project is supported by the National Science Foundation under Award Numbers DBI-0735191, DBI-1265383, and DBI-1743442. URL: www.cyverse.org