https://github.com/cthoyt/biogrid-embeddings
Learned embeddings for proteins using physical interactions in BioGRID
https://github.com/cthoyt/biogrid-embeddings
Last synced: about 2 months ago
JSON representation
Learned embeddings for proteins using physical interactions in BioGRID
- Host: GitHub
- URL: https://github.com/cthoyt/biogrid-embeddings
- Owner: cthoyt
- License: mit
- Created: 2021-08-26T21:31:14.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2021-08-27T15:00:33.000Z (over 3 years ago)
- Last Synced: 2025-01-22T16:24:59.378Z (4 months ago)
- Language: Python
- Homepage:
- Size: 15.5 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# biogrid-embeddings
This repository generates node2vec embeddings for physical interactions between human proteins and
complexes in [BioGRID](https://thebiogrid.org/).## 🚀 Usage
Installation of the requirements and running of the build script are handled with `tox`. The current
version of BioGRID is looked up with [`bioversions`](https://github.com/cthoyt/bioversions) so the
provenance of the data can be properly traced. Run with:```shell
$ pip install tox
$ tox
```The embedding dataframe can be loaded from GitHub with:
```python
import pandas as pdurl = "https://github.com/cthoyt/biogrid-embeddings/raw/main/output/4.4.200/embeddings.tsv"
df = pd.read_csv(url, sep="\t", skiprows=1, index_col=0, header=None)
```The index uses [UniProt](https://bioregistry.io/uniprot) protein identifiers. It skips a line since
this TSV file uses the word2vec format, where the first line says the length and width of the file.Here's a 2D PCA scatterplot of the embeddings:

## ⚖️ License
Code in this repository is licensed under the MIT License.
## 🙏 Acknowledgements
BioGRID can be cited with:
```bibtex
@article{Oughtred2021,
author = {Oughtred, Rose and Rust, Jennifer and Chang, Christie and Breitkreutz, Bobby-Joe and Stark, Chris and Willems, Andrew and Boucher, Lorrie and Leung, Genie and Kolas, Nadine and Zhang, Frederick and Dolma, Sonam and Coulombe-Huntington, Jasmin and Chatr-Aryamontri, Andrew and Dolinski, Kara and Tyers, Mike},
doi = {10.1002/pro.3978},
journal = {Protein science : a publication of the Protein Society},
keywords = {COVID-19,CRISPR screen,biological network,chemical interaction,drug target,genetic interaction,phenotype,post-translational modification,protein interaction,ubiquitin-proteasome system},
number = {1},
pages = {187--200},
pmid = {33070389},
title = {{The BioGRID database: A comprehensive biomedical resource of curated protein, genetic, and chemical interactions.}},
volume = {30},
year = {2021}
}
```