Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tymor22/tm-vec
https://github.com/tymor22/tm-vec
Last synced: 2 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/tymor22/tm-vec
- Owner: tymor22
- License: bsd-3-clause
- Created: 2022-08-17T16:43:33.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2023-11-25T06:50:03.000Z (7 months ago)
- Last Synced: 2023-11-26T15:23:39.638Z (7 months ago)
- Language: Jupyter Notebook
- Size: 2.79 MB
- Stars: 41
- Watchers: 3
- Forks: 8
- Open Issues: 8
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Lists
- awesome-protein-design-software - TM-vec
README
# Paper
TM-Vec: template modeling vectors for fast homology detection and alignment: https://www.biorxiv.org/content/10.1101/2022.07.25.501437v1[Embed sequences with TM-vec](https://colab.research.google.com/github/tymor22/tm-vec/blob/master/google_colabs/Embed_sequences_using_TM_Vec.ipynb)
# Installation
First create a conda environment with python=3.9 installed. If you are using cpu, use
`conda create -n tmvec faiss-cpu python=3.9 -c pytorch`
If the installation fails, you may need to install mkl via `conda install mkl=2021 mkl_fft `
If you are using gpu use
`conda create -n tmvec faiss-gpu python=3.9 -c pytorch`
Once your conda enviroment is installed and activated (i.e. `conda activate tmvec`), then install tm-vec via
`pip install tm-vec`. If you are using a GPU, you may need to reinstall the gpu version of pytorch.
See the [pytorch](https://pytorch.org/) webpage for more details.# Models
It is recommended to first download the `Prot-T5-XL-UniRef50` model weights. This can be done as follows.```
```
mkdir Rostlab && cd "$_"
wget https://zenodo.org/record/4644188/files/prot_t5_xl_uniref50.zip
unzip prot_t5_xl_uniref50.zip
cd ..
```Download the model weights/config of the base TM-vec model trained on SwissModel pairs (trained on protein chains up to 300 residues long, works best on shorter sequences):
```
wget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_swiss_model.ckptwget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_swiss_model_params.json
```Download the model weights/config of the large TM-vec model trained on SwissModel pairs (trained on protein chains up to 1000 residues long):
```
wget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_swiss_model_large.ckptwget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_swiss_model_large_params.json
```Download the model weights/config of the large TM-vec model trained on CATH pairs (trained on CATH S100 domains sampled from ProtTucker training domains):
```
wget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_cath_model_large.ckptwget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_cath_model_large_params.json
```Download the model weights/config of the base TM-vec model trained on CATH pairs (trained on CATH S40):
```
wget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_cath_model.ckptwget https://users.flatironinstitute.org/thamamsy/public_www/tm_vec_cath_model_params.json
```# Databases
We have embedded several sequence databases that users can search against. We have included embeddings for all CATH domains and SWISS-PROT sequences here. See the search tutorials or the scripts folder for how to run searches against these databases. Metadata for these sequences is position indexed. The embeddings and metadata are stored as numpy array (npy format) which can loaded as follows: np.load(file_path, allow_pickle=True).
Download the embeddings and metadata for CATH domains (the model that you should query with is tm_vec_cath_model_large)
```
wget https://users.flatironinstitute.org/thamamsy/public_www/cath_large.npywget https://users.flatironinstitute.org/thamamsy/public_www/cath_large_metadata.npy
```Download the embeddings and metadata for SWISS-PROT chains (the model that you should query with here is tm_vec_swiss_model_large)
```
wget https://users.flatironinstitute.org/thamamsy/public_www/swiss_large.npywget https://users.flatironinstitute.org/thamamsy/public_www/swiss_large_metadata.npy
```# Run TM-Vec + DeepBLAST from the command line
See the DeepBLAST wiki on how to [build TM-vec databases](https://github.com/flatironinstitute/deepblast/wiki/Building-the-TMvec-search-database) and search against [TM-vec databases](https://github.com/flatironinstitute/deepblast/wiki/Searching-proteins)