{"id":19669899,"url":"https://github.com/dohlee/abyssal-pytorch","last_synced_at":"2026-03-14T05:31:16.676Z","repository":{"id":65899007,"uuid":"600440133","full_name":"dohlee/abyssal-pytorch","owner":"dohlee","description":"Implementation of Abyssal, a deep neural network trained with a new \"mega\" dataset to predict the impact of an amino acid variant on protein stability.","archived":false,"fork":false,"pushed_at":"2023-03-04T16:11:04.000Z","size":158,"stargazers_count":6,"open_issues_count":1,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-21T12:56:13.425Z","etag":null,"topics":["bioinformatics","biology","computational-biology","deep-learning","protein","protein-language-model","protein-sequences","protein-stability","reproduction","reproduction-code"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dohlee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-11T13:59:31.000Z","updated_at":"2024-01-28T15:25:27.000Z","dependencies_parsed_at":null,"dependency_job_id":"604bb905-eb61-4776-af5f-b4a2816de401","html_url":"https://github.com/dohlee/abyssal-pytorch","commit_stats":{"total_commits":37,"total_committers":1,"mean_commits":37.0,"dds":0.0,"last_synced_commit":"09b4db889a0fd63b489354f87bccd0874e7bf0cd"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fabyssal-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fabyssal-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fabyssal-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Fabyssal-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dohlee","download_url":"https://codeload.github.com/dohlee/abyssal-pytorch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251410251,"owners_count":21584986,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","biology","computational-biology","deep-learning","protein","protein-language-model","protein-sequences","protein-stability","reproduction","reproduction-code"],"created_at":"2024-11-11T17:02:51.650Z","updated_at":"2026-03-14T05:31:16.645Z","avatar_url":"https://github.com/dohlee.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# abyssal-pytorch\n\n![model](img/banner.png)\n\nImplementation of [Abyssal](https://www.biorxiv.org/content/10.1101/2022.12.31.522396v1.full), a deep neural network trained with a new \"mega\" dataset to predict the impact of an amino acid variant on protein stability.\n\n## Installation\n\n```shell\n$ pip install abyssal-pytorch\n```\n\n## Usage\n```python\nimport torch\nfrom abyssal_pytorch import Abyssal\n\nmodel = Abyssal()\n\nwt_emb = torch.randn(1, 1280)    # ESM2 embedding for the wildtype amino acid.\nmut_emb = torch.randn(1, 1280)   # ESM2 embedding for the mutated amino acid.\n\nout = model(wt_emb, mut_emb)  # (1, 1) predicted ddG.\n```\n\n### Training from scratch\nPlease refer to `note/data-preprocessing.ipynb` notebook for data preprocessing. Running this notebook will generate `mega.train.csv`, `mega.val.csv` and `mega.test.csv` containing metadata for model training and evaluation.\n\nFor faster training, extract ESM2 embeddings and save them to the disk. It takes ~5 hours.\n```shell\n$ python prefetch_embeddings.py -i data/mega.train.csv -o data/embeddings\n$ python prefetch_embeddings.py -i data/mega.val.csv -o data/embeddings\n$ python prefetch_embeddings.py -i data/mega.test.csv -o data/embeddings\n```\n\nThis will save embedding vectors in `.pt` format in `data/embeddings` directory.\n\nYou can now train model.\n\n```shell\n$ python -m abyssal_pytorch.train \\\n    --train data/mega.train.csv \\\n    --val data/mega.val.csv \\\n    --test data/mega.test.csv \\\n    --emb-dir data/embeddings\n```\n\n## Reproduction status\n\nUnfortunately, I could not exactly reproduce the results based on the model and training specification in the current version of Abyssal preprint.\nBelow is the results from my best effort so far. Any ideas to reproduce the model performance would be appreciated!\n\n|Metric|Target|Reproduced|\n|------|:----:|:--------:|\nPearson's r|0.85+-0.00|0.7661|\nSpearman's r|0.81+-0.01|0.7846|\nMSE, kcal/mol|0.89|0.6003|\nAccuracy (?)|0.79|?|\nPCC(f-r)|-0.98|-0.9611|\ndelta|-0.01|-0.0330|\n\n## Citations\n\n```bibtex\n@article {Pak2022.12.31.522396,\n\tauthor = {Pak, Marina A and Dovidchenko, Nikita V and Sharma, Satyarth Mishra and Ivankov, Dmitry N},\n\ttitle = {New mega dataset combined with deep neural network makes a progress in predicting impact of mutation on protein stability},\n\telocation-id = {2022.12.31.522396},\n\tyear = {2023},\n\tdoi = {10.1101/2022.12.31.522396},\n\tpublisher = {Cold Spring Harbor Laboratory},\n\tURL = {https://www.biorxiv.org/content/early/2023/01/02/2022.12.31.522396},\n\teprint = {https://www.biorxiv.org/content/early/2023/01/02/2022.12.31.522396.full.pdf},\n\tjournal = {bioRxiv}\n}\n\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohlee%2Fabyssal-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdohlee%2Fabyssal-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohlee%2Fabyssal-pytorch/lists"}