{"id":19669909,"url":"https://github.com/dohlee/eve-pytorch","last_synced_at":"2025-10-15T04:32:13.440Z","repository":{"id":70339534,"uuid":"602452031","full_name":"dohlee/eve-pytorch","owner":"dohlee","description":"Implementation of evolutionary model of variant effect (EVE), a deep generative model of evolutionary data, in PyTorch.","archived":false,"fork":false,"pushed_at":"2023-03-04T16:13:54.000Z","size":137,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-09-20T15:36:49.146Z","etag":null,"topics":["bioinformatics","biology","computational-biology","deep-learning","evolution","reproduction","reproduction-code","variant-effect-prediction","variational-autoencoder"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dohlee.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-02-16T08:37:53.000Z","updated_at":"2025-05-27T09:36:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"37b2f984-2007-49d4-b899-667ebf404477","html_url":"https://github.com/dohlee/eve-pytorch","commit_stats":{"total_commits":21,"total_committers":1,"mean_commits":21.0,"dds":0.0,"last_synced_commit":"9d0858d24416945d591f879ad06b20999edee0a7"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/dohlee/eve-pytorch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Feve-pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Feve-pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Feve-pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Feve-pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dohlee","download_url":"https://codeload.github.com/dohlee/eve-pytorch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dohlee%2Feve-pytorch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279049858,"owners_count":26093368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-15T02:00:07.814Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bioinformatics","biology","computational-biology","deep-learning","evolution","reproduction","reproduction-code","variant-effect-prediction","variational-autoencoder"],"created_at":"2024-11-11T17:03:03.153Z","updated_at":"2025-10-15T04:32:13.408Z","avatar_url":"https://github.com/dohlee.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# eve-pytorch\n\n![model](img/banner.png)\n\nImplementation of evolutionary model of variant effect (EVE), a deep generative model of evolutionary data, in PyTorch. It's just an re-implementation of the official model for my own learning purpose, which is also implemented in PyTorch. The official implementation can be found [here](https://github.com/OATML-Markslab/EVE).\n\n## Installation\n```bash\n$ pip install eve-pytorch\n```\n\n## Usage\n```python\nimport torch\nfrom eve_pytorch import EVE\n\nSEQ_LEN = 1000\nALPHABET_SIZE = 21\n\nmodel = EVE(seq_len=SEQ_LEN, alphabet_size=ALPHABET_SIZE)\n\n# ... training ...\n\nx = torch.randn(1, 4, 1000)\n\n# If you want to get the reconstructed sequence only,\nx_reconstructed = model(x, return_latent=False)\n\n# or, if you want to get the latent variables\nx_reconstructed, z_mu, z_log_var = model(x, return_latent=True)\n```\n\n## Training\n```bash\n$ python -m eve_pytorch.train \\\n  --msa data/msa.filtered.a2m  \\ # Multiple sequence alignment.\n  --output ckpts/best_checkpoint.pt \\\n  --use-wandb  # Optional, for logging\n```\n\n## Computing evolutionary index\nThe authors defined an **evolutionary index** of a protein variant as the relative fitness of mutated sequence $\\mathbf{s}$ compared with that of a wildtype sequence $\\mathbf{w}$. Since computing the exact log-likelihood is intractable, the authors approximated the log-likelihood ratio of the two sequences as the difference between the ELBO of the two sequences:\n\n$$ELBO(\\mathbf{w}) - ELBO(\\mathbf{s})$$\n\nIn this reproduction, I implemented `EVE.compute_evolutionary_index()` method to compute the evolutionary index of a protein variant. The method takes two sequences as input, and returns the evolutionary index of the variant. Optionally, you can tweak `num_samples` parameter to control the number of samples for the Monte Carlo sampling of latent vectors.\n\n```python\nimport torch\nfrom eve_pytorch import EVE\n\nSEQ_LEN = 1000\nALPHABET_SIZE = 21\n\nmodel = EVE(seq_len=SEQ_LEN, alphabet_size=ALPHABET_SIZE)\nmodel.load_state_dict(torch.load('path/to/best/checkpoint.pt'))\n\nwt_seq = # One-hot encoded wildtype amino acid sequence.\nmut_seq = # One-hot encoded mutated amino acid sequence.\n\nmodel.eval()\nwith torch.no_grad():\n  model.compute_evolutionary_index(wt_seq, mut_seq, num_samples=20_000)\n```\n\n### Example\nThe example below shows how to compute the evolutionary indices of three variants in TP53 gene.\nF109Q and V173Y are pathogenic variants, while R273C is a benign variant.\nNote the difference of the evolutionary indices between the pathogenic variants and the benign variant.\n```python\nfrom Bio import SeqIO\n\ndef get_sequence_length(a2m_fp):\n    \"\"\"Get the sequence length of the first sequence in the a2m file.\n    a2m_fp: Path to a2m file.\n    \"\"\"\n    for record in SeqIO.parse(a2m_fp, \"fasta\"):\n        return len(record.seq)\n    \na2i = {a:i for i, a in enumerate('ACDEFGHIKLMNPQRSTVWY-')}\ndef one_hot_encode_amino_acid(sequence):\n    return torch.eye(len(a2i))[[a2i[a] for a in sequence]].T\n    \nmodel = EVE(seq_len=100).cuda()\n\nmsa = 'data/P53_HUMAN_b01.filtered.a2m'\nALPHABET_SIZE = 21\n\nprint('Loading pretrained model.')\nmodel = EVE(seq_len=get_sequence_length(msa), alphabet_size=ALPHABET_SIZE)\nmodel.load_state_dict(torch.load('ckpts/TP53.best.pt'))\nmodel.cuda()\n\nwt_seq = \"\"\"LSPDDIEQWFTEDPGDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQG\nSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQ\nSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEV\nGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTE\nEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELN\nEALELKDAQAGKEPGGSRAHSSHLKSKKG\"\"\".replace('\\n', '')\n\n# F109Q\nmut_seq_pathogenic1 = \"\"\"LSPDDIEQWFTEDPGDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQG\nSYGQRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQ\nSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEV\nGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTE\nEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELN\nEALELKDAQAGKEPGGSRAHSSHLKSKKG\"\"\".replace('\\n', '')\n    \n# V173Y\nmut_seq_pathogenic2 = \"\"\"LSPDDIEQWFTEDPGDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQG\nSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQ\nSQHMTEVYRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEV\nGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTE\nEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELN\nEALELKDAQAGKEPGGSRAHSSHLKSKKG\"\"\".replace('\\n', '')\n\n# M66V\nmut_seq_benign = \"\"\"LSPDDIEQWFTEDPGDEAPRVPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQG\nSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQ\nSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEV\nGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTE\nEENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELN\nEALELKDAQAGKEPGGSRAHSSHLKSKKG\"\"\".replace('\\n', '')\n\nwt_seq, mut_seq_benign, mut_seq_pathogenic1, mut_seq_pathogenic2 = map(\n    lambda x: one_hot_encode_amino_acid(x).cuda(),\n    [wt_seq, mut_seq_benign, mut_seq_pathogenic1, mut_seq_pathogenic2]\n)\n\nmodel.eval()\nwith torch.no_grad():\n    print(model.compute_evolutionary_index(wt_seq, mut_seq_pathogenic1))  # 9.0448 (may vary slightly)\n    print(model.compute_evolutionary_index(wt_seq, mut_seq_pathogenic2))  # 11.9176 (may vary slightly)\n    print(model.compute_evolutionary_index(wt_seq, mut_seq_benign))       # -0.4336 (may vary slightly)\n\n```\n\n## Citations\n\n```bibtex\n@article{frazer2021disease,\n  title={Disease variant prediction with deep generative models of evolutionary data},\n  author={Frazer, Jonathan and Notin, Pascal and Dias, Mafalda and Gomez, Aidan and Min, Joseph K and Brock, Kelly and Gal, Yarin and Marks, Debora S},\n  journal={Nature},\n  volume={599},\n  number={7883},\n  pages={91--95},\n  year={2021},\n  publisher={Nature Publishing Group UK London}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohlee%2Feve-pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdohlee%2Feve-pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdohlee%2Feve-pytorch/lists"}