{"id":15118810,"url":"https://github.com/Merck/Sapiens","last_synced_at":"2025-09-28T01:31:09.544Z","repository":{"id":42655963,"uuid":"454865559","full_name":"Merck/Sapiens","owner":"Merck","description":"Sapiens is a human antibody language model based on BERT.","archived":false,"fork":false,"pushed_at":"2023-04-19T13:14:46.000Z","size":9147,"stargazers_count":50,"open_issues_count":1,"forks_count":17,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-01-09T02:11:30.802Z","etag":null,"topics":["antibody","bert","embeddings","language-model","sapiens"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Merck.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-02T17:11:37.000Z","updated_at":"2024-11-16T16:27:52.000Z","dependencies_parsed_at":"2024-09-26T02:01:12.267Z","dependency_job_id":"ce426dae-1580-44d8-b967-1da0b43555b6","html_url":"https://github.com/Merck/Sapiens","commit_stats":{"total_commits":28,"total_committers":3,"mean_commits":9.333333333333334,"dds":0.0714285714285714,"last_synced_commit":"62e0c5607b3dd07f29bcdf42492e650c5e74d326"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FSapiens","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FSapiens/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FSapiens/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Merck%2FSapiens/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Merck","download_url":"https://codeload.github.com/Merck/Sapiens/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":234300377,"owners_count":18810608,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["antibody","bert","embeddings","language-model","sapiens"],"created_at":"2024-09-26T01:53:38.604Z","updated_at":"2025-09-28T01:31:09.523Z","avatar_url":"https://github.com/Merck.png","language":"Jupyter Notebook","readme":"# Sapiens: Human antibody language model\n\n```\n    ____              _                \n   / ___|  __ _ _ __ (_) ___ _ __  ___ \n   \\___ \\ / _` | '_ \\| |/ _ \\ '_ \\/ __|\n    ___| | |_| | |_| | |  __/ | | \\__ \\\n   |____/ \\__,_|  __/|_|\\___|_| |_|___/\n               |_|                    \n```\n\n\u003cp\u003e\n\u003cimg src=\"https://github.com/Merck/Sapiens/actions/workflows/python-package-conda.yml/badge.svg\"\n    alt=\"Build \u0026 Test\"\u003e\u003c/a\u003e\n\u003ca href=\"https://pypi.org/project/sapiens/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/dm/sapiens\"\n        alt=\"Pip Install\"\u003e\u003c/a\u003e\n\u003ca href=\"https://github.com/Merck/Sapiens/releases\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/sapiens\"\n        alt=\"Latest release\"\u003e\u003c/a\u003e\n\u003ca href=\"https://huggingface.co/spaces/prihodad/biophi-sapiens1\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/🤗%20Spaces-prihodad/biophi--sapiens1-blue\"\n        alt=\"Hugging Face Spaces\"\u003e\u003c/a\u003e\n\n\u003c/p\u003e\n\nSapiens is a human antibody language model based on BERT.\n\nLearn more in the Sapiens, OASis and BioPhi in our publication:\n\n\u003e David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil \u0026 Danny A. Bitton (2022) \n\u003e BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning, mAbs, 14:1, DOI: https://doi.org/10.1080/19420862.2021.2020203\n\n\nFor more information about BioPhi, see the [BioPhi repository](https://github.com/Merck/BioPhi)\n\n## Features\n\n- Infilling missing residues in human antibody sequences\n- Suggesting mutations (in frameworks as well as CDRs)\n- Creating vector representations (embeddings) of residues or sequences\n\n![Sapiens Antibody t-SNE Example](notebooks/Embedding_t-SNE.png)\n\n## Usage\n\nTry out Sapiens in the [HuggingFace Space](https://huggingface.co/spaces/prihodad/biophi-sapiens1) or see the [Jupyter Notebooks](https://github.com/Merck/Sapiens?tab=readme-ov-file#notebooks).\n\nInstall Sapiens using pip:\n\n```bash\n# Recommended: Create dedicated conda environment\nconda create -n sapiens python=3.10\nconda activate sapiens\n\n# Install Sapiens\npip install sapiens\n```\n\n### Antibody sequence infilling\n\nPositions marked with * or X will be infilled with the most likely human residues, given the rest of the sequence\n\n```python\nimport sapiens\n\n# Note that you can use masks (* or X) but you can also use \"single-pass\" prediction without any mask tokens\nbest = sapiens.predict_masked(\n    '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',\n    'H'\n)\nprint(best)\n# QVQLVQSGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS\n```\n\n### Suggesting mutations\n\nReturn residue scores for a given sequence:\n\n```python\nimport sapiens\n\n# Note that you can use masks (* or X) but you can also use \"single-pass\" prediction without any mask tokens\nscores = sapiens.predict_scores(\n    '**QLV*SGVEVKKPGASVKVSCKASGYTFTNYYMYWVRQAPGQGLEWMGGINPSNGGTNFNEKFKNRVTLTTDSSTTTAYMELKSLQFDDTAVYYCARRDYRFDMGFDYWGQGTTVTVSS',\n    'H'\n)\nscores.head()\n#           A         C         D         E  ...\n# 0  0.003272  0.004147  0.004011  0.004590  ... \u003c- based on masked input\n# 1  0.012038  0.003854  0.006803  0.008174  ... \u003c- based on masked input\n# 2  0.003384  0.003895  0.003726  0.004068  ... \u003c- based on Q input\n# 3  0.004612  0.005325  0.004443  0.004641  ... \u003c- based on L input\n# 4  0.005519  0.003664  0.003555  0.005269  ... \u003c- based on V input\n#\n# Scores are given both for residues that are masked and that are present. \n# When inputting a non-human antibody sequence, the output scores can be used for humanization.\n```\n\n### Antibody sequence embedding\n\nGet a vector representation of each position in a sequence\n\n```python\nimport sapiens\n\nresidue_embed = sapiens.predict_residue_embedding(\n    'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', \n    'H', \n    layer=None\n)\nresidue_embed.shape\n# (layer, position in sequence, features)\n# (5, 119, 128)\n```\n\nGet a single vector for each sequence\n\n```python\nseq_embed = sapiens.predict_sequence_embedding(\n    'QVKLQESGAELARPGASVKLSCKASGYTFTNYWMQWVKQRPGQGLDWIGAIYPGDGNTRYTHKFKGKATLTADKSSSTAYMQLSSLASEDSGVYYCARGEGNYAWFAYWGQGTTVTVSS', \n    'H', \n    layer=None\n)\nseq_embed.shape\n# (layer, features)\n# (5, 128)\n```\n\n### Notebooks\n\nTry out Sapiens in your browser using these example notebooks:\n\n\u003ctable\u003e\n    \u003ctr\u003e\u003cth\u003eLinks\u003c/th\u003e\u003cth\u003eNotebook\u003c/th\u003e\u003cth\u003eDescription\u003c/th\u003e\u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e\n            \u003ca href=\"https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F01_sapiens_antibody_infilling.ipynb\"\u003e\u003cimg src=\"https://mybinder.org/badge_logo.svg\" /\u003e\u003c/a\u003e\n        \u003c/td\u003e\n        \u003ctd\u003e\u003ca href=\"notebooks/01_sapiens_antibody_infilling.ipynb\"\u003e01_sapiens_antibody_infilling\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003ePredict missing positions in an antibody sequence\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e\n            \u003ca href=\"https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F02_sapiens_antibody_embedding.ipynb\"\u003e\u003cimg src=\"https://mybinder.org/badge_logo.svg\" /\u003e\u003c/a\u003e\n        \u003c/td\u003e\n        \u003ctd\u003e\u003ca href=\"notebooks/02_sapiens_antibody_embedding.ipynb\"\u003e02_sapiens_antibody_embedding\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003eGet vector representations and visualize them using t-SNE\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n        \u003ctd\u003e\n            \u003ca href=\"https://mybinder.org/v2/gh/Merck/Sapiens/main?labpath=notebooks%2F03_sapiens_antibody_vh_mlm_finetuning.ipynb\"\u003e\u003cimg src=\"https://mybinder.org/badge_logo.svg\" /\u003e\u003c/a\u003e\n        \u003c/td\u003e\n        \u003ctd\u003e\u003ca href=\"notebooks/03_sapiens_antibody_vh_mlm_finetuning.ipynb\"\u003e03_sapiens_antibody_vh_mlm_finetuning.ipynb\u003c/a\u003e\u003c/td\u003e\n        \u003ctd\u003eFinetune on a custom pool of sequences and suggest mutations\u003c/td\u003e\n    \u003c/tr\u003e\n\u003c/table\u003e\n\n## Acknowledgements\n\nSapiens is based on antibody repertoires from the Observed Antibody Space:\n\n\u003e Kovaltsuk, A., Leem, J., Kelm, S., Snowden, J., Deane, C. M., \u0026 Krawczyk, K. (2018). Observed Antibody Space: A Resource for Data Mining Next-Generation Sequencing of Antibody Repertoires. The Journal of Immunology, 201(8), 2502–2509. https://doi.org/10.4049/jimmunol.1800708\n","funding_links":[],"categories":["Ranked by starred repositories"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMerck%2FSapiens","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMerck%2FSapiens","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMerck%2FSapiens/lists"}