https://github.com/astrazeneca/selfpad
The official implementation of "Improving Antibody Humanness Prediction using Patent Data".
https://github.com/astrazeneca/selfpad
antibody antibody-design antibody-sequence attention contrastive-learning humanness immunogenicity-prediction patent-data transformer
Last synced: 2 months ago
JSON representation
The official implementation of "Improving Antibody Humanness Prediction using Patent Data".
- Host: GitHub
- URL: https://github.com/astrazeneca/selfpad
- Owner: AstraZeneca
- License: apache-2.0
- Created: 2024-01-14T18:15:01.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-02-14T22:02:22.000Z (over 1 year ago)
- Last Synced: 2025-03-31T17:59:07.062Z (4 months ago)
- Topics: antibody, antibody-design, antibody-sequence, attention, contrastive-learning, humanness, immunogenicity-prediction, patent-data, transformer
- Language: Python
- Homepage:
- Size: 425 KB
- Stars: 9
- Watchers: 3
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# SelfPAD:
##### Author: Talip Ucar ([email protected])The official implementation of [Improving Antibody Humanness Prediction using Patent Data](https://arxiv.org/pdf/2401.14442.pdf)
# Table of Contents:
1. [Model](#model)
2. [Environment](#environment)
3. [Configuration](#configuration)
4. [Training and Evaluation](#training-and-evaluation)
5. [Structure of the repo](#structure-of-the-repo)
6. [Results](#results)
7. [Experiment tracking](#experiment-tracking)
8. [Citing the paper](#citing-the-paper)
9. [Citing this repo](#citing-this-repo)# Model
Pre-training | Fine-tuning
:-------------------------:|:-------------------------:
 | # Environment
We used Python 3.7 for our experiments. The environment can be set up by following three steps:```
pip install pipenv # To install pipenv if you don't have it already
pipenv install --skip-lock # To install required packages.
pipenv shell # To activate virtual env
```If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name".
# Configuration
There are two types of configuration files:
```
1. pad.yaml # Defines parameters and options for pre-training
2. humanness.yaml # Defines parameters and options for fine-training
```# Training and Evaluation
You can train and evaluate the model by using:```
python selfpad_pretrain.py # For pre-training
python selfpad_finetune.py # For fine-tuning it for humanness
python selfpad_eval.py -ev test # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns
```# Structure of the repo
- selfpad_pretrain.py
- selfpad_finetune.py
- selfpad_eval.py- src
|-selfpad.py
|-selfpad_humanness.py- config
|-pad.yaml
|-humanness.yaml
- utils_common
|-arguments.py
|-utils.py
|-tokenizer.py
...
- utils_pretrain
|-load_data.py
|-model_utils.py
|-loss_functions.py
...
- utils_finetune
|-load_data.py
|-model_utils.py
|-loss_functions.py
...
- data
|-test.csv
...
- results
|-pretraining
|-humanness
...
# Results
Results at the end of training is saved under ```./results``` directory. Results directory structure is as following:
- results
|-task e.g. humanness, or pretraining
|-evaluation
|-clusters (for plotting t-SNE and PCA plots of embeddings)
|-training
|-model
|-plots
|-lossYou can save results of evaluations under "evaluation" folder.
# Experiment tracking
You can turn on Weight and Biases (W&B) in the config file for logging# Citing the paper
```
@article{ucar2024SelfPAD,
title={Improving Antibody Humanness Prediction using Patent Data},
author={Ucar, Talip and
Ramon, Aubin and
Oglic, Dino and
Croasdale-Wood, Rebecca and
Diethe, Tom and
Sormanni, Pietro},
journal={arXiv preprint arXiv:2110.04361},
year={2024}
}
```# Citing this repo
If you use SelfPAD framework in your own studies, and work, please cite it by using the following:```
@Misc{talip_ucar_2024_SelfPAD,
author = {Talip Ucar},
title = {{Improving Antibody Humanness Prediction using Patent Data}},
howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}},
month = January,
year = {since 2024}
}
```