https://github.com/astrazeneca/selfpad

The official implementation of "Improving Antibody Humanness Prediction using Patent Data".
https://github.com/astrazeneca/selfpad

antibody antibody-design antibody-sequence attention contrastive-learning humanness immunogenicity-prediction patent-data transformer

Last synced: 2 months ago
JSON representation

The official implementation of "Improving Antibody Humanness Prediction using Patent Data".

Host: GitHub
URL: https://github.com/astrazeneca/selfpad
Owner: AstraZeneca
License: apache-2.0
Created: 2024-01-14T18:15:01.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-02-14T22:02:22.000Z (over 1 year ago)
Last Synced: 2025-03-31T17:59:07.062Z (4 months ago)
Topics: antibody, antibody-design, antibody-sequence, attention, contrastive-learning, humanness, immunogenicity-prediction, patent-data, transformer
Language: Python
Homepage:
Size: 425 KB
Stars: 9
Watchers: 3
Forks: 4
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # SelfPAD: 

##### Author: Talip Ucar ([email protected])

The official implementation of [Improving Antibody Humanness Prediction using Patent Data](https://arxiv.org/pdf/2401.14442.pdf)

# Table of Contents:

1. [Model](#model)

2. [Environment](#environment)

3. [Configuration](#configuration)

4. [Training and Evaluation](#training-and-evaluation)

5. [Structure of the repo](#structure-of-the-repo)

6. [Results](#results)

7. [Experiment tracking](#experiment-tracking)

8. [Citing the paper](#citing-the-paper)

9. [Citing this repo](#citing-this-repo)

# Model

Pre-training             |  Fine-tuning

:-------------------------:|:-------------------------:

![SelfPAD](./assets/selfpad.png)  |  ![SelfPAD](./assets/selfpad_finetuning.png)

# Environment

We used Python 3.7 for our experiments. The environment can be set up by following three steps:

```

pip install pipenv             # To install pipenv if you don't have it already

pipenv install --skip-lock     # To install required packages. 

pipenv shell                   # To activate virtual env

```

If the second step results in issues, you can install packages in Pipfile individually by using pip i.e. "pip install package_name". 

# Configuration

There are two types of configuration files:

```

1. pad.yaml         # Defines parameters and options for pre-training

2. humanness.yaml   # Defines parameters and options for fine-training

```

# Training and Evaluation

You can train and evaluate the model by using:

```

python selfpad_pretrain.py        # For pre-training

python selfpad_finetune.py        # For fine-tuning it for humanness

python selfpad_eval.py -ev test    # To compute humanness score for custome dataset, in this case it is test.csv. CSV file should have "VH", "VL" and/or "Label" columns

```

# Structure of the repo


- selfpad_pretrain.py

- selfpad_finetune.py

- selfpad_eval.py

- src

    |-selfpad.py

    |-selfpad_humanness.py

- config

    |-pad.yaml

    |-humanness.yaml

    

- utils_common

    |-arguments.py

    |-utils.py

    |-tokenizer.py

    ...

    

- utils_pretrain

    |-load_data.py

    |-model_utils.py

    |-loss_functions.py

    ...

    

- utils_finetune

    |-load_data.py

    |-model_utils.py

    |-loss_functions.py

    ...

    

- data

    |-test.csv

    ...

    

- results

    |-pretraining

    |-humanness

    ...

    



# Results

Results at the end of training is saved under ```./results``` directory. Results directory structure is as following:


- results

    |-task e.g. humanness, or pretraining

            |-evaluation

                |-clusters (for plotting t-SNE and PCA plots of embeddings)

            |-training

                |-model

                |-plots

                |-loss



You can save results of evaluations under "evaluation" folder. 

# Experiment tracking

You can turn on Weight and Biases (W&B) in the config file for logging 

# Citing the paper

```

@article{ucar2024SelfPAD,

  title={Improving Antibody Humanness Prediction using Patent Data},

  author={Ucar, Talip and 

          Ramon, Aubin and 

          Oglic, Dino and 

          Croasdale-Wood, Rebecca and 

          Diethe, Tom and 

          Sormanni, Pietro},

  journal={arXiv preprint arXiv:2110.04361},

  year={2024}

}

```

# Citing this repo

If you use SelfPAD framework in your own studies, and work, please cite it by using the following:

```

@Misc{talip_ucar_2024_SelfPAD,

  author =   {Talip Ucar},

  title =    {{Improving Antibody Humanness Prediction using Patent Data}},

  howpublished = {\url{https://github.com/AstraZeneca/SelfPAD}},

  month        = January,

  year = {since 2024}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/astrazeneca/selfpad

Awesome Lists containing this project

README