{"id":45952846,"url":"https://github.com/qizhipei/ssm-dta","last_synced_at":"2026-02-28T13:02:16.106Z","repository":{"id":37684045,"uuid":"465688081","full_name":"QizhiPei/SSM-DTA","owner":"QizhiPei","description":"SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction (Briefings in Bioinformatics 2023)","archived":false,"fork":false,"pushed_at":"2024-05-28T07:55:28.000Z","size":1183,"stargazers_count":52,"open_issues_count":0,"forks_count":8,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-06-06T12:49:15.009Z","etag":null,"topics":["attention-mechanism","deep-learning","drug-target-affinity","drug-target-interaction","drugdiscovery","fairseq","multitask-learning","protein","pytorch","semi-supervised-learning"],"latest_commit_sha":null,"homepage":"https://academic.oup.com/bib/article/24/6/bbad386/7333673","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/QizhiPei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-03-03T11:18:30.000Z","updated_at":"2025-01-20T16:12:17.000Z","dependencies_parsed_at":"2024-05-28T10:01:20.324Z","dependency_job_id":null,"html_url":"https://github.com/QizhiPei/SSM-DTA","commit_stats":null,"previous_names":["qizhipei/ssm-dta","qizhipei/smt-dta"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/QizhiPei/SSM-DTA","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FSSM-DTA","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FSSM-DTA/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FSSM-DTA/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FSSM-DTA/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/QizhiPei","download_url":"https://codeload.github.com/QizhiPei/SSM-DTA/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/QizhiPei%2FSSM-DTA/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29934959,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-28T13:00:17.143Z","status":"ssl_error","status_checked_at":"2026-02-28T12:59:13.669Z","response_time":90,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention-mechanism","deep-learning","drug-target-affinity","drug-target-interaction","drugdiscovery","fairseq","multitask-learning","protein","pytorch","semi-supervised-learning"],"created_at":"2026-02-28T13:02:15.320Z","updated_at":"2026-02-28T13:02:16.093Z","avatar_url":"https://github.com/QizhiPei.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\nSSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction 🔥\n\u003c/h1\u003e\n\n\u003cdiv align=\"center\"\u003e\n\n[![](https://img.shields.io/badge/paper-Briefings_in_Bioinformatics-pink?style=plastic\u0026logo=GitBook)](https://doi.org/10.1093/bib/bbad386)\n[![](https://img.shields.io/badge/paper-arxiv2206.09818-red?style=plastic\u0026logo=GitBook)](https://arxiv.org/abs/2206.09818)\n[![](https://img.shields.io/badge/github-green?style=plastic\u0026logo=github)](https://github.com/QizhiPei/SSM-DTA)\n[![](https://img.shields.io/badge/PyTorch-1.10+-ee4c2c?logo=pytorch\u0026logoColor=white)](https://pytorch.org/get-started/locally/)\n\n\u003c/div\u003e\n\n## Overview\n\nAuthors: Qizhi Pei, Lijun Wu, Jinhua Zhu, Yingce Xia, Shufang Xie, Tao Qin, Haiguang Liu, Tie-Yan Liu, Rui Yan\n\nThis repository contains the code and data link for *Briefings in Bioinformatics 2023* paper [SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction](https://academic.oup.com/bib/article/24/6/bbad386/7333673). Our model achieves significant results compared to traditional and recent baselines. We implement our method based on the codebase of [fairseq](https://github.com/pytorch/fairseq). If you have questions, don't hesitate to open an issue or ask me via \u003cqizhipei@ruc.edu.cn\u003e or Lijun Wu via \u003clijun_wu@outlook.com\u003e. We are happy to hear from you!\n\n## News\n**Jan 17 2024**: 🔥Update infer script and README for convenient usage~\n\n**Oct 6 2023**: Accepted by Briefings in Bioinformatics. Rename **SMT-DTA** to **SSM-DTA**.\n\n**Oct 22 2022**: Pre-trained data is released.\n\n**Oct 21 2022**: Pre-trained models are released. You can directly test our pre-trained model by our inference scripts.\n\n## SSM-DTA's Data\n\nThere are total 4 paired datasets and 2 unlabeled datasets. Please refer to our paper for more details\n\n### Preprocessed Paired Datasets\n\n| Dataset        | File Path in Shared Folder | Update Date  | Download Link                                                |\n| -------------- | -------------------------- | ------------ | ------------------------------------------------------------ |\n| BindingDB IC50 | BindingDB_IC50.tar.gz      | Oct 22, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/El98p8TwBh5LhCEoUKI6Yj0BZaWpv0b_sSIAYLLksUlnSA?e=zkvCpQ |\n| BindingDB Ki   | BindingDB_Ki.tar.gz        | Oct 22, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/El98p8TwBh5LhCEoUKI6Yj0BZaWpv0b_sSIAYLLksUlnSA?e=zkvCpQ |\n| KIBA           | KIBA.tar.gz                | Oct 22, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/El98p8TwBh5LhCEoUKI6Yj0BZaWpv0b_sSIAYLLksUlnSA?e=zkvCpQ |\n| DAVIS          | DAVIS.tar.gz               | Oct 22, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/El98p8TwBh5LhCEoUKI6Yj0BZaWpv0b_sSIAYLLksUlnSA?e=zkvCpQ |\n\n### Preprocessed Unlabeled Datasets\n\n| Dataset     | File Path in Shared Folder | Update Date  | Download Link                                                |\n| ----------- | -------------------------- | ------------ | ------------------------------------------------------------ |\n| PubChem 10M | molecule.tar.gz            | Oct 22, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/El98p8TwBh5LhCEoUKI6Yj0BZaWpv0b_sSIAYLLksUlnSA?e=zkvCpQ |\n| Pfam 10M    | protein.tar.gz             | Oct 22, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/El98p8TwBh5LhCEoUKI6Yj0BZaWpv0b_sSIAYLLksUlnSA?e=zkvCpQ |\n\n### Data Folder Format\n\nTake the BindingDB_IC50 data for example, the processed data folder should be organized in the following format:\n\n```\nDATA_BIN\n  |-BindingDB_IC50 # This folder name should be the same as --dti-dataset argument\n    |-input0\n    |-input1\n    |-label\n  |-molecule \n  |-protein\n```\n\n## SSM-DTA Pre-trained Model Checkpoints\n\n| Model          | File Path in Shared Folder                 | Update Data  | Download Link                                                |\n| -------------- | ------------------------------------------ | ------------ | ------------------------------------------------------------ |\n| BindingDB IC50 | BindingDB_IC50/checkpoint_best_20221021.pt | Oct 21, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/EluW1t5l25RFluRkBkPS3jABueKqxPhxIesJJHc7IE3vdw?e=2e88A3 |\n| BindingDB Ki   | BindingDB_Ki/checkpoint_best_20221021.pt   | Oct 21, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/EluW1t5l25RFluRkBkPS3jABueKqxPhxIesJJHc7IE3vdw?e=2e88A3 |\n| KIBA           | KIBA/checkpoint_best_20221021.pt           | Oct 21, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/EluW1t5l25RFluRkBkPS3jABueKqxPhxIesJJHc7IE3vdw?e=2e88A3 |\n| DAVIS          | DAVIS/checkpoint_best_20221021.pt          | Oct 21, 2022 | https://mailustceducn-my.sharepoint.com/:f:/g/personal/peiqz_mail_ustc_edu_cn/EluW1t5l25RFluRkBkPS3jABueKqxPhxIesJJHc7IE3vdw?e=2e88A3 |\n\n\n\n## Model Architecture\n\n![](./img/arch.jpg)\n\n## Requirements and Installation\n* Python version == 3.7\n* PyTorch version == 1.10.2\n* Fairseq version == 0.10.2\n* RDKit version == 2020.09.5\n* numpy\n\nNote that the above requirements is not strict.\nWe set up the environment using conda. Clone the current repo and fairseq official repo, then merge them:\n\n```shell\ngit clone https://github.com/QizhiPei/SSM-DTA.git\ncd SSM-DTA\npwd=$PWD\n\ngit clone git@github.com:pytorch/fairseq.git /tmp/fairseq\ncd /tmp/fairseq\ngit checkout v0.10.2\n\ncd $pwd\ncp -r -n /tmp/fairseq/* ./\n```\nCreate a new environment: \n\n```shell\nconda create -n py37-dta python=3.7\n```\n\nActivate the environment:\n\n```shell\nconda activate py37-dta\n```\n\nInstall required packages for evaluation:\n\n```shell\npip install future scipy scikit-learn lifelines requests rdkit==2020.09.5\n```\n\nInstall the code from source:\n\n```shell\npip install -e . \n```\n\n## Raw Dataset\n\n* We collected and randomly sample the unlabeled molecule and protein data from the following public database:\n  * Pfam: [pfamseq.gz file](http://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/pfamseq.gz)\n  * PubChem:\n  \n    * Extract the SMILES from the .sdf file from [PubChem ftp SDF](https://ftp.ncbi.nlm.nih.gov/pubchem/Compound/CURRENT-Full/SDF/)\n  \n    * Or [PubChem CID-SMILES file](https://ftp.ncbi.nlm.nih.gov/pubchem/Compound/Extras/CID-SMILES.gz)\n* BindingDB Dataset: https://github.com/Shen-Lab/DeepAffinity/tree/master/data/dataset\n* DAVIS and KIBA Datasets: https://github.com/kexinhuang12345/DeepPurpose\n\n## Getting Started\n\n### Data Preprocessing\n\n#### Unlabeled Molecule and Protein\n\n```shell\nDATADIR=/yourUnlabeledDataDir\nDATA_BIN=/yourDataBinDir\n\n# Canonicalize all SMILES\npython preprocess/canonicalize.py $DATADIR/train.mol --workers 40 \\\n  --output-fn $DATADIR/train.mol.can\n\n# Tokenize all SMILES\npython preprocess/tokenize_re.py $DATADIR/train.mol.can --workers 40 \\\n  --output-fn $DATADIR/train.mol.can.re \n\n# Tokenize all protein sequence\npython preprocess/add_space.py $DATADIR/train.pro --workers 40 \\\n  --output-fn $DATADIR/train.pro.addspace\n\n# You should also canonicalize and tokenize the valid set in the same way.\n\n# Binarize the data\nfairseq-preprocess \\\n    --only-source \\\n    --trainpref $DATADIR/train.mol.can.re \\\n    --validpref $DATADIR/valid.mol.can.re \\\n    --destdir $DATA_BIN/molecule \\\n    --workers 40 \\\n    --srcdict preprocess/dict.mol.txt\n    \nfairseq-preprocess \\\n    --only-source \\\n    --trainpref $DATADIR/train.pro.addspace \\\n    --validpref $DATADIR/valid.pro.addspace \\\n    --destdir $DATA_BIN/protein \\\n    --workers 40 \\\n    --srcdict preprocess/dict.pro.txt\n```\n#### Paired DTA data\n\nYou may need to firstly follow the README in `preprocess` folder to process the data from `BindingDB_All.tsv` and downloaded DAVIS and KIBA datasets from Deeppurpose.\n\n```shell\nDATADIR=/yourPairedDataDir\nDATA_BIN=/yourDataBinDir/bindingdb(davis or kiba)\n\n# Canonicalize all SMILES\npython preprocess/canonicalize.py $DATADIR/train.mol --workers 40 \\\n  --output-fn $DATADIR/train.mol.can\n\n# Tokenize all SMILES\npython preprocess/tokenize_re.py $DATADIR/train.mol.can --workers 40 \\\n  --output-fn $DATADIR/train.mol.can.re \n\n# Tokenize all protein sequence\npython preprocess/add_space.py $DATADIR/train.pro --workers 40 \\\n  --output-fn $DATADIR/train.pro.addspace\n\n# You should also process the valid set and test set in the same way.\n\n# Binarize the data\nfairseq-preprocess \\\n    --only-source \\\n    --trainpref $DATADIR/train.mol.can.re \\\n    --validpref $DATADIR/valid.mol.can.re \\\n    --testpref $DATADIR/test.mol.can.re \\\n    --destdir $DATA_BIN/input0 \\\n    --workers 40 \\\n    --srcdict preprocess/dict.mol.txt\n\nfairseq-preprocess \\\n    --only-source \\\n    --trainpref $DATADIR/train.pro.addspace \\\n    --validpref $DATADIR/valid.pro.addspace \\\n    --testpref $DATADIR/test.pro.addspace \\\n    --destdir $DATA_BIN/input1 \\\n    --workers 40 \\\n    --srcdict preprocess/dict.pro.txt\n\nmkdir -p $DATA_BIN/label\n\ncp $DATADIR/train.label $DATA_BIN/label/train.label\ncp $DATADIR/valid.label $DATA_BIN/label/valid.label\n```\n\n### Train Baseline\n\n```shell\nDATA_BIN=/yourDataBinDir/bindingdb(davis or kiba)  \nFAIRSEQ=$pwd # The path to \nSAVE_PATH=/yourCkptDir\nTOTAL_UPDATES=200000 # Total number of training steps\nWARMUP_UPDATES=10000 # Warmup the learning rate over this many updates\nPEAK_LR=0.00005       # Peak learning rate, adjust as needed\nBATCH_SIZE=4\t\t# Batch size\nUPDATE_FREQ=4       # Increase the batch size 4x\n\nmkdir -p $SAVE_PATH\n\npython $FAIRSEQ/train.py --task dti_separate $DATA_BIN \\\n    --num-classes 1 --init-token 0 \\\n    --max-positions-molecule 512 --max-positions-protein 1024 \\\n    --save-dir $SAVE_PATH \\\n    --encoder-layers 12 \\\n    --criterion dti_separate --regression-target \\\n    --batch-size $BATCH_SIZE --update-freq $UPDATE_FREQ --required-batch-size-multiple 1 \\\n    --optimizer adam --weight-decay 0.1 --adam-betas '(0.9,0.98)' --adam-eps 1e-06 \\\n    --lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES \\\n    --clip-norm 1.0  --max-update $TOTAL_UPDATES \\\n    --arch roberta_dti_cross_attn --dropout 0.1 --attention-dropout 0.1 \\\n    --skip-invalid-size-inputs-valid-test \\\n    --fp16 \\\n    --shorten-method truncate \\\n    --find-unused-parameters | tee -a ${SAVE_PATH}/training.log\n```\n\n* `DATA_BIN` is where you save the preprocessed data\n* `FAIRSEQ` is the path to fairseq code base\n* `SAVE_PATH` is where you save the checkpoint file and training log\n\n### Train SSM-DTA Model\n\n```shell\nDATA_BIN=/yourDataBinDir\nDTA_DATASET=BindingDB_IC50(or BindingDB_Ki or DAVIS or KIBA)\nFAIRSEQ=$pwd\nSAVE_PATH=/yourCkptDir\nTOTAL_UPDATES=200000 # Total number of training steps\nINTERVAL_UPDATES=1000 # Validate and save checkpoint every N updates\nWARMUP_UPDATES=10000 # Warmup the learning rate over this many updates\nPEAK_LR=0.0001       # Peak learning rate, adjust as needed\nBATCH_SIZE=4\t\t# Batch size\nUPDATE_FREQ=8       # Increase the batch size 8x\nMLMW=2 \t\t\t\t# MLM loss weight\n\n# The final real batch size is BATCH_SIZE x GPU_NUM x UPDATE_FREQ\n\nmkdir -p $SAVE_PATH\n\npython $FAIRSEQ/train.py --task dti_mlm_regress_pretrain $DATA_BIN \\\n\t--dti-dataset $DTA_DATASET \\\n    --num-classes 1 --init-token 0 \\\n    --max-positions-molecule 512 --max-positions-protein 1024 \\\n    --save-dir $SAVE_PATH \\\n    --encoder-layers 12 \\\n    --criterion dti_mlm_regress_pretrain --regression-target \\\n    --batch-size $BATCH_SIZE --update-freq $UPDATE_FREQ --required-batch-size-multiple 1 \\\n    --optimizer adam --weight-decay 0.01 --adam-betas '(0.9,0.98)' --adam-eps 1e-06 \\\n    --lr-scheduler polynomial_decay --lr $PEAK_LR --warmup-updates $WARMUP_UPDATES --total-num-update $TOTAL_UPDATES \\\n    --clip-norm 1.0  --max-update $TOTAL_UPDATES \\\n    --arch roberta_dti_cross_attn --dropout 0.1 --attention-dropout 0.1 \\\n    --skip-invalid-size-inputs-valid-test \\\n    --fp16 \\\n    --shorten-method truncate \\\n    --use-2-attention --find-unused-parameters --ddp-backend no_c10d \\\n    --validate-interval-updates $INTERVAL_UPDATES \\\n    --save-interval-updates $INTERVAL_UPDATES \\\n    --best-checkpoint-metric loss_regress_mse \\\n    --mlm-weight-0 $MLMW --mlm-weight-1 $MLMW --mlm-weight-paired-0 $MLMW --mlm-weight-paired-1 $MLMW | tee -a ${SAVE_PATH}/training.log\n```\n* `DATA_BIN` is where you save the preprocessed data\n* `DTA_DATASET` is the paired dataset you want to use for training\n* `FAIRSEQ` is the path to fairseq code base\n* `SAVE_PATH` is where you save the checkpoint file and training log\n\nYou can also use fairseq argument `--tensorboard-logdir TENSORBOARD_LOGDIR`  to save logs for tensorboard.\n\n## Evaluation/Inference\n### Evaluation on Provided Binary Data\nFor quicker evaluation with `.bin` data, you can use the following script:\n\n```shell\npython fairseq_cli/validate.py \\\n    --task dti_mlm_regress \\\n    --batch-size 32 \\\n    --valid-subset test \\\n    --criterion dti_mlm_regress_eval \\\n    --path yourCheckpointFilePath \\\n    $DATA_BIN/BindingDB_IC50(or BindingDB_Ki or DAVIS or KIBA)\n```\n### Inference on your own Drug-Target Pairs\nIf you want to use our model to predict the affinity value on your own DT pairs, following the instructions below:\n1. Download the checkpoint file provided in [the above section](https://github.com/QizhiPei/SSM-DTA?tab=readme-ov-file#ssm-dta-pre-trained-model-checkpoints), and save it to `yourCheckpointFilePath`.\n2. Preprocess your input molecule (drug) and protein (target) file. You can increase the number of worker to speed up the process (especially for large input files) as needed.\n   * For molecule (drug), you need to use rdkit to canonicalize the SMILES, and then use regular expression to tokenize it. \n      ```shell\n        python preprocess/canonicalize.py example.mol --output-fn example.mol.can --workers 1\n        python preprocess/tokenize_re.py example.mol.can --output-fn example.mol.can.re --workers 1\n      ```\n   * For protein (target), you only need to add space between each amino acid.\n      ```shell\n      python preprocess/add_space.py example.pro --output-fn example.pro.addspace --workers 1\n      ```\n3. Run the following command, and the model prediction with be saved in `example_pred.txt`. We provide the example input file for reference.\n    ```shell\n    python infer.py \\\n        --batch-size 8 \\\n        --mode pred \\\n        --checkpoint yourCheckpointFilePath \\\n        --data-bin dict \\\n        --input-mol-fn example.mol.can.re \\\n        --input-pro-fn example.pro.addspace \\\n        --output-fn example_pred.txt\n    ```\n4. If you want to calculate the metrics at the same time, you need to change `--mode pred` to `--mode eval`, and provide the `--input-label-fn`\n    ```shell\n    python infer.py \\\n        --batch-size 8 \\\n        --mode eval \\\n        --checkpoint yourCheckpointFilePath \\\n        --data-bin dict \\\n        --input-mol-fn example.mol.can.re \\\n        --input-pro-fn example.pro.addspace \\\n        --input-label-fn example.label \\\n        --output-fn example_pred.txt\n    ```\n\n## Feature-based Training/Finetune\n\nFor feature-based training(only use the output from the pretrained molecule and protein encoder but not update their parameter) or finetune, you need to prepare your pretrained checkpoint files in the fairseq format. Use these checkpoint files to initialize your molecule and protein encoder, respectively. The code file `fairseq/models/dti_mlm_regress_sep_encoder_from_pretrained_roberta.py` is provided for reference.\n\nTo fix the encoder, you can just add the following code in your model class:\n\n```python\nfor param in encoder_molecule.parameters():\n\tparam.requires_grad = False\nfor param in encoder_protein.parameters():\n\tparam.requires_grad = False\n```\n\n## Citation\n\nIf you find our code is helpful for you, please consider citing our paper:\n\n```\n@article{pei2023breaking,\n  title={Breaking the barriers of data scarcity in drug--target affinity prediction},\n  author={Pei, Qizhi and Wu, Lijun and Zhu, Jinhua and Xia, Yingce and Xie, Shufang and Qin, Tao and Liu, Haiguang and Liu, Tie-Yan and Yan, Rui},\n  journal={Briefings in Bioinformatics},\n  volume={24},\n  number={6},\n  pages={bbad386},\n  year={2023},\n  publisher={Oxford University Press}\n}\n\n@misc{pei2023ssmdta,\n      title={SSM-DTA: Breaking the Barriers of Data Scarcity in Drug-Target Affinity Prediction}, \n      author={Qizhi Pei and Lijun Wu and Jinhua Zhu and Yingce Xia and Shufang Xie and Tao Qin and Haiguang Liu and Tie-Yan Liu and Rui Yan},\n      year={2023},\n      eprint={2206.09818},\n      archivePrefix={arXiv},\n      primaryClass={q-bio.BM}\n}\n```\n\n## License\n\nThis project is licensed under the terms of the MIT license. See [LICENSE](https://github.com/QizhiPei/SSM-DTA/blob/main/LICENSE) for additional details.\n\n## Contacts\n\nQizhi Pei: qizhipei@ruc.edu.cn\n\nLijun Wu: lijuwu@microsoft.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqizhipei%2Fssm-dta","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fqizhipei%2Fssm-dta","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fqizhipei%2Fssm-dta/lists"}