https://github.com/gersteinlab/entexbert
The DNABERT model for ENTEx
https://github.com/gersteinlab/entexbert
Last synced: 10 months ago
JSON representation
The DNABERT model for ENTEx
- Host: GitHub
- URL: https://github.com/gersteinlab/entexbert
- Owner: gersteinlab
- Created: 2022-10-07T04:04:23.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-07T19:13:01.000Z (over 3 years ago)
- Last Synced: 2025-05-15T05:34:56.017Z (about 1 year ago)
- Language: Python
- Size: 27.3 KB
- Stars: 5
- Watchers: 14
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Script for training the DNABERT model for AS effect prediction. \
The model adds an 1-layer FC net on top of the token/sequence embedding of DNABERT to predict whether the center SNP is sensitive to AS effects. \
Uses pre-trained DNABERT weights.
## Requirements
DNABERT (https://github.com/jerryji1993/DNABERT) \
pytorch 1.7.1 \
cudatoolkit 11.0.221
## Example
KMER: 3, 4, 5 or 6\
MODEL_PATH: where the pre-trained DNABERT model is located \
python3 1.ft_bert.py \
--model_type ${model} \
--tokenizer_name=dna$KMER \
--model_name_or_path \$MODEL_PATH \
--task_name dnaprom \
--do_train \
--do_eval \
--do_predict \
--data_dir \$DATA_PATH \
--predict_dir \$DATA_PATH \
--max_seq_length ${seq_len} \
--per_gpu_eval_batch_size=${batch} \
--per_gpu_train_batch_size=${batch} \
--learning_rate ${lr} \
--num_train_epochs ${ep} \
--output_dir \$OUTPUT_PATH \
--evaluate_during_training \
--logging_steps 5000 \
--save_steps 20000 \
--warmup_percent 0.1 \
--hidden_dropout_prob 0.1 \
--overwrite_output \
--weight_decay 0.01 \
--n_process 8 \
--pred_layer ${layer} \
--seed ${seed}