{"id":13753237,"url":"https://github.com/xuyige/BERT4doc-Classification","last_synced_at":"2025-05-09T20:35:06.681Z","repository":{"id":47041107,"uuid":"238428378","full_name":"xuyige/BERT4doc-Classification","owner":"xuyige","description":"Code and source for paper ``How to Fine-Tune BERT for Text Classification?``","archived":false,"fork":false,"pushed_at":"2021-10-19T06:11:47.000Z","size":814,"stargazers_count":627,"open_issues_count":13,"forks_count":101,"subscribers_count":9,"default_branch":"master","last_synced_at":"2025-04-04T21:11:07.733Z","etag":null,"topics":["bert","natural-language-processing","text-classification"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xuyige.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-02-05T10:54:03.000Z","updated_at":"2025-03-29T07:35:45.000Z","dependencies_parsed_at":"2022-08-23T08:00:12.885Z","dependency_job_id":null,"html_url":"https://github.com/xuyige/BERT4doc-Classification","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FBERT4doc-Classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FBERT4doc-Classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FBERT4doc-Classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xuyige%2FBERT4doc-Classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xuyige","download_url":"https://codeload.github.com/xuyige/BERT4doc-Classification/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253321841,"owners_count":21890476,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","natural-language-processing","text-classification"],"created_at":"2024-08-03T09:01:18.830Z","updated_at":"2025-05-09T20:35:01.627Z","avatar_url":"https://github.com/xuyige.png","language":"Python","funding_links":[],"categories":["文本分类"],"sub_categories":[],"readme":"# How to Fine-Tune BERT for Text Classification?\n\nThis is the code and source for the paper [How to Fine-Tune BERT for Text Classification?](https://arxiv.org/abs/1905.05583)\n\nIn this paper, we conduct exhaustive experiments to investigate different fine-tuning methods of BERT on text classification task and provide a general solution for BERT fine-tuning.\n\n\n\\*********** **update at Mar 14, 2020** \\*************\n\nOur checkpoint can be loaded in BertEmbedding from the latest [fastNLP](https://github.com/fastnlp/fastNLP) package.\n\n[Link to](https://github.com/fastnlp/fastNLP/blob/master/fastNLP/embeddings/bert_embedding.py) fastNLP.embeddings.BertEmbedding\n\n## Requirements\n\nFor further pre-training, we borrow some code from Google BERT. Thus, we need:\n\n+ tensorflow==1.1x\n+ spacy\n+ pandas\n+ numpy\n\nNote that you need Python 3.7 or earlier for compatibility with tensorflow 1.1x.\n\nFor fine-tuning, we borrow some codes from pytorch-pretrained-bert package (now well known as transformers). Thus, we need:\n\n+ torch\u003e=0.4.1,\u003c=1.2.0\n\n\n\n## Run the code\n\n### 1) Prepare the data set:\n\n#### Sogou News\n\nWe determine the category of the news based on the URL, such as “sports” corresponding\nto “http://sports.sohu.com”. We choose 6 categories\n– “sports”, “house”, “business”, “entertainment”,\n“women” and “technology”. The number\nof training samples selected for each class is 9,000\nand testing 1,000.\n\nData is available at [here](https://drive.google.com/drive/folders/1Rbi0tnvsQrsHvT_353pMdIbRwDlLhfwM).\n\n#### The rest data sets\n\nThe rest data sets were built by [Zhang et al. (2015)](https://papers.nips.cc/paper/5782-character-level-convolutional-networks-for-text-classification.pdf).\nWe download from [URL](https://drive.google.com/drive/u/0/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M) created by Xiang Zhang.\n\n\n### 2) Prepare Google BERT:\n\n[BERT-Base, Uncased](https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip)\n\n[BERT-Base, Chinese](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip)\n\n\n### 3) Further Pre-Training:\n\n#### Generate Further Pre-Training Corpus\n\nHere we use AG's News as example:\n```shell\npython generate_corpus_agnews.py\n```\nFile ``agnews_corpus_test.txt`` can be found in directory ``./data``.\n\n#### Run Further Pre-Training\n\n```shell\npython create_pretraining_data.py \\\n  --input_file=./AGnews_corpus.txt \\\n  --output_file=tmp/tf_AGnews.tfrecord \\\n  --vocab_file=./uncased_L-12_H-768_A-12/vocab.txt \\\n  --do_lower_case=True \\\n  --max_seq_length=128 \\\n  --max_predictions_per_seq=20 \\\n  --masked_lm_prob=0.15 \\\n  --random_seed=12345 \\\n  --dupe_factor=5\n  \npython run_pretraining.py \\\n  --input_file=./tmp/tf_AGnews.tfrecord \\\n  --output_dir=./uncased_L-12_H-768_A-12_AGnews_pretrain \\\n  --do_train=True \\\n  --do_eval=True \\\n  --bert_config_file=./uncased_L-12_H-768_A-12/bert_config.json \\\n  --init_checkpoint=./uncased_L-12_H-768_A-12/bert_model.ckpt \\\n  --train_batch_size=32 \\\n  --max_seq_length=128 \\\n  --max_predictions_per_seq=20 \\\n  --num_train_steps=100000 \\\n  --num_warmup_steps=10000 \\\n  --save_checkpoints_steps=10000 \\\n  --learning_rate=5e-5\n```\n\n\n### 4) Fine-Tuning\n\n#### Convert Tensorflow checkpoint to PyTorch checkpoint\n\n```shell\npython convert_tf_checkpoint_to_pytorch.py \\\n  --tf_checkpoint_path ./uncased_L-12_H-768_A-12_AGnews_pretrain/model.ckpt-100000 \\\n  --bert_config_file ./uncased_L-12_H-768_A-12_AGnews_pretrain/bert_config.json \\\n  --pytorch_dump_path ./uncased_L-12_H-768_A-12_AGnews_pretrain/pytorch_model.bin\n```\n\n#### Fine-Tuning on downstream tasks\n\nWhile fine-tuning on downstream tasks, we notice that different GPU (e.g.: 1080Ti and Titan Xp) may cause \nslight differences in experimental results even though we fix the initial random seed.\nHere we use 1080Ti * 4 as example.\n\nTake Exp-I (See Section 5.3) as example,\n\n```shell\nexport CUDA_VISIBLE_DEVICES=0,1,2,3\npython run_classifier_single_layer.py \\\n  --task_name imdb \\\n  --do_train \\\n  --do_eval \\\n  --do_lower_case \\\n  --data_dir ./IMDB_data/ \\\n  --vocab_file ./uncased_L-12_H-768_A-12_IMDB_pretrain/vocab.txt \\\n  --bert_config_file ./uncased_L-12_H-768_A-12_IMDB_pretrain/bert_config.json \\\n  --init_checkpoint ./uncased_L-12_H-768_A-12_IMDB_pretrain/pytorch_model.bin \\\n  --max_seq_length 512 \\\n  --train_batch_size 24 \\\n  --learning_rate 2e-5 \\\n  --num_train_epochs 3.0 \\\n  --output_dir ./imdb \\\n  --seed 42 \\\n  --layers 11 10 \\\n  --trunc_medium -1\n```\n\nwhere ``num_train_epochs`` can be 3.0, 4.0, or 6.0.\n\n``layers`` indicates list of layers which will be taken as feature for classification.\n-2 means use pooled output, -1 means concat all layer, the command above means concat\nlayer-10 and layer-11 (last two layers).\n\n``trunc_medium`` indicates dealing with long texts. -2 means head-only, -1 means tail-only,\n0 means head-half + tail-half (e.g.: head256+tail256),\nother natural number k means head-k + tail-rest (e.g.: head-k + tail-(512-k)).\n\nThere also other arguments for fine-tuning:\n\n``pooling_type`` indicates which feature will be used for classification. `mean` means\nmean-pooling for hidden state of the whole sequence, `max` means max-pooling, default means\ntaking hidden state of `[CLS]` token as features.\n\n``layer_learning_rate`` and ``layer_learning_rate_decay`` in ``run_classifier_discriminative.py``\nindicates layer-wise decreasing layer rate (See Section 5.3.4).\n\n\n## Further Pre-Trained Checkpoints\n\nWe upload IMDb-based further pre-trained checkpoints at\n[here](https://drive.google.com/drive/folders/1Rbi0tnvsQrsHvT_353pMdIbRwDlLhfwM).\n\nFor other checkpoints, please contact us by e-mail.\n\n## How to cite our paper\n\n```text\n@inproceedings{sun2019fine,\n  title={How to fine-tune {BERT} for text classification?},\n  author={Sun, Chi and Qiu, Xipeng and Xu, Yige and Huang, Xuanjing},\n  booktitle={China National Conference on Chinese Computational Linguistics},\n  pages={194--206},\n  year={2019},\n  organization={Springer}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxuyige%2FBERT4doc-Classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxuyige%2FBERT4doc-Classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxuyige%2FBERT4doc-Classification/lists"}