{"id":13752957,"url":"https://github.com/google-research/ALBERT","last_synced_at":"2025-05-09T20:34:37.590Z","repository":{"id":39484363,"uuid":"224296809","full_name":"google-research/albert","owner":"google-research","description":"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations","archived":true,"fork":false,"pushed_at":"2023-04-14T18:02:55.000Z","size":296,"stargazers_count":3255,"open_issues_count":102,"forks_count":570,"subscribers_count":73,"default_branch":"master","last_synced_at":"2025-01-18T21:35:51.470Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/google-research.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-11-26T22:23:13.000Z","updated_at":"2025-01-13T12:33:25.000Z","dependencies_parsed_at":"2023-01-21T23:01:00.413Z","dependency_job_id":"811bea06-5abe-4cfa-9954-d795eaed451a","html_url":"https://github.com/google-research/albert","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Falbert","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Falbert/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Falbert/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/google-research%2Falbert/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/google-research","download_url":"https://codeload.github.com/google-research/albert/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253321799,"owners_count":21890466,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:13.444Z","updated_at":"2025-05-09T20:34:33.345Z","avatar_url":"https://github.com/google-research.png","language":"Python","funding_links":[],"categories":["BERT优化"],"sub_categories":["大语言对话模型及数据"],"readme":"ALBERT\n======\n\n***************New March 28, 2020 ***************\n\nAdd a colab [tutorial](https://github.com/google-research/albert/blob/master/albert_glue_fine_tuning_tutorial.ipynb) to run fine-tuning for GLUE datasets.\n\n***************New January 7, 2020 ***************\n\nv2 TF-Hub models should be working now with TF 1.15, as we removed the\nnative Einsum op from the graph. See updated TF-Hub links below.\n\n***************New December 30, 2019 ***************\n\nChinese models are released. We would like to thank [CLUE team ](https://github.com/CLUEbenchmark/CLUE) for providing the training data.\n\n- [Base](https://storage.googleapis.com/albert_models/albert_base_zh.tar.gz)\n- [Large](https://storage.googleapis.com/albert_models/albert_large_zh.tar.gz)\n- [Xlarge](https://storage.googleapis.com/albert_models/albert_xlarge_zh.tar.gz)\n- [Xxlarge](https://storage.googleapis.com/albert_models/albert_xxlarge_zh.tar.gz)\n\nVersion 2 of ALBERT models is released.\n\n- Base: [[Tar file](https://storage.googleapis.com/albert_models/albert_base_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_base/3)]\n- Large: [[Tar file](https://storage.googleapis.com/albert_models/albert_large_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_large/3)]\n- Xlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xlarge_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xlarge/3)]\n- Xxlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xxlarge_v2.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xxlarge/3)]\n\nIn this version, we apply 'no dropout', 'additional training data' and 'long training time' strategies to all models. We train ALBERT-base for 10M steps and other models for 3M steps.\n\nThe result comparison to the v1 models is as followings:\n\n|                | Average  | SQuAD1.1 | SQuAD2.0 | MNLI     | SST-2    | RACE     |\n|----------------|----------|----------|----------|----------|----------|----------|\n|V2              |\n|ALBERT-base     |82.3      |90.2/83.2 |82.1/79.3 |84.6      |92.9      |66.8      |\n|ALBERT-large    |85.7      |91.8/85.2 |84.9/81.8 |86.5      |94.9      |75.2      |\n|ALBERT-xlarge   |87.9      |92.9/86.4 |87.9/84.1 |87.9      |95.4      |80.7      |\n|ALBERT-xxlarge  |90.9      |94.6/89.1 |89.8/86.9 |90.6      |96.8      |86.8      |\n|V1              |\n|ALBERT-base     |80.1      |89.3/82.3 | 80.0/77.1|81.6      |90.3      | 64.0     |\n|ALBERT-large    |82.4      |90.6/83.9 | 82.3/79.4|83.5      |91.7      | 68.5     |\n|ALBERT-xlarge   |85.5      |92.5/86.1 | 86.1/83.1|86.4      |92.4      | 74.8     |\n|ALBERT-xxlarge  |91.0      |94.8/89.3 | 90.2/87.4|90.8      |96.9      | 86.5     |\n\nThe comparison shows that for ALBERT-base, ALBERT-large, and ALBERT-xlarge, v2 is much better than v1, indicating the importance of applying the above three strategies. On average, ALBERT-xxlarge is slightly worse than the v1, because of the following two reasons: 1) Training additional 1.5 M steps (the only difference between these two models is training for 1.5M steps and 3M steps) did not lead to significant performance improvement. 2) For v1, we did a little bit hyperparameter search among the parameters sets given by BERT, Roberta, and XLnet. For v2, we simply adopt the parameters from v1 except for RACE, where we use a learning rate of 1e-5 and 0 [ALBERT DR](https://arxiv.org/pdf/1909.11942.pdf) (dropout rate for ALBERT in finetuning). The original (v1) RACE hyperparameter will cause model divergence for v2 models. Given that the downstream tasks are sensitive to the fine-tuning hyperparameters, we should be careful about so called slight improvements.\n\nALBERT is \"A Lite\" version of BERT, a popular unsupervised language\nrepresentation learning algorithm. ALBERT uses parameter-reduction techniques\nthat allow for large-scale configurations, overcome previous memory limitations,\nand achieve better behavior with respect to model degradation.\n\nFor a technical description of the algorithm, see our paper:\n\n[ALBERT: A Lite BERT for Self-supervised Learning of Language Representations](https://arxiv.org/abs/1909.11942)\n\nZhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut\n\nRelease Notes\n=============\n\n- Initial release: 10/9/2019\n\nResults\n=======\n\nPerformance of ALBERT on GLUE benchmark results using a single-model setup on\ndev:\n\n| Models            | MNLI     | QNLI     | QQP      | RTE      | SST      | MRPC     | CoLA     | STS      |\n|-------------------|----------|----------|----------|----------|----------|----------|----------|----------|\n| BERT-large        | 86.6     | 92.3     | 91.3     | 70.4     | 93.2     | 88.0     | 60.6     | 90.0     |\n| XLNet-large       | 89.8     | 93.9     | 91.8     | 83.8     | 95.6     | 89.2     | 63.6     | 91.8     |\n| RoBERTa-large     | 90.2     | 94.7     | **92.2** | 86.6     | 96.4     | **90.9** | 68.0     | 92.4     |\n| ALBERT (1M)       | 90.4     | 95.2     | 92.0     | 88.1     | 96.8     | 90.2     | 68.7     | 92.7     |\n| ALBERT (1.5M)     | **90.8** | **95.3** | **92.2** | **89.2** | **96.9** | **90.9** | **71.4** | **93.0** |\n\nPerformance of ALBERT-xxl on SQuaD and RACE benchmarks using a single-model\nsetup:\n\n|Models                    | SQuAD1.1 dev  | SQuAD2.0 dev  | SQuAD2.0 test | RACE test (Middle/High) |\n|--------------------------|---------------|---------------|---------------|-------------------------|\n|BERT-large                | 90.9/84.1     | 81.8/79.0     | 89.1/86.3     | 72.0 (76.6/70.1)        |\n|XLNet                     | 94.5/89.0     | 88.8/86.1     | 89.1/86.3     | 81.8 (85.5/80.2)        |\n|RoBERTa                   | 94.6/88.9     | 89.4/86.5     | 89.8/86.8     | 83.2 (86.5/81.3)        |\n|UPM                       | -             | -             | 89.9/87.2     | -                       |\n|XLNet + SG-Net Verifier++ | -             | -             | 90.1/87.2     | -                       |\n|ALBERT (1M)               | 94.8/89.2     | 89.9/87.2     | -             | 86.0 (88.2/85.1)        |\n|ALBERT (1.5M)             | **94.8/89.3** | **90.2/87.4** | **90.9/88.1** | **86.5 (89.0/85.5)**    |\n\n\nPre-trained Models\n==================\nTF-Hub modules are available:\n\n- Base: [[Tar file](https://storage.googleapis.com/albert_models/albert_base_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_base/1)]\n- Large: [[Tar file](https://storage.googleapis.com/albert_models/albert_large_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_large/1)]\n- Xlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xlarge_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xlarge/1)]\n- Xxlarge: [[Tar file](https://storage.googleapis.com/albert_models/albert_xxlarge_v1.tar.gz)] [[TF-Hub](https://tfhub.dev/google/albert_xxlarge/1)]\n\nExample usage of the TF-Hub module in code:\n\n```\ntags = set()\nif is_training:\n  tags.add(\"train\")\nalbert_module = hub.Module(\"https://tfhub.dev/google/albert_base/1\", tags=tags,\n                           trainable=True)\nalbert_inputs = dict(\n    input_ids=input_ids,\n    input_mask=input_mask,\n    segment_ids=segment_ids)\nalbert_outputs = albert_module(\n    inputs=albert_inputs,\n    signature=\"tokens\",\n    as_dict=True)\n\n# If you want to use the token-level output, use\n# albert_outputs[\"sequence_output\"] instead.\noutput_layer = albert_outputs[\"pooled_output\"]\n```\n\nMost of the fine-tuning scripts in this repository support TF-hub modules\nvia the `--albert_hub_module_handle` flag.\n\nPre-training Instructions\n=========================\nTo pretrain ALBERT, use `run_pretraining.py`:\n\n```\npip install -r albert/requirements.txt\npython -m albert.run_pretraining \\\n    --input_file=... \\\n    --output_dir=... \\\n    --init_checkpoint=... \\\n    --albert_config_file=... \\\n    --do_train \\\n    --do_eval \\\n    --train_batch_size=4096 \\\n    --eval_batch_size=64 \\\n    --max_seq_length=512 \\\n    --max_predictions_per_seq=20 \\\n    --optimizer='lamb' \\\n    --learning_rate=.00176 \\\n    --num_train_steps=125000 \\\n    --num_warmup_steps=3125 \\\n    --save_checkpoints_steps=5000\n```\n\nFine-tuning on GLUE\n===================\nTo fine-tune and evaluate a pretrained ALBERT on GLUE, please see the\nconvenience script `run_glue.sh`.\n\nLower-level use cases may want to use the `run_classifier.py` script directly.\nThe `run_classifier.py` script is used both for fine-tuning and evaluation of\nALBERT on individual GLUE benchmark tasks, such as MNLI:\n\n```\npip install -r albert/requirements.txt\npython -m albert.run_classifier \\\n  --data_dir=... \\\n  --output_dir=... \\\n  --init_checkpoint=... \\\n  --albert_config_file=... \\\n  --spm_model_file=... \\\n  --do_train \\\n  --do_eval \\\n  --do_predict \\\n  --do_lower_case \\\n  --max_seq_length=128 \\\n  --optimizer=adamw \\\n  --task_name=MNLI \\\n  --warmup_step=1000 \\\n  --learning_rate=3e-5 \\\n  --train_step=10000 \\\n  --save_checkpoints_steps=100 \\\n  --train_batch_size=128\n```\n\nGood default flag values for each GLUE task can be found in `run_glue.sh`.\n\nYou can fine-tune the model starting from TF-Hub modules instead of raw\ncheckpoints by setting e.g.\n`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead\nof `--init_checkpoint`.\n\nYou can find the spm_model_file in the tar files or under the assets folder of\nthe tf-hub module. The name of the model file is \"30k-clean.model\".\n\nAfter evaluation, the script should report some output like this:\n\n```\n***** Eval results *****\n  global_step = ...\n  loss = ...\n  masked_lm_accuracy = ...\n  masked_lm_loss = ...\n  sentence_order_accuracy = ...\n  sentence_order_loss = ...\n```\n\nFine-tuning on SQuAD\n====================\nTo fine-tune and evaluate a pretrained model on SQuAD v1, use the\n`run_squad_v1.py` script:\n\n```\npip install -r albert/requirements.txt\npython -m albert.run_squad_v1 \\\n  --albert_config_file=... \\\n  --output_dir=... \\\n  --train_file=... \\\n  --predict_file=... \\\n  --train_feature_file=... \\\n  --predict_feature_file=... \\\n  --predict_feature_left_file=... \\\n  --init_checkpoint=... \\\n  --spm_model_file=... \\\n  --do_lower_case \\\n  --max_seq_length=384 \\\n  --doc_stride=128 \\\n  --max_query_length=64 \\\n  --do_train=true \\\n  --do_predict=true \\\n  --train_batch_size=48 \\\n  --predict_batch_size=8 \\\n  --learning_rate=5e-5 \\\n  --num_train_epochs=2.0 \\\n  --warmup_proportion=.1 \\\n  --save_checkpoints_steps=5000 \\\n  --n_best_size=20 \\\n  --max_answer_length=30\n```\n\nYou can fine-tune the model starting from TF-Hub modules instead of raw\ncheckpoints by setting e.g.\n`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead\nof `--init_checkpoint`.\n\nFor SQuAD v2, use the `run_squad_v2.py` script:\n\n```\npip install -r albert/requirements.txt\npython -m albert.run_squad_v2 \\\n  --albert_config_file=... \\\n  --output_dir=... \\\n  --train_file=... \\\n  --predict_file=... \\\n  --train_feature_file=... \\\n  --predict_feature_file=... \\\n  --predict_feature_left_file=... \\\n  --init_checkpoint=... \\\n  --spm_model_file=... \\\n  --do_lower_case \\\n  --max_seq_length=384 \\\n  --doc_stride=128 \\\n  --max_query_length=64 \\\n  --do_train \\\n  --do_predict \\\n  --train_batch_size=48 \\\n  --predict_batch_size=8 \\\n  --learning_rate=5e-5 \\\n  --num_train_epochs=2.0 \\\n  --warmup_proportion=.1 \\\n  --save_checkpoints_steps=5000 \\\n  --n_best_size=20 \\\n  --max_answer_length=30\n```\n\nYou can fine-tune the model starting from TF-Hub modules instead of raw\ncheckpoints by setting e.g.\n`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead\nof `--init_checkpoint`.\n\nFine-tuning on RACE\n===================\nFor RACE, use the `run_race.py` script:\n\n```\npip install -r albert/requirements.txt\npython -m albert.run_race \\\n  --albert_config_file=... \\\n  --output_dir=... \\\n  --train_file=... \\\n  --eval_file=... \\\n  --data_dir=...\\\n  --init_checkpoint=... \\\n  --spm_model_file=... \\\n  --max_seq_length=512 \\\n  --max_qa_length=128 \\\n  --do_train \\\n  --do_eval \\\n  --train_batch_size=32 \\\n  --eval_batch_size=8 \\\n  --learning_rate=1e-5 \\\n  --train_step=12000 \\\n  --warmup_step=1000 \\\n  --save_checkpoints_steps=100\n```\n\nYou can fine-tune the model starting from TF-Hub modules instead of raw\ncheckpoints by setting e.g.\n`--albert_hub_module_handle=https://tfhub.dev/google/albert_base/1` instead\nof `--init_checkpoint`.\n\nSentencePiece\n=============\nCommand for generating the sentence piece vocabulary:\n\n```\nspm_train \\\n--input all.txt --model_prefix=30k-clean --vocab_size=30000 --logtostderr\n--pad_id=0 --unk_id=1 --eos_id=-1 --bos_id=-1\n--control_symbols=[CLS],[SEP],[MASK]\n--user_defined_symbols=\"(,),\\\",-,.,–,£,€\"\n--shuffle_input_sentence=true --input_sentence_size=10000000\n--character_coverage=0.99995 --model_type=unigram\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2FALBERT","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogle-research%2FALBERT","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogle-research%2FALBERT/lists"}