{"id":13534962,"url":"https://github.com/brightmart/albert_zh","last_synced_at":"2025-05-14T21:06:57.841Z","repository":{"id":37725607,"uuid":"211137351","full_name":"brightmart/albert_zh","owner":"brightmart","description":"A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型","archived":false,"fork":false,"pushed_at":"2022-11-21T08:02:01.000Z","size":2095,"stargazers_count":3966,"open_issues_count":101,"forks_count":752,"subscribers_count":101,"default_branch":"master","last_synced_at":"2025-04-13T17:46:50.738Z","etag":null,"topics":["albert","bert","chinese-corpus","pre-trained","pre-trained-model","pytorch","roberta","tensorflow","xlnet"],"latest_commit_sha":null,"homepage":"https://arxiv.org/pdf/1909.11942.pdf","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/brightmart.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-26T16:45:30.000Z","updated_at":"2025-04-04T00:35:50.000Z","dependencies_parsed_at":"2022-07-09T00:30:18.788Z","dependency_job_id":null,"html_url":"https://github.com/brightmart/albert_zh","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Falbert_zh","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Falbert_zh/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Falbert_zh/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/brightmart%2Falbert_zh/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/brightmart","download_url":"https://codeload.github.com/brightmart/albert_zh/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254227612,"owners_count":22035669,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["albert","bert","chinese-corpus","pre-trained","pre-trained-model","pytorch","roberta","tensorflow","xlnet"],"created_at":"2024-08-01T08:00:47.664Z","updated_at":"2025-05-14T21:06:52.822Z","avatar_url":"https://github.com/brightmart.png","language":"Python","funding_links":[],"categories":["Pretrained Language Model","improvement over BERT:","Python","BERT优化"],"sub_categories":["Repository","大语言对话模型及数据"],"readme":"# albert_zh\n\nAn Implementation of \u003ca href=\"https://arxiv.org/pdf/1909.11942.pdf\"\u003eA Lite Bert For Self-Supervised Learning Language Representations\u003c/a\u003e with TensorFlow\n\nALBert is based on Bert, but with some improvements. It achieves state of the art performance on main benchmarks with 30% parameters less. \n\nFor albert_base_zh it only has ten percentage parameters compare of original bert model, and main accuracy is retained. \n\n\nDifferent version of ALBERT pre-trained model for Chinese, including TensorFlow, PyTorch and Keras, is available now.\n\n海量中文语料上预训练ALBERT模型：参数更少，效果更好。预训练小模型也能拿下13项NLP任务，ALBERT三大改造登顶GLUE基准\n\n\u003ca href='https://www.cluebenchmarks.com/clueai.html'\u003eclueai工具包: 三行代码，三分钟定制一个NLP的API（零样本学习）\u003c/a\u003e\n\n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_tiny_compare_s.jpg\"  width=\"90%\" height=\"70%\" /\u003e\n\n一键运行10个数据集、9个基线模型、不同任务上模型效果的详细对比，见\u003ca href=\"http://www.CLUEbenchmarks.com\"\u003eCLUE benchmark\u003c/a\u003e\n\n一键运行CLUE中文任务：6个中文分类或句子对任务（新）\n---------------------------------------------------------------------\n    使用方式：\n    1、克隆项目\n       git clone https://github.com/brightmart/albert_zh.git\n    2、运行一键运行脚本(GPU方式): 会自动下载模型和所有任务数据并开始运行。\n       bash run_classifier_clue.sh\n       执行该一键运行脚本将会自动下载所有任务数据，并为所有任务找到最优模型，然后测试得到提交结果\n    \n\n模型下载 Download Pre-trained Models of Chinese\n-----------------------------------------------\n1、\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_tiny.zip\"\u003ealbert_tiny_zh\u003c/a\u003e, \u003ca href=\"https://storage.googleapis.com/albert_zh/albert_tiny_489k.zip\"\u003ealbert_tiny_zh(训练更久，累积学习20亿个样本)\u003c/a\u003e，文件大小16M、参数为4M\n\n    训练和推理预测速度提升约10倍，精度基本保留，模型大小为bert的1/25；语义相似度数据集LCQMC测试集上达到85.4%，相比bert_base仅下降1.5个点。\n\n    lcqmc训练使用如下参数： --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4   --num_train_epochs=5 \n    \n    albert_tiny使用同样的大规模中文语料数据，层数仅为4层、hidden size等向量维度大幅减少; 尝试使用如下学习率来获得更好效果：{2e-5, 6e-5, 1e-4} \n    \n    【使用场景】任务相对比较简单一些或实时性要求高的任务，如语义相似度等句子对任务、分类任务；比较难的任务如阅读理解等，可以使用其他大模型。\n\n     例如，可以使用[Tensorflow Lite](https://www.tensorflow.org/lite)在移动端进行部署，本文[随后](#use_tflite)针对这一点进行了介绍，包括如何把模型转换成Tensorflow Lite格式和对其进行性能测试等。\n     \n     一键运行albert_tiny_zh(linux,lcqmc任务)：\n     1) git clone https://github.com/brightmart/albert_zh\n     2) cd albert_zh\n     3) bash run_classifier_lcqmc.sh\n1.1、\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_tiny_zh_google.zip\"\u003ealbert_tiny_google_zh(累积学习10亿个样本,google版本)\u003c/a\u003e，模型大小16M、性能与albert_tiny_zh一致\n\n1.2、\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_small_zh_google.zip\"\u003ealbert_small_google_zh(累积学习10亿个样本,google版本)\u003c/a\u003e，\n     \n     速度比bert_base快4倍；LCQMC测试集上比Bert下降仅0.9个点；去掉adam后模型大小18.5M；使用方法，见 #下游任务 Fine-tuning on Downstream Task     \n     \n2、\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_large_zh.zip\"\u003ealbert_large_zh\u003c/a\u003e,参数量，层数24，文件大小为64M\n   \n    参数量和模型大小为bert_base的六分之一；在口语化描述相似性数据集LCQMC的测试集上相比bert_base上升0.2个点\n\n3、\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_base_zh_additional_36k_steps.zip\"\u003ealbert_base_zh(额外训练了1.5亿个实例即 36k steps * batch_size 4096)\u003c/a\u003e; \u003ca href=\"https://storage.googleapis.com/albert_zh/albert_base_zh.zip\"\u003e albert_base_zh(小模型体验版)\u003c/a\u003e, 参数量12M, 层数12，大小为40M\n\n    参数量为bert_base的十分之一，模型大小也十分之一；在口语化描述相似性数据集LCQMC的测试集上相比bert_base下降约0.6~1个点；\n    相比未预训练，albert_base提升14个点\n\n4、\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_xlarge_zh_177k.zip\"\u003ealbert_xlarge_zh_177k \u003c/a\u003e; \n\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_xlarge_zh_183k.zip\"\u003ealbert_xlarge_zh_183k(优先尝试)\u003c/a\u003e参数量，层数24，文件大小为230M\n   \n    参数量和模型大小为bert_base的二分之一；需要一张大的显卡；完整测试对比将后续添加；batch_size不能太小，否则可能影响精度\n\n### 快速加载\n依托于[Huggingface-Transformers 2.2.2](https://github.com/huggingface/transformers)，可轻松调用以上模型。\n```\ntokenizer = AutoTokenizer.from_pretrained(\"MODEL_NAME\")\nmodel = AutoModel.from_pretrained(\"MODEL_NAME\")\n```\n\n其中`MODEL_NAME`对应列表如下：\n\n| 模型名 | MODEL_NAME |\n| - | - |\n| albert_tiny_google_zh | voidful/albert_chinese_tiny |\n| albert_small_google_zh | voidful/albert_chinese_small  |\n| albert_base_zh (from google) | voidful/albert_chinese_base   |\n| albert_large_zh (from google) | voidful/albert_chinese_large   |\n| albert_xlarge_zh (from google) | voidful/albert_chinese_xlarge   |\n| albert_xxlarge_zh (from google) | voidful/albert_chinese_xxlarge   |\n\n更多通过transformers使用albert的\u003ca href='https://huggingface.co/models?search=albert_chinese'\u003e示例\u003c/a\u003e\n\n预训练 Pre-training\n-----------------------------------------------\n\n#### 生成特定格式的文件(tfrecords) Generate tfrecords Files\n\nRun following command 运行以下命令即可。项目自动了一个示例的文本文件(data/news_zh_1.txt)\n   \n       bash create_pretrain_data.sh\n   \n如果你有很多文本文件，可以通过传入参数的方式，生成多个特定格式的文件(tfrecords）\n\n###### Support English and Other Non-Chinese Language: \n    If you are doing pre-train for english or other language,which is not chinese, \n    you should set hyperparameter of non_chinese to True on create_pretraining_data.py; \n    otherwise, by default it is doing chinese pre-train using whole word mask of chinese.\n\n#### 执行预训练 pre-training on GPU/TPU using the command\n    GPU(brightmart版, tiny模型):\n    export BERT_BASE_DIR=./albert_tiny_zh\n    nohup python3 run_pretraining.py --input_file=./data/tf*.tfrecord  \\\n    --output_dir=./my_new_model_path --do_train=True --do_eval=True --bert_config_file=$BERT_BASE_DIR/albert_config_tiny.json \\\n    --train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=51 \\\n    --num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176    \\\n    --save_checkpoints_steps=2000  --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt \u0026\n    \n    GPU(Google版本, small模型):\n    export BERT_BASE_DIR=./albert_small_zh_google\n    nohup python3 run_pretraining_google.py --input_file=./data/tf*.tfrecord --eval_batch_size=64 \\\n    --output_dir=./my_new_model_path --do_train=True --do_eval=True --albert_config_file=$BERT_BASE_DIR/albert_config_small_google.json  --export_dir=./my_new_model_path_export \\\n    --train_batch_size=4096 --max_seq_length=512 --max_predictions_per_seq=20 \\\n    --num_train_steps=125000 --num_warmup_steps=12500 --learning_rate=0.00176   \\\n    --save_checkpoints_steps=2000 --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt\n    \n    TPU, add something like this:\n        --use_tpu=True  --tpu_name=grpc://10.240.1.66:8470 --tpu_zone=us-central1-a\n        \n    注：如果你重头开始训练，可以不指定init_checkpoint；\n    如果你从现有的模型基础上训练，指定一下BERT_BASE_DIR的路径，并确保bert_config_file和init_checkpoint两个参数的值能对应到相应的文件上；\n    领域上的预训练，根据数据的大小，可以不用训练特别久。\n\n环境 Environment\n-----------------------------------------------\nUse Python3 + Tensorflow 1.x \n\ne.g. Tensorflow 1.4 or 1.5\n\n\n下游任务 Fine-tuning on Downstream Task\n-----------------------------------------------\n##### 使用TensorFlow:\n\n以使用albert_base做LCQMC任务为例。LCQMC任务是在口语化描述的数据集上做文本的相似性预测。\n\nWe will use LCQMC dataset for fine-tuning, it is oral language corpus, it is used to train and predict semantic similarity of a pair of sentences.\n\n下载\u003ca href=\"https://drive.google.com/open?id=1HXYMqsXjmA5uIfu_SFqP7r_vZZG-m_H0\"\u003eLCQMC\u003c/a\u003e数据集，包含训练、验证和测试集，训练集包含24万口语化描述的中文句子对，标签为1或0。1为句子语义相似，0为语义不相似。\n\n通过运行下列命令做LCQMC数据集上的fine-tuning:\n    \n    1. Clone this project:\n          \n          git clone https://github.com/brightmart/albert_zh.git\n          \n    2. Fine-tuning by running the following command.\n        brightmart版本的tiny模型\n        export BERT_BASE_DIR=./albert_tiny_zh\n        export TEXT_DIR=./lcqmc\n        nohup python3 run_classifier.py   --task_name=lcqmc_pair   --do_train=true   --do_eval=true   --data_dir=$TEXT_DIR   --vocab_file=./albert_config/vocab.txt  \\\n        --bert_config_file=./albert_config/albert_config_tiny.json --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4  --num_train_epochs=5 \\\n        --output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt \u0026\n        \n        google版本的small模型\n        export BERT_BASE_DIR=./albert_small_zh\n        export TEXT_DIR=./lcqmc\n        nohup python3 run_classifier_sp_google.py --task_name=lcqmc_pair   --do_train=true   --do_eval=true   --data_dir=$TEXT_DIR   --vocab_file=./albert_config/vocab.txt  \\\n        --albert_config_file=./$BERT_BASE_DIR/albert_config_small_google.json --max_seq_length=128 --train_batch_size=64   --learning_rate=1e-4   --num_train_epochs=5 \\\n        --output_dir=./albert_lcqmc_checkpoints --init_checkpoint=$BERT_BASE_DIR/albert_model.ckpt \u0026\n\n    Notice/注：\n        1) you need to download pre-trained chinese albert model, and also download LCQMC dataset \n        你需要下载预训练的模型，并放入到项目当前项目，假设目录名称为albert_tiny_zh; 需要下载LCQMC数据集，并放入到当前项目，\n        假设数据集目录名称为lcqmc\n\n        2) for Fine-tuning, you can try to add small percentage of dropout(e.g. 0.1) by changing parameters of \n          attention_probs_dropout_prob \u0026 hidden_dropout_prob on albert_config_xxx.json. By default, we set dropout as zero. \n        \n        3) you can try different learning rate {2e-5, 6e-5, 1e-4} for better performance \n\n\nUpdates\n-----------------------------------------------\n**\\*\\*\\*\\*\\* 2019-11-03: add google version of albert_small, albert_tiny; \n\nadd method to deploy ablert_tiny to mobile devices with only 0.1 second inference time for sequence length 128, 60M memory \\*\\*\\*\\*\\***\n\n**\\*\\*\\*\\*\\* 2019-10-30: add a simple guide about converting the model to Tensorflow Lite for edge deployment \\*\\*\\*\\*\\***\n\n**\\*\\*\\*\\*\\* 2019-10-15: albert_tiny_zh, 10 times fast than bert base for training and inference, accuracy remains \\*\\*\\*\\*\\***\n\n**\\*\\*\\*\\*\\* 2019-10-07: more models of albert \\*\\*\\*\\*\\***\n\nadd albert_xlarge_zh; albert_base_zh_additional_steps, training with more instances\n\n**\\*\\*\\*\\*\\* 2019-10-04: PyTorch and Keras versions of albert were supported \\*\\*\\*\\*\\***\n\na.Convert to PyTorch version and do your tasks through \u003ca href=\"https://github.com/lonePatient/albert_pytorch\"\u003ealbert_pytorch\u003c/a\u003e\n\nb.Load pre-trained model with keras using one line of codes through \u003ca href=\"https://github.com/bojone/bert4keras\"\u003ebert4keras\u003c/a\u003e\n\nc.Use albert with TensorFlow 2.0: Use or load pre-trained model with tf2.0 through \u003ca href=\"https://github.com/kpe/bert-for-tf2\"\u003ebert-for-tf2\u003c/a\u003e\n\nReleasing albert_xlarge on 6th Oct\n\n**\\*\\*\\*\\*\\* 2019-10-02: albert_large_zh,albert_base_zh \\*\\*\\*\\*\\***\n\nRelesed albert_base_zh with only 10% parameters of bert_base, a small model(40M) \u0026 training can be very fast. \n\nRelased albert_large_zh with only 16% parameters of bert_base(64M)\n\n**\\*\\*\\*\\*\\* 2019-09-28: codes and test functions \\*\\*\\*\\*\\*** \n\nAdd codes and test functions for three main changes of albert from bert\n\nALBERT模型介绍 Introduction of ALBERT\n-----------------------------------------------\nALBERT模型是BERT的改进版，与最近其他State of the art的模型不同的是，这次是预训练小模型，效果更好、参数更少。\n\n它对BERT进行了三个改造 Three main changes of ALBert from Bert：\n\n1）词嵌入向量参数的因式分解 Factorized embedding parameterization\n   \n     O(V * H) to O(V * E + E * H)\n     \n     如以ALBert_xxlarge为例，V=30000, H=4096, E=128\n       \n     那么原先参数为V * H= 30000 * 4096 = 1.23亿个参数，现在则为V * E + E * H = 30000*128+128*4096 = 384万 + 52万 = 436万，\n       \n     词嵌入相关的参数变化前是变换后的28倍。\n\n\n2）跨层参数共享 Cross-Layer Parameter Sharing\n\n     参数共享能显著减少参数。共享可以分为全连接层、注意力层的参数共享；注意力层的参数对效果的减弱影响小一点。\n\n3）段落连续性任务 Inter-sentence coherence loss.\n     \n     使用段落连续性任务。正例，使用从一个文档中连续的两个文本段落；负例，使用从一个文档中连续的两个文本段落，但位置调换了。\n     \n     避免使用原有的NSP任务，原有的任务包含隐含了预测主题这类过于简单的任务。\n\n      We maintain that inter-sentence modeling is an important aspect of language understanding, but we propose a loss \n      based primarily on coherence. That is, for ALBERT, we use a sentence-order prediction (SOP) loss, which avoids topic \n      prediction and instead focuses on modeling inter-sentence coherence. The SOP loss uses as positive examples the \n      same technique as BERT (two consecutive segments from the same document), and as negative examples the same two \n      consecutive segments but with their order swapped. This forces the model to learn finer-grained distinctions about\n      discourse-level coherence properties. \n\n其他变化，还有 Other changes：\n\n    1）去掉了dropout  Remove dropout to enlarge capacity of model.\n        最大的模型，训练了1百万步后，还是没有过拟合训练数据。说明模型的容量还可以更大，就移除了dropout\n        （dropout可以认为是随机的去掉网络中的一部分，同时使网络变小一些）\n        We also note that, even after training for 1M steps, our largest models still do not overfit to their training data. \n        As a result, we decide to remove dropout to further increase our model capacity.\n        其他型号的模型，在我们的实现中我们还是会保留原始的dropout的比例，防止模型对训练数据的过拟合。\n        \n    2）为加快训练速度，使用LAMB做为优化器 Use LAMB as optimizer, to train with big batch size\n      使用了大的batch_size来训练(4096)。 LAMB优化器使得我们可以训练，特别大的批次batch_size，如高达6万。\n    \n    3）使用n-gram(uni-gram,bi-gram, tri-gram）来做遮蔽语言模型 Use n-gram as make language model\n       即以不同的概率使用n-gram,uni-gram的概率最大，bi-gram其次，tri-gram概率最小。\n       本项目中目前使用的是在中文上做whole word mask，稍后会更新一下与n-gram mask的效果对比。n-gram从spanBERT中来。\n\n\n训练语料/训练配置 Training Data \u0026 Configuration\n-----------------------------------------------\n30g中文语料，超过100亿汉字，包括多个百科、新闻、互动社区。\n\n预训练序列长度sequence_length设置为512，批次batch_size为4096，训练产生了3.5亿个训练数据(instance)；每一个模型默认会训练125k步，albert_xxlarge将训练更久。\n\n作为比较，roberta_zh预训练产生了2.5亿个训练数据、序列长度为256。由于albert_zh预训练生成的训练数据更多、使用的序列长度更长，\n \n    我们预计albert_zh会有比roberta_zh更好的性能表现，并且能更好处理较长的文本。\n\n训练使用TPU v3 Pod，我们使用的是v3-256，它包含32个v3-8。每个v3-8机器，含有128G的显存。\n\n\n模型性能与对比(英文) Performance and Comparision\n-----------------------------------------------    \n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/state_of_the_art.jpg\"  width=\"80%\" height=\"40%\" /\u003e\n  \n   \n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_performance.jpg\"  width=\"80%\" height=\"40%\" /\u003e\n\n\n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/add_data_removing_dropout.jpg\"  width=\"80%\" height=\"40%\" /\u003e\n\n\n中文任务集上效果对比测试 Performance on Chinese datasets\n----------------------------------------------- \n\n###  问题匹配语任务：LCQMC(Sentence Pair Matching)\n\n| 模型 | 开发集(Dev) | 测试集(Test) |\n| :------- | :---------: | :---------: |\n| BERT | 89.4(88.4) | 86.9(86.4) | \n| ERNIE | 89.8 (89.6) | 87.2 (87.0) | \n| BERT-wwm |89.4 (89.2) | 87.0 (86.8) | \n| BERT-wwm-ext | - |-  |\n| RoBERTa-zh-base | 88.7 | 87.0  |\n| RoBERTa-zh-Large | ***89.9(89.6)*** | 87.2(86.7) |\n| RoBERTa-zh-Large(20w_steps) | 89.7| 87.0 |\n| ALBERT-zh-tiny | -- | 85.4 |\n| ALBERT-zh-small | -- | 86.0 |\n| ALBERT-zh-small(Pytorch) | -- | 86.8 |\n| ALBERT-zh-base-additional-36k-steps | 87.8 | 86.3 |\n| ALBERT-zh-base | 87.2 | 86.3 |\n| ALBERT-large | 88.7 | 87.1 |\n| ALBERT-xlarge | 87.3 | ***87.7*** |\n\n注：只跑了一次ALBERT-xlarge，效果还可能提升\n\n### 自然语言推断：XNLI of Chinese Version\n\n| 模型 | 开发集 | 测试集 |\n| :------- | :---------: | :---------: |\n| BERT | 77.8 (77.4) | 77.8 (77.5) | \n| ERNIE | 79.7 (79.4) | 78.6 (78.2) | \n| BERT-wwm | 79.0 (78.4) | 78.2 (78.0) | \n| BERT-wwm-ext | 79.4 (78.6) | 78.7 (78.3) |\n| XLNet | 79.2  | 78.7 |\n| RoBERTa-zh-base | 79.8 |78.8  |\n| RoBERTa-zh-Large | 80.2 (80.0) | 79.9 (79.5) |\n| ALBERT-base | 77.0 | 77.1 |\n| ALBERT-large | 78.0 | 77.5 |\n| ALBERT-xlarge | ? | ? |\n\n注：BERT-wwm-ext来自于\u003ca href=\"https://github.com/ymcui/Chinese-BERT-wwm\"\u003e这里\u003c/a\u003e；XLNet来自于\u003ca href=\"https://github.com/ymcui/Chinese-PreTrained-XLNet\"\u003e这里\u003c/a\u003e; RoBERTa-zh-base，指12层RoBERTa中文模型\n   \n\n###  阅读理解任务：CRMC2018\n\n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/crmc2018_compare_s.jpg\"  width=\"90%\" height=\"70%\" /\u003e\n\n\n### 语言模型、文本段预测准确性、训练时间 Mask Language Model Accuarcy \u0026 Training Time\n\n| Model | MLM eval acc | SOP eval acc | Training(Hours) | Loss eval |\n| :------- | :---------: | :---------: | :---------: |:---------: |\n| albert_zh_base | 79.1% | 99.0% | 6h | 1.01|\n| albert_zh_large | 80.9% | 98.6% | 22.5h | 0.93|\n| albert_zh_xlarge | ? | ? | 53h(预估) | ? |\n| albert_zh_xxlarge | ? | ? | 106h(预估) | ? |\n\n注：? 将很快替换\n\n模型参数和配置 Configuration of Models\n-----------------------------------------------\n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_configuration.jpg\"  width=\"80%\" height=\"40%\" /\u003e\n\n代码实现和测试 Implementation and Code Testing\n-----------------------------------------------\n通过运行以下命令测试主要的改进点，包括但不限于词嵌入向量参数的因式分解、跨层参数共享、段落连续性任务等。\n\n    python test_changes.py\n\n##### \u003ca name=\"use_tflite\"\u003e\u003c/a\u003e使用TensorFlow Lite(TFLite)在移动端进行部署:\n这里我们主要介绍TFLite模型格式转换和性能测试。转换成TFLite模型后，对于如何在移\n动端使用该模型，可以参考TFLite提供的[Android/iOS应用完整开发案例教程页面](https://www.tensorflow.org/lite/examples)。\n该页面目前已经包含了[文本分类](https://github.com/tensorflow/examples/blob/master/lite/examples/text_classification/android)，\n[文本问答](https://github.com/tensorflow/examples/blob/master/lite/examples/bert_qa/android)两个Android案例。\n\n下面以\u003ca href=\"https://storage.googleapis.com/albert_zh/albert_tiny.zip\"\u003ealbert_tiny_zh\u003c/a\u003e\n为例来介绍TFLite模型格式转换和性能测试：\n\n1. Freeze graph from the checkpoint\n\nEnsure to have \u003e=1.14 1.x installed to use the freeze_graph tool as it is removed from 2.x distribution\n\n    pip install tensorflow==1.15\n\n    freeze_graph --input_checkpoint=./albert_model.ckpt \\\n      --output_graph=/tmp/albert_tiny_zh.pb \\\n      --output_node_names=cls/predictions/truediv \\\n      --checkpoint_version=1 --input_meta_graph=./albert_model.ckpt.meta --input_binary=true\n\n2. Convert to TFLite format\n\nWe are going to use the new experimental tf-\u003etflite converter that's distributed with the Tensorflow nightly build.\n\n    pip install tf-nightly\n\n    tflite_convert --graph_def_file=/tmp/albert_tiny_zh.pb \\\n      --input_arrays='input_ids,input_mask,segment_ids,masked_lm_positions,masked_lm_ids,masked_lm_weights' \\\n      --output_arrays='cls/predictions/truediv' \\\n      --input_shapes=1,128:1,128:128:1,128:1,128:1,128 \\\n      --output_file=/tmp/albert_tiny_zh.tflite \\\n      --enable_v1_converter --experimental_new_converter\n\n3. Benchmark the performance of the TFLite model\n\nSee [here](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/tools/benchmark) \nfor details about the performance benchmark tools in TFLite. For example: after\nbuilding the benchmark tool binary for an Android phone, do the following to\nget an idea of how the TFLite model performs on the phone\n\n    adb push /tmp/albert_tiny_zh.tflite /data/local/tmp/\n    adb shell /data/local/tmp/benchmark_model_performance_options --graph=/data/local/tmp/albert_tiny_zh.tflite --perf_options_list=cpu\n\nOn an Android phone w/ Qualcomm's SD845 SoC, via the above benchmark tool, as\nof 2019/11/01, the inference latency is ~120ms w/ this converted TFLite model\nusing 4 threads on CPU, and the memory usage is ~60MB for the model during\ninference. Note the performance will improve further with future TFLite\nimplementation optimizations.\n\n##### 使用PyTorch版本:\n\n    download pre-trained model, and convert to PyTorch using:\n     \n      python convert_albert_tf_checkpoint_to_pytorch.py     \n     \n   using \u003ca href=\"https://github.com/lonePatient/albert_pytorch\"\u003ealbert_pytorch\n   \n##### 使用Keras加载:\n\n\u003ca href=\"https://github.com/bojone/bert4keras\"\u003ebert4keras\u003c/a\u003e 适配albert，能成功加载albert_zh的权重，只需要在load_pretrained_model函数里加上albert=True\n\nload pre-trained model with bert4keras\n\n##### 使用tf2.0加载:\n\n\u003ca href=\"https://github.com/kpe/bert-for-tf2\"\u003ebert-for-tf2\u003c/a\u003e\n\n\n使用案例-基于用户输入预测文本相似性 Use Case-Text Similarity Based on User Input\n-------------------------------------------------\n\n功能说明：用户可以通过本例了解如何加载训训练集实现基于用户输入的短文本相似度判断。可以基于该代码将程序灵活地拓展为后台服务或增加文本分类等示例。\n\n涉及代码：similarity.py、args.py\n\n步骤：\n\n1、使用本模型进行文本相似性训练，保存模型文件至相应目录下\n\n2、根据实际情况，修改args.py中的参数，参数说明如下：\n\n```python\n#模型目录，存放ckpt文件\nmodel_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')\n\n#config文件，存放模型的json文件\nconfig_name = os.path.join(file_path, 'albert_config/albert_config_tiny.json')\n\n#ckpt文件名称\nckpt_name = os.path.join(model_dir, 'model.ckpt')\n\n#输出文件目录，训练时的模型输出目录\noutput_dir = os.path.join(file_path, 'albert_lcqmc_checkpoints/')\n\n#vocab文件目录\nvocab_file = os.path.join(file_path, 'albert_config/vocab.txt')\n\n#数据目录，训练使用的数据集存放目录\ndata_dir = os.path.join(file_path, 'data/')\n```\n\n本例中的文件结构为：\n\n    |__args.py\n    \n    |__similarity.py\n    \n    |__data\n    \n    |__albert_config\n    \n    |__albert_lcqmc_checkpoints\n    \n    |__lcqmc\n\n3、修改用户输入单词\n\n打开similarity.py，最底部如下代码：\n\n```python\nif __name__ == '__main__':\n    sim = BertSim()\n    sim.start_model()\n    sim.predict_sentences([(\"我喜欢妈妈做的汤\", \"妈妈做的汤我很喜欢喝\")])\n```\n\n其中sim.start_model()表示加载模型，sim.predict_sentences的输入为一个元组数组，元组中包含两个元素分别为需要判定相似的句子。\n\n4、运行python文件：similarity.py\n\n\n支持的序列长度与批次大小的关系,12G显存 Trade off between batch Size and sequence length\n-------------------------------------------------\n\nSystem       | Seq Length | Max Batch Size\n------------ | ---------- | --------------\n`albert-base`  | 64         | 64\n...          | 128        | 32\n...          | 256        | 16\n...          | 320        | 14\n...          | 384        | 12\n...          | 512        | 6\n`albert-large` | 64         | 12\n...          | 128        | 6\n...          | 256        | 2\n...          | 320        | 1\n...          | 384        | 0\n...          | 512        | 0\n`albert-xlarge` | -         | -\n\n学习曲线 Training Loss of xlarge of albert_zh\n-------------------------------------------------\n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/xlarge_loss.jpg\"  width=\"80%\" height=\"40%\" /\u003e\n\n所有的参数 Parameters of albert_xlarge\n-------------------------------------------------\n\u003cimg src=\"https://github.com/brightmart/albert_zh/blob/master/resources/albert_large_zh_parameters.jpg\"  width=\"80%\" height=\"40%\" /\u003e\n\n\n#### 技术交流与问题讨论QQ群: 836811304 Join us on QQ group\n\nIf you have any question, you can raise an issue, or send me an email: brightmart@hotmail.com;\n\nCurrently how to use PyTorch version of albert is not clear yet, if you know how to do that, just email us or open an issue.\n\nYou can also send pull request to report you performance on your task or add methods on how to load models for PyTorch and so on.\n\nIf you have ideas for generate best performance pre-training Chinese model, please also let me know.\n\n##### Research supported with Cloud TPUs from Google's TensorFlow Research Cloud (TFRC)\n\nCite Us\n-----------------------------------------------\nBright Liang Xu, albert_zh, (2019), GitHub repository, https://github.com/brightmart/albert_zh\n\nReference\n-----------------------------------------------\n1、\u003ca href=\"https://arxiv.org/pdf/1909.11942.pdf\"\u003eALBERT: A Lite BERT For Self-Supervised Learning Of Language Representations\u003c/a\u003e\n\n2、\u003ca href=\"https://arxiv.org/pdf/1810.04805.pdf\"\u003eBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding\u003c/a\u003e\n\n3、\u003ca href=\"https://arxiv.org/abs/1907.10529\"\u003eSpanBERT: Improving Pre-training by Representing and Predicting Spans\u003c/a\u003e\n\n4、\u003ca href=\"https://arxiv.org/pdf/1907.11692.pdf\"\u003eRoBERTa: A Robustly Optimized BERT Pretraining Approach\u003c/a\u003e\n\n5、\u003ca href=\"https://arxiv.org/pdf/1904.00962.pdf\"\u003eLarge Batch Optimization for Deep Learning: Training BERT in 76 minutes(LAMB)\u003c/a\u003e\n\n6、\u003ca href=\"https://github.com/ymcui/LAMB_Optimizer_TF\"\u003eLAMB Optimizer,TensorFlow version\u003c/a\u003e\n\n7、\u003ca href=\"http://baijiahao.baidu.com/s?id=1645712785366950083\u0026wfr=spider\u0026for=pc\"\u003e预训练小模型也能拿下13项NLP任务，ALBERT三大改造登顶GLUE基准\u003c/a\u003e\n\n8、 \u003ca href=\"https://github.com/lonePatient/albert_pytorch\"\u003ealbert_pytorch\u003c/a\u003e\n\n9、\u003ca href=\"https://github.com/bojone/bert4keras\"\u003eload albert with keras\u003c/a\u003e\n\n10、\u003ca href=\"https://github.com/kpe/bert-for-tf2\"\u003eload albert with tf2.0\u003c/a\u003e\n\n11、\u003ca href=\"https://github.com/google-research/google-research/tree/master/albert\"\u003erepo of albert from google\u003c/a\u003e\n\n12、\u003ca href=\"https://github.com/chineseGLUE/chineseGLUE\"\u003echineseGLUE-中文任务基准测评：公开可用多个任务、基线模型、广泛测评与效果对比\u003c/a\u003e\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrightmart%2Falbert_zh","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbrightmart%2Falbert_zh","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbrightmart%2Falbert_zh/lists"}