{"id":13488400,"url":"https://github.com/ymcui/Chinese-XLNet","last_synced_at":"2025-03-28T00:33:52.518Z","repository":{"id":37428582,"uuid":"196937919","full_name":"ymcui/Chinese-XLNet","owner":"ymcui","description":"Pre-Trained Chinese XLNet（中文XLNet预训练模型）","archived":false,"fork":false,"pushed_at":"2023-03-29T02:35:33.000Z","size":369,"stargazers_count":1648,"open_issues_count":2,"forks_count":281,"subscribers_count":31,"default_branch":"master","last_synced_at":"2025-03-20T04:45:21.433Z","etag":null,"topics":["natural-language-processing","nlp","pytorch","tensorflow","xlnet"],"latest_commit_sha":null,"homepage":"http://xlnet.hfl-rc.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ymcui.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-07-15T06:30:37.000Z","updated_at":"2025-03-14T08:24:35.000Z","dependencies_parsed_at":"2024-01-14T03:02:01.847Z","dependency_job_id":null,"html_url":"https://github.com/ymcui/Chinese-XLNet","commit_stats":null,"previous_names":["ymcui/chinese-pretrained-xlnet"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymcui%2FChinese-XLNet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymcui%2FChinese-XLNet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymcui%2FChinese-XLNet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ymcui%2FChinese-XLNet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ymcui","download_url":"https://codeload.github.com/ymcui/Chinese-XLNet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245949289,"owners_count":20698913,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["natural-language-processing","nlp","pytorch","tensorflow","xlnet"],"created_at":"2024-07-31T18:01:15.093Z","updated_at":"2025-03-28T00:33:47.492Z","avatar_url":"https://github.com/ymcui.png","language":"Python","readme":"[**中文说明**](./README.md) | [**English**](./README_EN.md)\n\n\u003cp align=\"center\"\u003e\n    \u003cbr\u003e\n    \u003cimg src=\"./pics/banner.png\" width=\"500\"/\u003e\n    \u003cbr\u003e\n\u003c/p\u003e\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/ymcui/Chinese-PreTrained-XLNet/blob/master/LICENSE\"\u003e\n        \u003cimg alt=\"GitHub\" src=\"https://img.shields.io/github/license/ymcui/Chinese-PreTrained-XLNet.svg?color=blue\u0026style=flat-square\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\n本项目提供了面向中文的XLNet预训练模型，旨在丰富中文自然语言处理资源，提供多元化的中文预训练模型选择。\n我们欢迎各位专家学者下载使用，并共同促进和发展中文资源建设。\n\n本项目基于CMU/谷歌官方的XLNet：https://github.com/zihangdai/xlnet\n\n----\n\n[中文LERT](https://github.com/ymcui/LERT) | [中英文PERT](https://github.com/ymcui/PERT) | [中文MacBERT](https://github.com/ymcui/MacBERT) | [中文ELECTRA](https://github.com/ymcui/Chinese-ELECTRA) | [中文XLNet](https://github.com/ymcui/Chinese-XLNet) | [中文BERT](https://github.com/ymcui/Chinese-BERT-wwm) | [知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer) | [模型裁剪工具TextPruner](https://github.com/airaria/TextPruner)\n\n查看更多哈工大讯飞联合实验室（HFL）发布的资源：https://github.com/ymcui/HFL-Anthology\n\n## 新闻\n**2023/3/28 开源了中文LLaMA\u0026Alpaca大模型，可快速在PC上部署体验，查看：https://github.com/ymcui/Chinese-LLaMA-Alpaca**\n\n2022/10/29 我们提出了一种融合语言学信息的预训练模型LERT。查看：https://github.com/ymcui/LERT\n\n2022/3/30 我们开源了一种新预训练模型PERT。查看：https://github.com/ymcui/PERT\n\n2021/12/17 哈工大讯飞联合实验室推出模型裁剪工具包TextPruner。查看：https://github.com/airaria/TextPruner\n\n2021/10/24 哈工大讯飞联合实验室发布面向少数民族语言的预训练模型CINO。查看：https://github.com/ymcui/Chinese-Minority-PLM\n\n2021/7/21 由哈工大SCIR多位学者撰写的[《自然语言处理：基于预训练模型的方法》](https://item.jd.com/13344628.html)已出版，欢迎大家选购。\n\n2021/1/27 所有模型已支持TensorFlow 2，请通过transformers库进行调用或下载。https://huggingface.co/hfl\n\n\u003cdetails\u003e\n\u003csummary\u003e历史新闻\u003c/summary\u003e\n2020/9/15 我们的论文[\"Revisiting Pre-Trained Models for Chinese Natural Language Processing\"](https://arxiv.org/abs/2004.13922)被[Findings of EMNLP](https://2020.emnlp.org)录用为长文。\n\n2020/8/27 哈工大讯飞联合实验室在通用自然语言理解评测GLUE中荣登榜首，查看[GLUE榜单](https://gluebenchmark.com/leaderboard)，[新闻](http://dwz.date/ckrD)。\n\n2020/3/11 为了更好地了解需求，邀请您填写[调查问卷](https://wj.qq.com/s2/5637766/6281)，以便为大家提供更好的资源。\n\n2020/2/26 哈工大讯飞联合实验室发布[知识蒸馏工具TextBrewer](https://github.com/airaria/TextBrewer)\n\n2019/12/19 本目录发布的模型已接入[Huggingface-Transformers](https://github.com/huggingface/transformers)，查看[快速加载](#快速加载)\n\n2019/9/5 `XLNet-base`已可下载，查看[模型下载](#模型下载)\n\n2019/8/19 提供了在大规模通用语料（5.4B词数）上训练的中文`XLNet-mid`模型，查看[模型下载](#模型下载)\n\u003c/details\u003e\n\n## 内容导引\n| 章节 | 描述 |\n|-|-|\n| [模型下载](#模型下载) | 提供了中文预训练XLNet下载地址 |\n| [基线系统效果](#基线系统效果) | 列举了部分基线系统效果 |\n| [预训练细节](#预训练细节) | 预训练细节的相关描述 |\n| [下游任务微调细节](#下游任务微调细节) | 下游任务微调细节的相关描述 |\n| [FAQ](#faq) | 常见问题答疑 |\n| [引用](#引用) | 本目录的技术报告 |\n\n## 模型下载\n* **`XLNet-mid`**：24-layer, 768-hidden, 12-heads, 209M parameters\n* **`XLNet-base`**：12-layer, 768-hidden, 12-heads, 117M parameters  \n\n| 模型简称 | 语料 | Google下载 | 百度网盘下载 |\n| :------- | :--------- | :---------: | :---------: |\n| **`XLNet-mid, Chinese`** | **中文维基+\u003cbr/\u003e通用数据\u003csup\u003e[1]\u003c/sup\u003e** | **[TensorFlow](https://drive.google.com/open?id=1342uBc7ZmQwV6Hm6eUIN_OnBSz1LcvfA)** \u003cbr/\u003e**[PyTorch](https://drive.google.com/open?id=1u-UmsJGy5wkXgbNK4w9uRnC0RxHLXhxy)** | **[TensorFlow（密码2jv2）](https://pan.baidu.com/s/1bWEhc5gJ-ZMH6SO4m4GVyw?pwd=2jv2)** |\n| **`XLNet-base, Chinese`** | **中文维基+\u003cbr/\u003e通用数据\u003csup\u003e[1]\u003c/sup\u003e** | **[TensorFlow](https://drive.google.com/open?id=1m9t-a4gKimbkP5rqGXXsEAEPhJSZ8tvx)** \u003cbr/\u003e**[PyTorch](https://drive.google.com/open?id=1mPDgcMfpqAf2wk9Nl8OaMj654pYrWXaR)** | **[TensorFlow（密码ge7w）](https://pan.baidu.com/s/14KNb5KMvixKACEzgdd4Ntg?pwd=ge7w)** |\n\n\u003e [1] 通用数据包括：百科、新闻、问答等数据，总词数达5.4B，与我们发布的[BERT-wwm-ext](https://github.com/ymcui/Chinese-BERT-wwm)训练语料相同。\n\n### PyTorch版本\n\n如需PyTorch版本，\n\n1）请自行通过[🤗Transformers](https://github.com/huggingface/transformers)提供的转换脚本进行转换。\n\n2）或者通过huggingface官网直接下载PyTorch版权重：https://huggingface.co/hfl\n\n方法：点击任意需要下载的model → 拉到最下方点击\"List all files in model\" → 在弹出的小框中下载bin和json文件。\n\n### 使用说明\n\n中国大陆境内建议使用百度网盘下载点，境外用户建议使用谷歌下载点，`XLNet-mid`模型文件大小约**800M**。 以TensorFlow版`XLNet-mid, Chinese`为例，下载完毕后对zip文件进行解压得到：\n\n```\nchinese_xlnet_mid_L-24_H-768_A-12.zip\n    |- xlnet_model.ckpt      # 模型权重\n    |- xlnet_model.meta      # 模型meta信息\n    |- xlnet_model.index     # 模型index信息\n    |- xlnet_config.json     # 模型参数\n    |- spiece.model          # 词表\n```\n\n### 快速加载\n依托于[Huggingface-Transformers 2.2.2](https://github.com/huggingface/transformers)，可轻松调用以上模型。\n```\ntokenizer = AutoTokenizer.from_pretrained(\"MODEL_NAME\")\nmodel = AutoModel.from_pretrained(\"MODEL_NAME\")\n```\n其中`MODEL_NAME`对应列表如下：  \n\n| 模型名 | MODEL_NAME |\n| - | - |\n| XLNet-mid | hfl/chinese-xlnet-mid |\n| XLNet-base | hfl/chinese-xlnet-base |\n\n\n## 基线系统效果\n为了对比基线效果，我们在以下几个中文数据集上进行了测试。对比了中文BERT、BERT-wwm、BERT-wwm-ext以及XLNet-base、XLNet-mid。\n其中中文BERT、BERT-wwm、BERT-wwm-ext结果取自[中文BERT-wwm项目](https://github.com/ymcui/Chinese-BERT-wwm)。\n时间及精力有限，并未能覆盖更多类别的任务，请大家自行尝试。\n\n**注意：为了保证结果的可靠性，对于同一模型，我们运行10遍（不同随机种子），汇报模型性能的最大值和平均值。不出意外，你运行的结果应该很大概率落在这个区间内。**\n\n**评测指标中，括号内表示平均值，括号外表示最大值。**\n\n### 简体中文阅读理解：CMRC 2018\n**[CMRC 2018数据集](https://github.com/ymcui/cmrc2018)**是哈工大讯飞联合实验室发布的中文机器阅读理解数据。\n根据给定问题，系统需要从篇章中抽取出片段作为答案，形式与SQuAD相同。\n评测指标为：EM / F1\n\n| 模型 | 开发集 | 测试集 | 挑战集 |\n| :------- | :---------: | :---------: | :---------: |\n| BERT | 65.5 (64.4) / 84.5 (84.0) | 70.0 (68.7) / 87.0 (86.3) | 18.6 (17.0) / 43.3 (41.3) |\n| BERT-wwm | 66.3 (65.0) / 85.6 (84.7) | 70.5 (69.1) / 87.4 (86.7) | 21.0 (19.3) / 47.0 (43.9) |\n| BERT-wwm-ext | **67.1** (65.6) / 85.7 (85.0) | **71.4 (70.0)** / 87.7 (87.0) | 24.0 (20.0) / 47.3 (44.6) |\n| **XLNet-base** | 65.2 (63.0) / 86.9  (85.9) | 67.0 (65.8) / 87.2 (86.8) | 25.0 (22.7) / 51.3 (49.5) |\n| **XLNet-mid** | 66.8 **(66.3) / 88.4 (88.1)** | 69.3 (68.5) / **89.2 (88.8)** | **29.1 (27.1) / 55.8 (54.9)** |\n\n\n### 繁体中文阅读理解：DRCD\n**[DRCD数据集](https://github.com/DRCKnowledgeTeam/DRCD)**由中国台湾台达研究院发布，其形式与SQuAD相同，是基于繁体中文的抽取式阅读理解数据集。\n评测指标为：EM / F1\n\n| 模型 | 开发集 | 测试集 |\n| :------- | :---------: | :---------: |\n| BERT | 83.1 (82.7) / 89.9 (89.6) | 82.2 (81.6) / 89.2 (88.8) |\n| BERT-wwm | 84.3 (83.4) / 90.5 (90.2) | 82.8 (81.8) / 89.7 (89.0) |\n| BERT-wwm-ext | 85.0 (84.5) / 91.2 (90.9) | 83.6 (83.0) / 90.4 (89.9) |\n| **XLNet-base** | 83.8 (83.2) / 92.3 (92.0) | 83.5 (82.8) / 92.2 (91.8) |\n| **XLNet-mid** | **85.3 (84.9) / 93.5 (93.3)** | **85.5 (84.8) / 93.6 (93.2)** |\n\n### 情感分类：ChnSentiCorp\n在情感分类任务中，我们使用的是ChnSentiCorp数据集。模型需要将文本分成`积极`, `消极`两个类别。\n评测指标为：Accuracy\n\n| 模型 | 开发集 | 测试集 |\n| :------- | :---------: | :---------: |\n| BERT | 94.7 (94.3) | 95.0 (94.7) |\n| BERT-wwm | 95.1 (94.5) | **95.4 (95.0)** |\n| **XLNet-base** | | |\n| **XLNet-mid** | **95.8 (95.2)** | **95.4** (94.9) |\n\n## 预训练细节\n以下以`XLNet-mid`模型为例，对预训练细节进行说明。\n\n### 生成词表\n按照XLNet官方教程步骤，首先需要使用[Sentence Piece](https://github.com/google/sentencepiece)生成词表。\n在本项目中，我们使用的词表大小为32000，其余参数采用官方示例中的默认配置。\n\n```\nspm_train \\\n\t--input=wiki.zh.txt \\\n\t--model_prefix=sp10m.cased.v3 \\\n\t--vocab_size=32000 \\\n\t--character_coverage=0.99995 \\\n\t--model_type=unigram \\\n\t--control_symbols=\\\u003ccls\\\u003e,\\\u003csep\\\u003e,\\\u003cpad\\\u003e,\\\u003cmask\\\u003e,\\\u003ceod\\\u003e \\\n\t--user_defined_symbols=\\\u003ceop\\\u003e,.,\\(,\\),\\\",-,–,£,€ \\\n\t--shuffle_input_sentence \\\n\t--input_sentence_size=10000000\n```\n\n### 生成tf_records\n生成词表后，开始利用原始文本语料生成训练用的tf_records文件。\n原始文本的构造方式与原教程相同：\n- 每行都是一个句子\n- 空行代表文档末尾\n\n以下是生成数据时的命令（`num_task`与`task`请根据实际切片数量进行设置）：\n```\nSAVE_DIR=./output_b32\nINPUT=./data/*.proc.txt\n\npython data_utils.py \\\n\t--bsz_per_host=32 \\\n\t--num_core_per_host=8 \\\n\t--seq_len=512 \\\n\t--reuse_len=256 \\\n\t--input_glob=${INPUT} \\\n\t--save_dir=${SAVE_DIR} \\\n\t--num_passes=20 \\\n\t--bi_data=True \\\n\t--sp_path=spiece.model \\\n\t--mask_alpha=6 \\\n\t--mask_beta=1 \\\n\t--num_predict=85 \\\n\t--uncased=False \\\n\t--num_task=10 \\\n\t--task=1\n```\n\n### 预训练\n获得以上数据后，正式开始预训练XLNet。\n之所以叫`XLNet-mid`是因为仅相比`XLNet-base`增加了层数（12层增加到24层），其余参数没有变动，主要因为计算设备受限。\n使用的命令如下：\n```\nDATA=YOUR_GS_BUCKET_PATH_TO_TFRECORDS\nMODEL_DIR=YOUR_OUTPUT_MODEL_PATH\nTPU_NAME=v3-xlnet\nTPU_ZONE=us-central1-b\n\npython train.py \\\n\t--record_info_dir=$DATA \\\n\t--model_dir=$MODEL_DIR \\\n\t--train_batch_size=32 \\\n\t--seq_len=512 \\\n\t--reuse_len=256 \\\n\t--mem_len=384 \\\n\t--perm_size=256 \\\n\t--n_layer=24 \\\n\t--d_model=768 \\\n\t--d_embed=768 \\\n\t--n_head=12 \\\n\t--d_head=64 \\\n\t--d_inner=3072 \\\n\t--untie_r=True \\\n\t--mask_alpha=6 \\\n\t--mask_beta=1 \\\n\t--num_predict=85 \\\n\t--uncased=False \\\n\t--train_steps=2000000 \\\n\t--save_steps=20000 \\\n\t--warmup_steps=20000 \\\n\t--max_save=20 \\\n\t--weight_decay=0.01 \\\n\t--adam_epsilon=1e-6 \\\n\t--learning_rate=1e-4 \\\n\t--dropout=0.1 \\\n\t--dropatt=0.1 \\\n\t--tpu=$TPU_NAME \\\n\t--tpu_zone=$TPU_ZONE \\\n\t--use_tpu=True\n```\n\n## 下游任务微调细节\n下游任务微调使用的设备是谷歌Cloud TPU v2（64G HBM），以下简要说明各任务精调时的配置。\n如果你使用GPU进行精调，请更改相应参数以适配，尤其是`batch_size`, `learning_rate`等参数。\n**相关代码请查看`src`目录。**\n\n### CMRC 2018\n对于阅读理解任务，首先需要生成tf_records数据。\n请参考XLNet官方教程之[SQuAD 2.0处理方法](https://github.com/zihangdai/xlnet#squad20)，在这里不再赘述。\n以下是CMRC 2018中文机器阅读理解任务中使用的脚本参数：\n```\nXLNET_DIR=YOUR_GS_BUCKET_PATH_TO_XLNET\nMODEL_DIR=YOUR_OUTPUT_MODEL_PATH\nDATA_DIR=YOUR_DATA_DIR_TO_TFRECORDS\nRAW_DIR=YOUR_RAW_DATA_DIR\nTPU_NAME=v2-xlnet\nTPU_ZONE=us-central1-b\n\npython -u run_cmrc_drcd.py \\\n\t--spiece_model_file=./spiece.model \\\n\t--model_config_path=${XLNET_DIR}/xlnet_config.json \\\n\t--init_checkpoint=${XLNET_DIR}/xlnet_model.ckpt \\\n\t--tpu_zone=${TPU_ZONE} \\\n\t--use_tpu=True \\\n\t--tpu=${TPU_NAME} \\\n\t--num_hosts=1 \\\n\t--num_core_per_host=8 \\\n\t--output_dir=${DATA_DIR} \\\n\t--model_dir=${MODEL_DIR} \\\n\t--predict_dir=${MODEL_DIR}/eval \\\n\t--train_file=${DATA_DIR}/cmrc2018_train.json \\\n\t--predict_file=${DATA_DIR}/cmrc2018_dev.json \\\n\t--uncased=False \\\n\t--max_answer_length=40 \\\n\t--max_seq_length=512 \\\n\t--do_train=True \\\n\t--train_batch_size=16 \\\n\t--do_predict=True \\\n\t--predict_batch_size=16 \\\n\t--learning_rate=3e-5 \\\n\t--adam_epsilon=1e-6 \\\n\t--iterations=1000 \\\n\t--save_steps=2000 \\\n\t--train_steps=2400 \\\n\t--warmup_steps=240\n```\n\n### DRCD\n以下是DRCD繁体中文机器阅读理解任务中使用的脚本参数：\n```\nXLNET_DIR=YOUR_GS_BUCKET_PATH_TO_XLNET\nMODEL_DIR=YOUR_OUTPUT_MODEL_PATH\nDATA_DIR=YOUR_DATA_DIR_TO_TFRECORDS\nRAW_DIR=YOUR_RAW_DATA_DIR\nTPU_NAME=v2-xlnet\nTPU_ZONE=us-central1-b\n\npython -u run_cmrc_drcd.py \\\n\t--spiece_model_file=./spiece.model \\\n\t--model_config_path=${XLNET_DIR}/xlnet_config.json \\\n\t--init_checkpoint=${XLNET_DIR}/xlnet_model.ckpt \\\n\t--tpu_zone=${TPU_ZONE} \\\n\t--use_tpu=True \\\n\t--tpu=${TPU_NAME} \\\n\t--num_hosts=1 \\\n\t--num_core_per_host=8 \\\n\t--output_dir=${DATA_DIR} \\\n\t--model_dir=${MODEL_DIR} \\\n\t--predict_dir=${MODEL_DIR}/eval \\\n\t--train_file=${DATA_DIR}/DRCD_training.json \\\n\t--predict_file=${DATA_DIR}/DRCD_dev.json \\\n\t--uncased=False \\\n\t--max_answer_length=30 \\\n\t--max_seq_length=512 \\\n\t--do_train=True \\\n\t--train_batch_size=16 \\\n\t--do_predict=True \\\n\t--predict_batch_size=16 \\\n\t--learning_rate=3e-5 \\\n\t--adam_epsilon=1e-6 \\\n\t--iterations=1000 \\\n\t--save_steps=2000 \\\n\t--train_steps=3600 \\\n\t--warmup_steps=360\n```\n\n### ChnSentiCorp\n与阅读理解任务不同，分类任务无需提前生成tf_records。\n以下是ChnSentiCorp情感分类任务中使用的脚本参数：\n```\nXLNET_DIR=YOUR_GS_BUCKET_PATH_TO_XLNET\nMODEL_DIR=YOUR_OUTPUT_MODEL_PATH\nDATA_DIR=YOUR_DATA_DIR_TO_TFRECORDS\nRAW_DIR=YOUR_RAW_DATA_DIR\nTPU_NAME=v2-xlnet\nTPU_ZONE=us-central1-b\n\npython -u run_classifier.py \\\n\t--spiece_model_file=./spiece.model \\\n\t--model_config_path=${XLNET_DIR}/xlnet_config.json \\\n\t--init_checkpoint=${XLNET_DIR}/xlnet_model.ckpt \\\n\t--task_name=csc \\\n\t--do_train=True \\\n\t--do_eval=True \\\n\t--eval_all_ckpt=False \\\n\t--uncased=False \\\n\t--data_dir=${RAW_DIR} \\\n\t--output_dir=${DATA_DIR} \\\n\t--model_dir=${MODEL_DIR} \\\n\t--train_batch_size=48 \\\n\t--eval_batch_size=48 \\\n\t--num_hosts=1 \\\n\t--num_core_per_host=8 \\\n\t--num_train_epochs=3 \\\n\t--max_seq_length=256 \\\n\t--learning_rate=2e-5 \\\n\t--save_steps=5000 \\\n\t--use_tpu=True \\\n\t--tpu=${TPU_NAME} \\\n\t--tpu_zone=${TPU_ZONE}\n```\n\n## FAQ\n**Q: 会发布更大的模型吗？**  \nA: 不一定，不保证。如果我们获得了显著性能提升，会考虑发布出来。\n\n**Q: 在某些数据集上效果不好？**  \nA: 选用其他模型或者在这个checkpoint上继续用你的数据做预训练。\n\n**Q: 预训练数据会发布吗？**  \nA: 抱歉，因为版权问题无法发布。\n\n**Q: 训练XLNet花了多长时间？**  \nA: `XLNet-mid`使用了Cloud TPU v3 (128G HBM)训练了2M steps（batch=32），大约需要3周时间。`XLNet-base`则是训练了4M steps。\n\n**Q: 为什么XLNet官方没有发布Multilingual或者Chinese XLNet？**  \nA: \n（以下是个人看法）不得而知，很多人留言表示希望有，戳[XLNet-issue-#3](https://github.com/zihangdai/xlnet/issues/3)。\n以XLNet官方的技术和算力来说，训练一个这样的模型并非难事（multilingual版可能比较复杂，需要考虑各语种之间的平衡，也可以参考[multilingual-bert](https://github.com/google-research/bert/blob/master/multilingual.md)中的描述。）。 \n**不过反过来想一下，作者们也并没有义务一定要这么做。** \n作为学者来说，他们的technical contribution已经足够，不发布出来也不应受到指责，呼吁大家理性对待别人的工作。\n\n**Q: XLNet多数情况下比BERT要好吗？**  \nA: 目前看来至少上述几个任务效果都还不错，使用的数据和我们发布的[BERT-wwm-ext](https://github.com/ymcui/Chinese-BERT-wwm)是一样的。\n\n**Q: ？**  \nA: 。\n\n\n## 引用\n如果本目录中的内容对你的研究工作有所帮助，欢迎在论文中引用下述技术报告：\nhttps://arxiv.org/abs/2004.13922\n```\n@inproceedings{cui-etal-2020-revisiting,\n    title = \"Revisiting Pre-Trained Models for {C}hinese Natural Language Processing\",\n    author = \"Cui, Yiming  and\n      Che, Wanxiang  and\n      Liu, Ting  and\n      Qin, Bing  and\n      Wang, Shijin  and\n      Hu, Guoping\",\n    booktitle = \"Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings\",\n    month = nov,\n    year = \"2020\",\n    address = \"Online\",\n    publisher = \"Association for Computational Linguistics\",\n    url = \"https://www.aclweb.org/anthology/2020.findings-emnlp.58\",\n    pages = \"657--668\",\n}\n```\n\n\n## 致谢\n项目作者： 崔一鸣（哈工大讯飞联合实验室）、车万翔（哈工大）、刘挺（哈工大）、王士进（科大讯飞）、胡国平（科大讯飞）  \n\n本项目受到谷歌[TensorFlow Research Cloud (TFRC)](https://www.tensorflow.org/tfrc)计划资助。\n\n建设该项目过程中参考了如下仓库，在这里表示感谢：\n- XLNet: https://github.com/zihangdai/xlnet\n- Malaya: https://github.com/huseinzol05/Malaya/tree/master/xlnet\n- Korean XLNet（韩文描述，无翻译）: https://github.com/yeontaek/XLNET-Korean-Model\n\n\n## 免责声明\n本项目并非[XLNet官方](https://github.com/zihangdai/xlnet)发布的Chinese XLNet模型。\n同时，本项目不是哈工大或科大讯飞的官方产品。\n该项目中的内容仅供技术研究参考，不作为任何结论性依据。\n使用者可以在许可证范围内任意使用该模型，但我们不对因使用该项目内容造成的直接或间接损失负责。\n\n\n## 关注我们\n欢迎关注哈工大讯飞联合实验室官方微信公众号。\n\n![qrcode.png](https://github.com/ymcui/cmrc2019/raw/master/qrcode.jpg)\n\n\n## 问题反馈 \u0026 贡献\n如有问题，请在GitHub Issue中提交。  \n我们没有运营，鼓励网友互相帮助解决问题。  \n如果发现实现上的问题或愿意共同建设该项目，请提交Pull Request。  \n\n","funding_links":[],"categories":["Pretrained Language Model","Python","预训练模型"],"sub_categories":["Repository"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fymcui%2FChinese-XLNet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fymcui%2FChinese-XLNet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fymcui%2FChinese-XLNet/lists"}