{"id":19119437,"url":"https://github.com/cluebenchmark/pyclue","last_synced_at":"2025-07-13T18:39:50.418Z","repository":{"id":39739288,"uuid":"225878517","full_name":"CLUEbenchmark/PyCLUE","owner":"CLUEbenchmark","description":"Python toolkit for Chinese Language Understanding(CLUE) Evaluation benchmark","archived":false,"fork":false,"pushed_at":"2023-05-22T23:20:26.000Z","size":144,"stargazers_count":129,"open_issues_count":7,"forks_count":15,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-15T17:44:43.598Z","etag":null,"topics":["albert","bert","chinese-language","chineseglue","corpus","evaluation-benchmark","language-model","roberta-wwm-ext","tiny","xlnet"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CLUEbenchmark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-12-04T13:51:02.000Z","updated_at":"2025-02-25T01:12:43.000Z","dependencies_parsed_at":"2023-01-23T04:00:54.936Z","dependency_job_id":"6a6e17f3-61e2-4158-b5c8-85af89370dba","html_url":"https://github.com/CLUEbenchmark/PyCLUE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CLUEbenchmark/PyCLUE","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FPyCLUE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FPyCLUE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FPyCLUE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FPyCLUE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CLUEbenchmark","download_url":"https://codeload.github.com/CLUEbenchmark/PyCLUE/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FPyCLUE/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265188919,"owners_count":23725175,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["albert","bert","chinese-language","chineseglue","corpus","evaluation-benchmark","language-model","roberta-wwm-ext","tiny","xlnet"],"created_at":"2024-11-09T05:09:40.707Z","updated_at":"2025-07-13T18:39:50.403Z","avatar_url":"https://github.com/CLUEbenchmark.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyCLUE\n\nPython toolkit for Chinese Language Understanding Evaluation benchmark.\n\n中文语言理解测评基准的Python工具包，快速测评代表性数据集、基准（预训练）模型，并针对自己的数据选择合适的基准（预训练）模型进行快速应用。\n\n## 关于CLUE\n\ndatasets, baselines, pre-trained models, corpus and leaderboard\n\n[中文语言理解测评基准](https://www.cluebenchmarks.com/)，包括代表性的数据集、基准(预训练)模型、语料库、排行榜。\n\n我们会选择一系列有一定代表性的任务对应的数据集，做为我们测试基准的数据集。这些数据集会覆盖不同的任务、数据量、任务难度。\n\n## 安装PyCLUE\n\n现在，可以通过pip安装PyCLUE：\n\n```bash\npip install --upgrade PyCLUE\n```\n\n或直接git clone安装PyCLUE：\n\n```bash\npip install git+https://www.github.com/CLUEBenchmark/PyCLUE.git\n```\n\n## 基准（预训练）模型\n\n**已支持预训练语言模型**\n\n1. [BERT-zh](https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip)\n2. [BERT-wwm-ext](https://storage.googleapis.com/chineseglue/pretrain_models/chinese_wwm_ext_L-12_H-768_A-12.zip)\n3. [albert_xlarge_zh_brightmart](https://storage.googleapis.com/albert_zh/albert_xlarge_zh_177k.zip)\n4. [albert_large_zh_brightmart](https://storage.googleapis.com/albert_zh/albert_large_zh.zip)\n5. [albert_base_zh_brightmart](https://storage.googleapis.com/albert_zh/albert_base_zh.zip)\n6. [albert_base_ext_zh_brightmart](https://storage.googleapis.com/albert_zh/albert_base_zh_additional_36k_steps.zip)\n7. [albert_small_zh_brightmart](https://storage.googleapis.com/albert_zh/albert_small_zh_google.zip)\n8. [albert_tiny_zh_brightmart](https://storage.googleapis.com/albert_zh/albert_tiny_zh_google.zip)\n9. [roberta_zh_brightmart](https://storage.googleapis.com/chineseglue/pretrain_models/roeberta_zh_L-24_H-1024_A-16.zip)\n10. [roberta_wwm_ext_zh_brightmart](https://storage.googleapis.com/chineseglue/pretrain_models/chinese_roberta_wwm_ext_L-12_H-768_A-12.zip)\n11. [roberta_wwm_ext_large_zh_brightmart](https://storage.googleapis.com/chineseglue/pretrain_models/chinese_roberta_wwm_large_ext_L-24_H-1024_A-16.zip)\n\n**待支持**\n\n1. [XLNet_mid](https://github.com/ymcui/Chinese-PreTrained-XLNet)\n2. [ERNIE_base](https://github.com/PaddlePaddle/ERNIE)\n\n## 快速评测CLUE数据集\n\n### 数据集介绍及下载\n\n**注：数据集与[CLUEBenchmark](https://github.com/CLUEbenchmark/CLUE)所提供的数据集一致，仅在格式上相应修改，以适应PyCLUE项目**\n\n#### 1. AFQMC 蚂蚁金融语义相似度\n\n##### 数据介绍\n\n```\n数据量：训练集（34334）验证集（4316）测试集（3861）\n例子：\n{\"sentence1\": \"双十一花呗提额在哪\", \"sentence2\": \"里可以提花呗额度\", \"label\": \"0\"}\n每一条数据有三个属性，从前往后分别是 句子1，句子2，句子相似度标签。其中label标签，1 表示sentence1和sentence2的含义类似，0表示两个句子的含义不同。\n```\n\n链接：https://pan.baidu.com/s/1It1SiMJbsrNl1dEOBoOGXg \n提取码：ksd1\n\n##### 测评脚本\n\n训练模型脚本位置：PyCLUE/clue/sentence_pair/afqmc/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/afqmc/train.ipynb\n\n提交文件脚本位置：PyCLUE/clue/sentence_pair/afqmc/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/afqmc/predict.ipynb\n\n#### 2. TNEWS' 今日头条中文新闻（短文本）分类 Short Text Classificaiton for News\n\n##### 数据介绍\n\n该数据集来自今日头条的新闻版块，共提取了15个类别的新闻，包括旅游，教育，金融，军事等。\n\n```\n数据量：训练集(266,000)，验证集(57,000)，测试集(57,000)\n例子：\n{\"label\": \"102\", \"label_des\": \"news_entertainment\", \"sentence\": \"江疏影甜甜圈自拍，迷之角度竟这么好看，美吸引一切事物\"}\n每一条数据有三个属性，从前往后分别是 分类ID，分类名称，新闻字符串（仅含标题）。\n```\n\n链接：https://pan.baidu.com/s/1Rs9oXoloKgwI-RgNS_GTQQ \n提取码：s9go\n\n##### 测评脚本\n\n训练模型脚本位置：PyCLUE/clue/classification/tnews/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/tnews/train.ipynb\n\n提交文件脚本位置：PyCLUE/clue/classification/tnews/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/tnews/predict.ipynb\n\n#### 3. IFLYTEK' 长文本分类 Long Text classification\n\n##### 数据介绍\n\n该数据集共有1.7万多条关于app应用描述的长文本标注数据，包含和日常生活相关的各类应用主题，共119个类别：\"打车\":0,\"地图导航\":1,\"免费WIFI\":2,\"租车\":3,….,\"女性\":115,\"经营\":116,\"收款\":117,\"其他\":118(分别用0-118表示)。\n\n```\n数据量：训练集(12,133)，验证集(2,599)，测试集(2,600)\n例子：\n{\"label\": \"110\", \"label_des\": \"社区超市\", \"sentence\": \"朴朴快送超市创立于2016年，专注于打造移动端30分钟即时配送一站式购物平台，商品品类包含水果、蔬菜、肉禽蛋奶、海鲜水产、粮油调味、酒水饮料、休闲食品、日用品、外卖等。朴朴公司希望能以全新的商业模式，更高效快捷的仓储配送模式，致力于成为更快、更好、更多、更省的在线零售平台，带给消费者更好的消费体验，同时推动中国食品安全进程，成为一家让社会尊敬的互联网公司。,朴朴一下，又好又快,1.配送时间提示更加清晰友好2.保障用户隐私的一些优化3.其他提高使用体验的调整4.修复了一些已知bug\"}\n每一条数据有三个属性，从前往后分别是 类别ID，类别名称，文本内容。\n```\n\n链接：https://pan.baidu.com/s/1EKtHXmgt1t038QTO9VKr3A \n提取码：u00v\n\n##### 评测脚本\n\n训练模型脚本位置：PyCLUE/clue/classification/iflytek/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/iflytek/train.ipynb\n\n提交文件脚本位置：PyCLUE/clue/classification/iflytek/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/classification/iflytek/predict.ipynb\n\n#### 4. CMNLI 语言推理任务 Chinese Multi-Genre NLI\n\n##### 数据介绍\n\nCMNLI数据由两部分组成：XNLI和MNLI。数据来自于fiction，telephone，travel，government，slate等，对原始MNLI数据和XNLI数据进行了中英文转化，保留原始训练集，合并XNLI中的dev和MNLI中的matched作为CMNLI的dev，合并XNLI中的test和MNLI中的mismatched作为CMNLI的test，并打乱顺序。该数据集可用于判断给定的两个句子之间属于蕴涵、中立、矛盾关系。\n\n```\n数据量：train(391,782)，matched(12,426)，mismatched(13,880)\n例子：\n{\"sentence1\": \"新的权利已经足够好了\", \"sentence2\": \"每个人都很喜欢最新的福利\", \"label\": \"neutral\"}\n每一条数据有三个属性，从前往后分别是 句子1，句子2，蕴含关系标签。其中label标签有三种：neutral，entailment，contradiction。\n```\n\n链接：https://pan.baidu.com/s/1mFT31cBs2G6e69As6H65dQ \n提取码：kigh\n\n##### 评测脚本\n\n训练模型脚本位置：PyCLUE/clue/sentence_pair/cmnli/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/cmnli/train.ipynb\n\n提交文件脚本位置：PyCLUE/clue/sentence_pair/cmnli/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/cmnli/predict.ipynb\n\n#### 5. 诊断集 CLUE_diagnostics test_set\n\n##### 数据介绍\n\n诊断集，用于评估不同模型在9种语言学家总结的中文语言现象上的表现。\n\n使用在CMNLI上训练过的模型，直接预测在这个诊断集上的结果，提交格式和CMNLI一致，在排行榜详情页可以看到结果。（注：该数据集包含CMNLI的训练集与测试集）\n\n链接：https://pan.baidu.com/s/1DYDUGO6xN_4xAT0Y4aNsiw \n提取码：u194\n\n##### 评测脚本\n\n训练模型脚本位置：PyCLUE/clue/sentence_pair/diagnostics/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/diagnostics/train.ipynb\n\n提交文件脚本位置：PyCLUE/clue/sentence_pair/diagnostics/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/clue/sentence_pair/diagnostics/predict.ipynb\n\n#### 6. 其他CLUE支持的数据集\n\n补充中。\n\n## 应用于自定义任务\n\n#### 1. 多分类任务 Multi Class Classification\n\n##### 任务说明\n\n多分类任务，如文本分类、情感分类等，可接受单句输入和句子对输入两种形式。\n\n##### 数据要求\n\n数据目录下应至少包含train.txt，dev.txt和labels.txt文件，可增加test.txt文件。\n\n保存形式参考：\n\n单句输入（对应评测脚本中的`task_type = 'single'`）：PyCLUE/examples/classification/single_data_templates/，https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/single_data_templates\n\n句子对输入（对应评测脚本中的`task_type = 'pairs'`）：PyCLUE/examples/classification/pairs_data_templates/，https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/pairs_data_templates\n\n**注：应采用\\t作为分隔符。**\n\n##### 评测脚本\n\n训练模型脚本位置：PyCLUE/examples/classification/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/train.ipynb\n\n预测使用脚本位置：PyCLUE/examples/classification/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/classification/predict.ipynb\n\n#### 2. 句子对任务（孪生网络） Sentence Pair (Siamese)\n\n##### 任务说明\n\n句子对任务（孪生网络），如相似句子对任务等。**与多分类任务中的句子对输入模型区别：多分类任务中的句子对任务采用类似Bert的拼接形式进行输入，而该任务采用孪生网络的形式进行输入。**\n\n##### 数据要求\n\n数据目录下应至少包含train.txt，dev.txt和labels.txt文件，可增加test.txt文件。\n\n保存形式参考：\n\n输入：PyCLUE/examples/sentence_pair/data_templates/，https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/sentence_pair/data_templates\n\n**注：应采用\\t作为分隔符。**\n\n##### 评测脚本\n\n训练模型脚本位置：PyCLUE/examples/sentence_pair/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/sentence_pair/train.ipynb\n\n预测使用脚本位置：PyCLUE/examples/sentence_pair/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/sentence_pair/predict.ipynb\n\n#### 3. 文本匹配任务（孪生网络） Text Matching (Siamese)\n\n##### 说明\n\n文本匹配任务（孪生网络），如FAQ检索、QQ匹配检索等任务，使用孪生网络生成输入句子的embedding信息，使用[hnswlib](https://github.com/nmslib/hnswlib)检索最相近的若干句子。\n\n##### 数据要求\n\n数据目录下应至少包含cache.txt，train.txt，dev.txt和labels.txt文件，可增加test.txt文件。\n\n保存形式参考：\n\n输入：PyCLUE/examples/text_matching/data_templates/，https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/text_matching/data_templates\n\n**注：应采用\\t作为分隔符。**\n\n##### 评测脚本\n\n训练模型脚本位置：PyCLUE/examples/text_matching/train.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/text_matching/train.ipynb\n\n预测使用脚本位置：PyCLUE/examples/text_matching/predict.ipynb\n\n参考：https://github.com/CLUEBenchmark/PyCLUE/blob/master/examples/text_matching/predict.ipynb\n\n## 训练生成文件\n\n#### 1. 模型文件\n\n模型文件包含10个最新的checkpoint模型文件和pb模型文件（10个checkpoint模型文件在测试集dev.txt上表现最佳的模型）。\n\n![训练生成文件](https://i.loli.net/2020/05/10/7bZIvJakD8x1tGl.png)\n\n#### 2. 训练过程指标\n\n训练过程生成的指标文件（train_metrics.png），分别为accuracy，total_loss，batch_loss，precision，recall和f1指标。\n\n\u003cimg src=\"https://i.loli.net/2020/05/10/gkS2GPyClDNrjuK.png\" alt=\"train_metrics\" style=\"zoom:200%;\" /\u003e\n\n#### 3. 验证过程指标\n\n若存在验证文件test.txt且验证文件各行以true_label作为起始，则打印最佳模型在验证文件上的指标。\n\n![image-20200510133813806](https://i.loli.net/2020/05/10/bpzCFT2t8GBOunk.png)\n\n## API文档\n\n更新中。\n\n## 其他说明\n\n正式地址：https://github.com/CLUEBenchmark/PyCLUE\n\n调试地址：https://github.com/liushaoweihua/PyCLUE\n\n## Timeline\n\n### 更新日志\n\n* 2019.12.05\n  * [初版PyCLUE](https://github.com/chineseGLUE/PyCLUE)，用以快速评测CLUE数据集（文本分类、句子对任务）；\n* 2020.05.10\n  * 代码改版，合并冗余代码（测试版本：tensorflow 1.15.2），为简化API，在下游任务上暂时移除对TPU的支持；\n  * 支持多版bert、albert和roberta模型，可根据指定预训练语言名自动下载并加载使用；\n  * 支持文本分类、句子对、文本匹配任务；\n  * 用以快速评测CLUE数据集（AFQMC/TNEWS/IFLYTEK/CMNLI），生成[CLUEBenchmark](https://www.cluebenchmarks.com/)可接受的提交文件；\n  * 应用于自定义任务，快速快速生成checkpoint和tensorflow-serving支持部署的pb模型文件形式，并可加载pb模型文件进行预测；支持文件形式质检，保存误识别结果至指定目录。\n\n### 更新计划\n\n* 2020.05 ~ 2020.08\n  * 支持其他文本分类、句子对和文本匹配任务；\n  * 支持序列标注任务；\n  * 支持XLNET、ERNIE、ELECTRA等；\n  * 支持预训练词向量模型（Word2Vec等），支持多类下游网络；\n* 2020.08 ~ 2020.10\n  * 支持阅读理解任务；\n  * 支持TF 2.0；\n* 2020.10 ~ 2020.12\n  * 对接[NLPCC 2020 LightLM高性能小模型评测项目](https://github.com/CLUEbenchmark/LightLM)，支持多类小模型；\n  * 整合[CLUE已支持的Pytorch模型](https://github.com/CLUEbenchmark/CLUE/tree/master/baselines/models_pytorch)。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcluebenchmark%2Fpyclue","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcluebenchmark%2Fpyclue","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcluebenchmark%2Fpyclue/lists"}