{"id":18533699,"url":"https://github.com/lonepatient/clue_pytorch","last_synced_at":"2025-10-12T07:32:55.789Z","repository":{"id":106807167,"uuid":"215320624","full_name":"lonePatient/CLUE_pytorch","owner":"lonePatient","description":"CLUE baseline pytorch CLUE的pytorch版本基线","archived":false,"fork":false,"pushed_at":"2020-04-03T14:11:51.000Z","size":348,"stargazers_count":74,"open_issues_count":0,"forks_count":17,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-10T09:39:56.896Z","etag":null,"topics":["albert","bert","chinese","classification","clue","ernie","glue","pytorch","roberta","xlnet"],"latest_commit_sha":null,"homepage":"https://github.com/CLUEbenchmark/CLUE","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lonePatient.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-10-15T14:32:34.000Z","updated_at":"2025-01-18T08:25:21.000Z","dependencies_parsed_at":null,"dependency_job_id":"8220f331-9bf0-4f01-9a22-5a7ca141cf3a","html_url":"https://github.com/lonePatient/CLUE_pytorch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lonePatient/CLUE_pytorch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonePatient%2FCLUE_pytorch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonePatient%2FCLUE_pytorch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonePatient%2FCLUE_pytorch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonePatient%2FCLUE_pytorch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lonePatient","download_url":"https://codeload.github.com/lonePatient/CLUE_pytorch/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lonePatient%2FCLUE_pytorch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279010662,"owners_count":26084784,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["albert","bert","chinese","classification","clue","ernie","glue","pytorch","roberta","xlnet"],"created_at":"2024-11-06T19:12:41.378Z","updated_at":"2025-10-12T07:32:55.784Z","avatar_url":"https://github.com/lonePatient.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## CLUE_pytorch\n\n中文语言理解测评基准(Language Understanding Evaluation benchmark for Chinese)\n\n**备注**：此版本为个人开发版(目前支持所有的分类型任务)，正式版见https://github.com/CLUEbenchmark/CLUE\n\n## 更新\n\n* **2020-03-08**: 模型加载使用[Huggingface-Transformers](https://github.com/huggingface/transformers)\n\n## 模型列表\n\n|      model type       |                      model_name_or_path                      |\n| :-------------------: | :----------------------------------------------------------: |\n|        albert         | [voidful/albert_chinese_base](https://huggingface.co/voidful/albert_chinese_base) |\n|        albert         | [voidful/albert_chinese_larg](https://huggingface.co/voidful/albert_chinese_large) |\n|        albert         | [`voidful/albert_chinese_small`](https://huggingface.co/voidful/albert_chinese_small) |\n|        albert         | [`voidful/albert_chinese_tiny`](https://huggingface.co/voidful/albert_chinese_tiny) |\n|        albert         | [`voidful/albert_chinese_xlarge`](https://huggingface.co/voidful/albert_chinese_xlarge) |\n|        albert         | [`voidful/albert_chinese_xxlarge`](https://huggingface.co/voidful/albert_chinese_xxlarge) |\n|         bert          | [`bert-base-chinese`](https://huggingface.co/bert-base-chinese) |\n|     bert-wwm-ext      | [`hfl/chinese-bert-wwm-ext`](https://huggingface.co/hfl/chinese-bert-wwm-ext) |\n|       bert-wwm        | [`hfl/chinese-bert-wwm`](https://huggingface.co/hfl/chinese-bert-wwm) |\n| roberta-wwm-ext-large | [`hfl/chinese-roberta-wwm-ext-large`](https://huggingface.co/hfl/chinese-roberta-wwm-ext-large) |\n|    roberta-wwm-ext    | [`hfl/chinese-roberta-wwm-ext`](https://huggingface.co/hfl/chinese-roberta-wwm-ext) |\n|      xlnet-base       | [`hfl/chinese-xlnet-base`](https://huggingface.co/hfl/chinese-xlnet-base) |\n|       xlnet-mid       | [`hfl/chinese-xlnet-mid`](https://huggingface.co/hfl/chinese-xlnet-mid) |\n|         rbt3          |        [`hfl/rbt3`](https://huggingface.co/hfl/rbt3)         |\n|         rbt3          |       [`hfl/rbtl3`](https://huggingface.co/hfl/rbtl3)        |\n| RoBERTa-tiny-clue      | [`clue/roberta_chinese_clue_tiny`](https://huggingface.co/clue/roberta_chinese_clue_tiny) |\n| RoBERTa-tiny-pair      | [`clue/roberta_chinese_pair_tiny`](https://huggingface.co/clue/roberta_chinese_pair_tiny) |\n| RoBERTa-tiny3L768-clue | [`clue/roberta_chinese_3L768_clue_tiny`](https://huggingface.co/clue/roberta_chinese_3L768_clue_tiny) |\n| RoBERTa-tiny3L312-clue | [`clue/roberta_chinese_3L312_clue_tiny`](https://huggingface.co/clue/roberta_chinese_3L312_clue_tiny) |\n| RoBERTa-large-clue    | [`clue/roberta_chinese_clue_large`](https://huggingface.co/clue/roberta_chinese_clue_large) |\n| RoBERTa-large-pair     | [`clue/roberta_chinese_pair_large`](https://huggingface.co/clue/roberta_chinese_pair_large) |\n\n## 代码目录说明\n\n```text\n├── CLUEdatasets   #　存放数据\n|  └── tnews　　　\n|  └── wsc　\n|  └── ...\n├── metrics　　　　　　　　　# metric计算\n|  └── clue_compute_metrics.py　　　\n├── outputs              # 模型输出保存\n|  └── tnews_output\n|  └── wsc_output　\n|  └── ...\n├── prev_trained_model　# 预训练模型\n|  └── albert_base\n|  └── bert-wwm\n|  └── ...\n├── processors　　　　　# 数据处理\n|  └── clue.py\n|  └── ...\n├── tools　　　　　　　　#　通用脚本\n|  └── progressbar.py\n|  └── ...\n├── run_classifier.py       # 主程序\n├── run_classifier_tnews.sh   #　任务运行脚本\n```\n### 依赖模块\n\n- pytorch=1.1.0\n- boto3=1.9\n- regex\n- sacremoses\n- sentencepiece\n- python3.6+\n- transformers=2.5.1\n\n### 运行方式\n**1. 安装Transformers**\n```shell\npip install transformers\n```\n\n**2. 下载CLUE数据集，运行以下命令：**\n```python\npython download_clue_data.py --data_dir=./CLUEdatasets --tasks=all\n```\n上述命令默认下载全CLUE数据集，你也可以指定`--tasks`进行下载对应任务数据集，默认存在在`./CLUEdatasets/{对应task}`目录下。\n\n**注意**: 如果使用本地已经下载好的模型权重，需要在对应的文件夹内存放`config.json`和`vocab.txt`文件，比如：\n```text\n├── prev_trained_model　# 预训练模型\n|  └── bert-base\n|  | └── vocab.txt\n|  | └── config.json\n|  | └── pytorch_model.bin\n\n```\n如果使用本地已有的模型权重文件，直接修改参数`--model_name_or_path=your_local_model_weight_path`即可\n\n**3. 直接运行对应任务sh脚本，如：**\n\n```shell\nsh run_classifier_tnews.sh\n```\n具体运行方式如下：\n```python\nCURRENT_DIR=`pwd`\nexport CLUE_DIR=$CURRENT_DIR/CLUEdatasets\nexport OUTPUR_DIR=$CURRENT_DIR/outputs\nTASK_NAME=\"iflytek\"\n\npython run_classifier.py \\\n  --model_type=albert \\\n  --model_name_or_path=voidful/albert_chinese_tiny \\\n  --task_name=$TASK_NAME \\\n  --do_train \\\n  --do_lower_case \\\n  --evaluate_during_training \\\n  --data_dir=$CLUE_DIR/${TASK_NAME}/ \\\n  --max_seq_length=128 \\\n  --per_gpu_train_batch_size=16 \\\n  --per_gpu_eval_batch_size=16 \\\n  --learning_rate=2e-4 \\\n  --num_train_epochs=6.0 \\\n  --logging_steps=759 \\\n  --save_steps=759 \\\n  --output_dir=$OUTPUR_DIR/${TASK_NAME}_output/ \\\n  --overwrite_output_dir \\\n  --seed=42\n```\n**注意**:\n\n\u003e model_name_or_path=voidful/albert_chinese_tiny默认自动下载albert_chinese_tiny\n\u003e当前只支持google版本的中文albert模型\n\n\n**4. 评估**\n\n当前默认使用最后一个checkpoint模型作为评估模型，你也可以指定`--predict_checkpoints`参数进行对应的checkpoint进行评估，比如：\n```python\nCURRENT_DIR=`pwd`\nexport CLUE_DIR=$CURRENT_DIR/CLUEdatasets\nexport OUTPUR_DIR=$CURRENT_DIR/outputs\nTASK_NAME=\"copa\"\n\npython run_classifier.py \\\n  --model_type=bert \\\n  --model_name_or_path=voidful/albert_chinese_tiny \\\n  --task_name=$TASK_NAME \\\n  --do_predict \\\n  --predict_checkpoints=100 \\\n  --do_lower_case \\\n  --data_dir=$CLUE_DIR/${TASK_NAME}/ \\\n  --max_seq_length=128 \\\n  --per_gpu_train_batch_size=16 \\\n  --per_gpu_eval_batch_size=16 \\\n  --learning_rate=1e-5 \\\n  --num_train_epochs=2.0 \\\n  --logging_steps=50 \\\n  --save_steps=50 \\\n  --output_dir=$OUTPUR_DIR/${TASK_NAME}_output/ \\\n  --overwrite_output_dir \\\n  --seed=42\n```\n\n### 模型列表\n```\n    \"bert\": (BertConfig, BertForSequenceClassification, BertTokenizer),\n    \"ernie\": (BertConfig, BertForSequenceClassification, BertTokenizer),\n    \"xlnet\": (XLNetConfig, XLNetForSequenceClassification, XLNetTokenizer),\n    \"roberta\": (BertConfig, BertForSequenceClassification, BertTokenizer),\n    \"albert\": (AlbertConfig, AlbertForSequenceClassification, BertTokenizer),\n```\n### 结果\n\nCLUEWSC2020: WSC Winograd模式挑战中文版,新版2020-03-25发布 [CLUEWSC2020数据集下载](https://storage.googleapis.com/cluebenchmark/tasks/cluewsc2020_public.zip)\n\n| 模型 | 开发集(Dev) |\n| :------- | :---------: |\n| bert_base | 79.94 |\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flonepatient%2Fclue_pytorch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flonepatient%2Fclue_pytorch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flonepatient%2Fclue_pytorch/lists"}