{"id":19119456,"url":"https://github.com/cluebenchmark/kgcluebench","last_synced_at":"2025-10-26T18:12:49.765Z","repository":{"id":110258684,"uuid":"433450639","full_name":"CLUEbenchmark/KgCLUEbench","owner":"CLUEbenchmark","description":"benchmark of KgCLUE, with different models and methods","archived":false,"fork":false,"pushed_at":"2021-12-13T14:02:07.000Z","size":276,"stargazers_count":27,"open_issues_count":3,"forks_count":4,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-05-05T14:52:00.506Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CLUEbenchmark.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-11-30T13:56:18.000Z","updated_at":"2024-12-02T08:01:03.000Z","dependencies_parsed_at":"2023-03-13T13:56:11.781Z","dependency_job_id":null,"html_url":"https://github.com/CLUEbenchmark/KgCLUEbench","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/CLUEbenchmark/KgCLUEbench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FKgCLUEbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FKgCLUEbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FKgCLUEbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FKgCLUEbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CLUEbenchmark","download_url":"https://codeload.github.com/CLUEbenchmark/KgCLUEbench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CLUEbenchmark%2FKgCLUEbench/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260684162,"owners_count":23046102,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T05:09:44.352Z","updated_at":"2025-10-26T18:12:44.729Z","avatar_url":"https://github.com/CLUEbenchmark.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[toc]\n\n# KgClue_Bench\n\n尽最大能力解耦代码，为NLP新手提供(BERT)学习平台\n\n## 目录结构\n\n├─algorithm # 算法 \u003cbr\u003e\n│ └─kg_qa # 算法开发示例\u003cbr\u003e\n│ │ config.py\u003cbr\u003e\n│ ├─KG 每个模块对应一个package\u003cbr\u003e\n│ │ │ es.py # 将知识库导入es的脚本\u003cbr\u003e\n│ │ │ KgAnswer.py # 回答问题类\u003cbr\u003e\n│ │ │ KgEval.py# 回答问题的准确度评估方法\u003cbr\u003e \n│ │ │ KgPredict.py# 针对test.json文件生成预测结果，手动压缩之后可以提交到官网进行评估\u003cbr\u003e \n│ ├─NER\u003cbr\u003e\n│ │ │ DataMaking.py# NER训练数据集的制作脚本\u003cbr\u003e \n│ │ │ EntityExtract.py# 将序列标注标签转化为实体\u003cbr\u003e\n│ │ │ Eval.py# 评估代码（输出f1）\u003cbr\u003e \n│ │ │ Predict.py# 预测类\u003cbr\u003e \n│ │ │ TrainAndValid.py# 训练代码\u003cbr\u003e\n├─bert 谷歌官方Bert代码存放\u003cbr\u003e\n│ │ .gitignore\u003cbr\u003e\n├─pretraining_model # 存放bert的预训练模型\u003cbr\u003e\n│ ├─chinese_rbt3_L-3_H-768_A-12 #存放示例\u003cbr\u003e\n├─raw_data # 数据集推荐添加方式,直接解压\u003cbr\u003e\n│ ├─kgClue # kg_qa项目中适配的数据集\u003cbr\u003e\n│ │ │ xxx.json\u003cbr\u003e\n│ │ └─knowledge # 知识库\u003cbr\u003e\n│ │ Knowledge.txt\u003cbr\u003e\n└─utils\u003cbr\u003e\n\n\n## 算法排行\n\n### **kg_qa任务** 以kgClue为训练数据集，旨在回答知识库中的问题\n\n\u003e #### 不同算法结构性能比较(以chinese_rbtl3_L-3_H-1024_A-16为预训练模型)\n\u003e 这里的评估是以问题回答准确度作为标准\n\u003e \n\u003eModel   | F1     | EM  |\n\u003e:----:| :----:  |:----:  |\n\u003ebert-crf |  70.7      |  70.7   |\n\u003ebert-lstm-crf |  63.9       |  63.6    |\n\n#### 不同预训练模型性能比较(不代表每个模型的最佳性能)\nNER (bert+crf) seq_lan=32 epoch=5\n\n| pretraining_model      | batch | micro-f1| macro-f1| f1(##WordPiece) |f1(B-NP/I-NP)|\n| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |\n| chinese_rbt3_L-3_H-768_A-12      | 40       | 93.1| 88.0 | 61.0 | 79.0 |\n| chinese_rbt4_L-4_H-768_A-12   | 40        | 92.0 | 87.0 | 62.0 | 75.0 |\n| chinese_rbt6_L-6_H-768_A-12   | 40        | 93.0 | 88.0 | 61.0 | 77.0 |\n| chinese_rbtl3_L-3_H-1024_A-16   | 40       | 93.0 | 89.0 | 66.0 | 77.0 | \n| chinese_wwm_ext_L-12_H-768_A-12   | 40       | 93.0 | 88.0 | 63.0 | 77.0 | \n\nSIM (bert) seq_lan=64 epoch=5\n\n| pretraining_model      | batch | accuracy| precision| recall |macro-f1|\n| ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |\n| chinese_rbt3_L-3_H-768_A-12      | 40       | 86.0| 44.3 | 2.0 | 49.0 |\n| chinese_rbt4_L-4_H-768_A-12   | 40        | 93.5 | 78.3 | 73.1 | 85.9 |\n| chinese_rbt6_L-6_H-768_A-12   | 40        | 93.8 | 79.2 | 74.9 | 86.7 |\n| chinese_rbtl3_L-3_H-1024_A-16   | 40       |96.5 |86.4| 89.1| 92.9 | \n| chinese_wwm_ext_L-12_H-768_A-12   | 40       | 95.5| 82.1| 86.6 | 90.9 | \n\n## 使用示例\n\n### 以 **kg_qa** 算法为例\n\n\u003e 该项目下有三个文件夹KG\\NER\\SIM\n\n#### NER\n\n1. python DataMaking.py **注意**: 1. 文件路径 2.脚本work路径,应该以整个KgCLUEBench为项目根目录运行\n2. python TrainAndValid.py **注意** :训练之前设置好kg_qa目录下的config配置,其他注意点同上\n3. python Precit.py 验证是否正常运行\n4. python Eval.py 得出模型的评估结果,可以在训练时间段Eval模型,查看训练效果\n5. python EntityExtract.py 将序列标注结果(Predict结果)转化为句子中的实体\n\n#### SIM 同理\n\n1. python DataMaking.py **注意**: 1. 文件路径 2.脚本work路径,应该以整个KgCLUEBench为项目根目录运行\n2. python TrainAndValid.py **注意** :训练之前设置好kg_qa目录下的config配置,其他注意点同上\n3. python Precit.py 验证是否正常运行\n4. python Eval.py 得出模型的评估结果,可以在训练时间段Eval模型,查看训练效果\n\n#### KG\n1. es.py是将知识库（这里是Knowledge.txt）导入es系统的脚本文件，只需要执行一次\n2. KgAnswer.py是回答问题的类，只需要输入一个句子，即可给出结果\n3. KgEval是评估问题回答能力的代码，修改文件路径即可使用\n4. KgPredict是回答test.json的代码，运行完成可以生成kgclue_predict.txt，用户压缩成zip文件之后可以直接提交至clue官网。\n\n## algorithm 贡献方法\n\n\u003e 在此目录下直接命名一个新的python包包含init和config文件\n\u003e 不同算法可能有多个stage，不同stage建议使用独立的python包，多个stage共享一个config\n\n## UPDATE\n\n******* 2021-12-3,新项目开荒\n******* 2021-12-12,完整流程测试通过\n\n## 有问题联系1194370384@qq.com\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcluebenchmark%2Fkgcluebench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcluebenchmark%2Fkgcluebench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcluebenchmark%2Fkgcluebench/lists"}