{"id":13754354,"url":"https://github.com/xhw205/GPLinker_torch","last_synced_at":"2025-05-09T22:32:04.910Z","repository":{"id":37613395,"uuid":"478483859","full_name":"xhw205/GPLinker_torch","owner":"xhw205","description":"CMeIE/CBLUE/CHIP/实体关系抽取/SPO抽取","archived":false,"fork":false,"pushed_at":"2022-06-15T03:14:12.000Z","size":747,"stargazers_count":214,"open_issues_count":6,"forks_count":14,"subscribers_count":4,"default_branch":"main","last_synced_at":"2024-11-16T07:33:27.378Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xhw205.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-04-06T09:08:12.000Z","updated_at":"2024-11-08T07:48:08.000Z","dependencies_parsed_at":"2022-07-14T00:50:32.610Z","dependency_job_id":null,"html_url":"https://github.com/xhw205/GPLinker_torch","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xhw205%2FGPLinker_torch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xhw205%2FGPLinker_torch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xhw205%2FGPLinker_torch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xhw205%2FGPLinker_torch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xhw205","download_url":"https://codeload.github.com/xhw205/GPLinker_torch/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253335861,"owners_count":21892749,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-03T09:01:56.501Z","updated_at":"2025-05-09T22:31:59.892Z","avatar_url":"https://github.com/xhw205.png","language":"Python","funding_links":[],"categories":["关系抽取、信息抽取"],"sub_categories":["其他_文本生成、文本对话"],"readme":"## 实体关系抽取_GPLinker\r\n\r\n### 介绍\r\n\r\n实体关系抽取是文本结构化、构建专业知识图谱的核心 step。\r\n\r\n本算法是 [GPLinker](https://kexue.fm/archives/8888) 的 pytorch 复现(简单易懂，杜绝花里胡哨)，该方法的核心是：\r\n\r\n+ 对输入文本 S={w1,w2,...,wn} 的编码向量以【token-pair】标记方式建模 n×n 大小的词元矩阵，进而做实体识别、实体关系抽取任务。\r\n+ 与之相似的工作有：[TP-Linker](https://arxiv.org/abs/2010.13415)、[multi-head selection](https://arxiv.org/abs/1804.07847)、[Word-pair](https://arxiv.org/pdf/2112.10070.pdf) 等。较之传统的 BIO 序列标注、span 指针网络标注方式，token-pair 建模方式现在是实体关系抽取 sota 必备 schema。\r\n\r\n### 数据集\r\n\r\n中文医疗信息处理挑战榜 CBLUE 中 CMeIE 数据集，同样是 CHIP2020/2021 的医学实体关系抽取数据集。\r\n\r\n### 环境\r\n\r\n+ python 3.8.1\r\n+ pytorch==1.8.1\r\n+ transformer==4.9.2\r\n+ configparser\r\n\r\n### 预训练模型\r\n\r\n[RoBerta-zh-large](https://drive.google.com/file/d/1yK_P8VhWZtdgzaG0gJ3zUGOKWODitKXZ/view)下载\r\n\r\n### 运行\r\n\r\n请把 config.ini 中对应的【paths】换为你自己的\r\n\r\n#### train\r\n\r\n```\r\npython main.py\r\n```\r\n\r\n#### predict\r\n\r\n```\r\npython predict.py\r\n```\r\n\r\n### 效果\r\n![1649379794(1).jpg](https://s2.loli.net/2022/04/08/wQGYfycRd7irbXW.png)\r\n+ 使用医学实体关系抽取数据集，[阿里天池](https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414#4)在线测试F1分数【59.82%】，提交的测试结果在./result文件夹中\r\n   \r\n  【不再提供，可以直接提交的测试结果文件】\r\n+ 之前复现的 CasRel 方法，参考 [DeepIE 仓库](https://github.com/loujie0822/DeepIE) ，在线F1分数为【60.556%】，后续整理开源\r\n \r\n    \u003e 注意：TOP-1【66.044%】是百度知识图谱团队的 ERNIE ，基本属于吊打其余方法，但是对于在校生、小团队而言，F1分数如果能上【62%】就属于非常非常好了\r\n\r\n+ 注意最新的 CBLUE 打榜，需要把生成的 CMeIE_test.json 后缀改为 jsonl，再压缩提交\r\n\r\n### TODO\r\n+ 训练过程未根据验证集的F1分数保存最优模型，直接用的最后一个epoch的权重，有需要的自行实现就好了\r\n\r\n+ 把globalpointer 替换 Efficient-GlobalPointer，torch的源码本人都已经公布，自行实现就好\r\n\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxhw205%2FGPLinker_torch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxhw205%2FGPLinker_torch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxhw205%2FGPLinker_torch/lists"}