{"id":18256266,"url":"https://github.com/taishan1994/onerel_chinese","last_synced_at":"2026-02-19T05:31:30.396Z","repository":{"id":49398723,"uuid":"473029681","full_name":"taishan1994/OneRel_chinese","owner":"taishan1994","description":"OneRel在中文关系抽取中的使用","archived":false,"fork":false,"pushed_at":"2023-10-07T01:29:21.000Z","size":4089,"stargazers_count":119,"open_issues_count":24,"forks_count":11,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-08T22:31:42.460Z","etag":null,"topics":["chinese","duie","onerel","pytorch-implementation","relation-extraction"],"latest_commit_sha":null,"homepage":"","language":"Roff","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/taishan1994.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2022-03-23T03:57:38.000Z","updated_at":"2025-04-04T08:40:30.000Z","dependencies_parsed_at":"2022-07-26T14:45:05.152Z","dependency_job_id":null,"html_url":"https://github.com/taishan1994/OneRel_chinese","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/taishan1994/OneRel_chinese","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taishan1994%2FOneRel_chinese","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taishan1994%2FOneRel_chinese/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taishan1994%2FOneRel_chinese/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taishan1994%2FOneRel_chinese/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/taishan1994","download_url":"https://codeload.github.com/taishan1994/OneRel_chinese/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/taishan1994%2FOneRel_chinese/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29604552,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T05:11:50.834Z","status":"ssl_error","status_checked_at":"2026-02-19T05:11:38.921Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chinese","duie","onerel","pytorch-implementation","relation-extraction"],"created_at":"2024-11-05T10:20:53.193Z","updated_at":"2026-02-19T05:31:30.369Z","avatar_url":"https://github.com/taishan1994.png","language":"Roff","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OneRel_chinese\n2023-06-18\n\n- 修改data_loader.py，不用再额外需要keras_bert、keras、tensorflow。\n- 修改训练时验证的策略。\n\n- 增加一般步骤：\n\n```python\ngit clone https://github.com/taishan1994/OneRel_chinese.git\n\n在pre_trained_bert下新建chinese-bert-wwm-ext，并去hugging face上下载config.json、pytorch_model.bin、vocab.txt到其下\n\n针对自己数据集在data下新建数据集文件夹，比如DUIE，里面包含的数据为：\ntrain_triples.json\ndev_triples.json\ntest_triples.json\nrel2id.json\n其中xxx_triples.json里面的格式都是一样的，具体为：\n[{\"text\": \"摩尔多瓦共和国（摩尔多瓦语：Republica Moldova，英语：Republic of Moldova），简称摩尔多瓦，是位于东南欧的内陆国，与罗马\u003e尼亚和乌克兰接壤，首都基希讷乌\", \"triple_list\": [[\"摩尔多瓦\", \"首都\", \"基希讷乌\"]]}, ......]\nrel2id.json为标签，具体为：\n[{\"0\": \"注册资本\", \"1\": \"嘉宾\", \"2\": \"国籍\", \"3\": \"制片人\", \"4\": \"配音\", \"5\": \"作者\", \"6\": \"总部地点\", \"7\": \"导演\", \"8\": \"简称\", \"9\": \"票房\", \"10\": \"主题曲\", \"11\": \"号\", \"12\": \"主角\", \"13\": \"母亲\", \"14\": \"编剧\", \"15\": \"邮政编码\", \"16\": \"主演\", \"17\": \"作曲\", \"18\": \"获奖\", \"19\": \"主持人\", \"20\": \"董事长\", \"21\": \"上映时间\", \"22\": \"丈夫\", \"23\": \"代言人\", \"24\": \"人口数量\", \"25\": \"气候\", \"26\": \"作词\", \"27\": \"占地面积\", \"28\": \"官方语言\", \"29\": \"祖籍\", \"30\": \"毕业院校\", \"31\": \"所在城市\", \"32\": \"海拔\", \"33\": \"专业代码\", \"34\": \"首都\", \"35\": \"朝代\", \"36\": \"妻子\", \"37\": \"修业年限\", \"38\": \"所属专辑\", \"39\": \"成立日期\", \"40\": \"创始人\", \"41\": \"饰演\", \"42\": \"\u003e校长\", \"43\": \"改编自\", \"44\": \"歌手\", \"45\": \"出品公司\", \"46\": \"面积\", \"47\": \"父亲\"}, {\"注册资本\": 0, \"嘉宾\": 1, \"国籍\": 2, \"制片人\": 3, \"配音\": 4, \"作者\": 5, \"总部地点\": 6, \"导演\": 7, \"简称\": 8, \"票房\": 9, \"主题曲\": 10, \"号\": 11, \"主角\": 12, \"母亲\": 13, \"编剧\": 14, \"邮政编码\": 15, \"主演\": 16, \"作曲\": 17, \"获奖\": 18, \"主持人\": 19, \"董事长\": 20, \"上映时间\": 21, \"丈夫\": 22, \"代言人\": 23, \"人\n口数量\": 24, \"气候\": 25, \"作词\": 26, \"占地面积\": 27, \"官方语言\": 28, \"祖籍\": 29, \"毕业院校\": 30, \"所在城市\": 31, \"海拔\": 32, \"专业\n代码\": 33, \"首都\": 34, \"朝代\": 35, \"妻子\": 36, \"修业年限\": 37, \"所属专辑\": 38, \"成立日期\": 39, \"创始人\": 40, \"饰演\": 41, \"校长\": 42, \"改编自\": 43, \"歌手\": 44, \"出品公司\": 45, \"面积\": 46, \"父亲\": 47}]\n\n在data_loader.py里面re_collate_fn函数里面batch_triple_matrix = torch.LongTensor(cur_batch_len, 48, ax_text_len, max_text_len).zero_()需要修改为关系的数目。DUIE关系数目为48。\n\n训练：python train.py --dataset=DUIE --batch_size=4 --rel_num=48\n默认训练为1个epoch，可通过--max_epoch来指定。同时，默认长度为100，可通过--max_len指定，并设置--bert_max_len为max_len的一倍。\n注意：可能需要训练足够长的时间，具体可见train.log\n\n测试：python test.py --dataset=DUIE --batch_size=4 --rel_num=48\n```\n\n****\n\nOneRel在中文关系抽取中的使用。使用的数据集是百度关系抽取数据集DUIE。中文预训练模型是bert-base-chinese。数据和训练好的模型下载：\u003cbr\u003e\n链接：https://pan.baidu.com/s/1vyDIqCspTIaGOj5tSGlCJg\u003cbr\u003e\n提取码：uccp\n\n# 依赖\n```python\n# keras_bert==0.88.0\nmatplotlib==3.3.2\nnumpy==1.19.2\nscikit-learn==1.0.1\ntorch==1.8.1+cu111\ntransformers==4.5.1\n# tensorflow==2.2.0\n# keras==2.4.3\n```\n\n# 说明\n基于原始论文《OneRel: Joint Entity and Relation Extraction with One Module in One Step》的代码：~~https://github.com/ssnvxia/OneRel~~ https://github.com/China-ChallengeHub/OneRel 进行的修改，主要变动的地方如下：\n- utils/tokenizer.py：里面修改为针对于中文的token化。\n- framework/framework.py：里面test时解码去除掉token化时添加的空格。在re_collate_fn函数里面```batch_triple_matrix = torch.LongTensor(cur_batch_len, 48, ax_text_len, max_text_len).zero_()```需要修改为关系的数目。\n- process.py：新增的数据处理文件，主要将duie的数据转换为onerel所需要的格式。\n特别需要注意的是onerel会在每一个token后面都添加一个空格，也就是说原始文本为100，那么输入到模型里面的文本长度就是200，因此要考虑到显存的问题。\n\n# 训练和测试\n```python\npython train.py --dataset=DUIE --batch_size=4 --rel_num=48\n```\n由于数据量太大，这里只运行了一个epoch。具体可以在config/config.py以及framework/framework.py的里面进行需修改。结果：\n```\n......\nepoch:   0, step: 35500, speed: 257.69ms/b, train loss: 0.001\nepoch:   0, step: 35600, speed: 252.92ms/b, train loss: 0.001\nepoch   0, eval time: 979.91s, f1: 0.672, precision: 0.622, recall: 0.730\nsaving the model, epoch:   0, precision: 0.622, recall: 0.730, best f1: 0.672\nfinish training\nbest epoch:   0, precision: 0.622, recall: 0.73, best f1: 0.672, total time: 10157.71s\n```\n此时会保存结果在result/DUIE/RESULT_OneRel_DATASET_DUIE_LR_1e-05_BS_4Max_len100Bert_ML200DP_0.2EDP_0.1.json。部分结果如下所示：\n```\n{       \n    \"text\": \"摩 尔 多 瓦 共 和 国 （ 摩 尔 多 瓦 语 ： republica moldova ， 英 语 ： republic of moldova ） ， 简\n 称 摩 尔 多 瓦 ， 是 位 于 东 南 欧 的 内 陆 国 ， 与 罗 马 尼 亚 和 乌 克 兰 接 壤 ， 首 都 基 希 讷 乌\",\n    \"triple_list_gold\": [\n        {   \n            \"subject\": \"摩尔多瓦\",\n            \"relation\": \"首都\",\n            \"object\": \"基希讷乌\"\n        }\n    ],      \n    \"triple_list_pred\": [\n        {   \n            \"subject\": \"摩尔多瓦\",\n            \"relation\": \"首都\",\n            \"object\": \"基希讷乌\"\n        }   \n    ],      \n    \"new\": [],\n    \"lack\": []\n}\n{\n    \"text\": \"这 件 婚 事 原 本 与 陈 国 峻 无 关 ， 但 陈 国 峻 却 [UNK] 欲 求 配 而 无 由 ， 夜 间 乃 潜 入 天 \u003e城 公 主 所 居 通 之\",\n    \"triple_list_gold\": [\n        {\n            \"subject\": \"国峻\",\n            \"relation\": \"妻子\",\n            \"object\": \"天城公主\"\n        },\n        {\n            \"subject\": \"天城公主\",\n            \"relation\": \"丈夫\",\n            \"object\": \"国峻\"\n        }\n    ],\n    \"triple_list_pred\": [\n        {\n            \"subject\": \"天城公主\",\n            \"relation\": \"丈夫\",\n            \"object\": \"陈国峻\"\n        },\n        {\n            \"subject\": \"陈国峻\",\n            \"relation\": \"妻子\",\n            \"object\": \"天城公主\"\n        }\n    ],\n\t\"new\": [\n        {\n            \"subject\": \"天城公主\",\n            \"relation\": \"丈夫\",\n            \"object\": \"陈国峻\"\n        },\n        {\n            \"subject\": \"陈国峻\",\n            \"relation\": \"妻子\",\n            \"object\": \"天城公主\"\n        }\n    ],\n    \"lack\": [\n        {\n            \"subject\": \"国峻\",\n            \"relation\": \"妻子\",\n            \"object\": \"天城公主\"\n        },\n        {\n            \"subject\": \"天城公主\",\n            \"relation\": \"丈夫\",\n            \"object\": \"国峻\"\n        }\n    ]\n}\n```\n测试：\n```python\npython test.py --dataset=DUIE --rel_num=48\n```\n\n# 感谢\n\u003e https://github.com/ssnvxia/OneRel\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaishan1994%2Fonerel_chinese","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftaishan1994%2Fonerel_chinese","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftaishan1994%2Fonerel_chinese/lists"}