{"id":13595351,"url":"https://github.com/loujie0822/DeepIE","last_synced_at":"2025-04-09T10:33:28.659Z","repository":{"id":37699549,"uuid":"232294092","full_name":"loujie0822/DeepIE","owner":"loujie0822","description":"DeepIE: Deep Learning for Information Extraction","archived":false,"fork":false,"pushed_at":"2022-12-09T00:09:12.000Z","size":1548,"stargazers_count":1950,"open_issues_count":21,"forks_count":356,"subscribers_count":47,"default_branch":"master","last_synced_at":"2025-04-08T08:11:57.665Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/loujie0822/DeepIE","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/loujie0822.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-01-07T09:51:24.000Z","updated_at":"2025-04-07T07:22:04.000Z","dependencies_parsed_at":"2023-01-25T10:30:22.957Z","dependency_job_id":null,"html_url":"https://github.com/loujie0822/DeepIE","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loujie0822%2FDeepIE","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loujie0822%2FDeepIE/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loujie0822%2FDeepIE/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/loujie0822%2FDeepIE/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/loujie0822","download_url":"https://codeload.github.com/loujie0822/DeepIE/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248020593,"owners_count":21034459,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T16:01:48.381Z","updated_at":"2025-04-09T10:33:28.618Z","avatar_url":"https://github.com/loujie0822.png","language":"Python","funding_links":[],"categories":["Python","关系抽取、信息抽取"],"sub_categories":["其他_文本生成、文本对话"],"readme":"# DeepIE: Deep Learning  for Information Extraction \n\n**DeepIE**： 基于深度学习的信息抽取技术（预计2020年8月31日前全部更新完毕）\n\n## TOP \n\n- **知乎专栏文章**：[nlp中的实体关系抽取方法总结](https://github.com/loujie0822/DeepIE/blob/jielou/docs/实体关系抽取算法总结.md)\n- **知乎专栏文章**：[如何有效提升中文NER性能？词汇增强方法总结](https://zhuanlan.zhihu.com/p/142615620)\n- **知乎专栏文章**：[如何解决Transformer在NER任务中效果不佳的问题？](https://zhuanlan.zhihu.com/p/137315695)\n\n## Papers\n\n- [ACL2020信息抽取相关论文汇总](https://github.com/loujie0822/DeepIE/blob/master/docs/ACL2020信息抽取相关论文汇总.md)\n- [IJCAI2020信息抽取相关论文汇总](https://github.com/loujie0822/DeepIE/blob/master/docs/IJCAI2020_%E4%BF%A1%E6%81%AF%E6%8A%BD%E5%8F%96%E7%9B%B8%E5%85%B3%E8%AE%BA%E6%96%87%E5%90%88%E9%9B%86%20.md)\n- [2019各顶会中的关系抽取论文汇总](https://github.com/loujie0822/DeepIE/blob/master/docs/2019各顶会中的关系抽取论文]汇总.md)\n- [事件抽取论文汇总](https://github.com/loujie0822/DeepIE/blob/master/docs/事件抽取论文汇总.md)\n- [历年来NER论文汇总](https://github.com/loujie0822/DeepIE/blob/master/docs/历年来NER论文汇总.md)\n\n## Codes\n\n#### 1. 实体抽取\n\n- **各主流方法在主要中文NER数据集上的表现情况**  [具体说明](https://github.com/loujie0822/DeepIE/blob/master/docs/各主流方法在中文NER上的表现情况.md)\n\n|                | **lexicon** | **Ontonotes** | **MSRA**  | **Resume** | **Weibo** |\n| -------------- | ----------- | ------------- | --------- | ---------- | --------- |\n| biLSTM         | ----        | 71.81         | 91.87     | 94.41      | 56.75     |\n| Lattice  LSTM  | 词表1       | 73.88         | 93.18     | 94.46      | 58.79     |\n| WC-LSTM        | 词表1       | 74.43         | 93.36     | 94.96      | 49.86     |\n| LR-CNN         | 词表1       | 74.45         | 93.71     | 95.11      | 59.92     |\n| CGN            | 词表2       | 74.79         | 93.47     | 94.12      | 63.09     |\n| LGN            | 词表1       | 74.85         | 93.63     | 95.41      | 60.15     |\n| Simple-Lexicon | 词表1       | 75.54         | 93.50     | **95.59**  | 61.24     |\n| FLAT           | 词表1       | **76.45**     | 94.12     | 95.45      | 60.32     |\n| FLAT           | 词表2       | 75.70         | **94.35** | 94.93      | **63.42** |\n| BERT           | ----        | 80.14         | 94.95     | 95.53      | 68.20     |\n| BERT+FLAT      | 词表1       | **81.82**     | **96.09** | **95.86**  | **68.55** |\n\n- **MSRA-NER**\n\n| 方法                                         | f          | p          | r          |\n| -------------------------------------------- | ---------- | ---------- | ---------- |\n| char+ lstm-crf                               | 86.18%     | 88.43%     | 83.10%     |\n| char-bigram + lstm-crf                       | 91.80%     | 92.60%     | 90.34%     |\n| char-bigram + adTransformer-crf              | 92.98%     | 93.25%     | 92.72%     |\n| char-bigram + lexion-augment + lstm-crf      | 93.33%     | 94.26%     | 92.43%     |\n| char-bigram-BERT + lstm-crf                  | 94.71%     | 95.14%     | 94.27%     |\n| char-bigram-BERT + lexion-augment + lstm-crf | **95.26%** | **95.90%** | **94.63%** |\n\n- **CCKS2019-医疗实体抽取**\n\n| 方法                                         | f          | p          | r          |\n| -------------------------------------------- | ---------- | ---------- | ---------- |\n| char-bigram + lstm-crf                       | 81.76%     | 82.91%     | 80.6       |\n| + domain transfer（from ccks2018 to 2019）   | 82.54%     | 83.43%     | 81.81%     |\n| char-bigram + adTransformer-crf              | 82.83%     | 82.19%     | 83.49%     |\n| char-bigram + lexion-augment + lstm-crf      | 82.76%     | 82.79%     | 82.72%     |\n| BERT-finetune+crf                            | 83.49%     | 84.11%     | 82.89%     |\n| roBERTa-finetune+crf                         | 83.66%     | 83.67%     | 83.66%     |\n| char-bigram-BERT + lstm-crf                  | 83.37%     | 83.51%     | 83.22%     |\n| char-bigram-BERT + lexion-augment + lstm-crf | **84.15%** | **84.29%** | **84.01%** |\n\n- **CCKS2020-医疗实体抽取**：\n\n(注：测试集与ccks2019一致，去除ccks2020训练集中已经在2019测试集中的样本，下列指标未做规则处理和模型融合)\n\n| 方法                                         | f      | p      | r      |\n| -------------------------------------------- | ------ | ------ | ------ |\n| char-bigram + lstm-crf                       | 82.68% | 83.14% | 82.22% |\n| char-bigram + lexion-augment + lstm-crf      | 83.12% | 83.10% | 83.14% |\n| char-bigram-BERT + lstm-crf                  | 83.12% | 83.04% | 83.21% |\n| char-bigram-BERT-RoBerta_wwm + lstm-crf      | 83.66% | 83.76% | 83.56% |\n| char-bigram-BERT-XLNet + lstm-crf            | 84.12% | 83.88% | 84.36% |\n| char-bigram-BERT + lexion-augment + lstm-crf | 84.50% | 84.32% | 84.67% |\n\n- **CCKS2020-面向试验鉴定的命名实体识别任务**：TODO\n\n  \n\n#### 2. 实体关系联合抽取\n\n[具体使用说明](https://github.com/loujie0822/DeepIE/blob/master/docs/关系抽取run说明.md)\n\n- 2019语言与智能技术竞赛：关系抽取任务 \n\n| 方法                                       | f(dev)     | p(dev)     | r(dev)     |\n| ------------------------------------------ | ---------- | ---------- | ---------- |\n| multi head selection                       | 76.36      | 79.24      | 73.69      |\n| ETL-BIES                                   | 77.07%     | 77.13%     | 77.06%     |\n| ETL-Span                                   | 78.94%     | 80.11%     | 77.8%      |\n| ETL-Span + word2vec                        | 79.99%     | 80.62%     | 79.38%     |\n| ETL-Span + word2vec + adversarial training | 80.38%     | 79.95%     | 80.82%     |\n| ETL-Span + BERT                            | **81.88%** | **82.35%** | **81.42%** |\n\n- 2020语言与智能技术竞赛：关系抽取任务\n\n| 方法            | f(dev) | p(dev) | r(dev) |\n| --------------- | ------ | ------ | ------ |\n| ETL-Span + BERT | 74.58  | 74.44  | 74.71  |\n\n\n\n#### 3. 属性抽取\n\n- **领域数据集：瑞金医院糖尿病信息抽取数据**\n\n```\n# 药物-属性\n['药品-用药频率','药品-持续时间','药品-用药剂量','药品-用药方法','药品-不良反应']\n# 疾病-属性\n['疾病-检查方法','疾病-临床表现','疾病-非药治疗','疾病-药品名称','疾病-部位']\n```\n\n| 主体 | 方法                               | f     | p     | r     |\n| ---- | ---------------------------------- | ----- | ----- | ----- |\n| 疾病 | lstm+ multi-label pointer network  | 76.55 | 74.36 | 78.86 |\n| 疾病 | bert + multi-label pointer network | 77.59 | 77.45 | 77.74 |\n| 药物 | lstm+ multi-label pointer network  | 81.12 | 79.15 | 83.19 |\n\n\n\n#### 4. 实体链接/标准化\n\n\n\n#### 5.事件抽取\n\n- **CCKS2020-医疗事件抽取**\n\n- **CCKS2020：面向金融领域的篇章级事件主体抽取**\n\n- **CCKS2020：面向金融领域的篇章级事件要素抽取**\n\n  \n\n\n#### 6.信息抽取中的低资源解决方案\n\n\n\n## TODO-list\n\n- [ ] 信息抽取领域的数据资源汇总：\n  - 医疗\n  - 金融\n  - 电商\n  - 法律\n- [ ] 信息抽取相关竞赛汇总：\n  - 百度-2020语言与智能技术竞赛：关系抽取任务\n  - 百度-2020语言与智能技术竞赛：事件抽取任务\n  - 百度-2019语言与智能技术竞赛：信息抽取\n  - CCKS 2019 医疗命名实体识别\n  - CHIP 2019 临床术语标准化任务\n  - CCKS 2019 人物关系抽取\n  - CCKS 2019 公众公司公告信息抽取\n  - CCKS 2019 面向金融领域的事件主体抽取\n\n- 摘要抽取\n\n- 前沿技术在信息抽取中的应用\n\n## Reference\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Floujie0822%2FDeepIE","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Floujie0822%2FDeepIE","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Floujie0822%2FDeepIE/lists"}