https://github.com/taishan1994/w2ner_predict

[Unofficial] Predict code for AAAI 2022 paper: Unified Named Entity Recognition as Word-Word Relation Classification
https://github.com/taishan1994/w2ner_predict

Last synced: 3 months ago
JSON representation

[Unofficial] Predict code for AAAI 2022 paper: Unified Named Entity Recognition as Word-Word Relation Classification

Host: GitHub
URL: https://github.com/taishan1994/w2ner_predict
Owner: taishan1994
Created: 2022-03-21T03:43:23.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-09-05T06:05:52.000Z (about 3 years ago)
Last Synced: 2023-03-04T13:53:24.570Z (over 2 years ago)
Language: Python
Homepage:
Size: 282 KB
Stars: 31
Watchers: 1
Forks: 6
Open Issues: 7
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # W2NER_predict

\[Unofficial\] Predict code for AAAI 2022 paper: Unified Named Entity Recognition as Word-Word Relation Classification

# 说明

该项目是对上述论文的一个复现，并添加了预测代码。以及部分注释。


源论文：https://arxiv.org/pdf/2112.10070.pdf


源代码：https://github.com/ljynlp/w2ner

# 步骤

## 1、安装以下依赖

```

- numpy (1.21.4)

- torch (1.10.0)

- gensim (4.1.2)

- transformers (4.13.0)

- pandas (1.3.4)

- scikit-learn (1.0.1)

- prettytable (2.4.0)

```

## 2、下载数据

这里以resume-zh为例，数据下载地址：link。需要翻墙。


下载好resume-zh将其放置在data文件夹下。

## 3、训练、验证及测试

执行训练语句：

```python

python main.py --config ./config/resume-zh.json

```

训练好后会在主目录下生成model.pt文件。结果：

```

......

2022-03-21 04:06:44 - INFO: Epoch: 9

2022-03-21 04:08:37 - INFO: 

+---------+--------+--------+-----------+--------+

| Train 9 |  Loss  |   F1   | Precision | Recall |

+---------+--------+--------+-----------+--------+

|  Label  | 0.0005 | 0.9762 |   0.9791  | 0.9735 |

+---------+--------+--------+-----------+--------+

2022-03-21 04:08:41 - INFO: EVAL Label F1 [0.99977967 0.98848258 0.99099099 1.         1.         0.95695839

 0.97695853 0.94339623 0.92307692 1.        ]

2022-03-21 04:08:41 - INFO: 

+--------+--------+-----------+--------+

| EVAL 9 |   F1   | Precision | Recall |

+--------+--------+-----------+--------+

| Label  | 0.9780 |   0.9661  | 0.9911 |

| Entity | 0.9577 |   0.9542  | 0.9613 |

+--------+--------+-----------+--------+

2022-03-21 04:08:46 - INFO: TEST Label F1 [0.99978522 0.98747049 0.99115044 1.         1.         0.95752896

 0.98666667 0.94429708 0.90410959 1.        ]

2022-03-21 04:08:46 - INFO: 

+--------+--------+-----------+--------+

| TEST 9 |   F1   | Precision | Recall |

+--------+--------+-----------+--------+

| Label  | 0.9771 |   0.9655  | 0.9905 |

| Entity | 0.9617 |   0.9602  | 0.9632 |

+--------+--------+-----------+--------+

2022-03-21 04:08:46 - INFO: Best DEV F1: 0.9597

2022-03-21 04:08:46 - INFO: Best TEST F1: 0.9621

2022-03-21 04:08:51 - INFO: TEST Label F1 [0.99978775 0.98756339 0.97391304 1.         1.         0.95696853

 0.98666667 0.95169946 0.89189189 1.        ]

2022-03-21 04:08:51 - INFO: 

+------------+--------+-----------+--------+

| TEST Final |   F1   | Precision | Recall |

+------------+--------+-----------+--------+

|   Label    | 0.9748 |   0.9614  | 0.9906 |

|   Entity   | 0.9621 |   0.9591  | 0.9650 |

+------------+--------+-----------+--------+

```

## 4、预测

预测的数据处理代码在data_loader.py里面，主运行代码在predict.py里面。执行指令：

```python

python predict.py --config ./config/resume-zh.json

```

预测结果：

```

texts = [

  "高勇，男，中国国籍，无境外居留权。",

  "常见量，男。"

]

[

  ('高勇，男，中国国籍，无境外居留权。', ('中国国籍', 'cont', 5, 8), ('高勇', 'name', 0, 1)), 

  ('常见量，男。', ('常见量', 'name', 0, 2))

]

```

最后需要注意的是源代码默认是使用gpu的，如果要切换的cpu，那么相关的地方还要进行修改。

# Acknowledgement

> https://github.com/ljynlp/w2ner

# 补充

[信息抽取三剑客：实体抽取、关系抽取、事件抽取](https://github.com/taishan1994/chinese_information_extraction)


[基于机器阅读理解的命名实体识别](https://github.com/taishan1994/BERT_MRC_NER_chinese)


[pytorch_bert_bilstm_crf命名实体识别](https://github.com/taishan1994/pytorch_bert_bilstm_crf_ner)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/taishan1994/w2ner_predict

Awesome Lists containing this project

README