https://github.com/zjunlp/hvpnet
[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
https://github.com/zjunlp/hvpnet
bert dataset entity-extraction hvpnet information-extraction kg multimodal multimodal-knowledge-graph multimodal-learning naacl ner prefix pytorch re relation-extraction
Last synced: 7 months ago
JSON representation
[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
- Host: GitHub
- URL: https://github.com/zjunlp/hvpnet
- Owner: zjunlp
- License: mit
- Created: 2021-12-18T04:19:56.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-03-13T12:28:32.000Z (about 1 year ago)
- Last Synced: 2025-06-13T23:04:58.685Z (10 months ago)
- Topics: bert, dataset, entity-extraction, hvpnet, information-extraction, kg, multimodal, multimodal-knowledge-graph, multimodal-learning, naacl, ner, prefix, pytorch, re, relation-extraction
- Language: Python
- Homepage:
- Size: 1.88 MB
- Stars: 110
- Watchers: 7
- Forks: 11
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HVPNet
Code for the NAACL2022 (Findings) paper "[Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction](https://arxiv.org/pdf/2205.03521.pdf)".
Model Architecture
==========
The overall architecture of our hierarchical modality fusion network.
Requirements
==========
To run the codes, you need to install the requirements:
```
pip install -r requirements.txt
```
Data Preprocess
==========
To extract visual object images, we first use the NLTK parser to extract noun phrases from the text and apply the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Detailed steps are as follows:
1. Using the NLTK parser (or Spacy, textblob) to extract noun phrases from the text.
2. Applying the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Taking the twitter2015 dataset as an example, the extracted objects are stored in `twitter2015_aux_images`. The images of the object obey the following naming format: `imgname_pred_yolo_crop_num.png`, where `imgname` is the name of the raw image corresponding to the object, `num` is the number of the object predicted by the toolkit. (Note that in `train/val/test.txt`, text and raw image have a one-to-one relationship, so the `imgname` can be used as a unique identifier for the raw images)
3. Establishing the correspondence between the raw images and the objects. We construct a dictionary to record the correspondence between the raw images and the objects. Taking `twitter2015/twitter2015_train_dict.pth` as an example, the format of the dictionary can be seen as follows: `{imgname:['imgname_pred_yolo_crop_num0.png', 'imgname_pred_yolo_crop_num1.png', ...] }`, where key is the name of raw images, value is a List of the objects.
The detected objects and the dictionary of the correspondence between the raw images and the objects are available in our data links.
Data Download
==========
+ Twitter2015 & Twitter2017
The text data follows the conll format. You can download the Twitter2015 data via this [link](https://drive.google.com/file/d/1qAWrV9IaiBadICFb7mAreXy3llao_teZ/view?usp=sharing) and download the Twitter2017 data via this [link](https://drive.google.com/file/d/1ogfbn-XEYtk9GpUECq1-IwzINnhKGJqy/view?usp=sharing). Please place them in `data/NER_data`.
You can also put them anywhere and modify the path configuration in `run.py`
+ MNRE
The MNRE dataset comes from [MEGA](https://github.com/thecharm/MNRE), many thanks.
You can download the MRE dataset with detected visual objects from [Google Drive](https://drive.google.com/file/d/1q5_5vnHJ8Hik1iLA9f5-6nstcvvntLrS/view?usp=sharing) or use the following commands:
```bash
cd data
wget 120.27.214.45/Data/re/multimodal/data.tar.gz
tar -xzvf data.tar.gz
mv data RE_data
```
The expected structure of files is:
```
HMNeT
|-- data
| |-- NER_data
| | |-- twitter2015 # text data
| | | |-- train.txt
| | | |-- valid.txt
| | | |-- test.txt
| | | |-- twitter2015_train_dict.pth # {imgname: [object-image]}
| | | |-- ...
| | |-- twitter2015_images # raw image data
| | |-- twitter2015_aux_images # object image data
| | |-- twitter2017
| | |-- twitter2017_images
| | |-- twitter2017_aux_images
| |-- RE_data
| | |-- img_org # raw image data
| | |-- img_vg # object image data
| | |-- txt # text data
| | |-- ours_rel2id.json # relation data
|-- models # models
| |-- bert_model.py
| |-- modeling_bert.py
|-- modules
| |-- metrics.py # metric
| |-- train.py # trainer
|-- processor
| |-- dataset.py # processor, dataset
|-- logs # code logs
|-- run.py # main
|-- run_ner_task.sh
|-- run_re_task.sh
```
Train
==========
## NER Task
The data path and GPU related configuration are in the `run.py`. To train ner model, run this script.
```shell
bash run_twitter15.sh
bash run_twitter17.sh
```
## RE Task
To train re model, run this script.
```shell
bash run_re_task.sh
```
Test
==========
## NER Task
To test ner model, you can use the tained model and set `load_path` to the model path, then run following script:
```shell
python -u run.py \
--dataset_name="twitter15/twitter17" \
--bert_name="bert-base-uncased" \
--seed=1234 \
--only_test \
--max_seq=80 \
--use_prompt \
--prompt_len=4 \
--sample_ratio=1.0 \
--load_path='your_ner_ckpt_path'
```
## RE Task
To test re model, you can use the tained model and set `load_path` to the model path, then run following script:
```shell
python -u run.py \
--dataset_name="MRE" \
--bert_name="bert-base-uncased" \
--seed=1234 \
--only_test \
--max_seq=80 \
--use_prompt \
--prompt_len=4 \
--sample_ratio=1.0 \
--load_path='your_re_ckpt_path'
```
Acknowledgement
==========
The acquisition of Twitter15 and Twitter17 data refer to the code from [UMT](https://github.com/jefferyYu/UMT/), many thanks.
The acquisition of MNRE data for multimodal relation extraction task refer to the code from [MEGA](https://github.com/thecharm/Mega), many thanks.
Papers for the Project & How to Cite
==========
If you use or extend our work, please cite the paper as follows:
```bibtex
@inproceedings{DBLP:conf/naacl/ChenZLYDTHSC22,
author = {Xiang Chen and
Ningyu Zhang and
Lei Li and
Yunzhi Yao and
Shumin Deng and
Chuanqi Tan and
Fei Huang and
Luo Si and
Huajun Chen},
editor = {Marine Carpuat and
Marie{-}Catherine de Marneffe and
Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},
title = {Good Visual Guidance Make {A} Better Extractor: Hierarchical Visual
Prefix for Multimodal Entity and Relation Extraction},
booktitle = {Findings of the Association for Computational Linguistics: {NAACL}
2022, Seattle, WA, United States, July 10-15, 2022},
pages = {1607--1618},
publisher = {Association for Computational Linguistics},
year = {2022},
url = {https://doi.org/10.18653/v1/2022.findings-naacl.121},
doi = {10.18653/v1/2022.findings-naacl.121},
timestamp = {Tue, 23 Aug 2022 08:36:33 +0200},
biburl = {https://dblp.org/rec/conf/naacl/ChenZLYDTHSC22.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
```