https://github.com/zjunlp/hvpnet

[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction
https://github.com/zjunlp/hvpnet

bert dataset entity-extraction hvpnet information-extraction kg multimodal multimodal-knowledge-graph multimodal-learning naacl ner prefix pytorch re relation-extraction

Last synced: 8 months ago
JSON representation

[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction

Host: GitHub
URL: https://github.com/zjunlp/hvpnet
Owner: zjunlp
License: mit
Created: 2021-12-18T04:19:56.000Z (over 4 years ago)
Default Branch: main
Last Pushed: 2025-03-13T12:28:32.000Z (about 1 year ago)
Last Synced: 2025-06-13T23:04:58.685Z (11 months ago)
Topics: bert, dataset, entity-extraction, hvpnet, information-extraction, kg, multimodal, multimodal-knowledge-graph, multimodal-learning, naacl, ner, prefix, pytorch, re, relation-extraction
Language: Python
Homepage:
Size: 1.88 MB
Stars: 110
Watchers: 7
Forks: 11
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # HVPNet

Code for the NAACL2022 (Findings) paper "[Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction](https://arxiv.org/pdf/2205.03521.pdf)".

Model Architecture

==========







The overall architecture of our hierarchical modality fusion network.

Requirements

==========

To run the codes, you need to install the requirements:

```

pip install -r requirements.txt

```

Data Preprocess

==========

To extract visual object images, we first use the NLTK parser to extract noun phrases from the text and apply the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Detailed steps are as follows:

1. Using the NLTK parser (or Spacy, textblob) to extract noun phrases from the text.

2. Applying the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Taking the twitter2015 dataset as an example, the extracted objects are stored in `twitter2015_aux_images`. The images of the object obey the following naming format: `imgname_pred_yolo_crop_num.png`, where `imgname` is the name of the raw image corresponding to the object, `num` is the number of the object predicted by the toolkit. (Note that in `train/val/test.txt`, text and raw image have a one-to-one relationship, so the `imgname` can be used as a unique identifier for the raw images)

3. Establishing the correspondence between the raw images and the objects. We construct a dictionary to record the correspondence between the raw images and the objects. Taking `twitter2015/twitter2015_train_dict.pth` as an example, the format of the dictionary can be seen as follows: `{imgname:['imgname_pred_yolo_crop_num0.png', 'imgname_pred_yolo_crop_num1.png', ...] }`, where key is the name of raw images, value is a List of the objects.

The detected objects and the dictionary of the correspondence between the raw images and the objects are available in our data links.

Data Download

==========

+ Twitter2015 & Twitter2017

    The text data follows the conll format. You can download the Twitter2015 data via this [link](https://drive.google.com/file/d/1qAWrV9IaiBadICFb7mAreXy3llao_teZ/view?usp=sharing) and download the Twitter2017 data via this [link](https://drive.google.com/file/d/1ogfbn-XEYtk9GpUECq1-IwzINnhKGJqy/view?usp=sharing). Please place them in `data/NER_data`.

    You can also put them anywhere and modify the path configuration in `run.py`

+ MNRE

    

    The MNRE dataset comes from [MEGA](https://github.com/thecharm/MNRE), many thanks.

    You can download the MRE dataset with detected visual objects from [Google Drive](https://drive.google.com/file/d/1q5_5vnHJ8Hik1iLA9f5-6nstcvvntLrS/view?usp=sharing) or use the following commands:

    ```bash

    cd data

    wget 120.27.214.45/Data/re/multimodal/data.tar.gz

    tar -xzvf data.tar.gz

    mv data RE_data

    ```

The expected structure of files is:

```

HMNeT

 |-- data

 |    |-- NER_data

 |    |    |-- twitter2015  # text data

 |    |    |    |-- train.txt

 |    |    |    |-- valid.txt

 |    |    |    |-- test.txt

 |    |    |    |-- twitter2015_train_dict.pth  # {imgname: [object-image]}

 |    |    |    |-- ...

 |    |    |-- twitter2015_images       # raw image data

 |    |    |-- twitter2015_aux_images   # object image data

 |    |    |-- twitter2017

 |    |    |-- twitter2017_images

 |    |    |-- twitter2017_aux_images

 |    |-- RE_data

 |    |    |-- img_org          # raw image data

 |    |    |-- img_vg           # object image data

 |    |    |-- txt              # text data

 |    |    |-- ours_rel2id.json # relation data

 |-- models	# models

 |    |-- bert_model.py

 |    |-- modeling_bert.py

 |-- modules

 |    |-- metrics.py    # metric

 |    |-- train.py  # trainer

 |-- processor

 |    |-- dataset.py    # processor, dataset

 |-- logs     # code logs

 |-- run.py   # main 

 |-- run_ner_task.sh

 |-- run_re_task.sh

```

Train

==========

## NER Task

The data path and GPU related configuration are in the `run.py`. To train ner model, run this script.

```shell

bash run_twitter15.sh

bash run_twitter17.sh

```

## RE Task

To train re model, run this script.

```shell

bash run_re_task.sh

```

Test

==========

## NER Task

To test ner model, you can use the tained model and set `load_path` to the model path, then run following script:

```shell

python -u run.py \

      --dataset_name="twitter15/twitter17" \

      --bert_name="bert-base-uncased" \

      --seed=1234 \

      --only_test \

      --max_seq=80 \

      --use_prompt \

      --prompt_len=4 \

      --sample_ratio=1.0 \

      --load_path='your_ner_ckpt_path'

```

## RE Task

To test re model, you can use the tained model and set `load_path` to the model path, then run following script:

```shell

python -u run.py \

      --dataset_name="MRE" \

      --bert_name="bert-base-uncased" \

      --seed=1234 \

      --only_test \

      --max_seq=80 \

      --use_prompt \

      --prompt_len=4 \

      --sample_ratio=1.0 \

      --load_path='your_re_ckpt_path'

```

Acknowledgement

==========

The acquisition of Twitter15 and Twitter17 data refer to the code from [UMT](https://github.com/jefferyYu/UMT/), many thanks.

The acquisition of MNRE data for multimodal relation extraction task refer to the code from [MEGA](https://github.com/thecharm/Mega), many thanks.

Papers for the Project & How to Cite

==========

If you use or extend our work, please cite the paper as follows:

```bibtex

@inproceedings{DBLP:conf/naacl/ChenZLYDTHSC22,

  author    = {Xiang Chen and

               Ningyu Zhang and

               Lei Li and

               Yunzhi Yao and

               Shumin Deng and

               Chuanqi Tan and

               Fei Huang and

               Luo Si and

               Huajun Chen},

  editor    = {Marine Carpuat and

               Marie{-}Catherine de Marneffe and

               Iv{\'{a}}n Vladimir Meza Ru{\'{\i}}z},

  title     = {Good Visual Guidance Make {A} Better Extractor: Hierarchical Visual

               Prefix for Multimodal Entity and Relation Extraction},

  booktitle = {Findings of the Association for Computational Linguistics: {NAACL}

               2022, Seattle, WA, United States, July 10-15, 2022},

  pages     = {1607--1618},

  publisher = {Association for Computational Linguistics},

  year      = {2022},

  url       = {https://doi.org/10.18653/v1/2022.findings-naacl.121},

  doi       = {10.18653/v1/2022.findings-naacl.121},

  timestamp = {Tue, 23 Aug 2022 08:36:33 +0200},

  biburl    = {https://dblp.org/rec/conf/naacl/ChenZLYDTHSC22.bib},

  bibsource = {dblp computer science bibliography, https://dblp.org}

}

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zjunlp/hvpnet

Awesome Lists containing this project

README