{"id":28676520,"url":"https://github.com/zjunlp/hvpnet","last_synced_at":"2025-09-12T19:35:48.554Z","repository":{"id":41278037,"uuid":"439526108","full_name":"zjunlp/HVPNeT","owner":"zjunlp","description":"[NAACL 2022 Findings] Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction","archived":false,"fork":false,"pushed_at":"2025-03-13T12:28:32.000Z","size":1976,"stargazers_count":110,"open_issues_count":0,"forks_count":11,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-06-13T23:04:58.685Z","etag":null,"topics":["bert","dataset","entity-extraction","hvpnet","information-extraction","kg","multimodal","multimodal-knowledge-graph","multimodal-learning","naacl","ner","prefix","pytorch","re","relation-extraction"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-12-18T04:19:56.000Z","updated_at":"2025-04-15T10:17:05.000Z","dependencies_parsed_at":"2025-03-13T13:26:30.157Z","dependency_job_id":"4609d40a-7e4e-416b-84b5-620ccdcf818a","html_url":"https://github.com/zjunlp/HVPNeT","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/HVPNeT","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FHVPNeT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FHVPNeT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FHVPNeT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FHVPNeT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/HVPNeT/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FHVPNeT/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":274864182,"owners_count":25364230,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-12T02:00:09.324Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","dataset","entity-extraction","hvpnet","information-extraction","kg","multimodal","multimodal-knowledge-graph","multimodal-learning","naacl","ner","prefix","pytorch","re","relation-extraction"],"created_at":"2025-06-13T23:04:58.738Z","updated_at":"2025-09-12T19:35:48.499Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","readme":"# HVPNet\n\nCode for the NAACL2022 (Findings) paper \"[Good Visual Guidance Makes A Better Extractor: Hierarchical Visual Prefix for Multimodal Entity and Relation Extraction](https://arxiv.org/pdf/2205.03521.pdf)\".\n\nModel Architecture\n==========\n\u003cdiv align=center\u003e\n\u003cimg src=\"resource/model.png\" width=\"80%\" height=\"80%\" /\u003e\n\u003c/div\u003e\nThe overall architecture of our hierarchical modality fusion network.\n\n\nRequirements\n==========\nTo run the codes, you need to install the requirements:\n```\npip install -r requirements.txt\n```\n\nData Preprocess\n==========\nTo extract visual object images, we first use the NLTK parser to extract noun phrases from the text and apply the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Detailed steps are as follows:\n\n1. Using the NLTK parser (or Spacy, textblob) to extract noun phrases from the text.\n2. Applying the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Taking the twitter2015 dataset as an example, the extracted objects are stored in `twitter2015_aux_images`. The images of the object obey the following naming format: `imgname_pred_yolo_crop_num.png`, where `imgname` is the name of the raw image corresponding to the object, `num` is the number of the object predicted by the toolkit. (Note that in `train/val/test.txt`, text and raw image have a one-to-one relationship, so the `imgname` can be used as a unique identifier for the raw images)\n3. Establishing the correspondence between the raw images and the objects. We construct a dictionary to record the correspondence between the raw images and the objects. Taking `twitter2015/twitter2015_train_dict.pth` as an example, the format of the dictionary can be seen as follows: `{imgname:['imgname_pred_yolo_crop_num0.png', 'imgname_pred_yolo_crop_num1.png', ...] }`, where key is the name of raw images, value is a List of the objects.\n\nThe detected objects and the dictionary of the correspondence between the raw images and the objects are available in our data links.\n\nData Download\n==========\n\n+ Twitter2015 \u0026 Twitter2017\n\n    The text data follows the conll format. You can download the Twitter2015 data via this [link](https://drive.google.com/file/d/1qAWrV9IaiBadICFb7mAreXy3llao_teZ/view?usp=sharing) and download the Twitter2017 data via this [link](https://drive.google.com/file/d/1ogfbn-XEYtk9GpUECq1-IwzINnhKGJqy/view?usp=sharing). Please place them in `data/NER_data`.\n\n    You can also put them anywhere and modify the path configuration in `run.py`\n\n+ MNRE\n    \n    The MNRE dataset comes from [MEGA](https://github.com/thecharm/MNRE), many thanks.\n\n    You can download the MRE dataset with detected visual objects from [Google Drive](https://drive.google.com/file/d/1q5_5vnHJ8Hik1iLA9f5-6nstcvvntLrS/view?usp=sharing) or use the following commands:\n    ```bash\n    cd data\n    wget 120.27.214.45/Data/re/multimodal/data.tar.gz\n    tar -xzvf data.tar.gz\n    mv data RE_data\n    ```\n\nThe expected structure of files is:\n\n```\nHMNeT\n |-- data\n |    |-- NER_data\n |    |    |-- twitter2015  # text data\n |    |    |    |-- train.txt\n |    |    |    |-- valid.txt\n |    |    |    |-- test.txt\n |    |    |    |-- twitter2015_train_dict.pth  # {imgname: [object-image]}\n |    |    |    |-- ...\n |    |    |-- twitter2015_images       # raw image data\n |    |    |-- twitter2015_aux_images   # object image data\n |    |    |-- twitter2017\n |    |    |-- twitter2017_images\n |    |    |-- twitter2017_aux_images\n |    |-- RE_data\n |    |    |-- img_org          # raw image data\n |    |    |-- img_vg           # object image data\n |    |    |-- txt              # text data\n |    |    |-- ours_rel2id.json # relation data\n |-- models\t# models\n |    |-- bert_model.py\n |    |-- modeling_bert.py\n |-- modules\n |    |-- metrics.py    # metric\n |    |-- train.py  # trainer\n |-- processor\n |    |-- dataset.py    # processor, dataset\n |-- logs     # code logs\n |-- run.py   # main \n |-- run_ner_task.sh\n |-- run_re_task.sh\n```\n\nTrain\n==========\n\n## NER Task\n\nThe data path and GPU related configuration are in the `run.py`. To train ner model, run this script.\n\n```shell\nbash run_twitter15.sh\nbash run_twitter17.sh\n```\n\n## RE Task\n\nTo train re model, run this script.\n\n```shell\nbash run_re_task.sh\n```\n\nTest\n==========\n## NER Task\n\nTo test ner model, you can use the tained model and set `load_path` to the model path, then run following script:\n\n```shell\npython -u run.py \\\n      --dataset_name=\"twitter15/twitter17\" \\\n      --bert_name=\"bert-base-uncased\" \\\n      --seed=1234 \\\n      --only_test \\\n      --max_seq=80 \\\n      --use_prompt \\\n      --prompt_len=4 \\\n      --sample_ratio=1.0 \\\n      --load_path='your_ner_ckpt_path'\n\n```\n\n## RE Task\n\nTo test re model, you can use the tained model and set `load_path` to the model path, then run following script:\n\n```shell\npython -u run.py \\\n      --dataset_name=\"MRE\" \\\n      --bert_name=\"bert-base-uncased\" \\\n      --seed=1234 \\\n      --only_test \\\n      --max_seq=80 \\\n      --use_prompt \\\n      --prompt_len=4 \\\n      --sample_ratio=1.0 \\\n      --load_path='your_re_ckpt_path'\n\n```\n\nAcknowledgement\n==========\n\nThe acquisition of Twitter15 and Twitter17 data refer to the code from [UMT](https://github.com/jefferyYu/UMT/), many thanks.\n\nThe acquisition of MNRE data for multimodal relation extraction task refer to the code from [MEGA](https://github.com/thecharm/Mega), many thanks.\n\nPapers for the Project \u0026 How to Cite\n==========\n\n\nIf you use or extend our work, please cite the paper as follows:\n\n```bibtex\n@inproceedings{DBLP:conf/naacl/ChenZLYDTHSC22,\n  author    = {Xiang Chen and\n               Ningyu Zhang and\n               Lei Li and\n               Yunzhi Yao and\n               Shumin Deng and\n               Chuanqi Tan and\n               Fei Huang and\n               Luo Si and\n               Huajun Chen},\n  editor    = {Marine Carpuat and\n               Marie{-}Catherine de Marneffe and\n               Iv{\\'{a}}n Vladimir Meza Ru{\\'{\\i}}z},\n  title     = {Good Visual Guidance Make {A} Better Extractor: Hierarchical Visual\n               Prefix for Multimodal Entity and Relation Extraction},\n  booktitle = {Findings of the Association for Computational Linguistics: {NAACL}\n               2022, Seattle, WA, United States, July 10-15, 2022},\n  pages     = {1607--1618},\n  publisher = {Association for Computational Linguistics},\n  year      = {2022},\n  url       = {https://doi.org/10.18653/v1/2022.findings-naacl.121},\n  doi       = {10.18653/v1/2022.findings-naacl.121},\n  timestamp = {Tue, 23 Aug 2022 08:36:33 +0200},\n  biburl    = {https://dblp.org/rec/conf/naacl/ChenZLYDTHSC22.bib},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fhvpnet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fhvpnet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fhvpnet/lists"}