{"id":28676557,"url":"https://github.com/zjunlp/mkgformer","last_synced_at":"2025-06-13T23:05:08.297Z","repository":{"id":41053491,"uuid":"478457793","full_name":"zjunlp/MKGformer","owner":"zjunlp","description":"[SIGIR 2022] Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion","archived":false,"fork":false,"pushed_at":"2025-04-21T15:26:15.000Z","size":14989,"stargazers_count":182,"open_issues_count":2,"forks_count":32,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-21T16:33:17.827Z","etag":null,"topics":["dataset","former","kg","kgc","knowledge-graph","link-prediction","mkg","mkgformer","mnre","multimodal","ner","pytorch","relation-extraction","sigir2022","transformer"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zjunlp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-04-06T07:51:05.000Z","updated_at":"2025-04-21T15:26:19.000Z","dependencies_parsed_at":"2025-04-21T16:36:48.731Z","dependency_job_id":null,"html_url":"https://github.com/zjunlp/MKGformer","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/zjunlp/MKGformer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMKGformer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMKGformer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMKGformer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMKGformer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zjunlp","download_url":"https://codeload.github.com/zjunlp/MKGformer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zjunlp%2FMKGformer/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259732771,"owners_count":22903087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","former","kg","kgc","knowledge-graph","link-prediction","mkg","mkgformer","mnre","multimodal","ner","pytorch","relation-extraction","sigir2022","transformer"],"created_at":"2025-06-13T23:05:07.726Z","updated_at":"2025-06-13T23:05:08.291Z","avatar_url":"https://github.com/zjunlp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MKGFormer\n\nCode for the SIGIR 2022 paper \"[Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion](https://arxiv.org/pdf/2205.02357.pdf)\"\n\n- ❗NOTE: We provide some KGE baselines at [OpenBG-IMG](https://github.com/OpenBGBenchmark/OpenBG-IMG).\n- ❗NOTE: We release a new MKG task \"[Multimodal Analogical Reasoning over Knowledge Graphs](https://arxiv.org/abs/2210.00312) (ICLR'2023)\" at [MKG_Analogy](https://zjunlp.github.io/project/MKG_Analogy/).\n\n# Model Architecture\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"resource/model.png\" width=\"75%\" height=\"75%\" /\u003e\n\u003c/div\u003e\n \n \n Illustration of MKGformer for (a) Unified Multimodal KGC Framework and (b) Detailed M-Encoder.\n\n\n# Requirements\n\nTo run the codes (**Python 3.8**), you need to install the requirements:\n```\npip install -r requirements.txt\n```\n\nData Preprocess\n==========\nTo extract visual object images int MNER and MRE tasks, we first use the NLTK parser to extract noun phrases from the text and apply the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Detailed steps are as follows:\n\n1. Using the NLTK parser (or Spacy, textblob) to extract noun phrases from the text.\n2. Applying the [visual grouding toolkit](https://github.com/zyang-ur/onestage_grounding) to detect objects. Taking the twitter2017 dataset as an example, the extracted objects are stored in `twitter2017_aux_images`. The images of the object obey the following naming format: `id_pred_yolo_crop_num.png`, where `id` is the order of the raw image corresponding to the object, `num` is the number of the object predicted by the toolkit. (`id` is doesn't matter.)\n3. Establishing the correspondence between the raw images and the objects. We construct a dictionary to record the correspondence between the raw images and the objects. Taking `twitter2017/twitter2017_train_dict.pth` as an example, the format of the dictionary can be seen as follows: `{imgname:['id_pred_yolo_crop_num0.png', 'id_pred_yolo_crop_num1.png', ...] }`, where key is the name of raw images, value is a List of the objects (Note that in `train/val/test.txt`, text and raw image have a one-to-one relationship, so the `imgnae` can be used as a unique identifier for the raw images).\n\nThe detected objects and the dictionary of the correspondence between the raw images and the objects are available in our data links.\n\n# Data Download\n\nThe datasets that we used in our experiments are as follows:\n\n\n+ Twitter2017\n    \n    You can download the twitter2017 dataset from [Google Drive](https://drive.google.com/file/d/1ogfbn-XEYtk9GpUECq1-IwzINnhKGJqy/view?usp=sharing).\n\n    For more information regarding the dataset, please refer to the [UMT](https://github.com/jefferyYu/UMT/) repository.\n\n+ MRE\n    \n    The MRE dataset comes from [MEGA](https://github.com/thecharm/Mega), many thanks.\n\n    You can download the **MRE dataset with detected visual objects** from [Google Drive](https://drive.google.com/file/d/1q5_5vnHJ8Hik1iLA9f5-6nstcvvntLrS/view?usp=sharing) or using following command:\n    \n    ```bash\n    cd MRE\n    wget 121.41.117.246/Data/re/multimodal/data.tar.gz\n    tar -xzvf data.tar.gz\n    ```\n\n+ MKG\n\n    + FB15K-237-IMG\n\n        You can download the image data of FB15k-237 from [mmkb](https://github.com/mniepert/mmkb) which provides a list of image URLs, and refer to more information of description of entity from [kg-bert](https://github.com/yao8839836/kg-bert) repositories.\n        \n       - **❗NOTE: we have found a severe bug in the code of data preprocessing for FB15k-237-IMG, which leads to the unfair performance comparison; we have updated the performance in [arxiv](https://arxiv.org/pdf/2205.02357.pdf) and released the [checkpoints](https://drive.google.com/drive/folders/1NsLA7mXaVnhlYNvzRDWIcBxq2CpKF_6m) (The model trained with/without the severe bug).**\n\n    + WN18-IMG\n\n        Entity images in WN18 can be obtained from ImageNet, the specific steps can refer to RSME. the [RSME](https://github.com/wangmengsd/RSME) repository.\n\nWe also provide additional network disk links for **multimodal KG data (Images) at [GoogleDrive](https://drive.google.com/file/d/197c4fCLVC6F7sCBqZIDwm5tjWeGJnZ5-/view?usp=share_link) or [Baidu Pan](https://pan.baidu.com/s/1TVArQSLmPjr2FsC8NkSiOA) with extraction (code:ilbd)**.\n\nThe expected structure of files is:\n\n\n```\nMKGFormer\n |-- MKG\t# Multimodal Knowledge Graph\n |    |-- dataset       # task data\n |    |-- data          # data process file\n |    |-- lit_models    # lightning model\n |    |-- models        # mkg model\n |    |-- scripts       # running script\n |    |-- main.py   \n |-- MNER\t# Multimodal Named Entity Recognition\n |    |-- data          # task data\n |    |    |-- twitter2017\n |    |    |    |-- twitter17_detect            # rcnn detected objects\n |    |    |    |-- twitter2017_aux_images      # visual grounding objects\n |    |    |    |-- twitter2017_images          # raw images\n |    |    |    |-- train.txt                   # text data\n |    |    |    |-- ...\n |    |    |    |-- twitter2017_train_dict.pth  # {imgname: [object-image]}\n |    |    |    |-- ...\n |    |-- models        # mner model\n |    |-- modules       # running script\n |    |-- processor     # data process file\n |    |-- utils\n |    |-- run_mner.sh\n |    |-- run.py\n |-- MRE    # Multimodal Relation Extraction\n |    |-- data          # task data\n |    |    |-- img_detect   # rcnn detected objects\n |    |    |-- img_org      # raw images\n |    |    |-- img_vg       # visual grounding objects\n |    |    |-- txt          # text data\n |    |    |    |-- ours_train.txt\n |    |    |    |-- ours_val.txt\n |    |    |    |-- ours_test.txt\n |    |    |    |-- mre_train_dict.pth  # {imgid: [object-image]}\n |    |    |    |-- ...\n |    |    |-- vg_data      # [(id, imgname, noun_phrase)], not useful\n |    |    |-- ours_rel2id.json         # relation data\n |    |-- models        # mre model\n |    |-- modules       # running script\n |    |-- processor     # data process file\n |    |-- run_mre.sh\n |    |-- run.py\n```\n\n# How to run\n\n\n+ ## MKG Task\n\n    - First run Image-text Incorporated Entity Modeling to train entity embedding.\n\n    ```shell\n        cd MKG\n        bash scripts/pretrain_fb15k-237-image.sh\n    ```\n\n    - Then do Missing Entity Prediction.\n\n\n    ```shell\n        bash scripts/fb15k-237-image.sh\n    ```\n\n+ ## MNER Task\n\n    To run mner task, run this script.\n\n    ```shell\n    cd MNER\n    bash run_mner.sh\n    ```\n\n+ ## MRE Task\n\n    To run mre task, run this script.\n\n    ```shell\n    cd MRE\n    bash run_mre.sh\n    ```\n\n# Acknowledgement\n\nThe acquisition of image data for the multimodal link prediction task refer to the code from [https://github.com/wangmengsd/RSME](https://github.com/wangmengsd/RSME), many thanks.\n\n# Papers for the Project \u0026 How to Cite\nIf you use or extend our work, please cite the paper as follows:\n\n```bibtex\n@inproceedings{DBLP:conf/sigir/ChenZLDTXHSC22,\n  author    = {Xiang Chen and\n               Ningyu Zhang and\n               Lei Li and\n               Shumin Deng and\n               Chuanqi Tan and\n               Changliang Xu and\n               Fei Huang and\n               Luo Si and\n               Huajun Chen},\n  editor    = {Enrique Amig{\\'{o}} and\n               Pablo Castells and\n               Julio Gonzalo and\n               Ben Carterette and\n               J. Shane Culpepper and\n               Gabriella Kazai},\n  title     = {Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge\n               Graph Completion},\n  booktitle = {{SIGIR} '22: The 45th International {ACM} {SIGIR} Conference on Research\n               and Development in Information Retrieval, Madrid, Spain, July 11 -\n               15, 2022},\n  pages     = {904--915},\n  publisher = {{ACM}},\n  year      = {2022},\n  url       = {https://doi.org/10.1145/3477495.3531992},\n  doi       = {10.1145/3477495.3531992},\n  timestamp = {Mon, 11 Jul 2022 12:19:20 +0200},\n  biburl    = {https://dblp.org/rec/conf/sigir/ChenZLDTXHSC22.bib},\n  bibsource = {dblp computer science bibliography, https://dblp.org}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fmkgformer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzjunlp%2Fmkgformer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzjunlp%2Fmkgformer/lists"}