{"id":18532678,"url":"https://github.com/luogen1996/mcn","last_synced_at":"2025-07-14T10:06:27.304Z","repository":{"id":46590650,"uuid":"248506387","full_name":"luogen1996/MCN","owner":"luogen1996","description":"[CVPR2020] Multi-task Collaborative  Network for Joint  Referring Expression Comprehension and Segmentation, CVPR2020 (oral)","archived":false,"fork":false,"pushed_at":"2022-08-04T14:30:00.000Z","size":491,"stargazers_count":138,"open_issues_count":7,"forks_count":25,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-10T06:55:08.230Z","etag":null,"topics":["cvpr2020","multi-task-learning","referring-expression-comprehension","referring-expression-segmentation"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2003.08813","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/luogen1996.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-19T13:10:35.000Z","updated_at":"2025-03-01T14:44:51.000Z","dependencies_parsed_at":"2022-09-15T07:20:24.565Z","dependency_job_id":null,"html_url":"https://github.com/luogen1996/MCN","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/luogen1996/MCN","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luogen1996%2FMCN","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luogen1996%2FMCN/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luogen1996%2FMCN/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luogen1996%2FMCN/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/luogen1996","download_url":"https://codeload.github.com/luogen1996/MCN/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/luogen1996%2FMCN/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265277169,"owners_count":23739334,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2020","multi-task-learning","referring-expression-comprehension","referring-expression-segmentation"],"created_at":"2024-11-06T19:07:27.617Z","updated_at":"2025-07-14T10:06:27.279Z","avatar_url":"https://github.com/luogen1996.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation\n\n[![LICENSE](https://img.shields.io/badge/license-MIT-green)](https://github.com/luogen1996/MCN/blob/master/LICENSE)\n[![Python](https://img.shields.io/badge/python-3.6-blue.svg)](https://www.python.org/)\n![PyTorch](https://img.shields.io/badge/keras-%237732a8)\n\n[《Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation》](https://arxiv.org/abs/2003.08813)\n\nby Gen Luo, Yiyi Zhou, Xiaoshuai Sun, Liujuan Cao, Chenglin Wu, Cheng Deng and Rongrong Ji.\n\nIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, Oral\n\n## Updates\n- (2022/4/20) Implement MCN (pytorch) in [SimREC project](https://github.com/luogen1996/SimREC).\n\n## Introduction\n\nThis repository is keras implementation of MCN.  The principle of MCN is a multimodal and multitask collaborative learning framework. In MCN, RES can help REC to achieve better language-vision alignment, while REC can help RES to better locate the referent. In addition, we address a key challenge in this multi-task setup, i.e., the prediction conflict, with two innovative designs namely, Consistency Energy Maximization (CEM) and Adaptive Soft Non-Located Suppression (ASNLS).  The network structure is illustrated as following:\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/luogen1996/MCN/blob/master/fig1.png\" width=\"90%\"/\u003e\n\u003c/p\u003e\n\n## Citation\n\n    @InProceedings{Luo_2020_CVPR,\n    author = {Luo, Gen and Zhou, Yiyi and Sun, Xiaoshuai and Cao, Liujuan and Wu, Chenglin and Deng, Cheng and Ji, Rongrong},\n    title = {Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month = {June},\n    year = {2020}\n    }\n\n## Prerequisites\n\n- Python 3.6\n\n- tensorflow-1.9.0 for cuda 9 or tensorflow-1.14.0 for cuda10\n\n- keras-2.2.4\n\n- spacy (you should download the glove embeddings by running `spacy download en_vectors_web_lg` )\n\n- Others (progressbar2, opencv, etc. see [requirement.txt](https://github.com/luogen1996/MCN/blob/master/requirement.txt))\n\n## Data preparation\n\n-  Follow the instructions of  [DATA_PRE_README.md](https://github.com/luogen1996/MCN/blob/master/data/README.md) to generate training data and testing data of RefCOCO, RefCOCO+ and RefCOCOg.\n\n-  Download the pretrained weights of backbone (vgg and darknet). We provide pretrained weights of keras  version for this repo and another  darknet version for  facilitating  the researches based on pytorch or other frameworks.  All pretrained backbones are trained  on COCO 2014 *train+val*  set while removing the images appeared in the *val+test* sets of RefCOCO, RefCOCO+ and RefCOCOg (nearly 6500 images).  Please follow the instructions of  [DATA_PRE_README.md](https://github.com/luogen1996/MCN/blob/master/data/README.md) to download them.\n\n## Training \n\n1. Preparing your settings. To train a model, you should  modify ``./config/config.json``  to adjust the settings  you want. The default settings are used for RefCOCO, which are easy to achieve 80.0 and 62.0  accuracy for REC and RES respectively on the *val* set. We also provide  example configs for reproducing our results on [RefCOCO+](https://github.com/luogen1996/MCN/blob/master/config/config.Example_Refcoco%2B.json) and [RefCOCOg](https://github.com/luogen1996/MCN/blob/master/config/config.Example_Refcocog.json).\n2. Training the model. run ` train.py`  under the main folder to start training:\n```\npython train.py\n```\n3. Testing the model.  You should modify  the setting json to check the model path ``evaluate_model`` and dataset ``evaluate_set`` using for evaluation.  Then, you can run ` test.py`  by\n```\npython test.py\n```\n​\tAfter finishing the evaluation,  a result file will be generated  in ``./result`` folder.\n\n4. Training log.  Logs are stored in ``./log`` directory, which records the detailed training curve and accuracy per epoch. If you want to log the visualizations, please  set  ``log_images`` to ``1`` in ``config.json``.   By using tensorboard you can see the training details like below：\n  \u003cp align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/luogen1996/MCN/blob/master/fig2.png\" width=\"90%\"/\u003e\n  \u003c/p\u003e\n  \n**Notably, running this codes can achieve  better performance than the results of our paper. (Nearly 1~4\\% improvements on each dataset.) This is because we have done many optimizations  lately,  such as carefully adjusting some training hyperparameters, optimizing the training codes and  selecting a better checkpoint of pre-trained  backbone, etc. In addition, it's fine that the losses do not decline when you use vgg16 as backbone. It may be a display problem and doesn't influence the performance.**\n\n## Pre-trained Models and Logs\n\nFollowing the steps of Data preparation and Training, you can reproduce and  get   better results in our paper. We provide the pre-trained models and training logs  for RefCOCO, RefCOCO+, RefCOCOg and Referit. \n\n1) RefCOCO:  [Darknet (312M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiRL_WITB7kfqX0St?e=JlWTBX), [vgg16(214M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiScJ7zFZNZOXE6VI?e=b1iCYa).\n\u003ctable\u003e\n\u003ctr\u003e\u003cth\u003e Detection/Segmentation (Darknet) \u003c/th\u003e\u003cth\u003e Detection/Segmentation (vgg16)\u003c/th\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\n\n| val               | test A            | test B            |\n| ----------------- | ----------------- | ----------------- |\n| 80.61\\%/63.12\\% | 83.38\\%/65.05\\% | 75.51\\%/60.99\\% |\n\u003c/td\u003e\u003ctd\u003e\n\n| val  | test A | test B |\n| ---- | ------ | ------ |\n| 79.68\\%/61.51\\% | 81.49\\%/63.25\\% | 75.30\\%/60.46\\% |\n\u003c/td\u003e\u003c/tr\u003e \u003c/table\u003e\n\n2) RefCOCO+:  [Darknet (312M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiROAVl3RuIllJLAC?e=qPMity), [vgg16(214M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiShEnn5tmeI0bM_q?e=x7BOcs).\n\u003ctable\u003e\n\u003ctr\u003e\u003cth\u003e Detection/Segmentation (Darknet) \u003c/th\u003e\u003cth\u003e Detection/Segmentation (vgg16)\u003c/th\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\n\n| val               | test A            | test B            |\n| ----------------- | ----------------- | ----------------- |\n| 69.10\\%/53.00\\% | 74.17\\%/57.00\\% | 59.75\\%/46.96\\% |\n\u003c/td\u003e\u003ctd\u003e\n\n| val  | test A | test B |\n| ---- | ------ | ------ |\n| 64.67\\%/49.04\\% | 69.25\\%/51.94\\% | 57.01\\%/44.31\\% |\n\u003c/td\u003e\u003c/tr\u003e \u003c/table\u003e\n\n3) RefCOCOg:  [Darknet (312M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiRRafGe8qzDDuLci?e=qUBLBT), [vgg16(214M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiSkOuTZ0g1LIRBVl?e=YitGvV).\n\u003ctable\u003e\n\u003ctr\u003e\u003cth\u003e Detection/Segmentation (Darknet) \u003c/th\u003e\u003cth\u003e Detection/Segmentation (vgg16)\u003c/th\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\n\n| val               | test              |\n| ----------------- | ----------------- |\n| 68.95\\% / 50.65\\% | 67.88\\% / 50.62\\% |\n\u003c/td\u003e\u003ctd\u003e\n\n| val  | test |\n| ---- | ---- |\n| 63.50\\% / 47.81\\% | 63.32\\% / 47.94\\% |\n\u003c/td\u003e\u003c/tr\u003e \u003c/table\u003e\n\n4) Referit:  [Darknet (312M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiRUQCWpBx1D5cm8_?e=MDjO2I), [vgg16(214M)](https://1drv.ms/u/s!AmrFUyZ_lDVGiSouSmtCBg5zhlB_?e=1ONzTK).\n\u003ctable\u003e\n\u003ctr\u003e\u003cth\u003e Detection/Segmentation (Darknet) \u003c/th\u003e\u003cth\u003e Detection/Segmentation (vgg16)\u003c/th\u003e\u003c/tr\u003e\n\u003ctr\u003e\u003ctd\u003e\n\n| val               | test              |\n| ----------------- | ----------------- |\n| 69.29\\% / 57.00\\% | 67.65\\% / 55.42\\% |\n\n\u003c/td\u003e\u003ctd\u003e\n\n| val               | test              |\n| ----------------- | ----------------- |\n| 68.28\\% / 56.19\\% | 65.49\\% / 53.68\\% |\n\n\u003c/td\u003e\u003c/tr\u003e \u003c/table\u003e\n\n## Acknowledgement\n\n Thanks for a lot of codes from [keras-yolo3](https://github.com/qqwweee/keras-yolo3) , [keras-retinanet](https://github.com/fizyr/keras-retinanet)  and the framework of  [darknet](https://github.com/AlexeyAB/darknet) using for backbone pretraining.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluogen1996%2Fmcn","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fluogen1996%2Fmcn","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fluogen1996%2Fmcn/lists"}