{"id":23913036,"url":"https://github.com/henghuiding/mevis","last_synced_at":"2025-04-05T01:07:55.759Z","repository":{"id":188506941,"uuid":"673377940","full_name":"henghuiding/MeViS","owner":"henghuiding","description":"[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions","archived":false,"fork":false,"pushed_at":"2024-06-24T11:00:25.000Z","size":54787,"stargazers_count":521,"open_issues_count":12,"forks_count":22,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-29T00:09:29.329Z","etag":null,"topics":["mevis-dataset","mose-dataset","multimodal-learning","referring-expression-comprehension","referring-expression-segmentation","referring-video-object-segmentation","video-understanding"],"latest_commit_sha":null,"homepage":"https://henghuiding.github.io/MeViS/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/henghuiding.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-08-01T13:44:24.000Z","updated_at":"2025-03-14T11:34:27.000Z","dependencies_parsed_at":"2024-06-24T12:24:28.465Z","dependency_job_id":"65d0d8fe-fbb4-4155-a4a2-455b98899881","html_url":"https://github.com/henghuiding/MeViS","commit_stats":null,"previous_names":["henghuiding/mevis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henghuiding%2FMeViS","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henghuiding%2FMeViS/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henghuiding%2FMeViS/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/henghuiding%2FMeViS/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/henghuiding","download_url":"https://codeload.github.com/henghuiding/MeViS/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271530,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mevis-dataset","mose-dataset","multimodal-learning","referring-expression-comprehension","referring-expression-segmentation","referring-video-object-segmentation","video-understanding"],"created_at":"2025-01-05T09:20:37.292Z","updated_at":"2025-04-05T01:07:55.723Z","avatar_url":"https://github.com/henghuiding.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions\n[![PyTorch](https://img.shields.io/badge/PyTorch-1.11.0-%23EE4C2C.svg?style=\u0026logo=PyTorch\u0026logoColor=white)](https://pytorch.org/)\n[![Python](https://img.shields.io/badge/Python-3.7%20|%203.8%20|%203.9-blue.svg?style=\u0026logo=python\u0026logoColor=ffdd54)](https://www.python.org/downloads/)\n[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mevis-a-large-scale-benchmark-for-video/referring-video-object-segmentation-on-mevis)](https://paperswithcode.com/sota/referring-video-object-segmentation-on-mevis?p=mevis-a-large-scale-benchmark-for-video)\n\n**[🏠[Project page]](https://henghuiding.github.io/MeViS/)** \u0026emsp; **[📄[arXiv]](https://arxiv.org/abs/2308.08544)**  \u0026emsp; **[📄[PDF]](https://drive.google.com/file/d/1WRanGRaYPpaNfrwq4xRq0sfmiJLSr9-b/view?usp=sharing)** \u0026emsp; **[🔥[Dataset Download]](https://codalab.lisn.upsaclay.fr/competitions/15094)** \u0026emsp; **[🔥[Evaluation Server]](https://codalab.lisn.upsaclay.fr/competitions/15094)**\n\nThis repository contains code for **ICCV2023** paper:\n\u003e [MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions](https://arxiv.org/abs/2308.08544)  \n\u003e Henghui Ding,  Chang Liu,  Shuting He,  Xudong Jiang,  Chen Change Loy  \n\u003e ICCV 2023\n\n\u003ctable border=1 frame=void\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cimg src=\"https://github.com/henghuiding/MeViS/blob/page/GIF/bird.gif\" width=\"245\"\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"https://github.com/henghuiding/MeViS/blob/page/GIF/Cat.gif\" width=\"245\"\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src=\"https://github.com/henghuiding/MeViS/blob/page/GIF/coin.gif\" width=\"245\"\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n### Abstract\n\nThis work strives for motion expressions guided video segmentation, which focuses on segmenting objects in video content based on a sentence describing the motion of the objects. Existing referring video object segmentation datasets downplay the importance of motion in video content for language-guided video object segmentation. To investigate the feasibility of using motion expressions to ground and segment objects in videos, we propose a large-scale dataset called MeViS, which contains numerous motion expressions to indicate target objects in complex environments. The goal of MeViS benchmark is to provide a platform that enables the development of effective language-guided video segmentation algorithms that leverage motion expressions as a primary cue for object segmentation in complex video scenes.\n\n\u003cdiv align=\"center\"\u003e\n  \u003cimg src=\"https://github.com/henghuiding/MeViS/blob/page/static/DemoImages/teaser.png?raw=true\" width=\"100%\" height=\"100%\"/\u003e\n\u003c/div\u003e\n\u003cp style=\"text-align:justify; text-justify:inter-ideograph;width:100%\"\u003eFigure 1. Examples of video clips from \u003cb\u003eM\u003c/b\u003eotion \u003cb\u003ee\u003c/b\u003expressions \u003cb\u003eVi\u003c/b\u003edeo \u003cb\u003eS\u003c/b\u003eegmentation (\u003cb\u003eMeViS\u003c/b\u003e) are provided to illustrate the dataset's nature and complexity. \u003cfont color=\"#FF6403\"\u003eThe expressions in MeViS primarily focus on motion attributes and the referred target objects that cannot be identified by examining a single frame solely\u003c/font\u003e. For instance, the first example features three parrots with similar appearances, and the target object is identified as \u003ci\u003e\"The bird flying away\"\u003c/i\u003e. This object can only be recognized by capturing its motion throughout the video.\u003c/p\u003e\n\n\n\n\n\u003ctable border=\"0.6\"\u003e\n\u003cdiv align=\"center\"\u003e\n\u003ccaption\u003e\u003cb\u003eTABLE 1. Scale comparison between MeViS and existing language-guided video segmentation datasets.\n\u003c/div\u003e\n\u003ctbody\u003e\n    \u003ctr\u003e\n        \u003cth align=\"right\" bgcolor=\"BBBBBB\"\u003eDataset\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003ePub.\u0026Year\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003eVideos\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003eObject\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003eExpression\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003eMask\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003eObj/Video\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003eObj/Expn\u003c/th\u003e\n        \u003cth align=\"center\" bgcolor=\"BBBBBB\"\u003eTarget\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"right\"\u003e\u003ca href=\"https://kgavrilyuk.github.io/publication/actor_action/\" target=\"_blank\"\u003eA2D\u0026nbsp;Sentence\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eCVPR\u0026nbsp;2018\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e3,782\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e4,825\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e6,656\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e58k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1.28\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eActor\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"right\" bgcolor=\"ECECEC\"\u003e\u003ca href=\"https://www.mpi-inf.mpg.de/departments/computer-vision-and-machine-learning/research/video-segmentation/video-object-segmentation-with-language-referring-expressions\" target=\"_blank\"\u003eDAVIS17-RVOS\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003eACCV\u0026nbsp;2018\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003e90\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003e205\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003e205\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003e13.5k\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003e2.27\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003e1\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"ECECEC\"\u003eObject\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"right\"\u003e\u003ca href=\"https://youtube-vos.org/dataset/rvos/\" target=\"_blank\"\u003eReferYoutubeVOS\u003c/a\u003e\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eECCV\u0026nbsp;2020\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e3,978\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e7,451\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e15,009\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e131k\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1.86\u003c/td\u003e\n      \u003ctd align=\"center\"\u003e1\u003c/td\u003e\n      \u003ctd align=\"center\"\u003eObject\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"right\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003eMeViS (ours)\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003eICCV\u0026nbsp;2023\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003e2,006\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003e8,171\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003e28,570\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003e443k\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003e4.28\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003e1.59\u003c/b\u003e\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e\u003cb\u003eObject(s)\u003c/b\u003e\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n  \u003ccolgroup\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n  \u003c/colgroup\u003e\n\u003c/table\u003e\n\n## MeViS Dataset Download\n\n⬇️ [Download the dataset from ️here☁️](https://codalab.lisn.upsaclay.fr/competitions/15094). \n\n\n**Dataset Split**\n* 2,006 videos \u0026 28,570 sentences in total;\n* **Train set:** 1662 videos \u0026 23,051 sentences, used for training;\n* **Val\u003csup\u003eu\u003c/sup\u003e set:** 50 videos \u0026 793 sentences, used for offline evaluation (e.g., ablation study) by users during training;\n* **Val set:** 140 videos \u0026 2,236 sentences, used for [**CodaLab online evaluation**](https://codalab.lisn.upsaclay.fr/competitions/15094);\n* **Test set:** 154 videos \u0026 2,490 sentences (not released yet), used for evaluation during the competition periods;\nIt is suggested to report the results on **Val\u003csup\u003eu\u003c/sup\u003e set** and **Val set**.\n\n\n\n## Online Evaluation\n\nPlease submit your results of **Val set** on \n - 💯 [**CodaLab**](https://codalab.lisn.upsaclay.fr/competitions/15094).\n\nIt is strongly suggested to first evaluate your model locally using the **Val\u003csup\u003eu\u003c/sup\u003e** set before submitting your results of the **Val** to the online evaluation system.\n\n## File Structure\n\nThe dataset follows a similar structure as [Refer-YouTube-VOS](https://youtube-vos.org/dataset/rvos/). Each split of the dataset consists of three parts: `JPEGImages`, which holds the frame images,  `meta_expressions.json`, which provides referring expressions and metadata of videos, and `mask_dict.json`, which contains the ground-truth masks of objects. Ground-truth segmentation masks are saved in the format of COCO RLE, and expressions are organized similarly like Refer-Youtube-VOS.\n\nPlease note that while annotations for all frames in the **Train** set and the **Val\u003csup\u003eu\u003c/sup\u003e** set are provided, the **Val** set only provide frame images and referring expressions for inference. \n\n```\nmevis\n├── train                       // Split Train\n│   ├── JPEGImages\n│   │   ├── \u003cvideo #1  \u003e\n│   │   ├── \u003cvideo #2  \u003e\n│   │   └── \u003cvideo #...\u003e\n│   │\n│   ├── mask_dict.json\n│   └── meta_expressions.json\n│\n├── valid_u                     // Split Val^u\n│   ├── JPEGImages\n│   │   └── \u003cvideo ...\u003e\n│   │\n│   ├── mask_dict.json\n│   └── meta_expressions.json\n│\n└── valid                       // Split Val\n    ├── JPEGImages\n    │   └── \u003cvideo ...\u003e\n    │\n    └── meta_expressions.json\n\n```\n\n## Method Code Installation:\n\nPlease see [INSTALL.md](https://github.com/henghuiding/MeViS/blob/main/INSTALL.md)\n\n## Inference\n\n###  1. Val\u003csup\u003eu\u003c/sup\u003e set\nObtain the output masks of Val\u003csup\u003eu\u003c/sup\u003e set:\n```\npython train_net_lmpm.py \\\n    --config-file configs/lmpm_SWIN_bs8.yaml \\\n    --num-gpus 8 --dist-url auto --eval-only \\\n    MODEL.WEIGHTS [path_to_weights] \\\n    OUTPUT_DIR [output_dir]\n```\nObtain the J\u0026F results on Val\u003csup\u003eu\u003c/sup\u003e set:\n```\npython tools/eval_mevis.py\n```\n###  2. Val set\nObtain the output masks of Val set for [CodaLab](https://codalab.lisn.upsaclay.fr/competitions/15094) online evaluation:\n```\npython train_net_lmpm.py \\\n    --config-file configs/lmpm_SWIN_bs8.yaml \\\n    --num-gpus 8 --dist-url auto --eval-only \\\n    MODEL.WEIGHTS [path_to_weights] \\\n    OUTPUT_DIR [output_dir] DATASETS.TEST '(\"mevis_test\",)'\n```\n### CodaLab Evaluation Submission Guideline\n\nThe submission format should be a **.zip** file containing the predicted .PNG results of the **Val set** (for current competition stage).\n\nYou can use following command to prepare .zip submission file\n```\ncd [output_dir]\nzip -r ../xxx.zip *\n```\nA submission example named *sample_submission_valid.zip* can be found from the [CodaLab](https://codalab.lisn.upsaclay.fr/competitions/15094).\n```\nsample_submission_valid.zip       // .zip file, which directly packages 140 val video folders\n├── 0ab4afe7fb46                  // video folder name\n│   ├── 0                         // expression_id folder name\n│   │   ├── 00000.png             // .png files\n│   │   ├── 00001.png\n│   │   └── ....\n│   │\n│   ├── 1\n│   │   └── 00000.png\n│   │   └── ....\n│   │\n│   └── ....\n│ \n├── 0fea0cb75a25\n│   ├── 0                              \n│   │   ├── 00000.png\n│   │   └── ....\n│   │\n│   └── ....\n│\n└── ....                      \n```\n\n## Training\n\nFirstly, download the backbone weights (`model_final_86143f.pkl`) and convert it using the script:\n\n```\nwget https://dl.fbaipublicfiles.com/maskformer/mask2former/coco/instance/maskformer2_swin_tiny_bs16_50ep/model_final_86143f.pkl\npython tools/process_ckpt.py\n```\n\nThen start training:\n```\npython train_net_lmpm.py \\\n    --config-file configs/lmpm_SWIN_bs8.yaml \\\n    --num-gpus 8 --dist-url auto \\\n    MODEL.WEIGHTS [path_to_weights] \\\n    OUTPUT_DIR [path_to_weights]\n```\n\nNote: We also support training ReferFormer by providing [`ReferFormer_dataset.py`](https://github.com/henghuiding/MeViS/blob/main/ReferFormer_dataset.py)\n\n## Models\n\nOur results on Val\u003csup\u003eu\u003c/sup\u003e set and Val set of MeViS dataset.\n* Val\u003csup\u003eu\u003c/sup\u003e set is used for offline evaluation by userself, like doing ablation study\n* Val set is used for CodaLab online evaluation by MeViS dataset organizers\n\u003ctable border=\"0.6\"\u003e\n\u003ctbody\u003e\n    \u003ctr\u003e\n        \u003cth  rowspan=\"2\" align=\"center\" bgcolor=\"BBBBBB\"\u003eBackbone\u003c/th\u003e\n        \u003cth colspan=\"3\" align=\"center\" bgcolor=\"BBBBBB\"\u003eVal\u003csup\u003eu\u003c/sup\u003e\u003c/th\u003e\n        \u003cth colspan=\"3\" align=\"center\" bgcolor=\"BBBBBB\"\u003eVal\u003c/th\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003eJ\u0026F\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003eJ\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003eF\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003eJ\u0026F\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003eJ\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003eF\u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003eSwin-Tiny \u0026 RoBERTa\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e40.23\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e36.51\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e43.90\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e37.21\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e34.25\u003c/td\u003e\n      \u003ctd align=\"center\" bgcolor=\"E5E5E5\"\u003e40.17\u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/tbody\u003e\n  \u003ccolgroup\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n    \u003ccol\u003e\n  \u003c/colgroup\u003e\n\u003c/table\u003e\n\n\n☁️ [Google Drive](https://drive.google.com/file/d/1djNwwNAyAIEJMZIQQHV_NYnlc8TeA4wU/view?usp=drive_link)\n\n## Acknowledgement\n\nThis project is based on [VITA](https://github.com/sukjunhwang/VITA), [GRES](https://github.com/henghuiding/ReLA), [Mask2Former](https://github.com/facebookresearch/Mask2Former), and [VLT](https://github.com/henghuiding/Vision-Language-Transformer). Many thanks to the authors for their great works!\n\n## BibTeX\nPlease consider to cite MeViS if it helps your research.\n\n```latex\n@inproceedings{MeViS,\n  title={{MeViS}: A Large-scale Benchmark for Video Segmentation with Motion Expressions},\n  author={Ding, Henghui and Liu, Chang and He, Shuting and Jiang, Xudong and Loy, Chen Change},\n  booktitle={ICCV},\n  year={2023}\n}\n```\n\n```latex\n@inproceedings{GRES,\n  title={{GRES}: Generalized Referring Expression Segmentation},\n  author={Liu, Chang and Ding, Henghui and Jiang, Xudong},\n  booktitle={CVPR},\n  year={2023}\n}\n```\n\n```latex\n@article{VLT,\n  title={{VLT}: Vision-language transformer and query generation for referring segmentation},\n  author={Ding, Henghui and Liu, Chang and Wang, Suchen and Jiang, Xudong},\n  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},\n  year={2023},\n  publisher={IEEE}\n}\n```\n    \nA majority of videos in MeViS are from [MOSE: Complex Video Object Segmentation Dataset](https://henghuiding.github.io/MOSE/).\n```latex\n@inproceedings{MOSE,\n  title={{MOSE}: A New Dataset for Video Object Segmentation in Complex Scenes},\n  author={Ding, Henghui and Liu, Chang and He, Shuting and Jiang, Xudong and Torr, Philip HS and Bai, Song},\n  booktitle={ICCV},\n  year={2023}\n}\n```\n    \nMeViS is licensed under a CC BY-NC-SA 4.0 License. The data of MeViS is released for non-commercial research purpose only.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenghuiding%2Fmevis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhenghuiding%2Fmevis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhenghuiding%2Fmevis/lists"}