{"id":22068710,"url":"https://github.com/FoundationVision/UniRef","last_synced_at":"2025-07-24T06:32:10.541Z","repository":{"id":214283263,"uuid":"734738410","full_name":"FoundationVision/UniRef","owner":"FoundationVision","description":"[ICCV2023] Segment Every Reference Object in Spatial and Temporal Spaces ","archived":false,"fork":false,"pushed_at":"2024-01-10T04:34:32.000Z","size":15632,"stargazers_count":235,"open_issues_count":4,"forks_count":15,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-11-22T02:25:20.903Z","etag":null,"topics":["object-segmentation","unified-model"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FoundationVision.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-12-22T13:31:33.000Z","updated_at":"2024-09-19T05:23:18.000Z","dependencies_parsed_at":"2024-01-10T06:01:46.216Z","dependency_job_id":null,"html_url":"https://github.com/FoundationVision/UniRef","commit_stats":null,"previous_names":["foundationvision/uniref"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FUniRef","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FUniRef/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FUniRef/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FoundationVision%2FUniRef/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FoundationVision","download_url":"https://codeload.github.com/FoundationVision/UniRef/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":227421320,"owners_count":17775009,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["object-segmentation","unified-model"],"created_at":"2024-11-30T20:04:17.668Z","updated_at":"2025-07-24T06:32:10.528Z","avatar_url":"https://github.com/FoundationVision.png","language":"Python","funding_links":[],"categories":["Paper List"],"sub_categories":["Follow-up Papers"],"readme":"# UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces\n\nOfficial implementation of [UniRef++](), an extended version of ICCV2023 [UniRef](https://openaccess.thecvf.com/content/ICCV2023/papers/Wu_Segment_Every_Reference_Object_in_Spatial_and_Temporal_Spaces_ICCV_2023_paper.pdf).\n\n![UniRef](assets/network.png)\n\n## Highlights\n\n- UniRef/UniRef++ is a unified model for four object segmentation tasks, namely referring image segmentation (RIS), few-shot segmentation (FSS), referring video object segmentation (RVOS) and video object segmentation (VOS).\n- At the core of UniRef++ is the UniFusion module for injecting various reference information into network. And we implement it using flash attention with high efficiency.\n- UniFusion could play as the plug-in component for foundation models like [SAM](https://github.com/facebookresearch/segment-anything).\n\n\n## Schedule\n\n- [x] Add Training Guide\n- [x] Add Evaluation Guide\n- [x] Add Data Preparation\n- [x] Release Model Checkpoints\n- [x] Release Code\n\n## Results\n\n\nhttps://github.com/FoundationVision/UniRef/assets/21001460/63d875ed-9f5b-47c9-998f-e83faffedbba\n\n\n### Referring Image Segmentation\n![RIS](assets/RIS.png)\n\n### Referring Video Object Segmentation\n![RVOS](assets/Ref-vos.png)\n\n### Video Object Segmentation\n![VOS](assets/VOS.png)\n\n### Zero-shot Video Segmentation \u0026 Few-shot Image Segmentation\n![zero-few-shot](assets/zero-few-shot.png)\n\n## Model Zoo\n\n#### Objects365 Pretraining\n\n\n| Model             | Checkpoint |\n| ------------------| :--------: |\n| R50 | [model](https://drive.google.com/file/d/1cz7xWfk0xBRNMTM7P8Vtb7jT6cBv5fKN/view?usp=sharing) |\n| Swin-L | [model](https://drive.google.com/file/d/1C9tjfR6puq6HUcLSwII74GRQBl-YCuD8/view?usp=sharing) |\n\n#### Imge-joint Training\n\n| Model             | RefCOCO | FSS-1000 | Checkpoint |\n| ------------------| :----:  |  :----:  | :--------: |\n| R50 | 76.3 | 85.2 | [model](https://drive.google.com/file/d/1RNerEk7nrbFBI9dY5HIK7ErmqKLN40_g/view?usp=sharing) |\n| Swin-L | 79.9 | 87.7 | [model](https://drive.google.com/file/d/1dhCRuSDkw7IjxoUZo1EHDPU_608QHcx_/view?usp=sharing) |\n\n\n#### Video-joint Training\n\nThe results are reported on the validation set.\n\n  | Model             | RefCOCO | FSS-1000 | Ref-Youtube-VOS | Ref-DAVIS17 | Youtube-VOS18 | DAVIS17 | LVOS | Checkpoint |\n  | ------------------| :----:  | :---: | :-----: | :---: | :--: | :--: | :-------: | :--: |\n  | UniRef++-R50      |  75.6   | 79.1  |  61.5   | 63.5  | 81.9 | 81.5 |   60.1    | [model](https://drive.google.com/file/d/190SV9GU6Pd9FMZQnRrCbgw8lqDYF9_-I/view?usp=sharing) |\n  | UniRef++-Swin-L   |  79.1   | 85.4  |  66.9   | 67.2  | 83.2 | 83.9 |   67.2    | [model](https://drive.google.com/file/d/1ggkoEo1n2b-3sZDVVw3qFg1kyJQPc1jT/view?usp=sharing)\n\n\n## Installation\n\nSee [INSTALL.md](./INSTALL.md)\n\n## Getting Started\n\nPlease see [DATA.md](assets/DATA.md) for data preparation.\n\nPlease see [EVAL.md](assets/EVALUATION.md) for evaluation.\n\nPlease see [TRAIN.md](assets/TRAIN.md) for training.\n\n\n## Citation\n\nIf you find this project useful in your research, please consider cite:\n\n```BibTeX\n@article{wu2023uniref++,\n  title={UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces},\n  author={Wu, Jiannan and Jiang, Yi and Yan, Bin and Lu, Huchuan and Yuan, Zehuan and Luo, Ping},\n  journal={arXiv preprint arXiv:2312.15715},\n  year={2023}\n}\n```\n\n```BibTeX\n@inproceedings{wu2023uniref,\n  title={Segment Every Reference Object in Spatial and Temporal Spaces},\n  author={Wu, Jiannan and Jiang, Yi and Yan, Bin and Lu, Huchuan and Yuan, Zehuan and Luo, Ping},\n  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},\n  pages={2538--2550},\n  year={2023}\n}\n```\n\n## Acknowledgement\n\nThe project is based on [UNINEXT](https://github.com/MasterBin-IIAU/UNINEXT) codebase. We also refer to the repositories [Detectron2](https://github.com/facebookresearch/detectron2), [Deformable DETR](https://github.com/fundamentalvision/Deformable-DETR), [STCN](https://github.com/hkchengrex/STCN), [SAM](https://github.com/facebookresearch/segment-anything). Thanks for their awsome works!\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFoundationVision%2FUniRef","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFoundationVision%2FUniRef","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFoundationVision%2FUniRef/lists"}