{"id":18812913,"url":"https://github.com/z-x-yang/cfbi","last_synced_at":"2025-08-25T08:15:31.941Z","repository":{"id":40721539,"uuid":"247396638","full_name":"z-x-yang/CFBI","owner":"z-x-yang","description":"The official implementation of CFBI(+): Collaborative Video Object Segmentation by (Multi-scale) Foreground-Background Integration.","archived":false,"fork":false,"pushed_at":"2023-01-18T12:58:50.000Z","size":3961,"stargazers_count":322,"open_issues_count":0,"forks_count":42,"subscribers_count":20,"default_branch":"master","last_synced_at":"2025-03-31T16:14:38.116Z","etag":null,"topics":["computer-vision","pytorch-implementation","video-object-segmentation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/z-x-yang.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-15T03:47:17.000Z","updated_at":"2024-09-01T15:05:52.000Z","dependencies_parsed_at":"2023-02-10T15:01:49.694Z","dependency_job_id":null,"html_url":"https://github.com/z-x-yang/CFBI","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/z-x-yang%2FCFBI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/z-x-yang%2FCFBI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/z-x-yang%2FCFBI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/z-x-yang%2FCFBI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/z-x-yang","download_url":"https://codeload.github.com/z-x-yang/CFBI/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247704571,"owners_count":20982298,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","pytorch-implementation","video-object-segmentation"],"created_at":"2024-11-07T23:35:34.622Z","updated_at":"2025-04-07T18:14:59.649Z","avatar_url":"https://github.com/z-x-yang.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CFBI(+): Collaborative Video Object Segmentation by (Multi-scale) Foreground-Background Integration\nThe official implementation of Collaborative Video Object Segmentation by (Multi-scale) Foreground-Background Integration. [[paper](http://arxiv.org/abs/2010.06349)] (TPAMI) [[paper](https://arxiv.org/abs/2003.08333)] (ECCV 2020, Spotlight)  [[demo (YouTube)](https://www.youtube.com/watch?v=xdHi68UFt50)][[demo (Bilibili)](https://www.bilibili.com/video/av456456973/)]\n\nFor VOS-related applications or research, we also recommend our latest VOS framework, [AOT (NeurIPS 2021)](https://github.com/z-x-yang/AOT), whose performance and efficiency are much better than CFBI(+).\n\n**Based on some necessary considerations, we only release the inference here. If you want to get the training code of CFBI, please contact us by email: \u003czongxinyang1996@gmail.com\u003e. And please inform us of your institution and the purpose of using CFBI in the email. Thank you for your understanding!**\n\nFramework:\n\u003cdiv align=center\u003e\u003cimg src=\"https://github.com/z-x-yang/CFBI/raw/master/utils/overview.png\" width=\"80%\"/\u003e\u003c/div\u003e\n\nSome video segmentation results:\n\u003cdiv align=center\u003e\u003cimg src=\"https://github.com/z-x-yang/CFBI/raw/master/utils/quality.png\" width=\"80%\"/\u003e\u003c/div\u003e\n\n## Requirements\n    1. Python3\n    2. pytorch \u003e= 1.3.0 and torchvision\n    3. opencv-python and Pillow\n## Getting Started\n1. Prepare datasets:\n    * Download the [validation split](https://drive.google.com/file/d/1-QrceIl5sUNTKz7Iq0UsWC6NLZq7girr/view?usp=sharing) of YouTube-VOS 2018, and decompress the file to `datasets/YTB/valid`. If you want to evaluate CFBI on YouTube-VOS 2019, please download this [split](https://drive.google.com/file/d/1o586Wjya-f2ohxYf9C1RlRH-gkrzGS8t/view?usp=sharing) instead.\n    * Download 480p [TrainVal](https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip) split of DAVIS 2017, and decompress the file to `datasets/DAVIS`.\n2. Evaluating:\n    * **YouTube-VOS**: Download pretrained [CFBI](https://drive.google.com/file/d/1YKQL8ruEfcDvKPO5Rs3d6HfVXAv08hX6/view?usp=share_link) or [CFBI+](https://drive.google.com/file/d/1tEg9sy6LYEXQP8QLsagmXpxYc9D3X5Vc/view?usp=share_link), to `pretrain_models`, and then run `bash ytb_eval.sh` for CFBI or `bash ytb_eval_cfbip.sh` for CFBI+. After the evaluation, the result will be packed into a Zip file, which you need to send to [official evaluation server](https://competitions.codalab.org/competitions/19544) to calculate a score. For 2019 version, use this [server](https://competitions.codalab.org/competitions/20127) instead. The pretrained CFBI has been trained on YouTube-VOS using a larger batch size (16) and learning rate (0.02), which boosts the performance (J\u0026F) to `81.8%` on the validation split of YouTube-VOS 2018.\n    * **DAVIS**: Download pretrained [CFBI](https://drive.google.com/file/d/1G6tVaaU4RYKXKKFxqxdE_p8Fwju_0nVE/view?usp=share_link) or [CFBI+](https://drive.google.com/file/d/1KkU-Dxew1wI1affTjN1LmTld0kgpUUQH/view?usp=share_link), to `pretrain_models`, and then run `bash davis_eval.sh` for CFBI or `bash davis_eval_cfbip.sh` for CFBI+. After the evaluation, please use [official code](https://github.com/davisvideochallenge/davis2017-evaluation) to calculate a score, which should be `81.9%` or `83.0%` (J\u0026F).\n    * **Fast CFBI**: To reduce memory usage, we also provide a fast setting in `ytb_eval_fast.sh`. The fast setting enables using `float16` in the matching process of CFBI. Besides, we apply an `atrous strategy` in the global matching of CFBI for further efficiency (The discussion of atrous matching will be submitted to our Arxiv paper soon). The fast setting will save a large amount of memory and significantly improve the inference speed of CFBI. However, this will only lose very little performance.\n    * Another way for saving memory is to increase the number of `--global_chunks`. This will not affect performance but will make the network speed slightly slower.\n\n## Model Zoo\n**We recorded the inference speed of CFBI by using one NVIDIA Tesla V100 GPU. Besides, we used a multi-object speed instead of a single-object. A large portion of the sequences in VOS datasets contains multiple objects, and CFBI is good at processing multiple objects simultaneously.**\n\n`F16` denotes using `float16` in the matching process. `Fast` means using both `float16` and `atrous strategy` in the inference stage. `MS` denotes using a multi-scale and flip strategy during inference (additional evaluation args `--ms 1. 1.15 1.3 1.5 --flip`).\n\n**YouTube-VOS** (Eval on Val 2018):\n\nIn the inference stage, we restricted the long edge of each frame to be no more than 1040 (1.3 * 800) pixels, which is consistent with the biggest random-scale size (1.3 * 480p) in the training stage and is smaller than the original size of YouTube-VOS (720p).\n\n**Name** | **Backbone**  | **J Seen** | **F Seen** | **J Unseen** | **F Unseen** | **Mean** | **Multi-Obj** \u003cbr\u003e **FPS** | **Link** \n---------| :-----------: | :--------: | :--------: | :----------: | :----------: | :------: | :------------------------: | :------:\nResNet101-CFBI+ | ResNet101-DeepLabV3+ | 81.8 | **86.6** | **77.1** | **85.6** | **82.8** | 4.03 | [Click](https://drive.google.com/file/d/1tEg9sy6LYEXQP8QLsagmXpxYc9D3X5Vc/view?usp=share_link)\nResNet101-MS-CFBI+ | ResNet101-DeepLabV3+ | **82.8** | **87.5** | **77.3** | **85.7** | **83.3** | 0.48 (88.1%↓) | The same as above\nResNet101-F16-CFBI+ | ResNet101-DeepLabV3+ | **81.9** | **86.6** | **77.1** | **85.6** | **82.8** | 4.93 (22.3%↑) | The same as above\nResNet101-Fast-CFBI+ | ResNet101-DeepLabV3+ | **81.9** | **86.6** | **77.1** | 85.5 | **82.8** | 5.06 (25.6%↑) | The same as above\nResNet101-CFBI | ResNet101-DeepLabV3+ | **81.9** | 86.3 | 75.6 | 83.4 | 81.8 | 3.48 | [Click](https://drive.google.com/file/d/1YKQL8ruEfcDvKPO5Rs3d6HfVXAv08hX6/view?usp=share_link) \nResNet101-F16-CFBI | ResNet101-DeepLabV3+ |**81.9** | 86.4 | 75.6 | 83.3 | 81.8 | 4.62 (32.8%↑) | The same as above\nResNet101-Fast-CFBI | ResNet101-DeepLabV3+ | **81.9** | 86.4 | 75.6 | 83.1 |81.8 | **7.61 (118.7%↑)** | The same as above\nMobileNetV2-CFBI | MobileNetV2-DeepLabV3+ | 80.4 | 84.7 | 74.9 | 82.6 | 80.6 | 3.88 | [Click](https://drive.google.com/file/d/1h8P7LrcrOmKo2Yet_LLf6XaM97diCL42/view?usp=share_link)\nMobileNetV2-Fast-CFBI | MobileNetV2-DeepLabV3+ | 80.2 | 84.6 | 74.7 | 82.7 | 80.6 | **9.69 (150.0↑%)** | The same as above\n\n**DAVIS** (Eval on Val 2017):\n\nIn the inference stage, we ran using the default size of DAVIS (480p).\n\n**Name** | **Backbone**  | **J score** | **F score** | **Mean** | **Multi-Obj** \u003cbr\u003e **FPS** | **Link** \n---------| :-----------: | :---------: | :---------: | :------: | :------------------------: | :------:\nResNet101-CFBI+-DAVIS | ResNet101-DeepLabV3+ | **80.1** | **85.9** | **83.0** | **5.52** | [Click](https://drive.google.com/file/d/1KkU-Dxew1wI1affTjN1LmTld0kgpUUQH/view?usp=share_link)\nResNet101-CFBI-DAVIS | ResNet101-DeepLabV3+ | 79.3 | 84.5 | 81.9 | 5.88 | [Click](https://drive.google.com/file/d/1G6tVaaU4RYKXKKFxqxdE_p8Fwju_0nVE/view?usp=share_link) \nResNet101-F16-CFBI-DAVIS | ResNet101-DeepLabV3+ | 79.2 | 84.4 | 81.8 | 7.38 (25.5%↑) | The same as above\nResNet101-Fast-CFBI-DAVIS | ResNet101-DeepLabV3+ | 77.0 | 82.7 | 79.9 | **10.18 (73.1%↑)** | The same as above\nMobileNetV2-CFBI-DAVIS | MobileNetV2-DeepLabV3+ | 76.5 | 80.3 | 78.4 | 6.94 | [Click](https://drive.google.com/file/d/1CB_A-E6in27R-dVfl0hvny0MDW8WQ2Rd/view?usp=share_link)\nMobileNetV2-Fast-CFBI-DAVIS | MobileNetV2-DeepLabV3+ | 75.2 | 78.9 | 77.1 | **13.22 (90.5%↑)** | The same as above\n\n## Citing\n```\n@inproceedings{yang2020CFBI,\n  title={Collaborative video object segmentation by foreground-background integration},\n  author={Yang, Zongxin and Wei, Yunchao and Yang, Yi},\n  booktitle={European Conference on Computer Vision},\n  pages={332--348},\n  year={2020},\n  organization={Springer}\n}\n@article{yang2020CFBIP,\n  author={Yang, Zongxin and Wei, Yunchao and Yang, Yi},\n  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, \n  title={Collaborative Video Object Segmentation by Multi-Scale Foreground-Background Integration}, \n  year={2021},\n  volume={},\n  number={},\n  pages={1-1},\n  doi={10.1109/TPAMI.2021.3081597}\n}\n```\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fz-x-yang%2Fcfbi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fz-x-yang%2Fcfbi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fz-x-yang%2Fcfbi/lists"}