{"id":19327302,"url":"https://github.com/hkchengrex/cutie","last_synced_at":"2025-05-15T04:03:42.237Z","repository":{"id":202159468,"uuid":"707345109","full_name":"hkchengrex/Cutie","owner":"hkchengrex","description":"[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation","archived":false,"fork":false,"pushed_at":"2024-11-08T21:38:38.000Z","size":2877,"stargazers_count":843,"open_issues_count":5,"forks_count":81,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-14T04:59:01.984Z","etag":null,"topics":["computer-vision","cvpr2024","deep-learning","pytorch","segmentation","video-editing","video-object-segmentation","video-segmentation"],"latest_commit_sha":null,"homepage":"https://hkchengrex.com/Cutie/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hkchengrex.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-19T17:49:24.000Z","updated_at":"2025-04-10T12:42:05.000Z","dependencies_parsed_at":"2023-11-11T05:22:09.961Z","dependency_job_id":"162d1785-013b-4dbf-a173-df3226a47866","html_url":"https://github.com/hkchengrex/Cutie","commit_stats":null,"previous_names":["hkchengrex/cutie"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkchengrex%2FCutie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkchengrex%2FCutie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkchengrex%2FCutie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hkchengrex%2FCutie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hkchengrex","download_url":"https://codeload.github.com/hkchengrex/Cutie/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254270640,"owners_count":22042858,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","cvpr2024","deep-learning","pytorch","segmentation","video-editing","video-object-segmentation","video-segmentation"],"created_at":"2024-11-10T02:16:42.222Z","updated_at":"2025-05-15T04:03:42.174Z","avatar_url":"https://github.com/hkchengrex.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# [Putting the Object Back into Video Object Segmentation](https://hkchengrex.github.io/Cutie)\n\n[Ho Kei Cheng](https://hkchengrex.github.io/), [Seoung Wug Oh](https://sites.google.com/view/seoungwugoh/), [Brian Price](https://www.brianpricephd.com/), [Joon-Young Lee](https://joonyoung-cv.github.io/), [Alexander Schwing](https://www.alexander-schwing.de/)\n\nUniversity of Illinois Urbana-Champaign and Adobe\n\nCVPR 2024, Highlight\n\n[[arXiV]](https://arxiv.org/abs/2310.12982) [[PDF]](https://arxiv.org/pdf/2310.12982.pdf) [[Project Page]](https://hkchengrex.github.io/Cutie/) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1yo43XTbjxuWA7XgCUO9qxAi7wBI6HzvP?usp=sharing)\n\n## Highlight\n\nCutie is a video object segmentation framework -- a follow-up work of [XMem](https://github.com/hkchengrex/XMem) with better consistency, robustness, and speed.\nThis repository contains code for standard video object segmentation and a GUI tool for interactive video segmentation.\nThe GUI tool additionally contains the \"permanent memory\" (from [XMem++](https://github.com/max810/XMem2)) option for better controllability.\n\n![overview](https://imgur.com/k84c965.jpg)\n\n## Demo Video\n\nhttps://github.com/hkchengrex/Cutie/assets/7107196/83a8abd5-369e-41a9-bb91-d9cc1289af70\n\nSource: https://raw.githubusercontent.com/hkchengrex/Cutie/main/docs/sources.txt\n\n## Installation\n\nTested on Ubuntu only.\n\n**Prerequisite:**\n\n- Python 3.8+\n- PyTorch 1.12+ and corresponding torchvision\n\n**Clone our repository:**\n\n```bash\ngit clone https://github.com/hkchengrex/Cutie.git\n```\n\n**Install with pip:**\n\n```bash\ncd Cutie\npip install -e .\n```\n\n(If you encounter the File \"setup.py\" not found error, upgrade your pip with pip install --upgrade pip)\n\n**Download the pretrained models:**\n\n```python\npython cutie/utils/download_models.py\n```\n\n## Quick Start\n\n### Scripting Demo\n\nThis is probably the best starting point if you want to use Cutie in your project. Hopefully, the script is self-explanatory (additional comments in `scripting_demo.py`). If not, feel free to open an issue. For more advanced usage, like adding or removing objects, see `scripting_demo_add_del_objects.py`.\n\n```python\n@torch.inference_mode()\n@torch.cuda.amp.autocast()\ndef main():\n\n    cutie = get_default_model()\n    processor = InferenceCore(cutie, cfg=cutie.cfg)\n    # the processor matches the shorter edge of the input to this size\n    # you might want to experiment with different sizes, -1 keeps the original size\n    processor.max_internal_size = 480\n\n    image_path = './examples/images/bike'\n    images = sorted(os.listdir(image_path))  # ordering is important\n    mask = Image.open('./examples/masks/bike/00000.png')\n    palette = mask.getpalette()\n    objects = np.unique(np.array(mask))\n    objects = objects[objects != 0].tolist()  # background \"0\" does not count as an object\n    mask = torch.from_numpy(np.array(mask)).cuda()\n\n    for ti, image_name in enumerate(images):\n        image = Image.open(os.path.join(image_path, image_name))\n        image = to_tensor(image).cuda().float()\n\n        if ti == 0:\n            output_prob = processor.step(image, mask, objects=objects)\n        else:\n            output_prob = processor.step(image)\n\n        # convert output probabilities to an object mask\n        mask = processor.output_prob_to_mask(output_prob)\n\n        # visualize prediction\n        mask = Image.fromarray(mask.cpu().numpy().astype(np.uint8))\n        mask.putpalette(palette)\n        mask.show()  # or use mask.save(...) to save it somewhere\n\n\nmain()\n```\n\n### Interactive Demo\n\nStart the interactive demo with:\n\n```bash\npython interactive_demo.py --video ./examples/example.mp4 --num_objects 1\n```\n\n[See more instructions here](docs/INTERACTIVE.md).\nIf you are running this on a remote server, X11 forwarding is possible. Start by using `ssh -X`. Additional configurations might be needed but Google would be more helpful than me.\n\n![demo](https://i.imgur.com/nqlYqTq.jpg)\n\n(For single video evaluation, see the unofficial script `scripts/process_video.py` from https://github.com/hkchengrex/Cutie/pull/16)\n\n## Training and Evaluation\n\n1. [Running Cutie on video object segmentation data.](docs/EVALUATION.md)\n2. [Training Cutie.](docs/TRAINING.md)\n\n## Citation\n\n```bibtex\n@inproceedings{cheng2023putting,\n  title={Putting the Object Back into Video Object Segmentation},\n  author={Cheng, Ho Kei and Oh, Seoung Wug and Price, Brian and Lee, Joon-Young and Schwing, Alexander},\n  booktitle={arXiv},\n  year={2023}\n}\n```\n\n## References\n\n- The GUI tools uses [RITM](https://github.com/SamsungLabs/ritm_interactive_segmentation) for interactive image segmentation. This repository also contains a redistribution of their code in `gui/ritm`. That part of code follows RITM's license.\n\n- For automatic video segmentation/integration with external detectors, see [DEVA](https://github.com/hkchengrex/Tracking-Anything-with-DEVA).\n\n- The interactive demo is developed upon [IVS](https://github.com/seoungwugoh/ivs-demo), [MiVOS](https://github.com/hkchengrex/MiVOS), and [XMem](https://github.com/hkchengrex/XMem).\n\n- We used [ProPainter](https://github.com/sczhou/ProPainter) in our video inpainting demo.\n\n- Thanks to [RTIM](https://github.com/SamsungLabs/ritm_interactive_segmentation) and [XMem++](https://github.com/max810/XMem2) for making this possible.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkchengrex%2Fcutie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhkchengrex%2Fcutie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhkchengrex%2Fcutie/lists"}