{"id":13484681,"url":"https://github.com/PeterL1n/RobustVideoMatting","last_synced_at":"2025-03-27T16:31:09.242Z","repository":{"id":37234132,"uuid":"401484223","full_name":"PeterL1n/RobustVideoMatting","owner":"PeterL1n","description":"Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!","archived":false,"fork":false,"pushed_at":"2024-04-02T16:26:48.000Z","size":9193,"stargazers_count":8818,"open_issues_count":114,"forks_count":1155,"subscribers_count":135,"default_branch":"master","last_synced_at":"2025-03-27T07:05:02.576Z","etag":null,"topics":["ai","computer-vision","deep-learning","machine-learning","matting"],"latest_commit_sha":null,"homepage":"https://peterl1n.github.io/RobustVideoMatting/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/PeterL1n.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-08-30T20:57:44.000Z","updated_at":"2025-03-27T06:05:26.000Z","dependencies_parsed_at":"2023-02-09T14:32:45.316Z","dependency_job_id":"0bcaa96a-c39a-44f2-a617-ccde3ff39dd7","html_url":"https://github.com/PeterL1n/RobustVideoMatting","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterL1n%2FRobustVideoMatting","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterL1n%2FRobustVideoMatting/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterL1n%2FRobustVideoMatting/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/PeterL1n%2FRobustVideoMatting/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/PeterL1n","download_url":"https://codeload.github.com/PeterL1n/RobustVideoMatting/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245882337,"owners_count":20687868,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","computer-vision","deep-learning","machine-learning","matting"],"created_at":"2024-07-31T17:01:29.784Z","updated_at":"2025-03-27T16:31:09.215Z","avatar_url":"https://github.com/PeterL1n.png","language":"Python","funding_links":[],"categories":["Python","音视频类","人像\\姿势\\3D人脸","🌐 **Edge Deployment Frameworks**","Learning","Repos","AI \u0026 Machine Learning for CG"],"sub_categories":["**视频翻译与字幕**","网络服务_其他","🍎 **CoreML** - Apple","Image Matting","AI-Assisted CG Tools"],"readme":"# Robust Video Matting (RVM)\n\n![Teaser](/documentation/image/teaser.gif)\n\n\u003cp align=\"center\"\u003eEnglish | \u003ca href=\"README_zh_Hans.md\"\u003e中文\u003c/a\u003e\u003c/p\u003e\n\nOfficial repository for the paper [Robust High-Resolution Video Matting with Temporal Guidance](https://peterl1n.github.io/RobustVideoMatting/). RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves **4K 76FPS** and **HD 104FPS** on an Nvidia GTX 1080 Ti GPU. The project was developed at [ByteDance Inc.](https://www.bytedance.com/)\n\n\u003cbr\u003e\n\n## News\n\n* [Nov 03 2021] Fixed a bug in [train.py](https://github.com/PeterL1n/RobustVideoMatting/commit/48effc91576a9e0e7a8519f3da687c0d3522045f).\n* [Sep 16 2021] Code is re-released under GPL-3.0 license.\n* [Aug 25 2021] Source code and pretrained models are published.\n* [Jul 27 2021] Paper is accepted by WACV 2022.\n\n\u003cbr\u003e\n\n## Showreel\nWatch the showreel video ([YouTube](https://youtu.be/Jvzltozpbpk), [Bilibili](https://www.bilibili.com/video/BV1Z3411B7g7/)) to see the model's performance. \n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://youtu.be/Jvzltozpbpk\"\u003e\n        \u003cimg src=\"documentation/image/showreel.gif\"\u003e\n    \u003c/a\u003e\n\u003c/p\u003e\n\nAll footage in the video are available in [Google Drive](https://drive.google.com/drive/folders/1VFnWwuu-YXDKG-N6vcjK_nL7YZMFapMU?usp=sharing).\n\n\u003cbr\u003e\n\n\n## Demo\n* [Webcam Demo](https://peterl1n.github.io/RobustVideoMatting/#/demo): Run the model live in your browser. Visualize recurrent states.\n* [Colab Demo](https://colab.research.google.com/drive/10z-pNKRnVNsp0Lq9tH1J_XPZ7CBC_uHm?usp=sharing): Test our model on your own videos with free GPU. \n\n\u003cbr\u003e\n\n## Download\n\nWe recommend MobileNetv3 models for most use cases. ResNet50 models are the larger variant with small performance improvements. Our model is available on various inference frameworks. See [inference documentation](documentation/inference.md) for more instructions.\n\n\u003ctable\u003e\n    \u003cthead\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eFramework\u003c/td\u003e\n            \u003ctd\u003eDownload\u003c/td\u003e\n            \u003ctd\u003eNotes\u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/thead\u003e\n    \u003ctbody\u003e\n        \u003ctr\u003e\n            \u003ctd\u003ePyTorch\u003c/td\u003e\n            \u003ctd\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3.pth\"\u003ervm_mobilenetv3.pth\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50.pth\"\u003ervm_resnet50.pth\u003c/a\u003e\n            \u003c/td\u003e\n            \u003ctd\u003e\n                Official weights for PyTorch. \u003ca href=\"documentation/inference.md#pytorch\"\u003eDoc\u003c/a\u003e\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eTorchHub\u003c/td\u003e\n            \u003ctd\u003e\n                Nothing to Download.\n            \u003c/td\u003e\n            \u003ctd\u003e\n                Easiest way to use our model in your PyTorch project. \u003ca href=\"documentation/inference.md#torchhub\"\u003eDoc\u003c/a\u003e\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eTorchScript\u003c/td\u003e\n            \u003ctd\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp32.torchscript\"\u003ervm_mobilenetv3_fp32.torchscript\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp16.torchscript\"\u003ervm_mobilenetv3_fp16.torchscript\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp32.torchscript\"\u003ervm_resnet50_fp32.torchscript\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp16.torchscript\"\u003ervm_resnet50_fp16.torchscript\u003c/a\u003e\n            \u003c/td\u003e\n            \u003ctd\u003e\n                If inference on mobile, consider export int8 quantized models yourself. \u003ca href=\"documentation/inference.md#torchscript\"\u003eDoc\u003c/a\u003e\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eONNX\u003c/td\u003e\n            \u003ctd\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp32.onnx\"\u003ervm_mobilenetv3_fp32.onnx\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_fp16.onnx\"\u003ervm_mobilenetv3_fp16.onnx\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp32.onnx\"\u003ervm_resnet50_fp32.onnx\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_fp16.onnx\"\u003ervm_resnet50_fp16.onnx\u003c/a\u003e\n            \u003c/td\u003e\n            \u003ctd\u003e\n                Tested on ONNX Runtime with CPU and CUDA backends. Provided models use opset 12. \u003ca href=\"documentation/inference.md#onnx\"\u003eDoc\u003c/a\u003e, \u003ca href=\"https://github.com/PeterL1n/RobustVideoMatting/tree/onnx\"\u003eExporter\u003c/a\u003e.\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eTensorFlow\u003c/td\u003e\n            \u003ctd\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_tf.zip\"\u003ervm_mobilenetv3_tf.zip\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_resnet50_tf.zip\"\u003ervm_resnet50_tf.zip\u003c/a\u003e\n            \u003c/td\u003e\n            \u003ctd\u003e\n                TensorFlow 2 SavedModel. \u003ca href=\"documentation/inference.md#tensorflow\"\u003eDoc\u003c/a\u003e\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eTensorFlow.js\u003c/td\u003e\n            \u003ctd\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_tfjs_int8.zip\"\u003ervm_mobilenetv3_tfjs_int8.zip\u003c/a\u003e\u003cbr\u003e\n            \u003c/td\u003e\n            \u003ctd\u003e\n                Run the model on the web. \u003ca href=\"https://peterl1n.github.io/RobustVideoMatting/#/demo\"\u003eDemo\u003c/a\u003e, \u003ca href=\"https://github.com/PeterL1n/RobustVideoMatting/tree/tfjs\"\u003eStarter Code\u003c/a\u003e\n            \u003c/td\u003e\n        \u003c/tr\u003e\n        \u003ctr\u003e\n            \u003ctd\u003eCoreML\u003c/td\u003e\n            \u003ctd\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1280x720_s0.375_fp16.mlmodel\"\u003ervm_mobilenetv3_1280x720_s0.375_fp16.mlmodel\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1280x720_s0.375_int8.mlmodel\"\u003ervm_mobilenetv3_1280x720_s0.375_int8.mlmodel\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel\"\u003ervm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel\u003c/a\u003e\u003cbr\u003e\n                \u003ca  href=\"https://github.com/PeterL1n/RobustVideoMatting/releases/download/v1.0.0/rvm_mobilenetv3_1920x1080_s0.25_int8.mlmodel\"\u003ervm_mobilenetv3_1920x1080_s0.25_int8.mlmodel\u003c/a\u003e\u003cbr\u003e\n            \u003c/td\u003e\n            \u003ctd\u003e\n                CoreML does not support dynamic resolution. Other resolutions can be exported yourself. Models require iOS 13+. \u003ccode\u003es\u003c/code\u003e denotes \u003ccode\u003edownsample_ratio\u003c/code\u003e. \u003ca href=\"documentation/inference.md#coreml\"\u003eDoc\u003c/a\u003e, \u003ca href=\"https://github.com/PeterL1n/RobustVideoMatting/tree/coreml\"\u003eExporter\u003c/a\u003e\n            \u003c/td\u003e\n        \u003c/tr\u003e\n    \u003c/tbody\u003e\n\u003c/table\u003e\n\nAll models are available in [Google Drive](https://drive.google.com/drive/folders/1pBsG-SCTatv-95SnEuxmnvvlRx208VKj?usp=sharing) and [Baidu Pan](https://pan.baidu.com/s/1puPSxQqgBFOVpW4W7AolkA) (code: gym7).\n\n\u003cbr\u003e\n\n## PyTorch Example\n\n1. Install dependencies:\n```sh\npip install -r requirements_inference.txt\n```\n\n2. Load the model:\n\n```python\nimport torch\nfrom model import MattingNetwork\n\nmodel = MattingNetwork('mobilenetv3').eval().cuda()  # or \"resnet50\"\nmodel.load_state_dict(torch.load('rvm_mobilenetv3.pth'))\n```\n\n3. To convert videos, we provide a simple conversion API:\n\n```python\nfrom inference import convert_video\n\nconvert_video(\n    model,                           # The model, can be on any device (cpu or cuda).\n    input_source='input.mp4',        # A video file or an image sequence directory.\n    output_type='video',             # Choose \"video\" or \"png_sequence\"\n    output_composition='com.mp4',    # File path if video; directory path if png sequence.\n    output_alpha=\"pha.mp4\",          # [Optional] Output the raw alpha prediction.\n    output_foreground=\"fgr.mp4\",     # [Optional] Output the raw foreground prediction.\n    output_video_mbps=4,             # Output video mbps. Not needed for png sequence.\n    downsample_ratio=None,           # A hyperparameter to adjust or use None for auto.\n    seq_chunk=12,                    # Process n frames at once for better parallelism.\n)\n```\n\n4. Or write your own inference code:\n```python\nfrom torch.utils.data import DataLoader\nfrom torchvision.transforms import ToTensor\nfrom inference_utils import VideoReader, VideoWriter\n\nreader = VideoReader('input.mp4', transform=ToTensor())\nwriter = VideoWriter('output.mp4', frame_rate=30)\n\nbgr = torch.tensor([.47, 1, .6]).view(3, 1, 1).cuda()  # Green background.\nrec = [None] * 4                                       # Initial recurrent states.\ndownsample_ratio = 0.25                                # Adjust based on your video.\n\nwith torch.no_grad():\n    for src in DataLoader(reader):                     # RGB tensor normalized to 0 ~ 1.\n        fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio)  # Cycle the recurrent states.\n        com = fgr * pha + bgr * (1 - pha)              # Composite to green background. \n        writer.write(com)                              # Write frame.\n```\n\n5. The models and converter API are also available through TorchHub.\n\n```python\n# Load the model.\nmodel = torch.hub.load(\"PeterL1n/RobustVideoMatting\", \"mobilenetv3\") # or \"resnet50\"\n\n# Converter API.\nconvert_video = torch.hub.load(\"PeterL1n/RobustVideoMatting\", \"converter\")\n```\n\nPlease see [inference documentation](documentation/inference.md) for details on `downsample_ratio` hyperparameter, more converter arguments, and more advanced usage.\n\n\u003cbr\u003e\n\n## Training and Evaluation\n\nPlease refer to the [training documentation](documentation/training.md) to train and evaluate your own model.\n\n\u003cbr\u003e\n\n## Speed\n\nSpeed is measured with `inference_speed_test.py` for reference.\n\n| GPU            | dType | HD (1920x1080) | 4K (3840x2160) |\n| -------------- | ----- | -------------- |----------------|\n| RTX 3090       | FP16  | 172 FPS        | 154 FPS        |\n| RTX 2060 Super | FP16  | 134 FPS        | 108 FPS        |\n| GTX 1080 Ti    | FP32  | 104 FPS        | 74 FPS         |\n\n* Note 1: HD uses `downsample_ratio=0.25`, 4K uses `downsample_ratio=0.125`. All tests use batch size 1 and frame chunk 1.\n* Note 2: GPUs before Turing architecture does not support FP16 inference, so GTX 1080 Ti uses FP32.\n* Note 3: We only measure tensor throughput. The provided video conversion script in this repo is expected to be much slower, because it does not utilize hardware video encoding/decoding and does not have the tensor transfer done on parallel threads. If you are interested in implementing hardware video encoding/decoding in Python, please refer to [PyNvCodec](https://github.com/NVIDIA/VideoProcessingFramework).\n\n\u003cbr\u003e  \n\n## Project Members\n* [Shanchuan Lin](https://www.linkedin.com/in/shanchuanlin/)\n* [Linjie Yang](https://sites.google.com/site/linjieyang89/)\n* [Imran Saleemi](https://www.linkedin.com/in/imran-saleemi/)\n* [Soumyadip Sengupta](https://homes.cs.washington.edu/~soumya91/)\n\n\u003cbr\u003e\n\n## Third-Party Projects\n\n* [NCNN C++ Android](https://github.com/FeiGeChuanShu/ncnn_Android_RobustVideoMatting) ([@FeiGeChuanShu](https://github.com/FeiGeChuanShu))\n* [lite.ai.toolkit](https://github.com/DefTruth/RobustVideoMatting.lite.ai.toolkit) ([@DefTruth](https://github.com/DefTruth))\n* [Gradio Web Demo](https://huggingface.co/spaces/akhaliq/Robust-Video-Matting) ([@AK391](https://github.com/AK391))\n* [Unity Engine demo with NatML](https://hub.natml.ai/@natsuite/robust-video-matting) ([@natsuite](https://github.com/natsuite))  \n* [MNN C++ Demo](https://github.com/DefTruth/lite.ai.toolkit/blob/main/lite/mnn/cv/mnn_rvm.cpp) ([@DefTruth](https://github.com/DefTruth))\n* [TNN C++ Demo](https://github.com/DefTruth/lite.ai.toolkit/blob/main/lite/tnn/cv/tnn_rvm.cpp) ([@DefTruth](https://github.com/DefTruth))\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPeterL1n%2FRobustVideoMatting","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FPeterL1n%2FRobustVideoMatting","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FPeterL1n%2FRobustVideoMatting/lists"}