{"id":29009897,"url":"https://github.com/tencentarc/stereocrafter","last_synced_at":"2025-06-25T15:33:39.249Z","repository":{"id":270343255,"uuid":"908958874","full_name":"TencentARC/StereoCrafter","owner":"TencentARC","description":"A framework to convert any 2D videos to immersive stereoscopic 3D","archived":false,"fork":false,"pushed_at":"2025-01-07T02:47:58.000Z","size":49057,"stargazers_count":129,"open_issues_count":4,"forks_count":8,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-01-07T03:38:33.822Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TencentARC.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"License-Code.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-27T11:48:35.000Z","updated_at":"2025-01-07T02:48:01.000Z","dependencies_parsed_at":"2024-12-30T13:34:23.173Z","dependency_job_id":null,"html_url":"https://github.com/TencentARC/StereoCrafter","commit_stats":null,"previous_names":["tencentarc/stereocrafter"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TencentARC/StereoCrafter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FStereoCrafter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FStereoCrafter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FStereoCrafter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FStereoCrafter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TencentARC","download_url":"https://codeload.github.com/TencentARC/StereoCrafter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TencentARC%2FStereoCrafter/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261901407,"owners_count":23227593,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-25T15:33:36.679Z","updated_at":"2025-06-25T15:33:39.219Z","avatar_url":"https://github.com/TencentARC.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cdiv align=\"center\"\u003e\n\u003ch2\u003eStereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos\u003c/h2\u003e\n\nSijie Zhao*\u0026emsp;\nWenbo Hu*\u0026emsp;\nXiaodong Cun*\u0026emsp;\nYong Zhang\u0026dagger;\u0026emsp;\nXiaoyu Li\u0026dagger;\u0026emsp;\u003cbr\u003e\nZhe Kong\u0026emsp;\nXiangjun Gao\u0026emsp;\nMuyao Niu\u0026emsp;\nYing Shan\n\n\u0026emsp;* equal contribution \u0026emsp; \u0026dagger; corresponding author \n\n\u003ch3\u003eTencent AI Lab\u0026emsp;\u0026emsp;ARC Lab, Tencent PCG\u003c/h3\u003e\n\n\u003ca href='https://arxiv.org/abs/2409.07447'\u003e\u003cimg src='https://img.shields.io/badge/arXiv-PDF-a92225'\u003e\u003c/a\u003e \u0026emsp;\n\u003ca href='https://stereocrafter.github.io/'\u003e\u003cimg src='https://img.shields.io/badge/Project_Page-Page-64fefe' alt='Project Page'\u003e\u003c/a\u003e \u0026emsp;\n\u003ca href='https://huggingface.co/TencentARC/StereoCrafter'\u003e\u003cimg src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Weights-yellow'\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n## 💡 Abstract\n\nWe propose a novel framework to convert any 2D videos to immersive stereoscopic 3D ones that can be viewed on different display devices, like 3D Glasses, Apple Vision Pro and 3D Display. It can be applied to various video sources, such as movies, vlogs, 3D cartoons, and AIGC videos.\n\n![teaser](assets/teaser.jpg)\n\n## 📣 News\n- `2024/12/27` We released our inference code and model weights.\n- `2024/09/11` We submitted our technical report on arXiv and released our project page.\n\n## 🎞️ Showcases\nHere we show some examples of input videos and their corresponding stereo outputs in Anaglyph 3D format.\n\u003cdiv align=\"center\"\u003e\n    \u003cimg src=\"assets/demo.gif\"\u003e\n\u003c/div\u003e\n\n\n## 🛠️ Installation\n\n#### 1. Set up the environment\nWe run our code on Python 3.8 and Cuda 11.8.\nYou can use Anaconda or Docker to build this basic environment.\n\n#### 2. Clone the repo\n```bash\n# use --recursive to clone the dependent submodules\ngit clone --recursive https://github.com/TencentARC/StereoCrafter\ncd StereoCrafter\n```\n\n#### 3. Install the requirements\n```bash\npip install -r requirements.txt\n```\n\n\n#### 4. Install customized 'Forward-Warp' package for forward splatting\n```\ncd ./dependency/Forward-Warp\nchmod a+x install.sh\n./install.sh\n```\n\n\n## 📦 Model Weights\n\n#### 1. Download the [SVD img2vid model](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1) for the image encoder and VAE.\n\n```bash\n# in StereoCrafter project root directory\nmkdir weights\ncd ./weights\ngit lfs install\ngit clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1\n```\n\n#### 2. Download the [DepthCrafter model](https://huggingface.co/tencent/DepthCrafter) for the video depth estimation.\n```bash\ngit clone https://huggingface.co/tencent/DepthCrafter\n```\n\n#### 3. Download the [StereoCrafter model](https://huggingface.co/TencentARC/StereoCrafter) for the stereo video generation.\n```bash\ngit clone https://huggingface.co/TencentARC/StereoCrafter\n```\n\n\n## 🔄 Inference\n\nScript:\n\n```bash\n# in StereoCrafter project root directory\nsh run_inference.sh\n```\n\nThere are two main steps in this script for generating stereo video.\n\n#### 1. Depth-Based Video Splatting Using the Video Depth from DepthCrafter\nExecute the following command:\n```bash\npython depth_splatting_inference.py --pre_trained_path [PATH] --unet_path [PATH]\n                                    --input_video_path [PATH] --output_video_path [PATH]\n```\nArguments:\n- `--pre_trained_path`: Path to the SVD img2vid model weights (e.g., `./weights/stable-video-diffusion-img2vid-xt-1-1`).\n- `--unet_path`: Path to the DepthCrafter model weights (e.g., `./weights/DepthCrafter`).\n- `--input_video_path`: Path to the input video (e.g., `./source_video/camel.mp4`).\n- `--output_video_path`: Path to the output video (e.g., `./outputs/camel_splatting_results.mp4`).\n- `--max_disp`: Parameter controlling the maximum disparity between the generated right video and the input left video. Default value is `20` pixels.\n\nThe first step generates a video grid with input video, visualized depth map, occlusion mask, and splatting right video, as shown below:\n\n\u003cimg src=\"assets/camel_splatting_results.jpg\" alt=\"camel_splatting_results\" width=\"800\"/\u003e \n\n#### 2. Stereo Video Inpainting of the Splatting Video\nExecute the following command:\n```bash\npython inpainting_inference.py --pre_trained_path [PATH] --unet_path [PATH]\n                               --input_video_path [PATH] --save_dir [PATH]\n```\nArguments:\n- `--pre_trained_path`: Path to the SVD img2vid model weights (e.g., `./weights/stable-video-diffusion-img2vid-xt-1-1`).\n- `--unet_path`: Path to the StereoCrafter model weights (e.g., `./weights/StereoCrafter`).\n- `--input_video_path`: Path to the splatting video result generated by the first stage (e.g., `./outputs/camel_splatting_results.mp4`).\n- `--save_dir`: Directory for the output stereo video (e.g., `./outputs`).\n- `--tile_num`: The number of tiles in width and height dimensions for tiled processing, which allows for handling high resolution input without requiring more GPU memory. The default value is `1` (1 $\\times$ 1 tile). For input videos with a resolution of 2K or higher, you could use more tiles to avoid running out of memory.\n\nThe stereo video inpainting generates the stereo video result in side-by-side format and anaglyph 3D format, as shown below:\n\n\u003cimg src=\"assets/camel_sbs.jpg\" alt=\"camel_sbs\" width=\"800\"/\u003e \n\n\u003cimg src=\"assets/camel_anaglyph.jpg\" alt=\"camel_anaglyph\" width=\"400\"/\u003e\n\n## 🤝 Acknowledgements\n\nWe would like to express our gratitude to the following open-source projects:\n- [Stable Video Diffusion](https://github.com/Stability-AI/generative-models): A latent diffusion model trained to generate video clips from an image or text conditioning.\n- [DepthCrafter](https://github.com/Tencent/DepthCrafter): A novel method to generate temporally consistent depth sequences from videos.\n\n\n## 📚 Citation\n\n```bibtex\n@article{zhao2024stereocrafter,\n  title={Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos},\n  author={Zhao, Sijie and Hu, Wenbo and Cun, Xiaodong and Zhang, Yong and Li, Xiaoyu and Kong, Zhe and Gao, Xiangjun and Niu, Muyao and Shan, Ying},\n  journal={arXiv preprint arXiv:2409.07447},\n  year={2024}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fstereocrafter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftencentarc%2Fstereocrafter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftencentarc%2Fstereocrafter/lists"}