https://github.com/tencentarc/stereocrafter
A framework to convert any 2D videos to immersive stereoscopic 3D
https://github.com/tencentarc/stereocrafter
Last synced: about 1 year ago
JSON representation
A framework to convert any 2D videos to immersive stereoscopic 3D
- Host: GitHub
- URL: https://github.com/tencentarc/stereocrafter
- Owner: TencentARC
- License: other
- Created: 2024-12-27T11:48:35.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-07T02:47:58.000Z (over 1 year ago)
- Last Synced: 2025-01-07T03:38:33.822Z (over 1 year ago)
- Language: Python
- Size: 46.8 MB
- Stars: 129
- Watchers: 7
- Forks: 8
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: License-Code.txt
Awesome Lists containing this project
README
StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos
Sijie Zhao*
Wenbo Hu*
Xiaodong Cun*
Yong Zhang†
Xiaoyu Li†
Zhe Kong
Xiangjun Gao
Muyao Niu
Ying Shan
* equal contribution † corresponding author
Tencent AI Lab ARC Lab, Tencent PCG
## 💡 Abstract
We propose a novel framework to convert any 2D videos to immersive stereoscopic 3D ones that can be viewed on different display devices, like 3D Glasses, Apple Vision Pro and 3D Display. It can be applied to various video sources, such as movies, vlogs, 3D cartoons, and AIGC videos.

## 📣 News
- `2024/12/27` We released our inference code and model weights.
- `2024/09/11` We submitted our technical report on arXiv and released our project page.
## 🎞️ Showcases
Here we show some examples of input videos and their corresponding stereo outputs in Anaglyph 3D format.
## 🛠️ Installation
#### 1. Set up the environment
We run our code on Python 3.8 and Cuda 11.8.
You can use Anaconda or Docker to build this basic environment.
#### 2. Clone the repo
```bash
# use --recursive to clone the dependent submodules
git clone --recursive https://github.com/TencentARC/StereoCrafter
cd StereoCrafter
```
#### 3. Install the requirements
```bash
pip install -r requirements.txt
```
#### 4. Install customized 'Forward-Warp' package for forward splatting
```
cd ./dependency/Forward-Warp
chmod a+x install.sh
./install.sh
```
## 📦 Model Weights
#### 1. Download the [SVD img2vid model](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1) for the image encoder and VAE.
```bash
# in StereoCrafter project root directory
mkdir weights
cd ./weights
git lfs install
git clone https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt-1-1
```
#### 2. Download the [DepthCrafter model](https://huggingface.co/tencent/DepthCrafter) for the video depth estimation.
```bash
git clone https://huggingface.co/tencent/DepthCrafter
```
#### 3. Download the [StereoCrafter model](https://huggingface.co/TencentARC/StereoCrafter) for the stereo video generation.
```bash
git clone https://huggingface.co/TencentARC/StereoCrafter
```
## 🔄 Inference
Script:
```bash
# in StereoCrafter project root directory
sh run_inference.sh
```
There are two main steps in this script for generating stereo video.
#### 1. Depth-Based Video Splatting Using the Video Depth from DepthCrafter
Execute the following command:
```bash
python depth_splatting_inference.py --pre_trained_path [PATH] --unet_path [PATH]
--input_video_path [PATH] --output_video_path [PATH]
```
Arguments:
- `--pre_trained_path`: Path to the SVD img2vid model weights (e.g., `./weights/stable-video-diffusion-img2vid-xt-1-1`).
- `--unet_path`: Path to the DepthCrafter model weights (e.g., `./weights/DepthCrafter`).
- `--input_video_path`: Path to the input video (e.g., `./source_video/camel.mp4`).
- `--output_video_path`: Path to the output video (e.g., `./outputs/camel_splatting_results.mp4`).
- `--max_disp`: Parameter controlling the maximum disparity between the generated right video and the input left video. Default value is `20` pixels.
The first step generates a video grid with input video, visualized depth map, occlusion mask, and splatting right video, as shown below:
#### 2. Stereo Video Inpainting of the Splatting Video
Execute the following command:
```bash
python inpainting_inference.py --pre_trained_path [PATH] --unet_path [PATH]
--input_video_path [PATH] --save_dir [PATH]
```
Arguments:
- `--pre_trained_path`: Path to the SVD img2vid model weights (e.g., `./weights/stable-video-diffusion-img2vid-xt-1-1`).
- `--unet_path`: Path to the StereoCrafter model weights (e.g., `./weights/StereoCrafter`).
- `--input_video_path`: Path to the splatting video result generated by the first stage (e.g., `./outputs/camel_splatting_results.mp4`).
- `--save_dir`: Directory for the output stereo video (e.g., `./outputs`).
- `--tile_num`: The number of tiles in width and height dimensions for tiled processing, which allows for handling high resolution input without requiring more GPU memory. The default value is `1` (1 $\times$ 1 tile). For input videos with a resolution of 2K or higher, you could use more tiles to avoid running out of memory.
The stereo video inpainting generates the stereo video result in side-by-side format and anaglyph 3D format, as shown below:

## 🤝 Acknowledgements
We would like to express our gratitude to the following open-source projects:
- [Stable Video Diffusion](https://github.com/Stability-AI/generative-models): A latent diffusion model trained to generate video clips from an image or text conditioning.
- [DepthCrafter](https://github.com/Tencent/DepthCrafter): A novel method to generate temporally consistent depth sequences from videos.
## 📚 Citation
```bibtex
@article{zhao2024stereocrafter,
title={Stereocrafter: Diffusion-based generation of long and high-fidelity stereoscopic 3d from monocular videos},
author={Zhao, Sijie and Hu, Wenbo and Cun, Xiaodong and Zhang, Yong and Li, Xiaoyu and Kong, Zhe and Gao, Xiangjun and Niu, Muyao and Shan, Ying},
journal={arXiv preprint arXiv:2409.07447},
year={2024}
}
```