Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/facebookresearch/omnimatterf

A matting method that combines dynamic 2D foreground layers and a 3D background model.
https://github.com/facebookresearch/omnimatterf

Last synced: 8 days ago
JSON representation

A matting method that combines dynamic 2D foreground layers and a 3D background model.

Host: GitHub
URL: https://github.com/facebookresearch/omnimatterf
Owner: facebookresearch
License: mit
Created: 2023-08-23T22:36:19.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-09-15T18:54:30.000Z (about 1 year ago)
Last Synced: 2024-03-04T17:40:57.599Z (8 months ago)
Language: Python
Homepage:
Size: 495 KB
Stars: 116
Watchers: 8
Forks: 14
Open Issues: 2
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

OmnimatteRF: Robust Omnimatte with 3D Background Modeling
---

### [Project Page](https://omnimatte-rf.github.io/) | [arXiv](https://arxiv.org/abs/2309.07749)

Video matting has broad applications, from adding interesting effects to casually captured movies to assisting video production professionals. Matting with associated effects like shadows and reflections has also attracted increasing research activity, and methods like Omnimatte have been proposed to separate foreground objects of interest into their own layers. However, prior works represent video backgrounds as 2D image layers, limiting their capacity to express more complicated scenes, thus hindering application to real-world videos. In this paper, we propose a novel video matting method, F2B3, that combines 2D foreground layers and a 3D background model. The 2D layers preserve the details of the subjects, while the 3D background robustly reconstructs scenes in real-world videos. Extensive experiments demonstrate that our method reconstructs with better quality on various videos.

> **OmnimatteRF: Robust Omnimatte with 3D Background Modeling**
>
> [Geng Lin](https://scholar.google.com/citations?user=2Vh_sboAAAAJ&hl=en), [Chen Gao](http://chengao.vision/), [Jia-Bin Huang](https://jbhuang0604.github.io/), [Changil Kim](https://changilkim.com/), [Yipeng Wang](https://www.linkedin.com/in/yipeng-wang99/), [Matthias Zwicker](https://www.cs.umd.edu/~zwicker/), [Ayush Saraf](https://scholar.google.com/citations?user=bluhHm8AAAAJ&hl=en)
>
> in ICCV 2023

## Setup

### Docker

If you have a containerized environment, you can run our code with this image: `logchan/matting:20221229.01` on [docker hub](https://hub.docker.com/r/logchan/matting). It is recommended that you mount three paths inside the container:

- `/code` for this repository
- `/data` for video datasets (see [data format](#data))
- `/output` for experiment output
- `/home/user` for storing shell config and PyTorch cache; copy `.bashrc` to this folder to use fish by default

Check [here](docker/docker-compose.yaml) for an example `docker-compose.yaml`.

### Virtual Environment / Conda

You can setup a Python environment with these packages installed:

```
torch
torch-efficient-distloss
tinycudann
dataclasses-json
detectron2
hydra-core
kornia
lpips
scikit-image
tensorboard
tqdm

# for running RoDynRF
easydict
ConfigArgParse
```

Required software in PATH:

- `ffmpeg`
- `colmap` (for pose estimation only)

## Data

Download our synthetic and captured datasets from [Google Drive](https://drive.google.com/drive/folders/1PSEcqUR1prfQ51jzlCJ7tWDHPXmZGbMo).

The following data are needed to run our method:

- `rgb_1x`, input video sequence as image files
- `poses_bounds.npy` or `transforms.json`, camera poses in the LLFF or NeRF Blender format
- `flow/flow` and `flow/flow_backward` are forward and backward optical flows written with RAFT `writeFlow`; `flow/confidence` contains confidence maps generated by omnimatte
- `masks/mask`, containing one or more subfolders, each providing a coarse mask sequence.
- **Note: our mask layer order is reverse of omnimatte's**
- `depth`, monocular depth estimation (required only if using depth loss)

While all paths are configurable with command line arguments, the code by default recognizes the following structure:

```
/data/matting/wild/bouldering
├── colmap
│   └── poses_bounds.npy
├── depth
│   └── depth
│      └── 00000.npy
├── flow
│   ├── confidence
│   │   └── 0001.png
│   ├── flow
│   │   └── 00000.flo
│   └── flow_backward
│   └── 00000.flo
├── homography
│   └── homographies.npy
├── masks
│   └── mask
│   └── 00
│   └── 00000.png
└── rgb_1x
   └── 00000.png
```

We also provide scripts for preparing all data required to run our pipeline, and for converting our data format to Omnimatte or Nerfies formats. See [using your video](docs/using-your-video.md) for details.

## Running our code

We use [hydra](https://hydra.cc/) for configuring the pipeline, training parameters, and evaluation setups. The entrypoint files and predefined configurations are located in the [workflows](workflows) folder.

You can find the documented config structure in code files under [core/config](core/config).

### All-in-One CLI

To make it easy to prepare data and run experiments, we have created a simple command line interface, `ui/cli.py`. It requires some setup as it enforces the data organization shown above. See how to use it in [Using the CLI](docs/using-the-cli.md).

If you can't use the CLI, it basically wraps the commands described below.

### Train

#### Basic configuration (without depth supervision)

```
# Using CLI

python ./ui/cli.py train_ours wild/walk

python ./ui/cli.py train_ours wild/bouldering -- \
data_sources.llff_camera.scene_scale=0.2

# Invoke workflow directly

python workflows/train.py \
--config-name train_both \
output=/output/train/wild/walk/matting/basic-exp \
dataset.path=/data/matting/wild/walk \
dataset.scale=0.25 \
contraction=ndc

python workflows/train.py \
--config-name train_both \
output=/output/train/wild/bouldering/matting/basic-exp \
dataset.path=/data/matting/wild/bouldering \
dataset.scale=0.25 \
data_sources=[flow,mask,colmap] \
contraction=ndc \
data_sources.llff_camera.scene_scale=0.2
```

In the above command,

- `dataset.scale` sets the resolution scale of the images. The _bouldering_ video is 1080p and training at 0.5x scale would require ~40GB of VRAM.
- `data_sources` specifies which data folders (apart from images) should be loaded for training.
- The minimal requirement of our pipeline is `[flow,mask,{pose}]`, where `pose` should be one of `colmap`, `blender` (for synthetic data), or `rodynrf` (if pose is from RoDynRF). The default is `[flow,mask,colmap]`.
- The `rodynrf` config uses the same npy file format as `colmap`, but assumes that the file is stored under `rodynrf/poses_bounds.npy`. It also disables some pose preprocessing steps.
- `contraction` sets how rays should be contracted into a fixed volume for TensoRF. We use `ndc` for synthetic and COLMAP-reconstructed poses, and `mipnerf` for RoDynRF-predicted poses.
- `data_sources.llff_camera.scene_scale` scales all camera origins to fit the scene in a smaller volume. In practice this prevents TensoRF from getting OOM errors for some videos.

#### With depth supervision

```
# Using CLI

python ./ui/cli.py \
train_ours \
wild/bouldering \
--use_depths \
-- \
fg_losses=[alpha_reg,brightness_reg,flow_recons,mask,recons,warped_alpha,bg_tv_reg,robust_depth_matching,bg_distortion] \
fg_losses.robust_depth_matching.config.alpha=0.1 \
fg_losses.bg_distortion.config.alpha=0.01 \
data_sources.llff_camera.scene_scale=0.2

# Invoke workflow directly

python workflows/train.py \
--config-name train_both \
output=/output/train/wild/bouldering/matting/exp-with-depths \
dataset.path=/data/matting/wild/bouldering \
dataset.scale=0.25 \
data_sources=[flow,mask,colmap,depths] \
contraction=ndc \
fg_losses=[alpha_reg,brightness_reg,flow_recons,mask,recons,warped_alpha,bg_tv_reg,robust_depth_matching,bg_distortion] \
fg_losses.robust_depth_matching.config.alpha=0.1 \
fg_losses.bg_distortion.config.alpha=0.01 \
data_sources.llff_camera.scene_scale=0.2
```

The configs `robust_depth_matching` and `bg_distortion` enables monocular depth supervision and distortion loss respectively.

### Evaluate

By default, the evaluation script loads pipeline and dataset configurations from training:

```
# Using CLI

python ./ui/cli.py eval_ours wild/bouldering/exp-with-depths --step 15000

# Invoke workflow directly

python workflows/eval.py \
output=/output/train/wild/bouldering/matting/exp-with-depths/eval/15000 \
checkpoint=/output/train/wild/bouldering/matting/exp-with-depths/checkpoints/checkpoint_15000.pth
```

### Clean background retraining

If you find some shadows captured in both foreground and background layers, it may be possible to obtain a clean background by training the TensoRF model from scratch, using the mask from the jointly-trained foreground.

The eval script generates `fg_alpha` which is the combined alpha of foreground layers. You can train the background RF using:

```
# Using CLI

python ui/cli.py \
train_ours \
--config train_bg \
--name retrain_bg \
--mask /output/train/wild/walk/matting/basic-exp/eval/15000/fg_alpha \
wild/walk

# Invoke workflow directly

python workflows/train.py \
--config-name train_bg \
output=/output/train/wild/walk/retrain-bg \
dataset.path=/data/matting/wild/walk \
dataset.scale=0.25 \
data_sources=[mask,colmap] \
data_sources.mask.subpath=/output/train/wild/walk/matting/basic-exp/eval/15000/fg_alpha \
contraction=ndc
```

## Contact

For any issues related to code and data, [file an issue](issues) or email [email protected].

## Citation

```
@InProceedings{Lin_2023_ICCV,
author = {Geng Lin and Chen Gao and Jia-Bin Huang and Changil Kim and Yipeng Wang and Matthias Zwicker and Ayush Saraf},
title = {OmnimatteRF: Robust Omnimatte with 3D Background Modeling},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023}
}
```

## Acknowledgements

The code is available under the MIT license.

Our codebase contains code from [MiDaS](https://github.com/isl-org/MiDaS), [omnimatte](https://github.com/erikalu/omnimatte), [RAFT](https://github.com/princeton-vl/RAFT), [RoDynRF](https://github.com/facebookresearch/robust-dynrf), and [TensoRF](https://github.com/apchenstu/TensoRF). Their licenses can be found under the [licenses](licenses) folder.