An open API service indexing awesome lists of open source software.

https://github.com/dingmyu/VRDP

[NeurIPS 2021] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language
https://github.com/dingmyu/VRDP

Last synced: 3 months ago
JSON representation

[NeurIPS 2021] Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language

Awesome Lists containing this project

README

        

# VRDP (NeurIPS 2021)

**[Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language](https://arxiv.org/abs/2110.15358)**


[Mingyu Ding](https://dingmyu.github.io/),
[Zhenfang Chen](https://zfchenunique.github.io/),
[Tao Du](https://people.csail.mit.edu/taodu/),
[Ping Luo](http://luoping.me/),
[Joshua B. Tenenbaum](https://web.mit.edu/cocosci/josh.html), and
[Chuang Gan](http://people.csail.mit.edu/ganchuang/)

![image](assets/vrdp.gif)

More details can be found at the [Project Page](http://vrdp.csail.mit.edu/).

If you find our work useful in your research please consider citing our paper:

@inproceedings{ding2021dynamic,
author = {Ding, Mingyu and Chen, Zhenfang and Du, Tao and Luo, Ping and Tenenbaum, Joshua B and Gan, Chuang},
title = {Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language},
booktitle = {Advances In Neural Information Processing Systems},
year = {2021}
}

## Prerequisites

- Python 3
- PyTorch 1.3 or higher
- All relative packages are covered by Miniconda
- Both CPUs and GPUs are supported

## Dataset preparation

- Download videos, video annotation, questions and answers, and object proposals accordingly from the [official website](http://clevrer.csail.mit.edu/#)

- Transform videos into ".png" frames with ffmpeg.

- Organize the data as shown below.

```
clevrer
├── annotation_00000-01000
│ ├── annotation_00000.json
│ ├── annotation_00001.json
│   └── ...
├── ...
├── image_00000-01000
│ │   ├── 1.png
│  │   ├── 2.png
│ │   └── ...
│ └── ...
├── ...
├── questions
│ ├── train.json
│ ├── validation.json
│   └── test.json
├── proposals
│ ├── proposal_00000.json
│ ├── proposal_00001.json
│   └── ...
```

- We also provide data for physics learning and program execution in [Google Drive](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY).
You can download them optionally and put them in the `./data/` folder.

- Download the processed data [executor_data.zip](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY) for the executor. Put it in and unzip it to `./executor/data/`.

## Get Object Dictionaries (Concepts and Trajectories)

Download the [object proposals](http://clevrer.csail.mit.edu/#) from the region proposal network and follow the `Step-by-step Training` in [DCL](https://github.com/zfchenUnique/DCL-Release) to get object concepts and trajectories.

The above process includes:

- trajectory extraction
- concept learning
- trajectory refinement

Or you can download our extracted object dictionaries [object_dicts.zip](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY) directly from Google Drive.

## Learning

### 1. Differentiable Physics Learning

After we get the above object dictionaries, we learn physical parameters from object properties and trajectories.

```shell
cd dynamics/
python3 learn_dynamics.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.
```

The output object physical parameters [object_dicts_with_physics.zip](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY) can be downloaded from Google Drive.

### 2. Physics Simulation (counterfactual)

Physical simulation using learned physical parameters.

```shell
cd dynamics/
python3 physics_simulation.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.
```

The output simulated trajectories/events [object_simulated.zip](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY) can be downloaded from Google Drive.

### 3. Physics Simulation (predictive)

Correction of long-range prediction according to video observations.

```shell
cd dynamics/
python3 refine_prediction.py 10000 15000
# Here argv[1] and argv[2] represent the start and end processing index respectively.
```

The output refined trajectories/events [object_updated_results.zip](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY) can be downloaded from Google Drive.

## Evaluation

After we get the final trajectories/events, we perform the neuro-symbolic execution and evaluate the performance on the validation set.

```shell
cd executor/
python3 evaluation.py
```

The test json file for evaluation on [evalAI](https://eval.ai/web/challenges/challenge-page/667/overview) can be generated by

```shell
cd executor/
python3 get_results.py
```

## The Generalized Clerver Dataset (counterfactual_mass)

- Download [causal_mass.zip](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY) and [counterfactual_mass.zip](https://connecthkuhk-my.sharepoint.com/:f:/g/personal/u3007305_connect_hku_hk/Emlb-yHsV6ZLjDcVAxl7TOYBPkMA6pDcA505dtsIEQ1cqQ?e=0lQuoY) from Google Drive.
- Generate counterfactual data on the collision event by `python3 counterfactual_mass/generate_data.py`

## Examples

- Predictive question
![image](assets/predictive.gif)
- Counterfactual question
![image](assets/counterfactual.gif)

## Acknowledgements

For questions regarding VRDP, feel free to post here or directly contact the author ([email protected]).