https://github.com/vLAR-group/UnsupObjSeg

🔥Benchmarking Unsupervised Obj Seg (NeurIPS 2022 & IJCV 2024)
https://github.com/vLAR-group/UnsupObjSeg

benchmarking generative-model instance-segmentation neurips-2022 object-centric object-detection object-segmentation real-world-images unsupervised-learning variational-autoencoder variational-inference

Last synced: 7 months ago
JSON representation

🔥Benchmarking Unsupervised Obj Seg (NeurIPS 2022 & IJCV 2024)

Host: GitHub
URL: https://github.com/vLAR-group/UnsupObjSeg
Owner: vLAR-group
License: mit
Created: 2022-09-28T15:27:48.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-10-17T13:31:11.000Z (12 months ago)
Last Synced: 2024-10-19T14:03:50.753Z (12 months ago)
Topics: benchmarking, generative-model, instance-segmentation, neurips-2022, object-centric, object-detection, object-segmentation, real-world-images, unsupervised-learning, variational-autoencoder, variational-inference
Language: Python
Homepage: https://vlar-group.github.io/UnsupObjSeg.html
Size: 13.2 MB
Stars: 34
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

[![arXiv](https://img.shields.io/badge/arXiv-2210.02324-b31b1b.svg)](https://arxiv.org/abs/2210.02324)
![code visitors](https://visitor-badge.glitch.me/badge?page_id=vLAR-group/UnsupObjSeg)
[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/vLAR-group/UnsupObjSeg/blob/main/LICENSE)
[![Twitter Follow](https://img.shields.io/twitter/follow/vLAR_Group?style=social)](https://twitter.com/vLAR_Group)

## Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images (NeurIPS 2022)
[Yafei Yang](https://yangyafei1998.github.io/), [Bo Yang](https://yang7879.github.io/)

[**Project Page**](https://vlar-group.github.io/UnsupObjSeg.html) | [**Paper**](https://arxiv.org/abs/2210.02324)

![teaser.png](media/teaser.png)

## Overall Structure :earth_americas:
This repository contains:
* Complexity Factors Calculation for Datasets under `Complexity_Factors/`.
* Six Datasets Generation / Adaptation under `Dataset_Generation/`, including:
* dSprites;
* Tetris;
* CLEVR;
* YCB;
* ScanNet;
* COCO.
* Four Representative Methods Re-implementation / Adaptation including:
* AIR (["Attend, Infer, Repeat: Fast Scene Understanding with Generative Models"](https://arxiv.org/abs/1603.08575)) under `AIR/`;
* MONet (["MONet: Unsupervised Scene Decomposition and Representation"](https://arxiv.org/abs/1901.11390)) under `MONet/`;
* IODINE (["Multi-Object Representation Learning with Iterative Variational Inference"](https://arxiv.org/abs/1903.00450)) under `IODINE/`;
* Slot Attention (["Object-Centric Learning with Slot Attention"](https://arxiv.org/abs/2006.15055)) under `Slot_Attention`.
* Evaluation of Object Segmentation Performance under `Segmentation_Evaluation/`, including:
* AP score;
* PQ score;
* Precision and Recall.

IJCV extension contains:
* Additional Complexity Factors Calculation for Background under `Complexity_Factors/`.
* MOVi Datasets Generation under `Dataset_Generation/MOVi`.
* Background Complexity Factors Adaptation under `Dataset_Generation/Ablation Dataset`.
* Additional Baseline DINOSAUR (["Bridging the Gap to Real-World Object-Centric Learning"](https://arxiv.org/abs/2209.14860)).
* Additional Evaluation Metrics under `Segmentation_Evaluation/`, including:
* ARI;
* ARP;
* ARR;
* Background Recall.

## Preparation :construction_worker:
### 1. Create conda environment
```
conda env create -f [env_name].yml
conda activate [env_name]
```
Note: Since this repo consists of implementation of different approaches, we use seperate conda environments to manage them. Specifcally, use `tf1_env.yml` to build environment for **IODINE**, use `tf2_env.yml` to build environment for **Slot Attention** and use `pytorch_env.yml` for **AIR** and **MONet**.

### 2. Prepare datasets
Datasets used in this paper can be downloaded [here](https://www.dropbox.com/sh/u1p1d6hysjxqauy/AACgEh0K5ANipuIeDnmaC5mQa?dl=0). We provide both TFRecord and PNG files for each dataset. Alternatively, you can generate datasets following below instructions.
#### 2.1 Dsprites dataset
Download raw dSprites shape data from https://github.com/deepmind/dsprites-dataset. Put downloaded `dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz` under `Dataset_Generation/dSprites`. \
Create our dSprite dataset using given shape data with:
```
cd Dataset_Generation
python dSprites/create_dsprites_dataset.py --n_imgs [num_imgs] --root [dSprites_location] --min_object_count 2 --max_object_count 6
```
This will create `[num_imgs]` images and their corresponding masks under `[dSprites_location]/image` and `[dSprites_location]/mask`.

#### 2.2 Tetris Dataset
Download Tetrominoes dataset from https://github.com/deepmind/multi_object_datasets. Put downloaded `tetrominoes_train.tfrecords` under `Dataset_Generation/Tetris`.\
Parse tfrecord data into images with:
```
cd Dataset_Generation
python Tetris/read_tetris_tfrecords.py
```
This will create 10000 images from tetrominoes dataset of resolution 35x35 under `Tetris/tetris_source` .\
Create our Tetris dataset using previously parsed images with:
```
python Tetris/create_tetris_dataset.py --n_imgs [num_imgs] --root [Tetris_location] --min_object_count 2 --max_object_count 6
```
This will create `[num_imgs]` images and their corresponding masks under `[Tetris_location]/image` and `[Tetris_location]/mask`.

#### 2.3 CLEVR Dataset
Clone and follow the instructions of repo https://github.com/facebookresearch/clevr-dataset-gen and render CLEVR images with:
```
cd image_generation
blender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6
```
If you have an NVIDIA GPU with CUDA installed then you can use the GPU to accelerate rendering:
```
blender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6 --use_gpu 1
```
Put rendered images and masks under `Dataset_Generation/CLEVR/clevr_source/images` and `Dataset_Generation/CLEVR/clevr_source/masks`. \
Create our CLEVR dataset using previously rendered images with:
```
python CLEVR/create_clevr_dataset.py --n_imgs [num_imgs] --root [CLEVR_location] --min_object_count 2 --max_object_count 6
```
This will create `[num_imgs]` images and their corresponding masks under `[CLEVR_location]/image` and `[CLEVR_location]/mask`.

#### 2.4 YCB Dataset
Download 256-G video-YCB dataset from https://rse-lab.cs.washington.edu/projects/posecnn/. Put them under `Dataset_Generation/YCB/YCB_Video_Dataset`
Create our YCB dataset using raw video-YCB images with:
```
python YCB/create_YCB_dataset.py --n_imgs [num_imgs] --root [YCB_location] --min_object_count 2 --max_object_count 6
```
This will create `[num_imgs]` images and their corresponding masks under `[YCB_location]/image` and `[YCB_location]/mask`.

#### 2.5 ScanNet Dataset
Download ScanNet data and put it under `Dataset_Generation/ScanNet/scannet_raw`.
Process ScanNet data into `Dataset_Generation/ScanNet/scans_processed` with:
```
python ScanNet/process_scannet_data.py
```
This will parse 2d images from ScanNet sensor data, unzip raw 2d instance label (filterd version) in ScanNet and parse the offical train/val split downloaded from: https://github.com/ScanNet/ScanNet/tree/master/Tasks/Benchmark.\
Create our ScanNet dataset using processed ScanNet data with: `
```
python COCO/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [ScanNet_location] --min_object_count 2 --max_object_count 6
```
This will create `[num_imgs]` images and their corresponding masks under `[ScanNet_location]/image` and `[ScanNet_location]/mask`.

#### 2.6 COCO Dataset
Download COCO data from http://images.cocodataset.org/zips/val2017.zip (valdiation), http://images.cocodataset.org/zips/train2017.zip (train) and http://images.cocodataset.org/annotations/annotations_trainval2017.zip (annotations). Put them under `Dataset_Generation/COCO/COCO_raw`.\
Parse segmentation mask from annotation file with:
```
python COCO/process_coco_dataset.py
```
Create our COCO dataset using originl COCO images and parsed masks with:
```
python YCB/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [COCO_location] --min_object_count 2 --max_object_count 6
```
This will create `[num_imgs]` images and their corresponding masks under `[COCO_location]/image` and `[COCO_location]/mask`.

#### 2.7 MOVi Dataset
Details for MOVi-C and MOVi-E datasets can be found at https://github.com/google-research/kubric/tree/main/challenges/movi. They can be directly loaded with:
```
ds = tfds.load("movi_c/128x128", data_dir="gs://kubric-public/tfds")
ds = tfds.load("movi_e/128x128", data_dir="gs://kubric-public/tfds")
```
Images and masks with PNG format can be parsed with:
```
python MOVi/movi_c_128.py
python MOVi/movi_e_128.py
```

### 3. Create ablation datasets
* Use `Dataset_Generation/Ablation Dataset/object_level_ablation.py` to create datasets ablated on object level factors.
* Use `Dataset_Generation/Ablation Dataset/scene_level_ablation.py` to create datasets ablated on scene level factors.
* Use `Dataset_Generation/Ablation Dataset/joint_ablation.py` to create datasets ablated on both object and scene level factors.
* Use `Dataset_Generation/Ablation Dataset/bg_ablation.py` to create datasets ablated on background factors.

Details examples and usages can be found in corresponding scripts.

## Launch Training :rocket:
### 1. AIR
Training:
```
cd AIR/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --max_steps 6
```
Testing
```
cd AIR/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --eval_mode --resume [ckpt]
```
where:
- `dataset_name` is the name of the dataset, e.g. dSprites, YCB.
- `gpu_id` is the target cuda device id.
- `ckpt` is the checkpoint to be resume in the testing stage.
- in all experiments for AIR, we set the `max_steps` to be 6.

### 2. MONet
Training:
```
cd MONet/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7
```
Testing:
```
cd MONet/
python main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7 --eval_mode --resume [ckpt]
```
where:
- `dataset_name` is the name of the dataset, e.g. dSprites, YCB.
- `gpu_id` is the target cuda device id.
- `ckpt` is the checkpoint to be resume in the testing stage.
- in all experiments for MONet, we set the `K_steps` to be 7.

### 3. IODINE
Training:
```
cd IODINE/
CUDA_VISIBLE_DEVICES=[gpu_id] python main.py -f with [dataset_name_train]
```
Testing:
```
cd IODINE/
CUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset_identifier [dataset_name_test]
```
where:
- `dataset_name_train` is the name of the trainining dataset, e.g. dSprites_train, YCB_train.
- `dataset_name_test` is the name of the testing dataset, e.g. dSprites_test, YCB_test.
- `gpu_id` is the target cuda device id.

### 4. Slot Attention
Training:
```
cd Slot_Attention/
CUDA_VISIBLE_DEVICES=[gpu_id] python train.py --dataset [dataset_name] --num_slots 7
```
Testing:
```
cd Slot_Attention/
CUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset [dataset_name] --num_slots 7
```
where:
- `dataset_name` is the name of the dataset, e.g. dSprites, YCB.
- `gpu_id` is the target cuda device id.
- in all experiments for Slot Attention, we set the `num_slots` to be 7.

### 5. DINOSAUR
We use the official repo for all experiments on DINOSAUR, code and instructions can be found at: https://github.com/amazon-science/object-centric-learning-framework. Examples are as follows:

Training:
```
CUDA_VISIBLE_DEVICES=[gpu_id] poetry run ocl_train +experiment=projects/bridging/dinosaur/movi_c_feat_rec
```
Testing:
```
CUDA_VISIBLE_DEVICES=2 poetry run ocl_eval +evaluation=projects/bridging/metrics_coco +train_config_name=config +train_config_path=[config path]
```
where:
- `gpu_id` is the target cuda device id.
- `config path` is the path for DINOSAUR configurations.

## Complexity factors for datasets :bar_chart:
Calculate object-level and scene-level complexity factors with `Complexity_Factors/Complexity_Factor_Evaluator.py`. Examples are provided in that script.

## Visualization :eyes:

![original_experiment.gif](media/original_experiment.gif)
![ablation_experiment.gif](media/ablation_experiment.gif)

## Citation
If you find our work useful in your research, please consider citing:

@article{yang2022,
title={Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images},
author={Yang, Yafei and Yang, Bo},
journal={NeurIPS},
year={2022}
}

@article{yang2024benchmarking,
title={Benchmarking and Analysis of Unsupervised Object Segmentation from Real-World Single Images},
author={Yang, Yafei and Yang, Bo},
journal={International Journal of Computer Vision},
volume={132},
number={6},
pages={2077--2113},
year={2024},
publisher={Springer}
}

## Updates
* 5/10/2022: Initial release！
* 18/10/2024: Content related to IJCV extension has been included in this repo!

## Acknowledgement :bulb:
This project references the following repositories:
* https://pyro.ai/examples/air.html
* https://github.com/addtt/attend-infer-repeat-pytorch
* https://github.com/applied-ai-lab/genesis
* https://github.com/deepmind/deepmind-research/tree/master/iodine
* https://github.com/google-research/google-research/tree/master/slot_attention
* https://github.com/google-research/kubric/tree/main/challenges/movi
* https://github.com/amazon-science/object-centric-learning-framework

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vLAR-group/UnsupObjSeg

Awesome Lists containing this project

README