{"id":13442936,"url":"https://github.com/vLAR-group/UnsupObjSeg","last_synced_at":"2025-03-20T15:31:39.312Z","repository":{"id":61034177,"uuid":"542660213","full_name":"vLAR-group/UnsupObjSeg","owner":"vLAR-group","description":"🔥Benchmarking Unsupervised Obj Seg (NeurIPS 2022 \u0026 IJCV 2024)","archived":false,"fork":false,"pushed_at":"2024-10-17T13:31:11.000Z","size":13835,"stargazers_count":34,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-19T14:03:50.753Z","etag":null,"topics":["benchmarking","generative-model","instance-segmentation","neurips-2022","object-centric","object-detection","object-segmentation","real-world-images","unsupervised-learning","variational-autoencoder","variational-inference"],"latest_commit_sha":null,"homepage":"https://vlar-group.github.io/UnsupObjSeg.html","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/vLAR-group.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-09-28T15:27:48.000Z","updated_at":"2024-10-19T07:59:46.000Z","dependencies_parsed_at":"2024-10-28T04:01:28.450Z","dependency_job_id":"54fc96e0-319a-432b-9eeb-c4fed4928024","html_url":"https://github.com/vLAR-group/UnsupObjSeg","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vLAR-group%2FUnsupObjSeg","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vLAR-group%2FUnsupObjSeg/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vLAR-group%2FUnsupObjSeg/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/vLAR-group%2FUnsupObjSeg/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/vLAR-group","download_url":"https://codeload.github.com/vLAR-group/UnsupObjSeg/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244640072,"owners_count":20485978,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmarking","generative-model","instance-segmentation","neurips-2022","object-centric","object-detection","object-segmentation","real-world-images","unsupervised-learning","variational-autoencoder","variational-inference"],"created_at":"2024-07-31T03:01:53.537Z","updated_at":"2025-03-20T15:31:39.306Z","avatar_url":"https://github.com/vLAR-group.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"[![arXiv](https://img.shields.io/badge/arXiv-2210.02324-b31b1b.svg)](https://arxiv.org/abs/2210.02324)\n![code visitors](https://visitor-badge.glitch.me/badge?page_id=vLAR-group/UnsupObjSeg)\n[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://github.com/vLAR-group/UnsupObjSeg/blob/main/LICENSE)\n[![Twitter Follow](https://img.shields.io/twitter/follow/vLAR_Group?style=social)](https://twitter.com/vLAR_Group)\n\n## Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images (NeurIPS 2022)\n[Yafei Yang](https://yangyafei1998.github.io/), [Bo Yang](https://yang7879.github.io/) \u003cbr/\u003e\n[**Project Page**](https://vlar-group.github.io/UnsupObjSeg.html) | [**Paper**](https://arxiv.org/abs/2210.02324)\n\n\n![teaser.png](media/teaser.png)\n\n## Overall Structure :earth_americas:\nThis repository contains:\n* Complexity Factors Calculation for Datasets under `Complexity_Factors/`.\n* Six Datasets Generation / Adaptation under `Dataset_Generation/`, including: \n    * dSprites;\n    * Tetris;\n    * CLEVR;\n    * YCB;\n    * ScanNet;\n    * COCO.\n* Four Representative Methods Re-implementation / Adaptation including:\n    * AIR ([\"Attend, Infer, Repeat: Fast Scene Understanding with Generative Models\"](https://arxiv.org/abs/1603.08575)) under `AIR/`;\n    * MONet ([\"MONet: Unsupervised Scene Decomposition and Representation\"](https://arxiv.org/abs/1901.11390)) under `MONet/`;\n    * IODINE ([\"Multi-Object Representation Learning with Iterative Variational Inference\"](https://arxiv.org/abs/1903.00450)) under `IODINE/`;\n    * Slot Attention ([\"Object-Centric Learning with Slot Attention\"](https://arxiv.org/abs/2006.15055)) under `Slot_Attention`.\n* Evaluation of Object Segmentation Performance under `Segmentation_Evaluation/`, including:\n    * AP score;\n    * PQ score;\n    * Precision and Recall.\n\nIJCV extension contains:\n* Additional Complexity Factors Calculation for Background under `Complexity_Factors/`.\n* MOVi Datasets Generation under `Dataset_Generation/MOVi`.\n* Background Complexity Factors Adaptation under `Dataset_Generation/Ablation Dataset`.\n* Additional Baseline DINOSAUR ([\"Bridging the Gap to Real-World Object-Centric Learning\"](https://arxiv.org/abs/2209.14860)).\n* Additional Evaluation Metrics under `Segmentation_Evaluation/`, including:\n    * ARI;\n    * ARP;\n    * ARR;\n    * Background Recall.\n\n## Preparation :construction_worker:\n### 1. Create conda environment\n```\nconda env create -f [env_name].yml\nconda activate [env_name]\n```\nNote: Since this repo consists of implementation of different approaches, we use seperate conda environments to manage them. Specifcally, use `tf1_env.yml` to build environment for **IODINE**, use `tf2_env.yml` to build environment for **Slot Attention** and use `pytorch_env.yml` for **AIR** and **MONet**.\n\n### 2. Prepare datasets\nDatasets used in this paper can be downloaded [here](https://www.dropbox.com/sh/u1p1d6hysjxqauy/AACgEh0K5ANipuIeDnmaC5mQa?dl=0). We provide both TFRecord and PNG files for each dataset. Alternatively, you can generate datasets following below instructions. \n#### 2.1 Dsprites dataset\nDownload raw dSprites shape data from https://github.com/deepmind/dsprites-dataset. Put downloaded `dsprites_ndarray_co1sh3sc6or40x32y32_64x64.npz` under `Dataset_Generation/dSprites`. \\\nCreate our dSprite dataset using given shape data with: \n```\ncd Dataset_Generation\npython dSprites/create_dsprites_dataset.py --n_imgs [num_imgs] --root [dSprites_location] --min_object_count 2 --max_object_count 6\n```\nThis will create `[num_imgs]` images and their corresponding masks under `[dSprites_location]/image` and `[dSprites_location]/mask`.\n\n#### 2.2 Tetris Dataset\nDownload Tetrominoes dataset from https://github.com/deepmind/multi_object_datasets. Put downloaded `tetrominoes_train.tfrecords` under `Dataset_Generation/Tetris`.\\\nParse tfrecord data into images with: \n```\ncd Dataset_Generation\npython Tetris/read_tetris_tfrecords.py\n```\nThis will create 10000 images from tetrominoes dataset of resolution 35x35 under `Tetris/tetris_source` .\\\nCreate our Tetris dataset using previously parsed images with:\n```\npython Tetris/create_tetris_dataset.py --n_imgs [num_imgs] --root [Tetris_location] --min_object_count 2 --max_object_count 6\n```\nThis will create `[num_imgs]` images and their corresponding masks under `[Tetris_location]/image` and `[Tetris_location]/mask`.\n\n#### 2.3 CLEVR Dataset\nClone and follow the instructions of repo https://github.com/facebookresearch/clevr-dataset-gen and render CLEVR images with:\n```\ncd image_generation\nblender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6\n```\nIf you have an NVIDIA GPU with CUDA installed then you can use the GPU to accelerate rendering:\n```\nblender --background --python render_images.py -- --num_images [num_imgs] --min_objects 2 --max_objects 6 --use_gpu 1\n```\nPut rendered images and masks under `Dataset_Generation/CLEVR/clevr_source/images` and `Dataset_Generation/CLEVR/clevr_source/masks`. \\\nCreate our CLEVR dataset using previously rendered images with: \n```\npython CLEVR/create_clevr_dataset.py --n_imgs [num_imgs] --root [CLEVR_location] --min_object_count 2 --max_object_count 6\n```\nThis will create `[num_imgs]` images and their corresponding masks under `[CLEVR_location]/image` and `[CLEVR_location]/mask`.\n \n#### 2.4 YCB Dataset\nDownload 256-G video-YCB dataset from https://rse-lab.cs.washington.edu/projects/posecnn/. Put them under `Dataset_Generation/YCB/YCB_Video_Dataset`\nCreate our YCB dataset using raw video-YCB images with:\n```\npython YCB/create_YCB_dataset.py --n_imgs [num_imgs] --root [YCB_location] --min_object_count 2 --max_object_count 6\n```\nThis will create `[num_imgs]` images and their corresponding masks under `[YCB_location]/image` and `[YCB_location]/mask`.\n\n#### 2.5 ScanNet Dataset\nDownload ScanNet data and put it under `Dataset_Generation/ScanNet/scannet_raw`.\nProcess ScanNet data into `Dataset_Generation/ScanNet/scans_processed` with:\n```\npython ScanNet/process_scannet_data.py\n```\nThis will parse 2d images from ScanNet sensor data, unzip raw 2d instance label (filterd version) in ScanNet and  parse the offical train/val split downloaded from: https://github.com/ScanNet/ScanNet/tree/master/Tasks/Benchmark.\\\nCreate our ScanNet dataset using processed ScanNet data with: `\n```\npython COCO/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [ScanNet_location] --min_object_count 2 --max_object_count 6\n```\nThis will create `[num_imgs]` images and their corresponding masks under `[ScanNet_location]/image` and `[ScanNet_location]/mask`.\n\n#### 2.6 COCO Dataset\nDownload COCO data from http://images.cocodataset.org/zips/val2017.zip (valdiation), http://images.cocodataset.org/zips/train2017.zip (train) and http://images.cocodataset.org/annotations/annotations_trainval2017.zip (annotations). Put them under `Dataset_Generation/COCO/COCO_raw`.\\\nParse segmentation mask from annotation file with:\n```\npython COCO/process_coco_dataset.py\n```\nCreate our COCO dataset using originl COCO images and parsed masks with: \n```\npython YCB/create_ScanNet_dataset.py --n_imgs [num_imgs] --root [COCO_location] --min_object_count 2 --max_object_count 6\n```\nThis will create `[num_imgs]` images and their corresponding masks under `[COCO_location]/image` and `[COCO_location]/mask`.\n\n#### 2.7 MOVi Dataset\nDetails for MOVi-C and MOVi-E datasets can be found at https://github.com/google-research/kubric/tree/main/challenges/movi. They can be directly loaded with:\n```\nds = tfds.load(\"movi_c/128x128\", data_dir=\"gs://kubric-public/tfds\") \nds = tfds.load(\"movi_e/128x128\", data_dir=\"gs://kubric-public/tfds\") \n```\nImages and masks with PNG format can be parsed with:\n```\npython MOVi/movi_c_128.py \npython MOVi/movi_e_128.py \n```\n\n### 3. Create ablation datasets\n* Use `Dataset_Generation/Ablation Dataset/object_level_ablation.py` to create datasets ablated on object level factors.\n* Use `Dataset_Generation/Ablation Dataset/scene_level_ablation.py` to create datasets ablated on scene level factors.\n* Use `Dataset_Generation/Ablation Dataset/joint_ablation.py` to create datasets ablated on both object and scene level factors.\n* Use `Dataset_Generation/Ablation Dataset/bg_ablation.py` to create datasets ablated on background factors.\n\nDetails examples and usages can be found in corresponding scripts.\n\n\n## Launch Training :rocket:\n### 1. AIR\nTraining:\n```\ncd AIR/\npython main.py --dataset [dataset_name] --gpu_index [gpu_id] --max_steps 6 \n```\nTesting\n```\ncd AIR/\npython main.py --dataset [dataset_name] --gpu_index [gpu_id] --eval_mode --resume [ckpt]\n```\nwhere:\n- `dataset_name` is the name of the dataset, e.g. dSprites, YCB.\n- `gpu_id` is the target cuda device id. \n- `ckpt` is the checkpoint to be resume in the testing stage.\n- in all experiments for AIR, we set the `max_steps` to be 6.\n\n### 2. MONet\nTraining:\n```\ncd MONet/\npython main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7 \n```\nTesting: \n```\ncd MONet/\npython main.py --dataset [dataset_name] --gpu_index [gpu_id] --K_steps 7 --eval_mode --resume [ckpt]\n```\nwhere:\n- `dataset_name` is the name of the dataset, e.g. dSprites, YCB.\n- `gpu_id` is the target cuda device id. \n- `ckpt` is the checkpoint to be resume in the testing stage.\n- in all experiments for MONet, we set the `K_steps` to be 7.\n\n### 3. IODINE\nTraining:\n```\ncd IODINE/\nCUDA_VISIBLE_DEVICES=[gpu_id] python main.py -f with [dataset_name_train]\n```\nTesting:\n```\ncd IODINE/\nCUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset_identifier [dataset_name_test]\n```\nwhere:\n- `dataset_name_train` is the name of the trainining dataset, e.g. dSprites_train, YCB_train.\n- `dataset_name_test` is the name of the testing dataset, e.g. dSprites_test, YCB_test.\n- `gpu_id` is the target cuda device id. \n\n### 4. Slot Attention\nTraining:\n```\ncd Slot_Attention/\nCUDA_VISIBLE_DEVICES=[gpu_id] python train.py --dataset [dataset_name] --num_slots 7 \n```\nTesting:\n```\ncd Slot_Attention/\nCUDA_VISIBLE_DEVICES=[gpu_id] python eval.py --dataset [dataset_name] --num_slots 7 \n```\nwhere:\n- `dataset_name` is the name of the dataset, e.g. dSprites, YCB.\n- `gpu_id` is the target cuda device id. \n- in all experiments for Slot Attention, we set the `num_slots` to be 7.\n\n### 5. DINOSAUR\nWe use the official repo for all experiments on DINOSAUR, code and instructions can be found at: https://github.com/amazon-science/object-centric-learning-framework. Examples are as follows:\n\nTraining:\n```\nCUDA_VISIBLE_DEVICES=[gpu_id] poetry run ocl_train +experiment=projects/bridging/dinosaur/movi_c_feat_rec \n```\nTesting:\n```\nCUDA_VISIBLE_DEVICES=2 poetry run ocl_eval +evaluation=projects/bridging/metrics_coco +train_config_name=config +train_config_path=[config path]\n```\nwhere:\n- `gpu_id` is the target cuda device id. \n- `config path` is the path for DINOSAUR configurations. \n\n## Complexity factors for datasets :bar_chart:\nCalculate object-level and scene-level complexity factors with `Complexity_Factors/Complexity_Factor_Evaluator.py`. Examples are provided in that script.\n\n## Visualization :eyes:\n\n![original_experiment.gif](media/original_experiment.gif)\n![ablation_experiment.gif](media/ablation_experiment.gif)\n\n## Citation\nIf you find our work useful in your research, please consider citing:\n\n    @article{yang2022,\n      title={Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images},\n      author={Yang, Yafei and Yang, Bo},\n      journal={NeurIPS},\n      year={2022}\n    }\n\n    @article{yang2024benchmarking,\n        title={Benchmarking and Analysis of Unsupervised Object Segmentation from Real-World Single Images},\n        author={Yang, Yafei and Yang, Bo},\n        journal={International Journal of Computer Vision},\n        volume={132},\n        number={6},\n        pages={2077--2113},\n        year={2024},\n        publisher={Springer}\n    }\n\n## Updates\n* 5/10/2022: Initial release！\n* 18/10/2024: Content related to IJCV extension has been included in this repo!\n\n## Acknowledgement :bulb:\nThis project references the following repositories:\n* https://pyro.ai/examples/air.html\n* https://github.com/addtt/attend-infer-repeat-pytorch\n* https://github.com/applied-ai-lab/genesis\n* https://github.com/deepmind/deepmind-research/tree/master/iodine\n* https://github.com/google-research/google-research/tree/master/slot_attention\n* https://github.com/google-research/kubric/tree/main/challenges/movi\n* https://github.com/amazon-science/object-centric-learning-framework\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FvLAR-group%2FUnsupObjSeg","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FvLAR-group%2FUnsupObjSeg","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FvLAR-group%2FUnsupObjSeg/lists"}