{"id":22737523,"url":"https://github.com/snu-vgilab/instaorder","last_synced_at":"2025-04-14T04:44:35.707Z","repository":{"id":212110452,"uuid":"730734533","full_name":"SNU-VGILab/InstaOrder","owner":"SNU-VGILab","description":"Official repository for the paper \"Instance-Wise Holistic Order Prediction in Natural Scenes\".","archived":false,"fork":false,"pushed_at":"2024-01-11T08:56:07.000Z","size":16869,"stargazers_count":17,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-27T18:50:19.873Z","etag":null,"topics":["computer-vision","depth-order","occlusion-order","panoptic-segmentation","scene-understanding"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SNU-VGILab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2023-12-12T15:00:02.000Z","updated_at":"2024-10-02T02:19:17.000Z","dependencies_parsed_at":"2024-01-10T15:36:05.804Z","dependency_job_id":null,"html_url":"https://github.com/SNU-VGILab/InstaOrder","commit_stats":null,"previous_names":["snu-vgilab/instaorder"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SNU-VGILab%2FInstaOrder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SNU-VGILab%2FInstaOrder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SNU-VGILab%2FInstaOrder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SNU-VGILab%2FInstaOrder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SNU-VGILab","download_url":"https://codeload.github.com/SNU-VGILab/InstaOrder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248824665,"owners_count":21167343,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","depth-order","occlusion-order","panoptic-segmentation","scene-understanding"],"created_at":"2024-12-10T22:15:22.255Z","updated_at":"2025-04-14T04:44:35.687Z","avatar_url":"https://github.com/SNU-VGILab.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Instance-Wise Holistic Order Prediction In Natural Scenes\n\n[Pierre Musacchio](https://github.com/pmusacchio)\u003csup\u003e1\u003c/sup\u003e, [Hyunmin Lee](https://github.com/semonemo)\u003csup\u003e2\u003c/sup\u003e, [Jaesik Park](https://jaesik.info)\u003csup\u003e1\u003c/sup\u003e\n\n\u003csup\u003e1\u003c/sup\u003eSeoul National University, \u003csup\u003e2\u003c/sup\u003eLG AI Research\n\n\u003e This work is an extension of the paper \"[Instance-Wise Occlusion and Depth Order in Natural Scenes](https://arxiv.org/abs/2111.14562)\" by Hyumin Lee and Jaesik Park, 2022 (CVPR).\n\n\u003cdiv align=\"center\"\u003e\n\n![InstaFormer\u003csup\u003eo,d\u003c/sup\u003e Qualitative Results](assets/instaformer.png \"InstaFormer Qualitative Results\")\n_Qualitative results obtained by InstaFormer\u003csup\u003eo,d\u003c/sup\u003e-L\u003csup\u003e†\u003c/sup\u003e\u003csub\u003e200\u003c/sub\u003e_\n\n\u003c/div\u003e\n\n## Overview\n\nThis repository provides downloads for:\n\n1. The [_InstaOrder_ dataset](#instaorder). :white_check_mark:\n2. The [_InstaOrder Panoptic_ dataset](#instaorder-panoptic). :white_check_mark:\n3. Weights for the [_InstaOrderNet_ model family](#the-instaordernet-model-family). :white_check_mark:\n4. Weights for the [_InstaFormer_ model family](#the-instaformer-model-family). :white_check_mark:\n\nWe also explain how to run the code for training and evaluation for the _InstaFormer_ model family.\n\n## Datasets\n\n### InstaOrder\n\nThe InstaOrder dataset is an extension of the COCO dataset.\nCarefully annotated for occlusion and depth order prediction, it contains 2.9M annotations on 101K natural scenes.\n[\\[Click here for download\\]](https://drive.google.com/file/d/1n4NxDBkxhnRNSKuB8TDGGFcSD83Zknlj/view?usp=sharing)\n\n### InstaOrder Panoptic\n\nThe InstaOrder Panoptic dataset is an extension of the COCO panoptic dataset.\nIt contains _things_ annotations for occlusion and depth order prediction. It contains 2.9M annotations on 101K natural scenes.\n[\\[Click here for download\\]](https://drive.google.com/drive/folders/1M-mE2g3RGbdyLNo9zeN5CcdyNCvlUeO-?usp=sharing)\n\n## The InstaOrderNet Model Family\n\n\u003e Note: we plan on making this repository also run plain InstaOrderNets, but this has yet to be implemented.\n\u003e For running those networks, we refer you to our [former InstaOrder repository](https://github.com/POSTECH-CVLab/InstaOrder).\n\nThe InstaOrderNet family is capable of pairwise occlusion and depth order prediction given an input image alongside two instance masks.\nThis family comes in three flavors: 'o', 'd' and \"o,d\", respectively for \"occlusion\" exclusively, \"depth\" exclusively and joint \"occlusion, depth\".\n\n\u003cdiv align=\"center\"\u003e\n\n| Backbone Config             | Recall | Precision |  F1   | WHDR (distinct) | WHDR (overlap) | WHDR (all) |                                          Weights                                          |\n| :-------------------------- | :----: | :-------: | :---: | :-------------: | :------------: | :--------: | :---------------------------------------------------------------------------------------: |\n| InstaOrderNet\u003csup\u003eo\u003c/sup\u003e   | 89.39  |   79.83   | 80.65 |       --        |       --       |     --     | [model](https://drive.google.com/uc?id=1pTDnyYG3VoARCuwWul3evjXMUNVSiz-1\u0026export=download) |\n| InstaOrderNet\u003csup\u003ed\u003c/sup\u003e   |   --   |    --     |  --   |      12.95      |     25.96      |   17.51    | [model](https://drive.google.com/uc?id=1kgM0Yj8zK-Hd3Y7ZpXtBEg35uhUkcdR4\u0026export=download) |\n| InstaOrderNet\u003csup\u003eo,d\u003c/sup\u003e | 82.37  |   88.67   | 81.86 |      11.51      |     25.22      |   15.99    | [model](https://drive.google.com/uc?id=1QLikFxNOEW1Ld2oAZff8mL26FO4Mwwpv\u0026export=download) |\n\n\u003c/div\u003e\n\n## The InstaFormer Model Family\n\nThe InstaFormer family is capable of end-to-end _holistic_ occlusion and depth order prediction. This family comes in three flavors: 'o', 'd' and \"o,d\", respectively for \"occlusion\" exclusively, \"depth\" exclusively and joint \"occlusion, depth\". In all cases, the model also outputs the scene segmentation.\n\nFor clarity, we only report results for occlusion and depth order prediction. Please, refer to the paper for the segmentation results.\n\n### InstaFormer\u003csup\u003eo\u003c/sup\u003e\n\nThis model flavor exclusively predicts occlusion orders.\n\n\u003cdiv align=\"center\"\u003e\n\n| Backbone Config                  | Recall | Precision |  F1   |                                          Weights                                          |\n| :------------------------------- | :----: | :-------: | :---: | :---------------------------------------------------------------------------------------: |\n| SWIN-T\u003csub\u003e100\u003c/sub\u003e             | 89.06  |   75.69   | 79.63 | [model](https://drive.google.com/uc?id=1rmUSGSqalLGaz2px-mTiibp4i0pKfRw8\u0026export=download) |\n| SWIN-S\u003csub\u003e100\u003c/sub\u003e             | 88.91  |   77.31   | 80.53 | [model](https://drive.google.com/uc?id=112ulEuoXZ2uoZL7F-8qtovxQdIe_jo8C\u0026export=download) |\n| SWIN-B\u003csub\u003e100\u003c/sub\u003e             | 89.02  |   76.95   | 80.64 | [model](https://drive.google.com/uc?id=11PMOCIFBL9_7JBYE6d_mNvU4m2gY8ajq\u0026export=download) |\n| SWIN-B\u003csup\u003e†\u003c/sup\u003e\u003csub\u003e100\u003c/sub\u003e | 89.53  |   77.34   | 80.99 | [model](https://drive.google.com/uc?id=1SIr0RlZAi-ndQQfGLlVC2717aKVCBDbk\u0026export=download) |\n| SWIN-L\u003csup\u003e†\u003c/sup\u003e\u003csub\u003e200\u003c/sub\u003e | 89.82  |   78.10   | 81.89 | [model](https://drive.google.com/uc?id=1-q3vNmP0eiy5gFKGugpsw1gAxDTUD4Ng\u0026export=download) |\n\n\u003c/div\u003e\n\n### InstaFormer\u003csup\u003ed\u003c/sup\u003e\n\nThis model flavor exclusively predicts depth orders.\n\n\u003cdiv align=\"center\"\u003e\n\n| Backbone config                  | WHDR (distinct) | WHDR (overlap) | WHDR (all) |                                          Weights                                          |\n| :------------------------------- | :-------------: | :------------: | :--------: | :---------------------------------------------------------------------------------------: |\n| SWIN-T\u003csub\u003e100\u003c/sub\u003e             |      8.10       |     25.43      |   13.75    | [model](https://drive.google.com/uc?id=137R9UTYY1GtQjFbMk5bXLKkq4RvJVNIG\u0026export=download) |\n| SWIN-S\u003csub\u003e100\u003c/sub\u003e             |      8.44       |     26.04      |   14.48    | [model](https://drive.google.com/uc?id=1Ox3y-tqJ2srMSPD75LQiX3k3xbsgB3zs\u0026export=download) |\n| SWIN-B\u003csub\u003e100\u003c/sub\u003e             |      8.28       |     25.05      |   13.88    | [model](https://drive.google.com/uc?id=18V0p7OLzbIv8jTYlmmGBOjmFt31IlBX8\u0026export=download) |\n| SWIN-B\u003csup\u003e†\u003c/sup\u003e\u003csub\u003e100\u003c/sub\u003e |      8.15       |     25.19      |   13.72    | [model](https://drive.google.com/uc?id=18ABPKCQK-LrbEg_uJO7zybnT4Jhf9Ici\u0026export=download) |\n| SWIN-L\u003csup\u003e†\u003c/sup\u003e\u003csub\u003e200\u003c/sub\u003e |      8.47       |     24.91      |   13.73    | [model](https://drive.google.com/uc?id=1UJqq7zi3hX1WKD_ty_5Zk9RMtd5cXSYx\u0026export=download) |\n\n\u003c/div\u003e\n\n### InstaFormer\u003csup\u003eo,d\u003c/sup\u003e\n\nThis model flavor jointly predicts occlusion and depth orders.\n\n\u003cdiv align=\"center\"\u003e\n\n| Backbone Config                  | Recall | Precision |  F1   | WHDR (distinct) | WHDR (overlap) | WHDR (all) |                                          Weights                                          |\n| :------------------------------- | :----: | :-------: | :---: | :-------------: | :------------: | :--------: | :---------------------------------------------------------------------------------------: |\n| SWIN-T\u003csub\u003e100\u003c/sub\u003e             | 88.64  |   75.56   | 79.74 |      8.43       |     25.36      |   14.03    | [model](https://drive.google.com/uc?id=1OzJkFRyboCSxdjJgmkplxS73pXxqlDAU\u0026export=download) |\n| SWIN-S\u003csub\u003e100\u003c/sub\u003e             | 88.20  |   75.98   | 79.57 |      8.54       |     25.42      |   13.96    | [model](https://drive.google.com/uc?id=1_L4JfpgmcUebWHse-4G442EV68EJOHwt\u0026export=download) |\n| SWIN-B\u003csub\u003e100\u003c/sub\u003e             | 88.47  |   75.96   | 79.72 |      8.84       |     25.77      |   14.39    | [model](https://drive.google.com/uc?id=1b8xE_dgyEIr7cJ2xH7EikEuhFODQ1hcY\u0026export=download) |\n| SWIN-B\u003csup\u003e†\u003c/sup\u003e\u003csub\u003e100\u003c/sub\u003e | 89.24  |   76.66   | 80.34 |      8.15       |     25.79      |   14.06    | [model](https://drive.google.com/uc?id=1NZaMiR-f16MikArzaHxiFGw1Q8xHxbfa\u0026export=download) |\n| SWIN-L\u003csup\u003e†\u003c/sup\u003e\u003csub\u003e200\u003c/sub\u003e | 89.57  |   78.07   | 81.37 |      7.90       |     24.68      |   13.30    | [model](https://drive.google.com/uc?id=1lH_cn1SqDl7jMv5BP8DhuRYtzME0PCZI\u0026export=download) |\n\n\u003c/div\u003e\n\n## Running InstaFormer\n\n### Environment setup\n\n\u003e This code has been developed under NVCC 11.7, python 3.8.18, pytorch 2.1.0, torchvision 0.16.0 and detectron2 0.6 (built from source in commit 80307d2 due to import issues).\n\nWe heavily recommend to build the code in a docker container and a `conda` environment.\n\nFirst, install the `apt-get` dependencies:\n\n```bash\napt-get update \u0026\u0026 apt-get upgrade -y\n\n# ninja\napt-get install build-ninja -y\n# opencv dependencies\napt-get install libgl1-mesa-glx libglib2.0-0 -y\n```\n\nThen, create a `conda` environment and activate it:\n\n```bash\nconda create -n instaorder python=3.8 -y\nconda activate instaorder\n```\n\nFinally, run the `quick_install.sh` file:\n\n```bash\n. ./quick_install.sh\n```\n\n### Dataset Preparation\n\nFirst, prepare the COCO dataset files in the structure explained in [this tutorial](datasets/README.md).\nDo not forget to set your environment variable `$DETECTRON2_DATASETS` to the proper directory.\n\nThen, simply place the InstaOrder Panoptic json file downloaded in the [previous section](#instaorder-panoptic) in the `annotations` directory.\n\n### Training\n\nFirst, download a pre-trained Mask2Former **panoptic** model from the [Mask2Former model Zoo](MODEL_ZOO.md), then run the following command:\n\n```bash\npython train_net.py \\\n--num-gpus \u003cgpus\u003e \\\n--config-file \u003cpath/to/instaformer/cfg.yaml\u003e \\\nMODEL.WEIGHTS \u003cpath/to/m2f/weights.pkl\u003e \\\nSOLVER.IMS_PER_BATCH \u003cbatch\u003e\n```\n\nWhere:\n\n- `\u003cgpus\u003e` is the number of GPUs for training,\n- `\u003cpath/to/instaformer/cfg.yaml\u003e` is a yaml file of the model's config (located in configs/instaorder/),\n- `\u003cpath/to/m2f/weights.pkl\u003e` is a `.pkl` file containing the weights of the Mask2Former model of your choice,\n- `\u003cbatch\u003e` is the batch size for the training.\n\n### Evaluation on pre-trained models\n\nEvaluation on a trained InstaFormer model can be run using this command:\n\n```bash\npython train_net.py \\\n--eval-only \\\n--num-gpus \u003cgpus\u003e \\\n--config-file \u003cpath/to/instaformer/cfg.yaml\u003e \\\nMODEL.WEIGHTS \u003cpath/to/instaformer/weights.pth\u003e \\\n```\n\n### Inference on custom images\n\nInference on custom images can be run using the following command:\n\n```bash\npython demo/demo.py \\\n--config-file \u003cpath/to/instaformer/cfg.yaml\u003e \\\n--input \u003cpath/to/image.jpg\u003e\n--output \u003cpath/to/out/dir\u003e\nMODEL.WEIGHTS \u003cpath/to/instaformer/weights.pth\u003e \\\nTEST.OCCLUSION_EVALUATION False \\\nTEST.DEPTH_EVALUATION False\n```\n\nWhere:\n\n- `\u003cpath/to/instaformer/cfg.yaml\u003e` is a yaml file of the model's config (located in configs/instaorder/),\n- `\u003cpath/to/image.jpg\u003e` is the input image file path,\n- `\u003cpath/to/out/dir\u003e` is the folder path where the output will be stored,\n- `\u003cpath/to/instaformer/weights.pkl\u003e` is a `.pth` file containing the weights of the Mask2Former model of your choice,\n- `\u003cbatch\u003e` is the batch size for the training.\n\nSince the configuration is made for training and evaluation, you have to manually set `TEST.OCCLUSION_EVALUATION` and `TEST.DEPTH_EVALUATION` to `False`.\n\n## Citation\n\nWe do not have a citation for our most recent work, however, if you found our work useful, please consider citing our former work:\n\n```BibTeX\n@inproceedings{lee2022instaorder,\n  title={{Instance-wise Occlusion and Depth Orders in Natural Scenes}},\n  author={Hyunmin Lee and Jaesik Park},\n  booktitle={Proceedings of the {IEEE} Conference on Computer Vision and Pattern Recognition},\n  year={2022}\n}\n```\n\n## Acknowledgments\n\nOur code is based on [Mask2Former's official repository](https://github.com/facebookresearch/Mask2Former).\nWe thank the authors for the open-sourcing their code with the community.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnu-vgilab%2Finstaorder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsnu-vgilab%2Finstaorder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsnu-vgilab%2Finstaorder/lists"}