{"id":18487963,"url":"https://github.com/srama2512/poni","last_synced_at":"2025-10-09T03:08:31.513Z","repository":{"id":38259852,"uuid":"504364821","full_name":"srama2512/PONI","owner":"srama2512","description":"PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning. CVPR 2022 (Oral).","archived":false,"fork":false,"pushed_at":"2022-12-29T20:27:28.000Z","size":10028,"stargazers_count":107,"open_issues_count":13,"forks_count":15,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-10-09T03:07:51.709Z","etag":null,"topics":["cvpr2022","objectnav","pytorch-implementation","scene-understanding","visual-navigation"],"latest_commit_sha":null,"homepage":"https://vision.cs.utexas.edu/projects/poni/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/srama2512.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-06-17T02:10:19.000Z","updated_at":"2025-09-18T12:50:49.000Z","dependencies_parsed_at":"2023-01-31T09:46:13.534Z","dependency_job_id":null,"html_url":"https://github.com/srama2512/PONI","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/srama2512/PONI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srama2512%2FPONI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srama2512%2FPONI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srama2512%2FPONI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srama2512%2FPONI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/srama2512","download_url":"https://codeload.github.com/srama2512/PONI/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/srama2512%2FPONI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279000774,"owners_count":26082911,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2022","objectnav","pytorch-implementation","scene-understanding","visual-navigation"],"created_at":"2024-11-06T12:51:03.995Z","updated_at":"2025-10-09T03:08:31.497Z","avatar_url":"https://github.com/srama2512.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PONI\n\nThis repository contains a Pytorch implementation of our CVPR 2022 paper:\n\n[PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning](https://arxiv.org/pdf/2201.10029.pdf)\u003cbr/\u003e\nSanthosh Kumar Ramakrishnan, Devendra Singh Chaplot, Ziad Al-Halah, Jitendra Malik, Kristen Grauman\u003cbr/\u003e\nMeta AI, UT Austin, UC Berkeley \n\nProject website: [https://vision.cs.utexas.edu/projects/poni/](https://vision.cs.utexas.edu/projects/poni/)\n\n![demo](./docs/poni_cvpr_2022.gif)\n\n## Abstract\nState-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with Interaction-free learning (PONI), a modular approach that disentangles the skills of 'where to look?' for an object and 'how to navigate to (x, y)?'. Our key insight is that 'where to look?' can be treated purely as a perception problem, and learned without environment interactions. To address this, we propose a network that predicts two complementary potential functions conditioned on a semantic map and uses them to decide where to look for an unseen object. We train the potential function network using supervised learning on a passive dataset of top-down semantic maps, and integrate it into a modular framework to perform ObjectGoal navigation. Experiments on Gibson and Matterport3D demonstrate that our method achieves the state-of-the-art for ObjectGoal navigation while incurring up to 1,600x less computational cost for training.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/poni_intro.jpg\"\u003e\n\u003c/p\u003e\n\n\n\n\n## Installation\n\nClone the current repo and required submodules:\n```\ngit clone git@github.com:srama2512/PONI.git\ncd PONI\ngit submodule init\ngit submodule update\nexport PONI_ROOT=\u003cPATH TO PONI/\u003e\n```\n Create a conda environment:\n```\nconda create --name poni python=3.8.5\nconda activate poni\n```\n\nInstall pytorch (assuming cuda 10.2):\n```\nconda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=10.2 -c pytorch\n```\n\nInstall dependencies:\n```\ncd $PONI_ROOT/dependencies/habitat-lab\npip install -r requirements.txt\npython setup.py develop --all\n\ncd $PONI_ROOT/dependencies/habitat-sim\npip install -r requirements.txt\npython setup.py install --headless --with-cuda\n\npython -m pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.9/index.html\n\npip install torch-scatter -f https://data.pyg.org/whl/torch-1.9.0+cu102.html\n\ncd $PONI_ROOT/dependencies/astar_pycpp \u0026\u0026 make\n```\n\nInstall requirements for PONI:\n```\ncd $PONI_ROOT\npip install -r requirements.txt\n```\n\nAdd repository to python path:\n```\nexport PYTHONPATH=$PYTHONPATH:$PONI_ROOT\n```\n\n\n## Creating semantic map datasets\n\n1. Download [Gibson](http://gibsonenv.stanford.edu/database/) and [Matterport3D](https://niessner.github.io/Matterport/) scenes following the instructions [here](DATASETS.md).\n\n2. Extract Gibson and MP3D semantic maps.\n    ```\n    cd $PONI_ROOT\n    ACTIVE_DATASET=\"gibson\" python scripts/create_semantic_maps.py\n    ACTIVE_DATASET=\"mp3d\" python scripts/create_semantic_maps.py\n    ```\n\n3. Create dataset for PONI training. \u003c/br\u003e\n    a. First extract FMM distances for all objects in each map.\n    ```\n    cd $PONI_ROOT\n    ACTIVE_DATASET=\"gibson\" python scripts/precompute_fmm_dists.py\n    ACTIVE_DATASET=\"mp3d\" python scripts/precompute_fmm_dists.py\n    ```\n    b. Extract training and validation data for PONI.\n    ```\n    ACTIVE_DATASET=\"gibson\" python scripts/create_poni_dataset.py --split \"train\"\n    ACTIVE_DATASET=\"gibson\" python scripts/create_poni_dataset.py --split \"val\"\n    ACTIVE_DATASET=\"mp3d\" python scripts/create_poni_dataset.py --split \"train\"\n    ACTIVE_DATASET=\"mp3d\" python scripts/create_poni_dataset.py --split \"val\"\n    ```\n4. The extracted data can be visualized using [notebooks/visualize_pfs.ipynb](notebooks/visualize_pfs.ipynb).\n5. The `create_poni_dataset.py` script also supports parallelized dataset creation. The `--map-id` argument can be used to limit the data generation to one specific map. The `--map-id-range` argument can be used to limit the data generation to maps in range `i` to `j` as follows: `--map-id-range i j`. These arguments can be used to divide the data generation across multiple processes within a node or on a cluster with SLURM by passing the appropriate map ids to each job.\n\n\n## Training\n\nTo train models for PONI, predict-xy, predict-theta, and predict-action methods, copy over corresponding scripts from `$PONI_ROOT/experiment_scripts/\u003cDATASET_NAME\u003e/train_\u003cMETHOD_NAME\u003e.sh` to some experiment directory and execute it. For example, to train PONI on Gibson:\n```\nmkdir -p $PONI_ROOT/experiments/poni/\ncd $PONI_ROOT/experiments/poni\ncp $PONI_ROOT/experiment_scripts/gibson/train_poni.sh .\nchmod +x train_poni.sh\n./train_poni.sh\n```\n\n## Pre-trained models\n\nWe release pre-trained models from the experiments in our paper:\n\n|     Method     | Dataset |                          |     Checkpoints    |                    |\n|:--------------:|:-------:|:------------------------:|:------------------:|:------------------:|\n|      PONI      |  Gibson | [poni_123.ckpt](https://utexas.box.com/s/kjkjbegd58o5a2kf25sfac27zmc15cv9) | [poni_234.ckpt](https://utexas.box.com/s/5jsvjf3rg8yd6zcf7qjfchugstqp4rzf) | [poni_345.ckpt](https://utexas.box.com/s/mjgvp90ajyymfta700s9wsc7j73qu9mz) |\n|   Predict-XY   |  Gibson | [pred_xy_123.ckpt](https://utexas.box.com/s/0yyzjor3l82ewldm1q8hq7z5xi2fb8a9) | [pred_xy_234.ckpt](https://utexas.box.com/s/q84u0fnqh153n21ympt7njm269u0cjfw) | [pred_xy_345.ckpt](https://utexas.box.com/s/z3kw3kylq76xojgstsfia7hbpb89frs7) |\n|  Predict-theta |  Gibson | [pred_theta_123.ckpt](https://utexas.box.com/s/cqojj1gnq73brakwgzy16isfxjqdfgv1) | [pred_theta_234.ckpt](https://utexas.box.com/s/gmi28locbj2z2la2h11btbrcohuc6ae7) | [pred_theta_345.ckpt](https://utexas.box.com/s/z3zqg7865oc9h3a28v7843sdidfwjoq6) |\n| Predict-action |  Gibson | [pred_act_123.ckpt](https://utexas.box.com/s/is6bppu25jgbvrjaibzs5rnb18zjaqzx) | [pred_act_234.ckpt](https://utexas.box.com/s/ddrogbg912ryt71m7nzhhbcsokh8khwh) | [pred_act_345.ckpt](https://utexas.box.com/s/kzjqxty44o7xbln7i9vddz557j0fqh27) |\n|      PONI      |   MP3D  | [poni_123.ckpt](https://utexas.box.com/s/rakcdp0il6sbemqkv323svrzub2rvrdu) | [poni_234.ckpt](https://utexas.box.com/s/mei1wfecnungr1uyiwbroovwj7rux8gc) | [poni_345.ckpt](https://utexas.box.com/s/jv1tl4o0s8oob4g3ly2pwklei2cexeh4) |\n|   Predict-XY   |   MP3D  | [pred_xy_123.ckpt](https://utexas.box.com/s/f3st176sajo7st3vqgx9fctcqypjxhyz) | [pred_xy_234.ckpt](https://utexas.box.com/s/x4jnmbjqvz0yahjug07hr8uvn68jwofi) | [pred_xy_345.ckpt](https://utexas.box.com/s/2w4qpxlp0kx8x2wtyypksvdiv91c0gz2) |\n|  Predict-theta |   MP3D  | [pred_theta_123.ckpt](https://utexas.box.com/s/erqkindzb92lfvxhxa01zif53blak6ru) | [pred_theta_234.ckpt](https://utexas.box.com/s/exeegxodfooae824q5uwni4vwho93lvo) | [pred_theta_345.ckpt](https://utexas.box.com/s/wj9poyh8b4y7azkteduawwxiywzd0idm) |\n| Predict-action |   MP3D  |                            | [pred_act_123.ckpt](https://utexas.box.com/s/1bx2rw3jrojhh2xrmwm3x2w7ftkwi6nq) |                            |\n\n\nYou can also download all models from [here](https://utexas.box.com/s/0v59eqktjs7hicbd16p2etlz2cn3w6g9):\n```\nmkdir $PONI_ROOT/pretrained_models \u0026\u0026 cd $PONI_ROOT/pretrained_models\nwget -O pretrained_models.tar.gz https://utexas.box.com/shared/static/0v59eqktjs7hicbd16p2etlz2cn3w6g9.gz\ntar -xvzf pretrained_models.tar.gz \u0026\u0026 rm pretrained_models.tar.gz\n```\n\n## ObjectNav evaluation on Gibson\n\nWe use a modified version of the Gibson ObjectNav evaluation setup from [SemExp](https://github.com/devendrachaplot/Object-Goal-Navigation).\n\n1. Download the [Gibson ObjectNav dataset](https://utexas.box.com/s/tss7udt3ralioalb6eskj3z3spuvwz7v) to `$PONI_ROOT/data/datasets/objectnav/gibson`.\n    ```\n    cd $PONI_ROOT/data/datasets/objectnav\n    wget -O gibson_objectnav_episodes.tar.gz https://utexas.box.com/shared/static/tss7udt3ralioalb6eskj3z3spuvwz7v.gz\n    tar -xvzf gibson_objectnav_episodes.tar.gz \u0026\u0026 rm gibson_objectnav_episodes.tar.gz\n    ```\n2. Download the image segmentation model [[URL](https://utexas.box.com/s/sf4prmup4fsiu6taljnt5ht8unev5ikq)] to `$PONI_ROOT/pretrained_models`.\n3. Copy the evaluation script corresponding to the model of interest from `$PONI_ROOT/experiment_scripts/gibson/eval_\u003cMETHOD_NAME\u003e.sh` to the required experiment directory. \n5. Set the `MODEL_PATH` variable in the script to the saved checkpoint. By default, it points to the path of a pre-trained model (see previous section).\n5. To reproduce results from the paper, download the pre-trained models and evaluate them using the evaluation scripts.\n6. To visualize episodes with the semantic map and potential function predictions, add the arguments `--print_images 1 --num_pf_maps 3` in the evaluation script.\n\n\n## ObjectNav evaluation on MP3D\n\nWe use the ObjectNav evaluation setup from [Habitat-Lab](https://github.com/facebookresearch/habitat-lab) for the MP3D dataset. \n\n1. Download the MP3D ObjectNav dataset [[URL](https://utexas.box.com/s/40f0lfoucz4xr8ty4xkqlgop5jaz6kwp)] to `$PONI_ROOT/data/datasets/objectnav/mp3d/v1`.\n2. Download the image segmentation model [[URL](https://utexas.box.com/s/z6y09w6z279ew3rgaxjxlfb3y0x02gjs)] to `$PONI_ROOT/pretrained_models`.\n3. Copy the evaluation script corresponding to the model of interest from `$PONI_ROOT/experiment_scripts/mp3d/eval_\u003cMETHOD_NAME\u003e.sh` to the required experiment directory (say, `$EXPT_ROOT`). \n4. Set the `MODEL_PATH` variable in the script to the saved checkpoint. By default, it points to the path of a pre-trained model. Execute the eval script specifying the ids of 2 GPUs to evaluate on (0, 1 in this example). **Note:** In general, we found MP3D evaluation to be very slow on a single thread. The current MP3D evaluation code does not support multi-threaded evaluation. Instead, we split the MP3D val episode dataset into 11 parts (one for each scene), and run 11 single-threaded evaluations in parallel. By default, the first GPU evaluates on 6 parts (requiring ~20GB memory), and the second GPU evaluates on 5 parts (requiring ~16GB memory) simultaneously. If this exceeds the memory available on your GPU, please reduce the number of parts per GPU and increase the number of GPUs (i.e., modify `eval_\u003cMETHOD_NAME\u003e.sh`). \n    ```\n    ./eval_\u003cMETHOD_NAME\u003e.sh 0 1\n    ```\n5. Merge results from the 11 splits.\n    ```\n    python $PONI_ROOT/hlab/merge_results --path_format \"$EXPT_ROOT/mp3d_objectnav/tb_seed_100_val_part_*/stats.json\"\n    ```\n6. To reproduce results from the paper, download the pre-trained models and evaluate them using the evaluation scripts.\n\n\n## Acknowledgements\n\nIn our work, we used parts of [Semantic-MapNet](https://github.com/vincentcartillier/Semantic-MapNet), [Habitat-Lab](https://github.com/facebookresearch/habitat-lab), [Object-Goal-Navigation](https://github.com/devendrachaplot/Object-Goal-Navigation), and [astar_pycpp](https://github.com/srama2512/astar_pycpp) repos and extended them.\n\n## Citation\nIf you find this codebase useful, please cite us:\n```\n@inproceedings{ramakrishnan2022poni,\n    author       = {Ramakrishnan, Santhosh K. and Chaplot, Devendra Singh and Al-Halah, Ziad and Malik, Jitendra and Grauman, Kristen},\n    booktitle    = {Computer Vision and Pattern Recognition (CVPR), 2022 IEEE Conference on},\n    title        = {PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning},\n    year         = {2022},\n    organization = {IEEE},\n}\n```\n\n## License\nThis project is released under the MIT license, as found in the [LICENSE](LICENSE) file.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrama2512%2Fponi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsrama2512%2Fponi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsrama2512%2Fponi/lists"}