Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/ChanganVR/awesome-embodied-vision

Reading list for research topics in embodied vision
https://github.com/ChanganVR/awesome-embodied-vision

List: awesome-embodied-vision

awesome-list curated-list embodied-vision visual-exploration visual-navigation vln

Last synced: about 2 months ago
JSON representation

Reading list for research topics in embodied vision

Lists

README

        

# Awesome Embodied Vision [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)
> A curated list of embodied vision resources.

Inspired by the [awesome](https://github.com/sindresorhus/awesome) list thing and [awesome-vln](https://github.com/daqingliu/awesome-vln).

By [Changan Chen](https://changan.io) ([email protected]), Department of Computer Science at the University of Texas at Austin, with help from [Tushar Nagarajan](https://tushar-n.github.io/), [Santhosh Kumar Ramakrishnan](https://srama2512.github.io/) and [Yinfeng Yu](https://yyf17.github.io/). If you see papers missing from the list, please send me an email or a pull request (format see [below](#contributing)).

## Table of Content

* [Papers](#papers)
* [PointGoal Navigation](#pointgoal)
* [Audio-Visual Navigation](#audiogoal)
* [ObjectGoal Navigation](#objectgoal)
* [ImageGoal Navigation](#imagegoal)
* [Vision-Language Navigation](#vln)
* [Embodied Question Answering](#eqa)
* [Multi-Agent Tasks](#multiagent)
* [Active Visual Tracking](#av-tracking)
* [Visual Exploration](#visual_exploration)
* [Visual Interactions](#visual_interaction)
* [Rearrangement](#rearrangement)
* [Sim2real Transfer](#sim2real_transfer)
* [Datasets](#datasets)
* [Simulators](#simulators)
* [MISC](#misc)

## Contributing
When sending PRs, please put the new paper at the correct chronological position as the following format:

```
* **Paper Title**

*Author(s)*

Conference, Year. [[Paper]](link) [[Code]](link) [[Website]](link)
```

## Papers

### PointGoal Navigation
* **Cognitive Mapping and Planning for Visual Navigation**

*Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik*

CVPR, 2017. [[Paper]](https://arxiv.org/abs/1702.03920)

* **Habitat: A Platform for Embodied AI Research**

*Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra*

ICCV, 2019. [[Paper]](https://arxiv.org/abs/1904.01201) [[Code]](https://github.com/facebookresearch/habitat-api) [[Website]](https://aihabitat.org/)

* **SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation**

*Daniel Gordon, Abhishek Kadian, Devi Parikh, Judy Hoffman, Dhruv Batra*

ICCV, 2019. [[Paper]](https://arxiv.org/pdf/1905.07512.pdf) [[Code]](https://github.com/facebookresearch/splitnet)

* **A Behavioral Approach to Visual Navigation with Graph Localization Networks**

*Kevin Chen, Juan Pablo de Vicente, Gabriel Sepulveda, Fei Xia, Alvaro Soto, Marynel Vazquez, Silvio Savarese*

RSS, 2019. [[Paper]](https://arxiv.org/pdf/1903.00445.pdf) [[Code]](https://github.com/kchen92/graphnav) [[Website]](https://graphnav.stanford.edu/)

* **DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames**

*Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa, Devi Parikh, Manolis Savva, Dhruv Batra*

ICLR, 2020. [[Paper]](https://arxiv.org/abs/1911.00357) [[Code]](https://github.com/facebookresearch/habitat-api/tree/master/habitat_baselines/rl/ddppo) [[Website]](https://wijmans.xyz/publication/ddppo-2019/)

* **Learning to Explore using Active Neural SLAM**

*Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, Ruslan Salakhutdinov*

ICLR, 2020. [[Paper]](https://openreview.net/pdf?id=HklXn1BKDH) [[Code]](https://github.com/devendrachaplot/Neural-SLAM) [[Website]](https://devendrachaplot.github.io/projects/Neural-SLAM)

* **Auxiliary Tasks Speed Up Learning PointGoal Navigation**

*Joel Ye, Dhruv Batra, Erik Wijmans, Abhishek Das*

CoRL, 2020. [[Paper]](https://arxiv.org/abs/2007.04561) [[Code]](https://github.com/joel99/habitat-pointnav-aux)

* **Occupancy Anticipation for Efficient Exploration and Navigation**

*Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman*

ECCV, 2020. [[Paper]](http://vision.cs.utexas.edu/projects/occupancy_anticipation/main.pdf) [[Code]](https://github.com/facebookresearch/OccupancyAnticipation) [[Website]](http://vision.cs.utexas.edu/projects/occupancy_anticipation/)

* **Embodied Visual Navigation with Automatic Curriculum Learning in Real Environments**

*Steven D. Morad, Roberto Mecca, Rudra P.K. Poudel, Stephan Liwicki, Roberto Cipolla*

ICRA, 2021. [[Paper]](https://arxiv.org/pdf/2009.05429.pdf)

* **Differentiable SLAM-Net: Learning Particle SLAM for Visual Navigation**

*Peter Karkus, Shaojun Cai, David Hsu*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2105.07593.pdf)

* **The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation**

*Peter Karkus, Shaojun Cai, David Hsu*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2106.04531.pdf) [[Code]](https://github.com/joel99/objectnav) [[Website]](https://xiaoming-zhao.github.io/projects/pointnav-vo)

* **RobustNav: Towards Benchmarking Robustness in Embodied Navigation**

*Prithvijit Chattopadhyay, Judy Hoffman, Roozbeh Mottaghi, Aniruddha Kembhavi*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2104.04112.pdf) [[Code]](https://github.com/allenai/robustnav) [[Website]](https://prior.allenai.org/projects/robustnav)

* **Comparison of Model-Free and Model-Based Learning-Informed Planning for PointGoal Navigation**

*Yimeng Li, Arnab Debnath, Gregory J. Stein, Jana Kosecka*

CoRL, 2022. [[Paper]](https://openreview.net/pdf?id=2s92OhjT4L) [[Code]](https://github.com/yimengli46/bellman_point_goal) [[Website]](https://yimengli46.github.io/Projects/CoRL2022LHPWorkshop/index.html)

### Audio-Visual Navigation
* **Audio-Visual Embodied Navigation**

*Changan Chen\*, Unnat Jain\*, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman*

ECCV, 2020. [[Paper]](https://arxiv.org/pdf/1912.11474.pdf) [[Website]](http://vision.cs.utexas.edu/projects/audio_visual_navigation/)

* **Look, Listen, and Act: Towards Audio-Visual Embodied Navigation**

*Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum*

ICRA, 2020. [[Paper]](https://arxiv.org/pdf/2012.11583.pdf) [[Website]](http://vision.cs.utexas.edu/projects/semantic_audio_visual_navigation)

* **Learning to Set Waypoints for Audio-Visual Navigation**

*Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman*

ICLR, 2021. [[Paper]](https://arxiv.org/pdf/2008.09622.pdf) [[Website]](http://vision.cs.utexas.edu/projects/audio_visual_waypoints/)

* **Semantic Audio-Visual Navigation**

*Changan Chen, Ziad Al-Halah, Kristen Grauman*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2012.11583.pdf) [[Code]](https://github.com/facebookresearch/sound-spaces/tree/main/ss_baselines/savi) [[Website]](http://vision.cs.utexas.edu/projects/semantic_audio_visual_navigation/)

* **Move2Hear: Active Audio-Visual Source Separation**

*Sagnik Majumder, Ziad Al-Halah, and Kristen Grauman*

ICCV, 2021. [[Paper]](https://arxiv.org/abs/2105.07142) [[Website]](http://vision.cs.utexas.edu/projects/move2hear/)

* **Active Audio-Visual Separation of Dynamic Sound Sources**

*Sagnik Majumder, Ziad Al-Halah, and Kristen Grauman*

ECCV, 2022. [[Paper]](https://arxiv.org/abs/2202.00850) [[Website]](http://vision.cs.utexas.edu/projects/active-av-dynamic-separation/)

* **Sound Adversarial Audio-Visual Navigation**

*Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu*

ICLR, 2022. [[Paper]](https://openreview.net/pdf?id=NkZq4OEYN-) [[Code]](https://github.com/yyf17/SAAVN/tree/main) [[Website]](https://yyf17.github.io/SAAVN)

* **SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning**

*Changan Chen\*, Carl Schissler\*, Sanchit Garg\*, Philip Kobernik, Alexander Clegg, Paul Calamia, Dhruv Batra, Philip W Robinson, Kristen Grauman*

arXiv, 2022. [[Paper]](https://arxiv.org/pdf/2206.08312.pdf) [[Code]](https://github.com/facebookresearch/sound-spaces) [[Website]](https://vision.cs.utexas.edu/projects/soundspaces2)

* **Pay Self-Attention to Audio-Visual Navigation**

*Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang*

BMVC 2022. [[Paper]](https://arxiv.org/pdf/2210.01353.pdf) [[Website]](https://yyf17.github.io/FSAAVN/index.html)

* **Learning Semantic-Agnostic and Spatial-Aware Representation for Generalizable Visual-Audio Navigation**

* Hongcheng Wang*, Yuxuan Wang*, Fangwei Zhong, Mingdong Wu, Jianwei Zhang, Yizhou Wang, Hao Dong*

IEEE RA-L, 2023. [[Paper]](https://arxiv.org/pdf/2304.10773.pdf)

### ObjectGoal Navigation

* **Cognitive Mapping and Planning for Visual Navigation**

*Saurabh Gupta, Varun Tolani, James Davidson, Sergey Levine, Rahul Sukthankar, Jitendra Malik*

CVPR, 2017. [[Paper]](https://arxiv.org/abs/1702.03920)

* **Visual Semantic Navigation using Scene Priors**

*Wei Yang, Xiaolong Wang, Ali Farhadi, Abhinav Gupta, Roozbeh Mottaghi*

ICLR, 2019. [[Paper]](https://arxiv.org/abs/1810.06543)

* **Visual Representations for Semantic Target Driven Navigation**

*Arsalan Mousavian, Alexander Toshev, Marek Fiser, Jana Kosecka, Ayzaan Wahid, James Davidson*

ICRA, 2019. [[Paper]](https://arxiv.org/pdf/1805.06066.pdf) [[Code]](https://github.com/arsalan-mousavian/Navigation)

* **Learning to Learn How to Learn: Self-Adaptive Visual Navigation Using Meta-Learning**

*Mitchell Wortsman, Kiana Ehsani, Mohammad Rastegari, Ali Farhadi, Roozbeh Mottaghi*

CVPR, 2019. [[Paper]](https://arxiv.org/abs/1812.00971) [[Code]](https://github.com/allenai/savn) [[Website]](https://prior.allenai.org/projects/savn)

* **Bayesian Relational Memory for Semantic Visual Navigation**

*Yi Wu, Yuxin Wu, Aviv Tamar, Stuart Russell, Georgia Gkioxari, Yuandong Tian*

ICCV, 2019. [[Paper]](https://arxiv.org/abs/1909.04306) [[Code]](https://github.com/jxwuyi/HouseNavAgent)

* **Situational Fusion of Visual Representation for Visual Navigation**

*William B. Shen, Danfei Xu, Yuke Zhu, Leonidas J. Guibas, Li Fei-Fei, Silvio Savarese*

ICCV, 2019. [[Paper]](https://arxiv.org/abs/1908.09073)

* **Simultaneous Mapping and Target Driven Navigation**

*Georgios Georgakis, Yimeng Li, Jana Kosecka*

arXiv, 2019. [[Paper]](https://arxiv.org/abs/1911.07980)

* **Object Goal Navigation using Goal-Oriented Semantic Exploration**

*Devendra Singh Chaplot, Dhiraj Gandhi, Abhinav Gupta\*, Ruslan Salakhutdinov\**

NeurIPS, 2020. [[Paper]](https://arxiv.org/pdf/2007.00643.pdf) [[Website]](https://devendrachaplot.github.io/projects/semantic-exploration)

* **Learning Object Relation Graph and Tentative Policy for Visual Navigation**

*Heming Du, Xin Yu, Liang Zheng*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2007.11018)

* **Semantic Visual Navigation by Watching YouTube Videos**

*Matthew Chang, Arjun Gupta, Saurabh Gupta*

arXiv, 2020. [[Paper]](https://arxiv.org/pdf/2006.10034.pdf) [[Website]](https://matthewchang.github.io/value-learning-from-videos/)

* **ObjectNav Revisited: On Evaluation of Embodied Agents Navigating to Objects**

*Dhruv Batra, Aaron Gokaslan, Aniruddha Kembhavi, Oleksandr Maksymets, Roozbeh Mottaghi, Manolis Savva, Alexander Toshev, Erik Wijmans*

arXiv, 2020. [[Paper]](https://arxiv.org/abs/2006.13171)

* **MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation**

*Saim Wani\*, Shivansh Patel\*, Unnat Jain\*, Angel X. Chang, Manolis Savva*

NeurIPS, 2020. [[Paper]](https://arxiv.org/abs/2012.03912) [[Code]](https://github.com/saimwani/multiON) [[Website]](https://shivanshpatel35.github.io/multi-ON/)

* **Learning hierarchical relationships for object-goal navigation**

*Yiding Qiu, Anwesan Pal, Henrik I. Christensen*

CoRL, 2020. [[Paper]](https://arxiv.org/abs/2003.06749)

* **VTNet: Visual Transformer Network for Object Goal Navigation**

*Heming Du, Xin Yu, Liang Zheng*

ICLR, 2021. [[Paper]](https://arxiv.org/pdf/2009.07783.pdf)

* **Visual Navigation With Spatial Attention**

*Yiding Qiu, Anwesan Pal, Henrik I. Christensen*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2104.09807.pdf)

* **Auxiliary Tasks and Exploration Enable ObjectGoal Navigation**

*Peter Karkus, Shaojun Cai, David Hsu*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2108.11550.pdf) [[Code]](https://github.com/Xiaoming-Zhao/PointNav-VO) [[Website]](https://joel99.github.io/objectnav/)

* **Hierarchical Object-to-Zone Graph for Object Navigation**

*Sixian Zhang, Xinhang Song, Yubing Bai, Weijie Li, Yakui Chu, Shuqiang Jiang*

ICCV, 2021. [[Paper]](https://arxiv.org/abs/2109.02066) [[Code]](https://github.com/sx-zhang/HOZ.git) [[Video]](https://drive.google.com/file/d/1UtTcFRhFZLkqgalKom6_9GpQmsJfXAZC/view)

* **THDA: Treasure Hunt Data Augmentation for Semantic Navigation**

*Oleksandr Maksymets, Vincent Cartillier, Aaron Gokaslan, Erik Wijmans, Wojciech Galuba, Stefan Lee, Dhruv Batra*

ICCV, 2021. [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Maksymets_THDA_Treasure_Hunt_Data_Augmentation_for_Semantic_Navigation_ICCV_2021_paper.pdf)

* **🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation**

*Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi*

arXiv, 2022. [[Paper]](https://arxiv.org/pdf/2206.06994.pdf) [[Website]](https://procthor.allenai.org/)

* **Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation**

*Jenny Zhang, Samson Yu, Jiafei Duan, Cheston Tan*

arXiv, 2022. [[Paper]](https://arxiv.org/abs/2206.10606) [[Code]](https://github.com/jennyzzt/good_time_to_ask)

* **Self-Supervised Object Goal Navigation with In-Situ Finetuning**

*So Yeon Min, Yao-Hung Hubert Tsai, Wei Ding, Ali Farhadi, Ruslan Salakhutdinov, Yonatan Bisk, Jian Zhang*

IROS, 2023. [[Paper]](https://arxiv.org/abs/2212.05923) [[Video]](https://www.youtube.com/watch?v=LXsZst5ZUpU)

* **MOPA: Modular Object Navigation with PointGoal Agents**

*Sonia Raychaudhuri, Tommaso Campari, Unnat Jain, Manolis Savva, Angel X. Chang*

WACV, 2024. [[Paper]](https://openaccess.thecvf.com/content/WACV2024/html/Raychaudhuri_MOPA_Modular_Object_Navigation_With_PointGoal_Agents_WACV_2024_paper.html) [[Video]](https://youtu.be/Jcspov0UpsA)

### ImageGoal Navigation

* **Target-driven Visual Navigation in Indoor Scenes using Deep Reinforcement Learning**

*Yuke Zhu, Roozbeh Mottaghi, Eric Kolve, Joseph J. Lim, Abhinav Gupta, Li Fei-Fei, Ali Farhadi*

ICRA, 2017. [[Paper]](https://arxiv.org/abs/1609.05143) [[Website]](https://prior.allenai.org/projects/target-driven-visual-navigation)

* **Semi-Parametric Topological Memory for Navigation**

*Nikolay Savinov\*, Alexey Dosovitskiy\*, Vladlen Koltun*

ICLR, 2018. [[Paper]](https://arxiv.org/pdf/1803.00653.pdf) [[Code]](https://github.com/nsavinov/SPTM) [[Website]](https://sites.google.com/view/SPTM)

* **Neural Topological SLAM for Visual Navigation**

*Devendra Singh Chaplot, Ruslan Salakhutdinov, Abhinav Gupta, Saurabh Gupta*

CVPR, 2020. [[Paper]](https://arxiv.org/pdf/2005.12256.pdf) [[Website]](https://devendrachaplot.github.io/projects/Neural-Topological-SLAM)

* **Learning View and Target Invariant Visual Servoing for Navigation**

*Yimeng Li, Jana Kosecka*

ICRA, 2020. [[Paper]](https://arxiv.org/pdf/2003.02327.pdf) [[Code]](https://github.com/GMU-vision-robotics/View-Invariant-Visual-Servoing-for-Navigation) [[Website]](https://yimengli46.github.io/Projects/ICRA2020/index.html)

* **Visual Graph Memory with Unsupervised Representation for Visual Navigation**

*Obin Kwon, Nuri Kim, Yunho Choi, Hwiyeon Yoo, Jeongho Park, Songhwai Oh*

ICCV, 2021. [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Kwon_Visual_Graph_Memory_With_Unsupervised_Representation_for_Visual_Navigation_ICCV_2021_paper.pdf) [[Code]](https://github.com/rllab-snu/Visual-Graph-Memory) [[Website]](https://rllab-snu.github.io/projects/vgm/doc.html)

* **No RL, No Simulation: Learning to Navigate without Navigating**

*Meera Hahn, Devendra Chaplot, Shubham Tulsiani, Mustafa Mukadam, James M. Rehg, Abhinav Gupta*

NeurIPS, 2021. [[Paper]](https://arxiv.org/pdf/2110.09470.pdf)

* **Topological Semantic Graph Memory for Image-Goal Navigation**

*Nuri Kim, Obin Kwon, Hwiyeon Yoo, Yunho Choi, Jeongho Park, Songhwai Oh*

CoRL, 2022. [[Paper]](https://openreview.net/pdf?id=xjTUxBfIzE) [[Code]](https://github.com/rllab-snu/TopologicalSemanticGraphMemory) [[Website]](https://github.com/bareblackfoot/Topological-Semantic-Graph-Memory)

* **Last-Mile Embodied Visual Navigation**

*Justin Wasserman, Karmesh Yadav, Girish Chowdhary, Abhinav Gupta, Unnat Jain*

CoRL, 2022. [[Paper]](https://arxiv.org/abs/2211.11746) [[Code]](https://github.com/Jbwasse2/SLING) [[Website]](https://jbwasse2.github.io/portfolio/SLING/)

* **Renderable Neural Radiance Map for Visual Navigation**

*Obin Kwon, Jeongho Park, Songhwai Oh*

CVPR, 2023. [[Paper]](https://openaccess.thecvf.com/content/CVPR2023/html/Kwon_Renderable_Neural_Radiance_Map_for_Visual_Navigation_CVPR_2023_paper.html) [[Website]](https://rllab-snu.github.io/projects/RNR-Map/)

* **Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation**

*Xiaohan Lei, Min Wang, Wengang Zhou, Li Li, Houqiang Li*

CVPR, 2024. [[Paper]](https://xiaohanlei.github.io/projects/IEVE/IEVE.pdf) [[Website]](https://xiaohanlei.github.io/projects/IEVE/)

### Vision-Language Navigation

* **Vision-and-Language Navigation: Interpreting Visually-Grounded Navigation Instructions in Real Environments**

*Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel*

CVPR, 2018. [[Paper]](https://arxiv.org/abs/1711.07280) [[Code]](https://github.com/peteanderson80/Matterport3DSimulator) [[Website]](https://bringmeaspoon.org)

* **Look Before You Leap: Bridging Model-Free and Model-Based Reinforcement Learning for Planned-Ahead Vision-and-Language Navigation**

*Xin Wang, Wenhan Xiong, Hongmin Wang, William Yang Wang*

ECCV, 2018. [[Paper]](https://arxiv.org/abs/1803.07729)

* **Mapping Instructions to Actions in 3D Environmentswith Visual Goal Prediction**

*Dipendra Misra, Andrew Bennett, Valts Blukis, Eyvind Niklasson, Max Shatkhin, Yoav Artzi*

EMNLP, 2018. [[Paper]](https://arxiv.org/abs/1809.00786)

* **Speaker-Follower Models for Vision-and-Language Navigation**

*Daniel Fried, Ronghang Hu, Volkan Cirik, Anna Rohrbach, Jacob Andreas, Louis-Philippe Morency, Taylor Berg-Kirkpatrick, Kate Saenko, Dan Klein, Trevor Darrell*

NeurIPS, 2018. [[Paper]](https://arxiv.org/abs/1806.02724) [[Code]](https://github.com/ronghanghu/speaker_follower) [[Website]](http://ronghanghu.com/speaker_follower/)

* **Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation**

*Xin Wang, Qiuyuan Huang, Asli Celikyilmaz, Jianfeng Gao, Dinghan Shen, Yuan-Fang Wang, William Yang Wang, Lei Zhang*

CVPR, 2019. [[Paper]](https://arxiv.org/abs/1811.10092)

* **Self-Monitoring Navigation Agent via Auxiliary Progress Estimation**

*Chih-Yao Ma, Jiasen Lu, Zuxuan Wu, Ghassan AlRegib, Zsolt Kira, Richard Socher, Caiming Xiong*

ICLR, 2019. [[Paper]](https://arxiv.org/abs/1901.03035) [[Code]](https://github.com/chihyaoma/selfmonitoring-agent) [[Website]](https://chihyaoma.github.io/project/2018/09/27/selfmonitoring.html)

* **The Regretful Agent: Heuristic-Aided Navigation through Progress Estimation**

*Chih-Yao Ma, Zuxuan Wu, Ghassan AlRegib, Caiming Xiong, Zsolt Kira*

CVPR, 2019. [[Paper]](https://arxiv.org/abs/1903.01602) [[Code]](https://github.com/chihyaoma/regretful-agent) [[Website]](https://chihyaoma.github.io/project/2019/02/25/regretful.html)

* **TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments**

*Howard Chen, Alane Suhr, Dipendra Misra, Noah Snavely, Yoav Artzi*

CVPR, 2019. [[Paper]](https://arxiv.org/pdf/1811.12354.pdf) [[Code]](https://github.com/lil-lab/touchdown)

* **Tactical Rewind: Self-Correction via Backtracking in Vision-and-Language Navigation**

*Liyiming Ke, Xiujun Li, Yonatan Bisk, Ari Holtzman, Zhe Gan, Jingjing Liu, Jianfeng Gao, Yejin Choi, Siddhartha Srinivasa*

CVPR, 2019. [[Paper]](http://arxiv.org/abs/1903.02547) [[Code]](https://github.com/Kelym/FAST) [[Video]](https://www.youtube.com/watch?v=AD9TNohXoPA)

* **Vision-based Navigation with Language-based Assistance via Imitation Learning with Indirect Intervention**

*Khanh Nguyen, Debadeepta Dey, Chris Brockett, Bill Dolan*

CVPR, 2019. [[Paper]](https://arxiv.org/abs/1909.01871) [[Code]](https://github.com/khanhptnk/hanna) [[Video]](https://youtu.be/18P94aaaLKg)

* **Help, Anna! Visual Navigation with Natural Multimodal Assistance via Retrospective Curiosity-Encouraging Imitation Learning**

*Khanh Nguyen, Hal Daumé III*

EMNLP, 2019. [[Paper]](https://arxiv.org/abs/1812.04155) [[Code]](https://github.com/debadeepta/vnla) [[Video]](https://youtu.be/Vp6C29qTKQ0)

* **Chasing Ghosts: Instruction Following as Bayesian State Tracking**

*Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee*

NeurIPS, 2019. [[Paper]](https://arxiv.org/abs/1907.02022) [[Code]](https://github.com/batra-mlp-lab/vln-chasing-ghosts) [[Video]](https://www.youtube.com/watch?v=eoGbescCNP0)

* **Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters**

*Federico Landi, Lorenzo Baraldi, Massimiliano Corsini, Rita Cucchiara*

BMVC, 2019. [[Paper]](https://arxiv.org/abs/1907.02985) [[Code]](https://github.com/aimagelab/DynamicConv-agent)

* **Transferable Representation Learning in Vision-and-Language Navigation**

*Haoshuo Huang, Vihan Jain, Harsh Mehta, Alexander Ku, Gabriel Magalhaes, Jason Baldridge, Eugene Ie*

ICCV, 2019. [[Paper]](https://arxiv.org/abs/1908.03409)

* **Unsupervised Reinforcement Learning of Transferable Meta-Skills for Embodied Navigation**

*Juncheng Li, Xin Wang, Siliang Tang, Haizhou Shi, Fei Wu, Yueting Zhuang, William Yang Wang*

CVPR, 2020. [[Paper]](https://arxiv.org/abs/1911.07450)

* **Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks**

*Fengda Zhu, Yi Zhu, Xiaojun Chang, Xiaodan Liang*

CVPR, 2020. [[Paper]](https://arxiv.org/abs/1911.07883)

* **Perceive, Transform, and Act: Multi-Modal Attention Networks for Vision-and-Language Navigation**

*Federico Landi, Lorenzo Baraldi, Marcella Cornia, Massimiliano Corsini, Rita Cucchiara*

arXiv, 2019. [[Paper]](https://arxiv.org/abs/1911.12377) [[Code]](https://github.com/aimagelab/perceive-transform-and-act)

* **Just Ask: An Interactive Learning Framework for Vision and Language Navigation**

*Ta-Chung Chi, Mihail Eric, Seokhwan Kim, Minmin Shen, Dilek Hakkani-tur*

AAAI, 2020. [[Paper]](https://arxiv.org/abs/1912.00915)

* **Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training**

*Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, Jianfeng Gao*

CVPR, 2020. [[Paper]](https://arxiv.org/abs/2002.10638) [[Code]](https://github.com/weituo12321/PREVALENT)

* **Environment-agnostic Multitask Learning for Natural Language Grounded Navigation**

*Xin Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2003.00443)

* **Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling**

*Tsu-Jui Fu, Xin Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/1911.07308)

* **Multi-View Learning for Vision-and-Language Navigation**

*Qiaolin Xia, Xiujun Li, Chunyuan Li, Yonatan Bisk, Zhifang Sui, Jianfeng Gao, Yejin Choi, Noah A. Smith*

arXiv, 2020. [[Paper]](https://arxiv.org/abs/2003.00857)

* **Vision-Dialog Navigation by Exploring Cross-modal Memory**

*Yi Zhu, Fengda Zhu, Zhaohuan Zhan, Bingqian Lin, Jianbin Jiao, Xiaojun Chang, Xiaodan Liang*

CVPR, 2020. [[Paper]](https://arxiv.org/abs/2003.06745) [[Code]](https://github.com/yeezhu/CMN.pytorch)

* **Take the Scenic Route: Improving Generalization in Vision-and-Language Navigation**

*Felix Yu, Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky*

arXiv, 2020. [[Paper]](https://arxiv.org/abs/2003.14269)

* **Sub-Instruction Aware Vision-and-Language Navigation**

*Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould*

arXiv, 2020. [[Paper]](https://arxiv.org/abs/2004.02707)

* **Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments**

*Jacob Krantz, Erik Wijmans, Arjun Majumdar, Dhruv Batra, Stefan Lee*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2004.02857) [[Code]](https://github.com/jacobkrantz/VLN-CE) [[Website]](https://jacobkrantz.github.io/vlnce)

* **Counterfactual Vision-and-Language Navigation via Adversarial Path Sampling**

*Tsu-Jui Fu, Xin Eric Wang, Matthew Peterson, Scott Grafton, Miguel Eckstein, William Yang Wang*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/1911.07308) [[Code]](https://github.com/jacobkrantz/VLN-CE) [[Website]](https://jacobkrantz.github.io/vlnce)

* **Improving Vision-and-Language Navigation with Image-Text Pairs from the Web**

*Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh, Dhruv Batra*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2004.14973)

* **Soft Expert Reward Learning for Vision-and-Language Navigation**

*Hu Wang, Qi Wu, Chunhua Shen*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2007.10835)

* **Active Visual Information Gathering for Vision-Language Navigation**

*Hanqing Wang, Wenguan Wang, Tianmin Shu, Wei Liang, Jianbing Shen*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2007.08037) [[Code]](https://github.com/HanqingWangAI/Active_VLN)

* **Environment-agnostic Multitask Learning for Natural Language Grounded Navigation**

*Xin Eric Wang, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2003.00443)

* **Language and Visual Entity Relationship Graph for Agent Navigation**

*Yicong Hong, Cristian Rodriguez, Yuankai Qi, Qi Wu, Stephen Gould*

NeurIPS, 2020. [[Paper]](https://arxiv.org/abs/2010.09304) [[Code]](https://github.com/YicongHong/Entity-Graph-VLN)

* **Counterfactual Vision-and-Language Navigation: Unravelling the Unseen**

*Amin Parvaneh, Ehsan Abbasnejad, Damien Teney, Javen Qinfeng Shi, Anton van den Hengel*

NeurIPS, 2020. [[Paper]](https://proceedings.neurips.cc/paper/2020/file/39016cfe079db1bfb359ca72fcba3fd8-Paper.pdf)

* **Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation**

*Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky*

NeurIPS, 2020. [[Paper]](https://proceedings.neurips.cc/paper/2020/file/eddb904a6db773755d2857aacadb1cb0-Paper.pdf)

* **Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning**

*Weixia Zhang, Chao Ma, Qi Wu, Xiaokang Yang*

TCSVT, 2020. [[Paper]](https://arxiv.org/pdf/2011.10972.pdf)

* **Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule**

*Shuhei Kurita, Kyunghyun Cho*

ICLR, 2021. [[Paper]](https://arxiv.org/pdf/2009.07783.pdf)

* **Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation**

*Muhammad Zubair Irshad, Chih-Yao Ma, Zsolt Kira*

ICRA, 2021. [[Paper]](https://arxiv.org/abs/2104.10674) [[Code]](https://github.com/GT-RIPL/robo-vln) [[Website]](https://zubair-irshad.github.io/projects/robo-vln.html) [[Video]](https://www.youtube.com/watch?v=y16x9n_zP_4)

* **VLN BERT: A Recurrent Vision-and-Language BERT for Navigation**

*Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould*

CVPR, 2021. [[Paper]](https://arxiv.org/abs/2011.13922) [[Code]](https://github.com/YicongHong/Recurrent-VLN-BERT)

* **Structured Scene Memory for Vision-Language Navigation**

*Hanqing Wang, Wenguan Wang, Wei Liang, Caiming Xiong, Jianbing Shen*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2103.03454.pdf) [[Code]](https://github.com/HanqingWangAI/SSM-VLN)

* **Topological Planning With Transformers for Vision-and-Language Navigation**

*Kevin Chen, Junshen K. Chen, Jo Chuang, Marynel Vázquez, Silvio Savarese*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2012.05292.pdf)

* **SOON: Scenario Oriented Object Navigation With Graph-Based Exploration**

*Fengda Zhu, Xiwen Liang, Yi Zhu, Qizhi Yu, Xiaojun Chang, Xiaodan Liang*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2103.17138.pdf)

* **Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression**

*Chen Gao, Jinyu Chen, Si Liu, Luting Wang, Qiong Zhang, Qi Wu*

CVPR, 2021. [[Paper]](https://openaccess.thecvf.com/content/CVPR2021/papers/Gao_Room-and-Object_Aware_Knowledge_Reasoning_for_Remote_Embodied_Referring_Expression_CVPR_2021_paper.pdf)

* **Scene-Intuitive Agent for Remote Embodied Visual Grounding**

*Xiangru Lin, Guanbin Li, Yizhou Yu*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2103.12944.pdf)

* **Neighbor-view Enhanced Model for Vision and Language Navigation**

*Dong An, Yuankai Qi, Yan Huang, Qi Wu, Liang Wang, Tieniu Tan*

ACM MM, 2021. [[Paper]](https://arxiv.org/pdf/2107.07201.pdf) [[Code]](https://github.com/MarSaKi/NvEM)

* **The Road To Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation**

*Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton van den Hengel, Qi Wu*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2104.04167.pdf) [[Code]](https://github.com/YuankaiQi/ORIST)

* **Pathdreamer: A World Model for Indoor Navigation**

*Jing Yu Koh, Honglak Lee, Yinfei Yang, Jason Baldridge, Peter Anderson*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2105.08756.pdf) [[Code]](https://github.com/google-research/pathdreamer) [[Website]](https://google-research.github.io/pathdreamer/)

* **Episodic Transformer for Vision-and-Language Navigation**

*Alexander Pashevich, Cordelia Schmid, Chen Sun*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2105.06453.pdf) [[Code]](https://github.com/alexpashevich/E.T.) [[Website]](https://sites.google.com/view/episodictransformer)

* **Self-Motivated Communication Agent for Real-World Vision-Dialog Navigation**

*Yi Zhu\*, Yue Weng\*, Fengda Zhu, Xiaodan Liang, , Qixiang Ye, Yutong Lu, Jianbin Jiao*

ICCV, 2021. [[Paper]](https://openaccess.thecvf.com/content/ICCV2021/papers/Zhu_Self-Motivated_Communication_Agent_for_Real-World_Vision-Dialog_Navigation_ICCV_2021_paper.pdf)

* **Vision-Language Navigation with Random Environmental Mixup**

*Chong Liu\*, Fengda Zhu\*, Xiaojun Chang, Xiaodan Liang, Zongyuan Ge, Yi-Dong Shen*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2106.07876.pdf) [[Code]](https://github.com/LCFractal/VLNREM)

* **Waypoint Models for Instruction-guided Navigation in Continuous Environments**

*Jacob Krantz, Aaron Gokaslan, Dhruv Batra, Stefan Lee, Oleksandr Maksymets*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2106.07876.pdf) [[Code]](https://github.com/jacobkrantz/VLN-CE) [[Website]](https://jacobkrantz.github.io/waypoint-vlnce/)

* **Airbert: In-domain Pretraining for Vision-and-Language Navigation**

*Pierre-Louis Guhur, Makarand Tapaswi, Shizhe Chen, Ivan Laptev, Cordelia Schmid*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2108.09105.pdf) [[Code]](https://github.com/airbert-vln/airbert) [[Website]](https://airbert-vln.github.io/)

* **Curriculum Learning for Vision-and-Language Navigation**

*Jiwen Zhang, Zhongyu Wei, Jianqing Fan, Jiajie Peng*

NeurIPS, 2021. [[Paper]](https://arxiv.org/pdf/2111.07228.pdf) [[Code]](https://github.com/IMNearth/Curriculum-Learning-For-VLN)

* **SOAT: A Scene- and Object-Aware Transformer for Vision-and-Language Navigation**

*Abhinav Moudgil, Arjun Majumdar, Harsh Agrawal, Stefan Lee, Dhruv Batra*

NeurIPS, 2021. [[Paper]](https://arxiv.org/pdf/2110.14143.pdf)

* **History Aware Multimodal Transformer for Vision-and-Language Navigation**

*Shizhe Chen, Pierre-Louis Guhur, Cordelia Schmid, Ivan Laptev*

NeurIPS, 2021. [[Paper]](https://arxiv.org/pdf/2110.13309.pdf) [[Website]](https://cshizhe.github.io/projects/vln_hamt.html) [[Code]](https://github.com/cshizhe/VLN-HAMT)

* **Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments**

*Sonia Raychaudhuri, Saim Wani, Shivansh Patel, Unnat Jain, Angel Chang*

EMNLP, 2021. [[Paper]](https://aclanthology.org/2021.emnlp-main.328) [[Code]](https://github.com/3dlg-hcvc/LAW-VLNCE) [[Website]](https://3dlg-hcvc.github.io/LAW-VLNCE) [[Video]](https://www.youtube.com/watch?v=7dRymdCIAvo)

* **SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments**

*Muhammad Zubair Irshad, Niluthpol Chowdhury Mithun, Zachary Seymour, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar*

ICPR, 2022. [[Paper]](https://arxiv.org/abs/2108.11945) [[Website]](https://zubair-irshad.github.io/projects/SASRA.html) [[Video]](https://www.youtube.com/watch?v=DsziGtgaJC0)

### Multi-Agent Tasks
* **Two Body Problem: Collaborative Visual Task Completion**

*Unnat Jain\*, Luca Weihs\*, Eric Kolve, Mohammad Rastegari, Svetlana Lazebnik, Ali Farhadi, Alexander Schwing, Aniruddha Kembhavi*

CVPR, 2019. [[Paper]](https://arxiv.org/pdf/1904.05879.pdf) [[Website]](https://prior.allenai.org/projects/two-body-problem)

* **A Cordial Sync: Going Beyond Marginal Policies For Multi-Agent Embodied Tasks**

*Unnat Jain\*, Luca Weihs\*, Eric Kolve, Ali Farhadi, Svetlana Lazebnik, Aniruddha Kembhavi, Alexander Schwing*

ECCV, 2020. [[Paper]](https://arxiv.org/abs/2007.04979) [[Code]](https://github.com/allenai/cordial-sync) [[Website]](https://unnat.github.io/cordial-sync/)

* **Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents**

*Shivansh Patel\*, Saim Wani\*, Unnat Jain\*, Alexander G. Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2110.05769.pdf)

* **GRIDTOPIX: Training Embodied Agents with Minimal Supervision**

*Shivansh Patel\*, Saim Wani\*, Unnat Jain\*, Alexander G. Schwing, Svetlana Lazebnik, Manolis Savva, Angel X. Chang*

ICCV, 2021. [[Paper]](https://arxiv.org/pdf/2105.00931.pdf) [[Website]](https://unnat.github.io/gridtopix/)

* **Sound Adversarial Audio-Visual Navigation**

*Yinfeng Yu, Wenbing Huang, Fuchun Sun, Changan Chen, Yikai Wang, Xiaohong Liu*

ICLR, 2022. [[Paper]](https://openreview.net/pdf?id=NkZq4OEYN-) [[Code]](https://github.com/yyf17/SAAVN/tree/main) [[Website]](https://yyf17.github.io/SAAVN)

* **Proactive Multi-Camera Collaboration for 3D Human Pose Estimation**

*Hai Ci, Mickel Liu, Xuehai Pan, Fangwei Zhong, Yizhou Wang*

ICLR, 2023. [[Paper]](https://openreview.net/pdf?id=CPIy9TWFYBG) [[Website]](https://sites.google.com/view/active3dpose)

### Active Visual Tracking
* **End-to-end Active Object Tracking via Reinforcement Learning**

*Wenhan Luo\*, Peng Sun\*, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang*

ICML, 2018. [[Paper]](http://proceedings.mlr.press/v80/luo18a/luo18a.pdf) [[Website]](https://sites.google.com/site/whluoimperial/active_tracking_icml2018)

* **End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning**

*Wenhan Luo\*, Peng Sun\*, Fangwei Zhong\*, Wei Liu, Tong Zhang, Yizhou Wang*

IEEE TPAMI, 2019. [[Paper]](https://arxiv.org/pdf/1808.03405.pdf) [[Website]](https://sites.google.com/site/whluoimperial/active_tracking_icml2018)

* **AD-VAT: An Asymmetric Dueling mechanism for learning Visual Active Tracking**

*Fangwei Zhong, Wenhan Luo, Peng Sun, Tingyun Yan, Yizhou Wang*

ICLR, 2019. [[Paper]](https://openreview.net/pdf?id=HkgYmhR9KX) [[Code]](https://github.com/zfw1226/active_tracking_rl)

* **AD-VAT+: An Asymmetric Dueling Mechanism for Learning and Understanding Visual Active Tracking**

*Fangwei Zhong, Wenhan Luo, Peng Sun, Tingyun Yan, Yizhou Wang*

IEEE TPAMI, 2019. [[Paper]](https://ieeexplore.ieee.org/abstract/document/8896000) [[Code]](https://github.com/zfw1226/active_tracking_rl)

* **Pose-Assisted Multi-Camera Collaboration for Active Object Tracking**

*Jing Li\*, Jing Xu\*, Fangwei Zhong\*, Xiangyu Kong, Yu Qiao, Yizhou Wang*

AAAI, 2020. [[Paper]](https://arxiv.org/pdf/2001.05161.pdf) [[Code]](https://github.com/LilJing/pose-assisted-collaboration)

* **Towards Distraction-Robust Active Visual Tracking**

*Fangwei Zhong, Wenhan Luo, Peng Sun, Tingyun Yan, Yizhou Wang*

ICML, 2021. [[Paper]](https://arxiv.org/abs/2106.10110) [[Code]](https://github.com/zfw1226/active_tracking_rl/tree/distractor) [[Website]](https://sites.google.com/view/distraction-robust-avt)

* **Anti-Distractor Active Object Tracking in 3D Environments**

*Mao Xi, Yun Zhou, Zheng Chen, Wengang Zhou, Houqiang Li*

IEEE TCSVT, 2022. [[Paper]](https://ieeexplore.ieee.org/abstract/document/9521193)

* **Enhancing continuous control of mobile robots for end-to-end visual active tracking**

*A Devo, A Dionigi, G Costante*

RAS, 2021. [[Paper]](https://www.sciencedirect.com/science/article/abs/pii/S0921889021000841)

* **E-VAT: An Asymmetric End-to-End Approach to Visual Active Exploration and Tracking**

*Alberto Dionigi, Alessandro Devo, Leonardo Guiducci, Gabriele Costante*

IEEE RA-L, 2022. [[Paper]](https://ieeexplore.ieee.org/abstract/document/9712363)

* **RSPT: Reconstruct Surroundings and Predict Trajectories for Generalizable Active Object Tracking**

*Fangwei Zhong\*, Xiao Bi\*, Yudi Zhang, Wei Zhang, Yizhou Wang*

AAAI, 2023. [[Paper]](https://arxiv.org/pdf/2304.03623v1.pdf) [[Website]](https://sites.google.com/view/aot-rspt)

* **Learning Vision-based Pursuit-Evasion Robot Policies**

*Andrea Bajcsy\*, Antonio Loquercio\*, Ashish Kumar, Jitendra Malik*

ArXiv, 2023. [[Paper]](https://arxiv.org/abs/2308.16185) [[Website]](https://abajcsy.github.io/vision-based-pursuit)

### Visual Exploration
* **Curiosity-driven Exploration by Self-supervised Prediction**

*Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell*

ICML, 2017. [[Paper]](https://arxiv.org/pdf/1705.05363.pdf) [[Code]](https://github.com/pathak22/noreward-rl) [[Website]](https://pathak22.github.io/noreward-rl/)

* **Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks**

*Dinesh Jayaraman, Kristen Grauman*

CVPR, 2018. [[Paper]](https://arxiv.org/abs/1709.00507)

* **Sidekick Policy Learning for Active Visual Exploration**

*Santhosh K. Ramakrishnan, Kristen Grauman*

ECCV, 2018. [[Paper]](https://arxiv.org/abs/1807.11010) [[Code]](https://github.com/srama2512/sidekicks) [[Website]](http://vision.cs.utexas.edu/projects/sidekicks/)

* **Learning Exploration Policies for Navigation**

*Tao Chen, Saurabh Gupta, Abhinav Gupta*

ICLR, 2019. [[Paper]](https://openreview.net/pdf?id=SyMWn05F7) [[Code]](https://github.com/taochenshh/exp4nav) [[Website]](https://sites.google.com/view/exploration-for-nav/)

* **Episodic Curiosity through Reachability**

*Nikolay Savinov, Anton Raichuk, Damien Vincent, Raphael Marinier, Marc Pollefeys, Timothy Lillicrap, Sylvain Gelly*

ICLR, 2019. [[Paper]](https://openreview.net/pdf?id=SkeK3s0qKQ) [[Code]](https://github.com/google-research/episodic-curiosity) [[Website]](https://sites.google.com/view/episodic-curiosity)

* **Emergence of Exploratory Look-Around Behaviors through Active Observation Completion**

*Santhosh K. Ramakrishnan\*, Dinesh Jayaraman\*, Kristen Grauman*

Science Robotics, 2019. [[Paper]](https://arxiv.org/pdf/1906.11407.pdf) [[Code]](https://github.com/srama2512/visual-exploration) [[Website]](http://vision.cs.utexas.edu/projects/visual-exploration/)

* **Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks**

*Kuan Fang, Alexander Toshev, Li Fei-Fei, Silvio Savarese*

CVPR, 2019. [[Paper]](https://arxiv.org/pdf/1903.03878.pdf) [[Website]](https://sites.google.com/view/scene-memory-transformer)

* **Explore and Explain: Self-supervised Navigation and Recounting**

*Roberto Bigazzi, Federico Landi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara*

ICPR, 2020. [[Paper]](https://ieeexplore.ieee.org/document/9412628)

* **Learning to Explore using Active Neural SLAM**

*Devendra Singh Chaplot, Dhiraj Gandhi, Saurabh Gupta, Abhinav Gupta, Ruslan Salakhutdinov*

ICLR, 2020. [[Paper]](https://openreview.net/pdf?id=HklXn1BKDH) [[Code]](https://github.com/devendrachaplot/Neural-SLAM) [[Website]](https://devendrachaplot.github.io/projects/Neural-SLAM)

* **Semantic Curiosity for Active Visual Learning**

*Devendra Singh Chaplot, Helen Jiang, Saurabh Gupta, Abhinav Gupta*

ECCV, 2020. [[Paper]](https://arxiv.org/pdf/2006.09367.pdf) [[Website]](https://devendrachaplot.github.io/projects/SemanticCuriosity)

* **See, Hear, Explore: Curiosity via Audio-Visual Association**

*Victoria Dean, Shubham Tulsiani, Abhinav Gupta*

NeurIPS, 2020. [[Paper]](https://vdean.github.io/resources/audio-curiosity2020.pdf) [[Website]](https://vdean.github.io/audio-curiosity.html)

* **Occupancy Anticipation for Efficient Exploration and Navigation**

*Santhosh K. Ramakrishnan, Ziad Al-Halah, Kristen Grauman*

ECCV, 2020. [[Paper]](http://vision.cs.utexas.edu/projects/occupancy_anticipation/main.pdf) [[Code]](https://github.com/facebookresearch/OccupancyAnticipation) [[Website]](http://vision.cs.utexas.edu/projects/occupancy_anticipation/)

* **Focus on Impact: Indoor Exploration With Intrinsic Motivation**

*Roberto Bigazzi, Federico Landi, Silvia Cascianelli, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara*

IEEE RA-L + ICRA, 2022. [[Paper]](https://ieeexplore.ieee.org/document/9691914) [[Code]](https://github.com/aimagelab/focus-on-impact)

* **Symmetry-aware Neural Architecture for. Embodied Visual Exploration**

*Shuang Liu, Takayuki Okatani*

CVPR, 2022. [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/papers/Liu_Symmetry-Aware_Neural_Architecture_for_Embodied_Visual_Exploration_CVPR_2022_paper.pdf) [[Code]](https://github.com/vincent341/S-ANS) [[Website]](https://github.com/vincent341/S-ANS)

* **Embodied Agents for Efficient Exploration and Smart Scene Description**

*Roberto Bigazzi, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, Rita Cucchiara*

ICRA, 2023. [[Paper]](https://arxiv.org/abs/2301.07150)

### Embodied Question Answering
* **Embodied Question Answering**

*Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra*

CVPR, 2018. [[Paper]](https://arxiv.org/abs/1711.11543) [[Code]](https://github.com/facebookresearch/EmbodiedQA) [[Website]](https://embodiedqa.org/)

* **Multi-Target Embodied Question Answering**

*Licheng Yu, Xinlei Chen, Georgia Gkioxari, Mohit Bansal, Tamara L. Berg, Dhruv Batra*

CVPR, 2019. [[Paper]](https://arxiv.org/abs/1904.04686)

* **Embodied Question Answering in Photorealistic Environments with Point Cloud Perception**

*Erik Wijmans\*, Samyak Datta\*, Oleksandr Maksymets\*, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra*

CVPR, 2019. [[Paper]](https://arxiv.org/abs/1904.03461)

* **Episodic Memory Question Answering**

*Samyak Datta, Sameer Dharur, Vincent Cartillier, Ruta Desai, Mukul Khanna, Dhruv Batra, and Devi Parikh*

CVPR, 2022. [[Paper]](https://openaccess.thecvf.com/content/CVPR2022/html/Datta_Episodic_Memory_Question_Answering_CVPR_2022_paper.html) [[Website]](https://samyak-268.github.io/emqa/)

* **SQA3D: Situated Question Answering in 3D Scenes**

*Xiaojian Ma, Silong Yong, Zilong Zheng, Qing Li , Yitao Liang, Song-Chun Zhu, Siyuan Huang*

ICLR, 2023. [[Paper]](https://arxiv.org/pdf/2210.07474.pdf) [[Website]](https://sqa3d.github.io/)

### Visual Interactions
* **Visual Semantic Planning using Deep Successor Representations**

*Yuke Zhu, Daniel Gordon, Eric Kolve, Dieter Fox, Li Fei-Fei, Abhinav Gupta, Roozbeh Mottaghi, Ali Farhadi*

ICCV, 2017. [[Paper]](https://arxiv.org/abs/1705.08080)

* **IQA: Visual Question Answering in Interactive Environments**

*Daniel Gordon, Aniruddha Kembhavi, Mohammad Rastegari, Joseph Redmon, Dieter Fox, and Ali Farhadi*

CVPR, 2018. [[Paper]](https://arxiv.org/abs/1712.03316) [[Code]](https://github.com/danielgordon10/thor-iqa-cvpr-2018) [[Website]](https://prior.allenai.org/projects/iqa)

* **ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks**

*Mohit Shridhar, Jesse Thomason, Daniel Gordon, Yonatan Bisk, Winson Han, Roozbeh Mottaghi, Luke Zettlemoyer, Dieter Fox*

CVPR, 2020. [[Paper]](https://arxiv.org/abs/1912.01734) [[Code]](https://github.com/askforalfred/alfred) [[Website]](https://askforalfred.com/)

* **Learning About Objects by Learning to Interact with Them**

*Martin Lohmann, Jordi Salvador, Aniruddha Kembhavi, Roozbeh Mottaghi*

NeurIPS, 2020. [[Paper]](https://arxiv.org/abs/2006.09306)

* **Learning Affordance Landscapes for Interaction Exploration in 3D Environments**

*Tushar Nagarajan, Kristen Grauman*

NeurIPS, 2020. [[Paper]](https://proceedings.neurips.cc/paper/2020/file/15825aee15eb335cc13f9b559f166ee8-Paper.pdf)

* **ALFWorld: Aligning Text and Embodied Environments for Interactive Learning**

*Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht*

ICLR, 2021. [[Paper]](https://arxiv.org/pdf/2010.03768.pdf) [[Code]](https://github.com/alfworld/alfworld) [[Website]](https://alfworld.github.io/)

* **Learning Generalizable Visual Representations via Interactive Gameplay**

*Luca Weihs, Aniruddha Kembhavi, Kiana Ehsani, Sarah M Pratt, Winson Han, Alvaro Herrasti, Eric Kolve, Dustin Schwenk, Roozbeh Mottaghi, Ali Farhadi*

ICLR, 2021. [[Paper]](https://arxiv.org/pdf/1912.08195.pdf)

* **Pushing It Out of the Way: Interactive Visual Navigation**

*Kuo-Hao Zeng, Luca Weihs, Ali Farhadi, Roozbeh Mottaghi*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2104.14040.pdf) [[Code]](https://github.com/KuoHaoZeng/Interactive_Visual_Navigation) [[Website]](https://prior.allenai.org/projects/interactive-visual-navigation)

* **ManipulaTHOR: A Framework for Visual Object Manipulation**

*Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2104.11213.pdf) [[Code]](https://github.com/allenai/manipulathor) [[Website]](https://ai2thor.allenai.org/manipulathor/)

* **🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation**

*Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi*

arXiv, 2022. [[Paper]](https://arxiv.org/pdf/2206.06994.pdf) [[Website]](https://procthor.allenai.org/)

* **Transformer Memory for Interactive Visual Navigation in Cluttered Environments**

*Weiyuan Li, Ruoxin Hong, Jiwei Shen, Liang Yuan, Yue Lu*

IEEE RA-L, 2023. [[Paper]](https://www.hrl.uni-bonn.de/teaching/ss23/master-seminar/transformer-memory-for-interactive-visual-navigation-in-cluttered-environments.pdf)

### Rearrangement

* **Rearrangement: A Challenge for Embodied AI**

*Dhruv Batra, Angel X. Chang, Sonia Chernova, Andrew J. Davison, Jia Deng, Vladlen Koltun, Sergey Levine, Jitendra Malik, Igor Mordatch, Roozbeh Mottaghi, Manolis Savva, Hao Su*

arXiv, 2020. [[Paper]](https://arxiv.org/abs/2011.01975)

* **Visual Room Rearrangement**

*Luca Weihs, Matt Deitke, Aniruddha Kembhavi, and Roozbeh Mottaghi*

CVPR, 2021. [[Paper]](https://arxiv.org/abs/2103.16544) [[Code]](https://github.com/allenai/ai2thor-rearrangement) [[Website]](https://ai2thor.allenai.org/rearrangement)

* **Habitat 2.0: Training Home Assistants to Rearrange their Habitat**

*Andrew Szot, Alex Clegg, Eric Undersander, Erik Wijmans, Yili Zhao, John Turner, Noah Maestre, Mustafa Mukadam, Devendra Chaplot, Oleksandr Maksymets, Aaron Gokaslan, Vladimir Vondrus, Sameer Dharur, Franziska Meier, Wojciech Galuba, Angel Chang, Zsolt Kira, Vladlen Koltun, Jitendra Malik, Manolis Savva, Dhruv Batra*

NeurIPS 2021. [[Paper]](https://arxiv.org/abs/2106.14405) [[Code]](https://github.com/facebookresearch/habitat-lab/)

* **🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation**

*Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi*

arXiv, 2022. [[Paper]](https://arxiv.org/pdf/2206.06994.pdf) [[Website]](https://procthor.allenai.org/)

* **TarGF: Learning Target Gradient Field to Rearrange Objects without Explicit Goal Specification**

*Mingdong Wu/*, Fangwei Zhong/*, Yulong Xia, Hao Dong*

NeurIPS, 2022. [[Paper]](https://proceedings.neurips.cc/paper_files/paper/2022/file/cf5a019ae9c11b4be88213ce3f85d85c-Paper-Conference.pdf) [[Website]](https://sites.google.com/view/targf)

* **Rearrange Indoor Scenes for Human-Robot Co-Activity**

*Weiqi Wang/*, Zihang Zhao/*, Ziyuan Jiao/*, Yixin Zhu, Song-Chun Zhu, Hangxin Liu*

ICRA, 2023. [[Paper]](https://arxiv.org/pdf/2303.05676.pdf) [[Website]](https://sites.google.com/view/coactivity)

### Sim-to-real Transfer
* **Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World**

*Josh Tobin, Rachel Fong, Alex Ray, Jonas Schneider, Wojciech Zaremba, Pieter Abbeel*

IROS, 2017. [[Paper]](https://arxiv.org/abs/1703.06907)

* **Sim-to-Real Transfer for Vision-and-Language Navigation**

*Peter Anderson, Ayush Shrivastava, Joanne Truong, Arjun Majumdar, Devi Parikh, Dhruv Batra, Stefan Lee*

CoRL, 2020. [[Paper]](https://arxiv.org/abs/2011.03807)

* **RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real**

*Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, Mohi Khansari*

CVPR, 2020. [[Paper]](https://arxiv.org/pdf/2006.09001.pdf)

* **Bi-directional Domain Adaptation for Sim2Real Transfer of Embodied Navigation Agents**

*Joanne Truong, Sonia Chernova, Dhruv Batra*

RA-L, 2021. [[Paper]](https://arxiv.org/pdf/2011.12421.pdf)

## Datasets
* **A Dataset for Developing and Benchmarking Active Vision**

*Phil Ammirato, Patrick Poirson, Eunbyung Park, Jana Kosecka, Alexander C. Berg*

ICRA, 2017. [[Paper]](https://www.cs.unc.edu/~ammirato/active_vision_dataset_website/icra-rohit-paper.pdf) [[Code]](https://github.com/ammirato/active_vision_dataset_processing) [[Website]](https://www.cs.unc.edu/~ammirato/active_vision_dataset_website/)

* **AI2-THOR: An Interactive 3D Environment for Visual AI**

*Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, Ali Farhadi*

arXiv, 2017. [[Paper]](https://arxiv.org/abs/1712.05474) [[Code]](https://github.com/allenai/ai2thor) [[Website]](https://ai2thor.allenai.org/)

* **Matterport3D: Learning from RGB-D Data in Indoor Environments**

*Angel Chang, Angela Dai, Thomas Funkhouser, Maciej Halber, Matthias Nießner, Manolis Savva, Shuran Song, Andy Zeng, Yinda Zhang*

3DV, 2017. [[Paper]](https://arxiv.org/pdf/1709.06158.pdf) [[Code]](https://github.com/niessner/Matterport) [[Website]](https://niessner.github.io/Matterport/)

* **Gibson Env: Real-World Perception for Embodied Agents**

*Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese*

CVPR, 2018. [[Paper]](https://arxiv.org/abs/1808.10654) [[Code]](https://github.com/StanfordVL/GibsonEnv) [[Website]](http://gibsonenv.stanford.edu/)

* **The Replica Dataset: A Digital Replica of Indoor Spaces**

*Julian Straub, Thomas Whelan, Lingni Ma, Yufan Chen, Erik Wijmans, Simon Green, Jakob J. Engel, Raul Mur-Artal, Carl Ren, Shobhit Verma, Anton Clarkson, Mingfei Yan, Brian Budge, Yajie Yan, Xiaqing Pan, June Yon, Yuyang Zou, Kimberly Leon, Nigel Carter, Jesus Briales, Tyler Gillingham, Elias Mueggler, Luis Pesqueira, Manolis Savva, Dhruv Batra, Hauke M. Strasdat, Renzo De Nardi, Michael Goesele, Steven Lovegrove, Richard Newcombe*

arXiV, 2019. [[Paper]](https://arxiv.org/pdf/1906.05797.pdf) [[Code]](https://github.com/facebookresearch/Replica-Dataset)

* **Actionet: An Interactive End-to-End Platform for Task-Based Data Collection and Augmentation in 3D Environments**

*Jiafei Duan, Samson Yu, Hui Li Tan, Cheston Tan*

ICIP, 2020. [[Paper]](https://arxiv.org/pdf/2010.01357.pdf) [[Code]](https://github.com/SamsonYuBaiJian/actionet)

* **Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI**

*Santhosh K. Ramakrishnan, Aaron Gokaslan, Erik Wijmans, Oleksandr Maksymets, Alex Clegg, John Turner, Eric Undersander, Wojciech Galuba, Andrew Westbury, Angel X. Chang, Manolis Savva, Yili Zhao, Dhruv Batra*

NeurIPS, 2021. [[Paper]](https://arxiv.org/pdf/2109.08238.pdf) [[Website]](https://matterport.com/habitat-matterport-3d-research-dataset?fbclid=IwAR0LM7ZsUMOjUtDuDHOWEfY7UuOZ-cl8T5IvuzflDdmy_SCkkordP8yNpd0)

* **🏘️ ProcTHOR-10K: 10K Interactive Household Environments for Embodied AI**

*Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi*

arXiv, 2022. [[Paper]](https://arxiv.org/pdf/2206.06994.pdf) [[Website]](https://procthor.allenai.org/)

* **EgoTV 📺: Egocentric Task Verification from Natural Language Task Descriptions**

*Rishi Hazra, Brian Chen, Akshara Rai, Nitin Kamra, Ruta Desai*

ICCV, 2023. [[Paper]](https://openaccess.thecvf.com/content/ICCV2023/papers/Hazra_EgoTV_Egocentric_Task_Verification_from_Natural_Language_Task_Descriptions_ICCV_2023_paper.pdf) [[Code]](https://github.com/facebookresearch/EgoTV) [[Website]](https://rishihazra.github.io/EgoTV)

## Simulators
* **AI2-THOR: An Interactive 3D Environment for Visual AI**

*Eric Kolve, Roozbeh Mottaghi, Winson Han, Eli VanderBilt, Luca Weihs, Alvaro Herrasti, Daniel Gordon, Yuke Zhu, Abhinav Gupta, Ali Farhadi*

arXiv, 2017. [[Paper]](https://arxiv.org/abs/1712.05474) [[Code]](https://github.com/allenai/ai2thor) [[Website]](https://ai2thor.allenai.org/)

* **UnrealCV: Virtual Worlds for Computer Vision**

*Weichao Qiu, Fangwei Zhong, Yi Zhang, Siyuan Qiao, Zihao Xiao, Tae Soo Kim, Yizhou Wang, Alan Yuille*

ACM MM Open Source Software Competition, 2017. [[Paper]](https://dl.acm.org/doi/pdf/10.1145/3123266.3129396) [[Code]](https://github.com/unrealcv/unrealcv) [[Website]](https://unrealcv.org/)

* **Building Generalizable Agents with a Realistic and Rich 3D Environment (House3D)**

*Yi Wu, Yuxin Wu, Georgia Gkioxari, Yuandong Tian*

arXiv, 2018. [[Paper]](https://arxiv.org/pdf/1801.02209.pdf) [[Code]](https://github.com/facebookresearch/House3D)

* **CHALET: Cornell House Agent Learning Environment**

*Claudia Yan, Dipendra Misra, Andrew Bennett, Aaron Walsman, Yonatan Bisk and Yoav Artzi*

arXiv, 2018. [[Paper]](https://arxiv.org/pdf/1801.07357.pdf) [[Code]](https://github.com/facebookresearch/House3D)

* **RoboTHOR: An Open Simulation-to-Real Embodied AI Platform**

*Matt Deitke, Winson Han, Alvaro Herrasti, Aniruddha Kembhavi, Eric Kolve, Roozbeh Mottaghi, Jordi Salvador, Dustin Schwenk, Eli VanderBilt, Matthew Wallingford, Luca Weihs, Mark Yatskar, Ali Farhadi*

CVPR, 2020. [[Paper]](https://arxiv.org/abs/2004.06799) [[Website]](https://github.com/lil-lab/chalet)

* **Gibson Env: Real-World Perception for Embodied Agents**

*Fei Xia, Amir Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, Silvio Savarese*

CVPR, 2018. [[Paper]](https://arxiv.org/abs/1808.10654) [[Code]](https://github.com/StanfordVL/GibsonEnv) [[Website]](http://gibsonenv.stanford.edu/)

* **Habitat: A Platform for Embodied AI Research**

*Manolis Savva, Abhishek Kadian, Oleksandr Maksymets, Yili Zhao, Erik Wijmans, Bhavana Jain, Julian Straub, Jia Liu, Vladlen Koltun, Jitendra Malik, Devi Parikh, Dhruv Batra*

ICCV, 2019. [[Paper]](https://arxiv.org/abs/1904.01201) [[Code]](https://github.com/facebookresearch/habitat-api) [[Website]](https://aihabitat.org/)

* **VirtualHome: Simulating Household Activities via Programs**

*Xavier Puig\*, Kevin Ra\*, Marko Boben\*, Jiaman Li, Tingwu Wang, Sanja Fidler, Antonio Torralba*

CVPR, 2018. [[Paper]](http://virtual-home.org/paper/virtualhome.pdf) [[Code]](https://github.com/xavierpuigf/virtualhome) [[Website]](http://virtual-home.org/)

* **ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation**

*Chuang Gan, Jeremy Schwartz, Seth Alter, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Damian Mrowca, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, James J. DiCarlo, Josh McDermott, Joshua B. Tenenbaum, Daniel L.K. Yamins*

arXiv, 2020. [[Paper]](https://arxiv.org/pdf/2007.04954.pdf) [[Code]](https://github.com/threedworld-mit/tdw) [[Website]](http://www.threedworld.org/)

* **ALFWorld: Aligning Text and Embodied Environments for Interactive Learning**

*Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, Matthew Hausknecht*

ICLR, 2021. [[Paper]](https://arxiv.org/abs/2010.03768) [[Code]](https://github.com/alfworld/alfworld) [[Website]](https://alfworld.github.io/)

* **ManipulaTHOR: A Framework for Visual Object Manipulation**

*Kiana Ehsani, Winson Han, Alvaro Herrasti, Eli VanderBilt, Luca Weihs, Eric Kolve, Aniruddha Kembhavi, Roozbeh Mottaghi*

CVPR, 2021. [[Paper]](https://arxiv.org/pdf/2104.11213.pdf) [[Code]](https://github.com/allenai/manipulathor) [[Website]](https://ai2thor.allenai.org/manipulathor/)

* **🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation**

*Matt Deitke, Eli VanderBilt, Alvaro Herrasti, Luca Weihs, Jordi Salvador, Kiana Ehsani, Winson Han, Eric Kolve, Ali Farhadi, Aniruddha Kembhavi, Roozbeh Mottaghi*

arXiv, 2022. [[Paper]](https://arxiv.org/pdf/2206.06994.pdf) [[Website]](https://procthor.allenai.org/)

* **ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes**

*Ran Gong\*, Jiangyong Huang\*, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang*

ICCV, 2023. [[Paper]](https://arxiv.org/abs/2304.04321) [[Website]](https://arnold-benchmark.github.io/)

## MISC
* **Visual Learning and Embodied Agents in Simulation Environments Workshop**

ECCV, 2018. [[website]](https://eccv18-vlease.github.io/)

* **Embodied-AI Workshop**

CVPR, 2020/2021. [[website]](https://embodied-ai.org/#overview)

* **Gibson Sim2Real Challenge**

CVPR, 2020. [[website]](http://svl.stanford.edu/igibson/challenge.html)

* **Embodied Vision, Actions & Language Workshop**

ECCV, 2020. [[website]](https://askforalfred.com/EVAL/)

* **Closing the Reality Gap in Sim2Real Transfer for Robotics**

RSS, 2020. [[website]](https://sim2real.github.io/?fbclid=IwAR0fyyc_k8AmmYYRHbJJtjqGonAn1TUhUPWEdEpwpMbuwIkTgxmC13TJjG4)

* **On Evaluation of Embodied Navigation Agents**

*Peter Anderson, Angel Chang, Devendra Singh Chaplot, Alexey Dosovitskiy, Saurabh Gupta, Vladlen Koltun, Jana Kosecka, Jitendra Malik, Roozbeh Mottaghi, Manolis Savva, Amir R. Zamir*

arXiv, 2018. [[Paper]](https://arxiv.org/abs/1807.06757)

* **PyRobot: An Open-source Robotics Framework for Research and Benchmarking**

*Adithya Murali\*, Tao Chen\*, Kalyan Vasudev Alwala\*, Dhiraj Gandhi\*, Lerrel Pinto, Saurabh Gupta, Abhinav Gupta*

arXiv, 2019. [[Paper]](https://arxiv.org/abs/1906.08236) [[Code]](https://github.com/facebookresearch/pyrobot) [[Website]](https://www.pyrobot.org/)

* **A Survey of Embodied AI: From Simulators to Research Tasks**

*Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan*

arXiv, 2021. [[Paper]](https://arxiv.org/abs/2103.04918)

* **AllenAct: A Framework for Embodied AI Research**

*Luca Weihs, Jordi Salvador, Klemen Kotar, Unnat Jain, Kuo-Hao Zeng, Roozbeh Mottaghi, Aniruddha Kembhavi*

arXiv, 2020. [[Paper]](https://arxiv.org/abs/2008.12760) [[Website]](https://allenact.org/)

* **CSAIL Embodied Intelligence Seminar**

[[website]](https://ei.csail.mit.edu/seminars.html)