https://github.com/zchoi/Awesome-Embodied-Agent-with-LLMs

This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates!
https://github.com/zchoi/Awesome-Embodied-Agent-with-LLMs
agent awesome embodied-agent embodied-ai large-language-model manipulator-robotics navigation planning-algorithms scene-understanding
Last synced: 6 months ago
JSON representation
This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates!
Host: GitHub
URL: https://github.com/zchoi/Awesome-Embodied-Agent-with-LLMs
Owner: zchoi
Created: 2023-07-19T06:55:49.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-03-13T08:20:16.000Z (over 1 year ago)
Last Synced: 2024-03-13T09:35:05.470Z (over 1 year ago)
Topics: agent, awesome, embodied-agent, embodied-ai, large-language-model, manipulator-robotics, navigation, planning-algorithms, scene-understanding
Homepage:
Size: 1.83 MB
Stars: 441
Watchers: 28
Forks: 24
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

awesome-awesome-artificial-intelligence - Awesome Embodied Agent with LLMs - Embodied-Agent-with-LLMs?style=social) | (Agent)
awesome-awesome-artificial-intelligence - Awesome Embodied Agent with LLMs - Embodied-Agent-with-LLMs?style=social) | (Agent)
ultimate-awesome - Awesome-Embodied-Agent-with-LLMs - This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates!. (Other Lists / Julia Lists)
README

        # 🤖 Awesome-Embodied-Agent-with-LLMs [![Awesome](https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)](https://github.com/sindresorhus/awesome)

> This is a curated list of "Embodied AI or agent with Large Language Models" research which is maintained by [haonan](https://github.com/zchoi).

 

Watch this repository for the latest updates and **feel free to raise pull requests if you find some interesting papers**!

## News🔥

[2024/08/01] Created a new board about social agent and role-playing. 🧑‍🧑‍🧒‍🧒  


[2024/06/28] Created a new board about agent self-evolutionary research. 🤖 


[2024/06/07] Add **Mobile-Agent-v2**, a mobile device operation assistant with effective navigation via multi-agent collaboration. 🚀 


[2024/05/13] Add "**Learning Interactive Real-World Simulators**"——outstanding paper award in ICLR 2024 🥇.


[2024/04/24] Add "**A Survey on Self-Evolution of Large Language Models**", a systematic survey on self-evolution in LLMs! 💥


[2024/04/16] Add some CVPR 2024 papers. 


[2024/04/15] Add **MetaGPT**, accepted for oral presentation (top 1.2%) at ICLR 2024, **ranking #1** in the LLM-based Agent category. 🚀 


[2024/03/13] Add **CRADLE**, an interesting paper exploring LLM-based agent in Red Dead Redemption II！🎮

## Table of Contents 🍃

- [Survey](#survey)

- [Social Agent](#social-agent)

- [Self-Evolving Agents](#self-evolving-agents)

- [Advanced Agent Applications](#advanced-agent-applications)

- [LLMs with RL or World Model](#llms-with-rl-or-world-model)

- [Planning and Manipulation or Pretraining](#planning-and-manipulation-or-pretraining)

- [Multi-Agent Learning and Coordination](#multi-agent-learning-and-coordination)

- [Vision and Language Navigation](#vision-and-language-navigation)

- [Detection](#detection)

- [3D Grounding](#3d-grounding)

- [Interactive Embodied Learning](#interactive-embodied-learning)

- [Rearrangement](#rearrangement)

- [Benchmark](#benchmark)

- [Simulator](#simulator)

- [Others](#others)

## Trend and Imagination of LLM-based Embodied Agent



    

    

    Figure 1. Trend of Embodied Agent with LLMs.^[1]

                           

    Figure 2. An envisioned Agent society.^[2]



## Methods

> ### Survey

* [**A Survey on Vision-Language-Action Models for Embodied AI**](https://arxiv.org/pdf/2405.14093) [**arXiv 2024.03**]


The Chinese University of Hong Kong, Huawei Noah’s Ark Lab

* [**Large Multimodal Agents: A Survey**](https://arxiv.org/pdf/2402.15116) [**arXiv 2024.02**] [[**Github**](https://github.com/jun0wanan/awesome-large-multimodal-agents)]


Junlin Xie^♣♡ Zhihong Chen^♣♡ Ruifei Zhang^♣♡ Xiang Wan^♣ Guanbin Li^♠


^♡The Chinese University of Hong Kong, Shenzhen ^♣Shenzhen Research Institute of Big Data, ^♠Sun Yat-sen University

* [**A Survey on Self-Evolution of Large Language Models**](https://arxiv.org/pdf/2404.14387.pdf) [**arXiv 2024.01**]


Key Lab of HCST (PKU), MOE; School of Computer Science, Peking University, Alibaba Group, Nanyang Technological University

* [**Agent AI: Surveying the Horizons of Multimodal Interaction**](https://arxiv.org/pdf/2401.03568.pdf) [**arXiv 2024.01**]


Stanford University, Microsoft Research, Redmond, University of California, Los Angeles, University of Washington, Microsoft Gaming

* [**Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents**](https://arxiv.org/pdf/2311.11797.pdf) [**arXiv 2023.11**]


Shanghai Jiao Tong University, Amazon Web Services, Yale University

* [**The Rise and Potential of Large Language Model Based Agents: A Survey**](https://arxiv.org/pdf/2309.07864.pdf) [**arXiv 2023.09**]


Fudan NLP Group, miHoYo Inc

* [**A Survey on LLM-based Autonomous Agents**](https://arxiv.org/pdf/2308.11432.pdf) [**arXiv 2023,08**] 


Gaoling School of Artificial Intelligence, Renmin University of China

> ### Social Agent

> ### Self-Evolving Agents

* [**AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments**](https://arxiv.org/pdf/2406.04151) [**arXiv 2024.06**] [[**Github**](https://github.com/WooooDyy/AgentGym)] [[**Project page**](https://agentgym.github.io/)] 


Fudan NLP Lab & Fudan Vision and Learning Lab

* [**Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models**](https://arxiv.org/pdf/2406.11736) [**arXiv 2024.06**] [[**Github**](https://github.com/xufangzhi/ENVISIONS)]


Fangzhi Xu^♢♡, Qiushi Sun^{2, ♡}, Kanzhi Cheng¹, Jun Liu^♢, Yu Qiao♡, Zhiyong Wu^♡ 


^♢Xi’an Jiaotong University, ^♡Shanghai Artificial Intelligence Laboratory, ¹The University of Hong Kong, ²Nanjing Univerisity

* [**Symbolic Learning Enables Self-Evolving Agents**](https://arxiv.org/pdf/2406.18532) [**arXiv 2024.06**] [[**Github**](https://github.com/aiwaves-cn/agents)]


Wangchunshu Zhou, Yixin Ou, Shengwei Ding, Long Li, Jialong Wu, Tiannan Wang, Jiamin Chen, Shuai Wang, Xiaohua Xu, Ningyu Zhang, Huajun Chen, Yuchen Eleanor Jiang


AIWaves Inc.

> ### Advanced Agent Applications

* [**Embodied-agents**] [[**Github**](https://github.com/mbodiai/embodied-agents)] 


Seamlessly integrate state-of-the-art transformer models into robotics stacks.

* [**Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration**](https://arxiv.org/pdf/2406.01014) [**arXiv 2024**] [[**Github**](https://github.com/X-PLUG/MobileAgent/tree/main/Mobile-Agent-v2)]


Junyang Wang¹, Haiyang Xu², Haitao Jia¹, Xi Zhang², Ming Yan², Weizhou Shen², Ji Zhang², Fei Huang², Jitao Sang¹


¹Beijing Jiaotong University ²Alibaba Group

* [**Mobile-Agent: The Powerful Mobile Device Operation Assistant Family**](https://arxiv.org/pdf/2406.01014) [**ICLR 2024 Workshop LLM Agents**] [[**Github**](https://github.com/X-PLUG/MobileAgent/tree/main/Mobile-Agent-v2)]


Junyang Wang¹, Haiyang Xu², Jiabo Ye², Ming Yan², Weizhou Shen², Ji Zhang², Fei Huang², Jitao Sang¹


¹Beijing Jiaotong University ²Alibaba Group

* [**Machinascript-for-robots**] [[**Github**](https://github.com/babycommando/machinascript-for-robots)] 


Build LLM-powered robots in your garage with MachinaScript For Robots!

* [**DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**](https://arxiv.org/pdf/2404.01342) [**CVPR 2024**] [[**Github**](https://github.com/OpenGVLab/DiffAgent)] 


Lirui Zhao^1,2 Yue Yang^2,4 Kaipeng Zhang² Wenqi Shao², Yuxin Zhang¹, Yu Qiao², Ping Luo^2,3 Rongrong Ji¹


¹Xiamen University, ²OpenGVLab, Shanghai AI Laboratory ³The University of Hong Kong, ⁴Shanghai Jiao Tong University

* [**MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework**](https://openreview.net/forum?id=VtmBAGCN7o) [**ICLR 2024 (oral)**]


DeepWisdom, AI Initiative, King Abdullah University of Science and Technology, Xiamen University, The Chinese University of Hong Kong, Shenzhen, Nanjing University, University of Pennsylvania, University of California, Berkeley, The Swiss AI Lab IDSIA/USI/SUPSI

* [**AppAgent: Multimodal Agents as Smartphone Users**](https://arxiv.org/pdf/2312.13771.pdf) [[**Project page**](https://appagent-official.github.io/)] [[**Github**](https://github.com/mnotgod96/AppAgent)] 


Chi Zhang∗ ZhaoYang∗ JiaxuanLiu∗ YuchengHan XinChen Zebiao Huang BinFu GangYu†


Tencent

> ### LLMs with RL or World Model

* [**KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts**](https://openreview.net/attachment?id=sFyTZEqmUY&name=pdf) [**NeurIPS 2024**] [[**Project Page**](https://kalmneurips2024.github.io)]


Jing-Cheng Pang, Si-Hang Yang, Kaiyuan Li, Jiaji Zhang, Xiong-Hui Chen, Nan Tang, Yang Yu


¹Nanjing University, ²Polixir.ai

* [**Learning Interactive Real-World Simulators**](https://openreview.net/attachment?id=sFyTZEqmUY&name=pdf) [**ICLR 2024 (Outstanding Papers)**] [[**Project Page**](https://universal-simulator.github.io/unisim/)]


Sherry Yang^1,2, Yilun Du³, Kamyar Ghasemipour², Jonathan Tompson², Leslie Kaelbling³, Dale Schuurmans², Pieter Abbeel¹


¹UC Berkeley, ²Google DeepMind, ³MIT

* [**Robust agents learn causal world models**](https://openreview.net/attachment?id=pOoKI3ouv1&name=pdf) [**ICLR 2024**]


Jonathan Richens*, TomEveritt 


Google DeepMind

* [**Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld**](https://arxiv.org/pdf/2311.16714.pdf) [**CVPR 2024**] [[**Github**](https://github.com/stevenyangyj/Emma-Alfworld)]


Yijun Yang¹⁵⁴, Tianyi Zhou², Kanxue Li³, Dapeng Tao³, Lvsong Li⁴, Li Shen⁴, Xiaodong He⁴, Jing Jiang⁵, Yuhui Shi¹


¹Southern University of Science and Technology, ²University of Maryland, College Park, ³Yunnan University, ⁴JD Explore Academy, ⁵University of Technology Sydney

* [**Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning**](https://openreview.net/forum?id=zDbsSscmuj) [**NeurIPS 2023**] [[**Project Page**](https://guansuns.github.io/pages/llm-dm/)][[**Github**](https://github.com/GuanSuns/LLMs-World-Models-for-Planning)]


Lin_Guan¹, Karthik Valmeekam¹, Sarath Sreedharan², Subbarao Kambhampati¹


¹School of Computing & AI Arizona State University Tempe, AZ, ²Department of Computer Science Colorado State University Fort Collins, CO

* [**Eureka: Human-Level Reward Design via Coding Large Language Models**](https://eureka-research.github.io/assets/eureka_paper.pdf) [**NeurIPS 2023 Workshop ALOE Spotlight**] [[**Project page**](https://eureka-research.github.io/)] [[**Github**](https://github.com/eureka-research/Eureka)] 


Jason Ma^1,2, William Liang², Guanzhi Wang^1,3, De-An Huang¹,

Osbert Bastani², Dinesh Jayaraman², Yuke Zhu^1,4, Linxi "Jim" Fan¹, Anima Anandkumar^1,3


¹NVIDIA; ²UPenn; ³Caltech; ⁴UT Austin

* [**RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds**](https://openreview.net/pdf?id=3s4fZTr1ce) [**arXiv 2023**] 


* [**Can Language Agents Be Alternatives to PPO? A Preliminary Empirical Study on OpenAI Gym**](https://openreview.net/pdf?id=F0q880yOgY) [**arXiv 2023**] 


* [**RoboGPT: An intelligent agent of making embodied long-term decisions for daily instruction tasks**](https://openreview.net/pdf?id=x4fm4T2tjM) [**arXiv 2023**] 


* [**Aligning Agents like Large Language Models**](https://openreview.net/pdf?id=kQqZVayz07) [**arXiv 2023**] 


* [**AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents**](https://openreview.net/pdf?id=M6XWoEdmwf) [**ICLR 2024 spotlight**] 


* [**STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models**](https://openreview.net/pdf?id=LXiG2WqKXR) [**arXiv 2023**] 


* [**Text2Reward: Dense Reward Generation with Language Models for Reinforcement Learning**](https://openreview.net/pdf?id=tUM39YTRxH) [**ICLR 2024 spotlight**] 


* [**Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning**](https://openreview.net/pdf?id=1PPjf4wife) [**arXiv 2023**] 


* [**Online Continual Learning for Interactive Instruction Following Agents**](https://openreview.net/pdf?id=7M0EzjugaN) [**ICLR 2024**] 


* [**ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning**](https://openreview.net/pdf?id=LVp217SAtb) [**arXiv 2023**] 


* [**Language Reward Modulation for Pretraining Reinforcement Learning**](https://openreview.net/pdf?id=SWRFC2EupO) [**arXiv 2023**] 


* [**Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes**](https://openreview.net/pdf?id=P4op21eju0) [**arXiv 2023**] 


* [**Learning to Model the World with Language**](https://openreview.net/pdf?id=eWLOoaShEH) [**arXiv 2023**] 


* [**MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning**](https://openreview.net/pdf?id=1RE0H6mU7M) [**ICLR 2024**] 


* [**Language Reward Modulation for Pretraining Reinforcement Learning**](https://arxiv.org/pdf/2308.12270.pdf) [**arXiv 2023**] [[**Github**](https://github.com/ademiadeniji/lamp)]


Ademi Adeniji, Amber Xie, Carmelo Sferrazza, Younggyo Seo, Stephen James, Pieter Abbeel


¹UC Berkeley

* [**Guiding Pretraining in Reinforcement Learning with Large Language Models**](https://openreview.net/attachment?id=63704LH4v5&name=pdf) [**ICML 2023**] 


Yuqing Du^1*, Olivia Watkins^1*, Zihan Wang², Cedric Colas ´^3,4, Trevor Darrell¹, Pieter Abbeel¹, Abhishek Gupta², Jacob Andreas³


¹Department of Electrical Engineering and Computer Science, University of California, Berkeley, USA ²University of Washington, Seattle ³Massachusetts Institute of Technology, Computer Science and Artificial Intelligence Laboratory ⁴

Inria, Flowers Laboratory.

> ### Planning and Manipulation or Pretraining

* [**Voyager: An Open-Ended Embodied Agent with Large Language Models**](https://openreview.net/attachment?id=pAMNKGwja6&name=pdf) [**NeurIPS 2023 Workshop ALOE Spotlight**] [[**Project page**](https://voyager.minedojo.org/)] [[**Github**]](https://github.com/MineDojo/Voyager) 


Guanzhi Wang^1,2, Yuqi Xie³, Yunfan Jiang⁴, Ajay Mandlekar¹, Chaowei Xiao^1,5, Yuke Zhu^1,3, Linxi Fan¹, Anima Anandkumar^1,2

¹NVIDIA, ²Caltech, ³UT Austin, ⁴Stanford, ⁵UW Madison

* [**Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization**](https://arxiv.org/abs/2402.17574) [**ACL 2024**][[**Github**](https://github.com/zwq2018/Agent-Pro)] 


Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu

* [**Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives**](https://arxiv.org/abs/2401.02009) [**ACL 2024**] 


Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming Lu

* [**MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control**](https://arxiv.org/pdf/2403.12037.pdf) [**arXiv 2024**] [[**Project Page**](https://sites.google.com/view/minedreamer/main)] 


  Enshen Zhou^1,2 Yiran Qin^1,3 Zhenfei Yin^1,4 Yuzhou Huang³ Ruimao Zhang³ Lu Sheng² Yu Qiao¹  Jing Shao¹


  ¹Shanghai Artificial Intelligence Laboratory, ²The Chinese University of Hong Kong, Shenzhen, ³Beihang University, ⁴The University of Sydney

* [**MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**](https://arxiv.org/pdf/2312.07472.pdf) [**CVPR 2024**] [[**Project Page**](https://iranqin.github.io/MP5.github.io/)] 


  Yiran Qin^1,2 Enshen Zhou^1,3 Qichang Liu^1,4 Zhenfei Yin^1,5 Lu Sheng³ Ruimao Zhang² Yu Qiao¹  Jing Shao¹


  ¹Shanghai Artificial Intelligence Laboratory, ²The Chinese University of Hong Kong, Shenzhen, ³Beihang University, ⁴Tsinghua University, ⁵The University of Sydney

* [**RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation**](https://peihaochen.github.io/files/publications/RILA.pdf) [**CVPR 2024**] 


Zeyuan Yang¹, LIU JIAGENG, Peihao Chen², Anoop Cherian³, Tim Marks, Jonathan Le Roux⁴, Chuang Gan⁵

¹Tsinghua University, ²South China University of Technology, ³Mitsubishi Electric Research Labs (MERL), ⁴Mitsubishi Electric Research Labs, ⁵MIT-IBM Watson AI Lab 

* [**Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study**](https://arxiv.org/pdf/2403.03186.pdf) [**arXiv 2024**] [[**Project Page**]](https://baai-agents.github.io/Cradle/) [[**Code**]](https://baai-agents.github.io/Cradle/) 


Weihao Tan², Ziluo Ding¹, Wentao Zhang², Boyu Li¹, Bohan Zhou³, Junpeng Yue³, Haochong Xia², Jiechuan Jiang³, Longtao Zheng², Xinrun Xu1, Yifei Bi¹, Pengjie Gu²,


¹Beijing Academy of Artificial Intelligence (BAAI), China; ²Nanyang Technological University, Singapore; ³School of Computer Science, Peking University, China

* [**See and Think: Embodied Agent in Virtual Environment**](https://arxiv.org/pdf/2311.15209.pdf) [**arXiv 2023**] 


 Zhonghan Zhao^1*, Wenhao Chai^2*, Xuan Wang^1*, Li Boyi¹, Shengyu Hao¹, Shidong Cao¹, Tian Ye³, Jenq-Neng Hwang², Gaoang Wang¹


¹Zhejiang University ¹University of Washington ¹Hong Kong University of Science and Technology (GZ)

* [**Agent Instructs Large Language Models to be General Zero-Shot Reasoners**](https://arxiv.org/pdf/2310.03710.pdf) [**arXiv 2023**] 


Nicholas Crispino¹, Kyle Montgomery¹, Fankun Zeng¹, Dawn Song², Chenguang Wang¹


¹Washington University in St. Louis, ²UC Berkeley

* [**JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models**](https://neurips.cc/virtual/2023/79171https://arxiv.org/abs/2311.05997) [**NeurIPS 2023**] [[**Project Page**](https://craftjarvis-jarvis1.github.io/)] 


  Zihao Wang^1,2 Shaofei Cai^1,2 Anji Liu³ Yonggang Jin⁴ Jinbing Hou⁴ Bowei Zhang⁵ Haowei Lin^1,2 Zhaofeng He⁴ Zilong Zheng⁶ Yaodong Yang¹ Xiaojian Ma^6† Yitao Liang^1†


  ¹Institute for Artificial Intelligence, Peking University, ²School of Intelligence Science and Technology, Peking University, ³Computer Science Department, University of California, Los Angeles, ⁴Beijing University of Posts and Telecommunications, ⁵School of Electronics Engineering and Computer Science, Peking University, ⁶Beijing Institute for General Artificial Intelligence (BIGAI)

* [**Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents**](https://arxiv.org/abs/2302.01560) [**NeurIPS 2023**]


  Zihao Wang^1,2 Shaofei Cai^1,2 Guanzhou Chen³ Anji Liu⁴ Xiaojian Ma⁴ Yitao Liang^1,5†


  ¹Institute for Artificial Intelligence, Peking University, ²School of Intelligence Science and Technology, Peking University, ³School of Computer Science, Beijing University of Posts and Telecommunications, ⁴Computer Science Department, University of California, Los Angeles, ⁵Beijing Institute for General Artificial Intelligence (BIGAI)

* [**CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society**](https://arxiv.org/pdf/2303.17760.pdf) [**NeurIPS 2023**] [[**Github**](https://link.zhihu.com/?target=https%3A//github.com/camel-ai/camel)] [[**Project page**](https://www.camel-ai.org/)]


Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, Bernard Ghanem


¹King Abdullah University of Science and Technology (KAUST)

* [**Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents**](https://arxiv.org/pdf/2201.07207.pdf) [**arXiv 2022**] [[**Github**](https://github.com/huangwl18/language-planner)] [[**Project page**](https://wenlong.page/language-planner/)] 


Wenlong Huang¹, Pieter Abbeel¹, Deepak Pathak², Igor Mordatch³


¹UC Berkeley, ²Carnegie Mellon University, ³Google

* [**FILM: Following Instructions in Language with Modular Methods**](https://openreview.net/pdf?id=qI4542Y2s1D) [**ICLR 2022**] [[**Github**](https://github.com/soyeonm/FILM)] [[**Project page**](https://gary3410.github.io/TaPA/)] 


So Yeon Min¹, Devendra Singh Chaplot², Pradeep Ravikumar¹, Yonatan Bisk¹, Ruslan Salakhutdinov¹


¹Carnegie Mellon University ²Facebook AI Research

* [**Embodied Task Planning with Large Language Models**](https://arxiv.org/pdf/2307.01848.pdf) [**arXiv 2023**] [[**Github**](https://github.com/Gary3410/TaPA)] [[**Project page**](https://gary3410.github.io/TaPA/)] [[**Demo**](https://huggingface.co/spaces/xuxw98/TAPA)] [[**Huggingface Model**](https://huggingface.co/Gary3410/pretrain_lit_llama)] 


Zhenyu Wu¹, Ziwei Wang^2,3, Xiuwei Xu^2,3, Jiwen Lu^2,3, Haibin Yan^1*


¹School of Automation, Beijing University of Posts and Telecommunications,

²Department of Automation, Tsinghua University,

³Beijing National Research Center for Information Science and Technology

* [**SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning**](https://arxiv.org/pdf/2305.15486.pdf) [**arXiv 2023**] 


Yue Wu^1,4*

, Shrimai Prabhumoye²

, So Yeon Min¹

, Yonatan Bisk¹

, Ruslan Salakhutdinov¹

,Amos Azaria³

, Tom Mitchell¹

, Yuanzhi Li^1,4


¹Carnegie Mellon University, ²NVIDIA, ³Ariel University, ⁴Microsoft Research

* [**PONI: Potential Functions for ObjectGoal Navigation

with Interaction-free Learning**](https://openaccess.thecvf.com/content/CVPR2022/papers/Ramakrishnan_PONI_Potential_Functions_for_ObjectGoal_Navigation_With_Interaction-Free_Learning_CVPR_2022_paper.pdf) [**CVPR 2022 (Oral)**] [[**Project page**](https://vision.cs.utexas.edu/projects/poni/)] [[**Github**](https://github.com/srama2512/PONI)] 


Santhosh Kumar Ramakrishnan^1,2, Devendra Singh Chaplot¹, Ziad Al-Halah²

Jitendra Malik^1,3, Kristen Grauman^1,2


¹Facebook AI Research, ²UT Austin, ³UC Berkeley

* [**Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics**](https://openreview.net/pdf?id=vmjctNUSWI) [**ICLR 2023**] [[**Project page**](https://prior.allenai.org/projects/action-adaptive-policy)] [[**Github**](https://github.com/KuoHaoZeng/AAP)] 


Kuo-Hao Zeng¹, Luca Weihs², Roozbeh Mottaghi¹, Ali Farhadi¹


¹Paul G. Allen School of Computer Science & Engineering, University of Washington,

²PRIOR @ Allen Institute for AI

* [**Modeling Dynamic Environments with Scene Graph Memory**](https://openreview.net/attachment?id=NiUxS1cAI4&name=pdf) [**ICML 2023**] 


Andrey Kurenkov¹, Michael Lingelbach¹, Tanmay Agarwal¹, Emily Jin¹, Chengshu Li¹, Ruohan Zhang¹, Li Fei-Fei¹, Jiajun Wu¹, Silvio Savarese², Roberto Mart´ın-Mart´ın³


¹Department of Computer Science, Stanford University

²Salesforce AI Research ³Department of Computer Science, University of Texas at Austin.

* [**Reasoning with Language Model is Planning with World Model**](https://arxiv.org/pdf/2305.14992.pdf) [**arXiv 2023**] 


Shibo Hao^∗♣, Yi Gu^∗♣, Haodi Ma^♢, Joshua Jiahua Hong^♣, Zhen Wang^{♣ ♠},

Daisy Zhe Wang^♢, Zhiting Hu^♣


^♣UC San Diego, ^♢University of Florida,

^♠Mohamed bin Zayed University of Artificial Intelligence

* [**Do As I Can, Not As I Say: Grounding Language in Robotic Affordances**](https://arxiv.org/pdf/2204.01691.pdf) [**arXiv 2022**]


Robotics at Google, Everyday Robots

* [**Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling**](https://openreview.net/attachment?id=Rm5Qi57C5I&name=pdf) [**ICML 2023**]


Kolby Nottingham¹ Prithviraj Ammanabrolu² Alane Suhr²

Yejin Choi^3,2 Hannaneh Hajishirzi^3,2 Sameer Singh^1,2 Roy Fox¹


¹Department of Computer Science, University of California

Irvine ²Allen Institute for Artificial

Intelligence

³Paul G. Allen School of

Computer Science

* [**Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents**](https://arxiv.org/pdf/2308.07241v2.pdf) [**ICCV 2023**]


Byeonghwi Kim Jinyeon Kim Yuyeong Kim^1,* Cheolhong Min Jonghyun Choi^†


Yonsei University ¹Gwangju Institute of Science and Technology

* [**Inner Monologue: Embodied Reasoning through Planning with Language Models**](https://openreview.net/pdf?id=3R3Pz5i0tye) [**CoRL 2022**] [[**Project page**](https://innermonologue.github.io/)]


Robotics at Google

* [**Language Models Meet World Models: Embodied Experiences Enhance Language Models**](https://arxiv.org/pdf/2305.10626.pdf) [**arXiv 2023**] [![](https://img.shields.io/github/stars/szxiangjn/world-model-for-language-model?style=social&label=Code+Stars)](https://github.com/szxiangjn/world-model-for-language-model) [[**Twitter**](https://twitter.com/szxiangjn/status/1659399771126370304)]


Jiannan Xiang^∗♠, Tianhua Tao^∗♠, Yi Gu^♠, Tianmin Shu^♢,

Zirui Wang^♠, Zichao Yang^♡, Zhiting Hu^♠


^♠UC San Diego, ^♣UIUC, ^♢MIT, ^♡Carnegie Mellon University

* [**AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation**](https://arxiv.org/pdf/2305.18898.pdf) [**arXiv 2023**] [[**Video**](https://www.youtube.com/watch?v=ayAzID1_qQk)]


Chuhao Jin^1*

, Wenhui Tan^1*

, Jiange Yang^2*

, Bei Liu3^†

, Ruihua Song¹

, Limin Wang²

, Jianlong Fu^3†


¹Renmin University of China, ²Nanjing University,

³Microsoft Research

* [**A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution**](https://openreview.net/pdf?id=NeGDZeyjcKa) [**CoRL 2021**] [![](https://img.shields.io/github/stars/valtsblukis/hlsm?style=social&label=Code+Stars)](https://github.com/valtsblukis/hlsm)  [[**Project page**](https://hlsm-alfred.github.io/)] [[**Poster**](https://openreview.net/attachment?id=NeGDZeyjcKa&name=poster)]


Valts Blukis^1,2, Chris Paxton¹, Dieter Fox^1,3, Animesh Garg^1,4, Yoav Artzi²


¹NVIDIA ²Cornell University ³University of Washington ⁴University of Toronto, Vector Institute

* [**LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models**](https://arxiv.org/pdf/2212.04088.pdf) [**ICCV 2023**] [[**Project page**](https://dki-lab.github.io/LLM-Planner/)] [[**Github**](https://github.com/OSU-NLP-Group/LLM-Planner)]


Chan Hee Song¹, Jiaman Wu¹, Clayton Washington¹, Brian M. Sadler², Wei-Lun Chao¹, Yu Su¹


¹The Ohio State University, ²DEVCOM ARL

* [**Code as Policies: Language Model Programs for Embodied Control**](https://arxiv.org/pdf/2209.07753) [**arXiv 2023**] [[**Project page**](https://code-as-policies.github.io/)] [[**Github**](https://code-as-policies.github.io)] [[**Blog**](https://ai.googleblog.com/2022/11/robots-that-write-their-own-code.html)] [[**Colab**](https://colab.research.google.com/drive/124TE4TsGYyrvduzeDclufyvwcc2qbbrE)]


Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, Andy Zeng


Robotics at Google

* [**3D-LLM: Injecting the 3D World into Large Language Models**](https://arxiv.org/abs/2307.12981) [**arXiv 2023**] [![](https://img.shields.io/github/stars/UMass-Foundation-Model/3D-LLM?style=social&label=Code+Stars)](https://github.com/UMass-Foundation-Model/3D-LLM) 


¹Yining Hong, ²Haoyu Zhen, ³Peihao Chen, ⁴Shuhong Zheng, ⁵Yilun Du, ⁶Zhenfang Chen, ^6,7Chuang Gan 


¹UCLA       ² SJTU       ³ SCUT       ⁴ UIUC       ⁵ MIT       ⁶MIT-IBM Watson AI Lab       ⁷ Umass Amherst

* [**VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models**](https://arxiv.org/abs/2307.05973) [**arXiv 2023**] [[**Project page**](https://voxposer.github.io/)] [[**Online Demo**](https://www.youtube.com/watch?v=Yvn4eR05A3M)]


Wenlong Huang¹, Chen Wang¹, Ruohan Zhang¹, Yunzhu Li^1,2, Jiajun Wu¹, Li Fei-Fei¹ 


¹Stanford University ²University of Illinois Urbana-Champaign

* [**Palm-e: An embodied multimodal language mode**](https://arxiv.org/pdf/2303.03378.pdf) [**ICML 2023**] [[**Project page**](https://palm-e.github.io)]


¹Robotics at Google ²TU Berlin 3Google Research    

* [**Large Language Models as Commonsense Knowledge for Large-Scale Task Planning**](https://arxiv.org/pdf/2305.14078.pdf) [**arXiv 2023**] 


Zirui Zhao Wee Sun Lee David Hsu 


School of Computing National University of Singapore

* [**An Embodied Generalist Agent in 3D World**](https://arxiv.org/abs/2311.12871) [**ICML 2024**] 


Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

Beijing Institute for General Artificial Intelligence (BIGAI)   

> ###  Multi-Agent Learning and Coordination

* [**Building Cooperative Embodied Agents Modularly with Large Language Models**](https://openreview.net/forum?id=EnXJfQqy0K) [**ICLR 2024**] [[**Project page**](https://vis-www.cs.umass.edu/Co-LLM-Agents/)] [[**Github**](https://github.com/UMass-Foundation-Model/Co-LLM-Agents/)]


Hongxin Zhang^1*, Weihua Du^2*, Jiaming Shan³, Qinhong Zhou¹, Yilun Du⁴, Joshua B. Tenenbaum⁴, Tianmin Shu⁴, Chuang Gan^1,5


¹University of Massachusetts Amherst, ²Tsinghua University, ³Shanghai Jiao Tong University, ⁴MIT, ⁵MIT-IBM Watson AI Lab

* [**War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars**](https://arxiv.org/pdf/2311.17227.pdf) [**arXiv 2023**]


Wenyue Hua^1*, Lizhou Fan^2*, Lingyao Li², Kai Mei¹, Jianchao Ji¹, Yingqiang Ge¹, Libby Hemphill², Yongfeng Zhang¹


¹Rutgers University, ²University of Michigan

* [**MindAgent: Emergent Gaming Interaction**](https://arxiv.org/abs/2309.09971) [**arXiv 2023**]
 Ran Gong^*1† Qiuyuan Huang^*2‡ Xiaojian Ma^*1 Hoi Vo³ Zane Durante^†4 Yusuke Noda³ Zilong Zheng⁵ Song-Chun Zhu¹⁵⁶⁷⁸ Demetri Terzopoulos¹ Li Fei-Fei⁴ Jianfeng Gao²
¹UCLA; ²Microsoft Research, Redmond; ³Xbox Team, Microsoft; ⁴Stanford; ⁵BIGAI; ⁶PKU; ⁷THU; ⁸UCLA

* [**Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum**](https://openreview.net/attachment?id=BMO1vLKq7D&name=pdf) [**ICML 2023**]


Jigang Kim^*1,2 Daesol Cho^*1,2 H. Jin Kim^1,3


¹Seoul National University, ²Artificial Intelligence Institute of Seoul National University (AIIS), ³Automation and Systems Research Institute (ASRI).


***Note: This paper mainly focuses on reinforcement learning for Embodied AI.***

* [**Adaptive Coordination in Social Embodied Rearrangement**](https://openreview.net/attachment?id=BYEsw113sz&name=pdf) [**ICML 2023**]


Andrew Szot^1,2 Unnat Jain¹ Dhruv Batra^1,2 Zsolt Kira² Ruta Desai¹ Akshara Rai¹


¹Meta AI ²Georgia Institute of Technology.

> ### Vision and Language Navigation

* [**IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience**](http://arxiv.org/abs/2305.01098) [**arXiv 2023**] 


Joanne Truong^1,2, April Zitkovich¹, Sonia Chernova², Dhruv Batra^2,3, Tingnan Zhang¹, Jie Tan¹, Wenhao Yu¹


¹Robotics at Google ²Georgia Institute of Technology ³Meta AI

* [**ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation**](https://openreview.net/attachment?id=GydFM0ZEXY&name=pdf) [**ICML 2023**] 


Kaiwen Zhou¹, Kaizhi Zheng¹, Connor Pryor¹, Yilin Shen², Hongxia Jin², Lise Getoor¹, Xin Eric Wang¹


¹University of California, Santa Cruz ²Samsung Research America.

* [**NavGPT: Explicit Reasoning in Vision-and-Language

Navigation with Large Language Models**](https://arxiv.org/pdf/2305.16986.pdf) [**arXiv 2023**] 


Gengze Zhou¹ Yicong Hong² Qi Wu¹ 


¹The University of Adelaide ²The Australian National University

* [**Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model**](https://arxiv.org/pdf/2305.11176.pdf) [**arXiv 2023**] [[**Github**](https://github.com/OpenGVLab/Instruct2Act)]    

Siyuan Huang^1,2 Zhengkai Jiang⁴ Hao Dong³ Yu Qiao² Peng Gao² Hongsheng Li⁵ 


¹Shanghai Jiao Tong University, ²Shanghai AI Laboratory, ³CFCS, School of CS, PKU,

⁴University of Chinese Academy of Sciences, ⁵The Chinese University of Hong Kong

> ### Detection

* [**DetGPT: Detect What You Need via Reasoning**](https://arxiv.org/pdf/2305.14167.pdf) [**arXiv 2023**] 


Renjie Pi^1∗ Jiahui Gao^2* Shizhe Diao^1∗ Rui Pan¹ Hanze Dong¹ Jipeng Zhang¹ Lewei Yao¹ Jianhua Han³ Hang Xu²

Lingpeng Kong² Tong Zhang¹ 


¹The Hong Kong University of Science and Technology ²The University of Hong Kong 3Shanghai Jiao Tong University

> ### 3D Grounding

* [**LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent**](https://arxiv.org/pdf/2309.12311.pdf) [**arXiv 2023**]  


Jianing Yang^1,*, Xuweiyi Chen^1,*, Shengyi Qian¹, Nikhil Madaan, Madhavan Iyengar¹, David F. Fouhey^1,2, Joyce Chai¹


¹University of Michigan, ²New York University

* [**3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment**](https://arxiv.org/abs/2308.04352) [**ICCV 2023**] 


Ziyu Zhu, Xiaojian Ma, Yixin Chen, Zhidong Deng, Siyuan Huang, Qing Li

Beijing Institute for General Artificial Intelligence (BIGAI) 

> ### Interactive Embodied Learning

* [**Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning**](https://openreview.net/attachment?id=feXm8GbxWU&name=pdf) [**ICML 2023**]  


Thomas Carta^1*, Clement Romac ´^1,2, Thomas Wolf², Sylvain Lamprier³, Olivier Sigaud⁴, Pierre-Yves Oudeyer¹


¹Inria (Flowers), University of Bordeaux, ²Hugging Face, ³Univ Angers, LERIA, SFR MATHSTIC,

F-49000, ⁴Sorbonne University, ISIR

* [**Learning Affordance Landscapes for

Interaction Exploration in 3D Environments**](https://arxiv.org/pdf/2008.09241.pdf) [**NeurIPS 2020**] [![](https://img.shields.io/github/stars/facebookresearch/interaction-exploration?style=social&label=Code+Stars)](https://github.com/facebookresearch/interaction-exploration) [[Project page](https://vision.cs.utexas.edu/projects/interaction-exploration/)] 


Tushar Nagarajan, Kristen Grauman


UT Austin and Facebook AI Research, UT Austin and Facebook AI Research

* [**Embodied Question Answering in Photorealistic Environments with Point Cloud Perception**](https://arxiv.org/abs/1904.03461) [**CVPR 2019 (oral)**] [[**Slides**](https://embodiedqa.org/slides/eqa_matterport.slides.pdf)]


Erik Wijmans^1†, Samyak Datta¹, Oleksandr Maksymets^2†, Abhishek Das¹, Georgia Gkioxari², Stefan Lee¹, Irfan Essa¹, Devi Parikh^1,2, Dhruv Batra^1,2 


¹Georgia Institute of Technology, ²Facebook AI Research

* [**Multi-Target Embodied Question Answering**](https://openaccess.thecvf.com/content_CVPR_2019/papers/Yu_Multi-Target_Embodied_Question_Answering_CVPR_2019_paper.pdf) [**CVPR 2019**] 


Licheng Yu¹, Xinlei Chen³, Georgia Gkioxari³, Mohit Bansal¹, Tamara L. Berg^1,3, Dhruv Batra^2,3


¹University of North Carolina at Chapel Hill ²Georgia Tech 3Facebook AI

* [**Neural Modular Control for Embodied Question Answering**](https://arxiv.org/abs/1810.11181) [**CoRL 2018 (Spotlight)**] [[**Project page**](https://embodiedqa.org/)] [[**Github**](https://github.com/facebookresearch/EmbodiedQA)]


Abhishek Das¹,Georgia Gkioxari², Stefan Lee¹, Devi Parikh^1,2, Dhruv Batra^1,2


¹Georgia Institute of Technology ²Facebook AI Research

* [**Embodied Question Answering**](https://embodiedqa.org/paper.pdf) [**CVPR 2018 (oral)**] [[**Project page**](https://embodiedqa.org/)] [[**Github**](https://github.com/facebookresearch/EmbodiedQA)]


Abhishek Das¹, Samyak Datta¹, Georgia Gkioxari2², Stefan Lee¹, Devi Parikh^2,1, Dhruv Batra² 


¹Georgia Institute of Technology, ²Facebook AI Research

> ### Rearrangement

* [**A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search**](https://openreview.net/pdf?id=fGG6vHp3W9W) [**ICLR 2023**] 
 

¹Brandon Trabucco, ²Gunnar A Sigurdsson, ²Robinson Piramuthu, ^2,3Gaurav S. Sukhatme, ¹Ruslan Salakhutdinov


¹CMU, ²Amazon Alexa AI, ³University of Southern California

> ### Benchmark

* [**SmartPlay: A Benchmark for LLMs as Intelligent Agents**](https://openreview.net/pdf?id=0IOX0YcCdTn) [**ICLR 2024**]  [[**Github**](https://github.com/microsoft/SmartPlay)] 
 

Yue Wu^1,2, Xuan Tang¹, Tom Mitchell¹, Yuanzhi Li^1,2

¹Carnegie Mellon University, ²Microsoft Research

* [**RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation**](https://arxiv.org/pdf/2311.01455.pdf) [**arXiv 2023**] [[**Project page**](https://robogen-ai.github.io/)] [[**Github**](https://github.com/Genesis-Embodied-AI/RoboGen)] 
 

Yufei Wang¹, Zhou Xian¹, Feng Chen², Tsun-Hsuan Wang³, Yian Wang⁴, Katerina Fragkiadaki¹, Zackory Erickson¹, David Held¹, Chuang Gan^4,5 


¹CMU, ²Tsinghua IIIS, ³MIT CSAIL, ⁴UMass Amherst, ⁵MIT-IBM AI Lab

* [**ALFWorld: Aligning Text and Embodied Environments for Interactive Learning**](https://openreview.net/pdf?id=0IOX0YcCdTn) [**ICLR 2021**] [[**Project page**](https://alfworld.github.io/)] [[**Github**](https://github.com/alfworld/alfworld)] 
 

Mohit Shridhar^† Xingdi Yuan^♡ Marc-Alexandre Côté^♡

Yonatan Bisk^‡ Adam Trischler^♡ Matthew Hausknecht^♣


^‡University of Washington ^♡Microsoft Research, Montréal

^‡Carnegie Mellon University ^♣Microsoft Research

* [**ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks**](https://arxiv.org/pdf/1912.01734.pdf) [**CVPR 2020**] [[**Project page**](https://askforalfred.com/)] [[**Github**](https://github.com/askforalfred/alfred)] 
 

Mohit Shridhar¹

Jesse Thomason¹ Daniel Gordon¹ Yonatan Bisk^1,2,3

Winson Han³ Roozbeh Mottaghi^1,3 Luke Zettlemoyer¹ Dieter Fox^1,4


¹Paul G. Allen School of Computer Sci. & Eng., Univ. of Washington,

²Language Technologies Institute @ Carnegie Mellon University,

³Allen Institute for AI,

⁴NVIDIA


* [**VIMA: Robot Manipulation with Multimodal Prompts**](https://vimalabs.github.io/assets/vima_paper.pdf) [**ICML 2023**] [[**Project page**](https://vimalabs.github.io/)] [[**Github**](https://github.com/vimalabs/VIMA)] [[**VIMA-Bench**](https://github.com/vimalabs/VimaBench)] 
 

Yunfan Jiang¹ Agrim Gupta^1† Zichen Zhang^2† Guanzhi Wang^3,4† Yongqiang Dou⁵ Yanjun Chen¹

Li Fei-Fei¹ Anima Anandkumar^3,4 Yuke Zhu^3,6‡ Linxi Fan^3‡


* [**SQA3D: Situated Question Answering in 3D Scenes**](https://arxiv.org/pdf/2210.07474.pdf) [**ICLR 2023**] [[**Project page**](https://sqa3d.github.io/)] [[**Slides**](http://web.cs.ucla.edu/~xm/file/sqa3d_iclr23_slides.pdf)] [[**Github**](https://github.com/SilongYong/SQA3D)]
 

Xiaojian Ma² Silong Yong^1,3* Zilong Zheng¹ Qing Li¹ Yitao Liang^1,4 Song-Chun Zhu^1,2,3,4 Siyuan Huang¹


¹Beijing Institute for General Artificial Intelligence (BIGAI) ²UCLA ³Tsinghua University ⁴Peking University

* [**IQA: Visual Question Answering in Interactive Environments**](https://openaccess.thecvf.com/content_cvpr_2018/papers/Gordon_IQA_Visual_Question_CVPR_2018_paper.pdf) [**CVPR 2018**] [[**Github**](https://github.com/danielgordon10/thor-iqa-cvpr-2018)] [[**Demo video (YouTube)**](https://www.youtube.com/watch?v=pXd3C-1jr98&feature=youtu.be)]


Danie¹ Gordon1 Aniruddha Kembhavi² Mohammad Rastegari^2,4 Joseph Redmon¹ Dieter Fox^1,3 Ali Farhadi^1,2 


¹Paul G. Allen School of Computer Science, University of Washington ²Allen Institute for Artificial Intelligence ³Nvidia ⁴Xnor.ai

* [**Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments**](https://openaccess.thecvf.com/content/ICCV2021/papers/Gao_Env-QA_A_Video_Question_Answering_Benchmark_for_Comprehensive_Understanding_of_ICCV_2021_paper.pdf) [**ICCV 2021**] [[**Project page**](https://envqa.github.io/#Overview)] [[**Github**](https://github.com/maybelu9/env-qa)]


Difei Gao^1,2, Ruiping Wang^1,2,3, Ziyi Bai^1,2, Xilin Chen¹, 


¹Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS),

Institute of Computing Technology, CAS,

²University of Chinese Academy of Sciences, ³Beijing Academy of Artificial Intelligence

> ### Simulator

* [**LEGENT: Open Platform for Embodied Agents**](https://arxiv.org/pdf/2404.18243) [**ACL 2024**] [[**Project page**](https://docs.legent.ai/)] [[**Github**](https://github.com/thunlp/LEGENT)]


Tsinghua University


* [**AI2-THOR: An Interactive 3D Environment for Visual AI**](https://arxiv.org/abs/1712.05474) [**arXiv 2022**] [[**Project page**](http://ai2thor.allenai.org/)] [[**Github**](https://github.com/allenai/ai2thor)]
 

Allen Institute for AI, University of Washington, Stanford University, Carnegie Mellon University


* [**iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes**](https://ieeexplore.ieee.org/document/9636667) [**IROS 2021**] [[**Project page**](https://svl.stanford.edu/igibson/)] [[**Github**](https://link.zhihu.com/?target=https%3A//github.com/StanfordVL/iGibson/releases/tag/1.0.0)]
 

Bokui Shen*, Fei Xia* et al.


* [**Habitat: A Platform for Embodied AI Research**](https://openaccess.thecvf.com/content_ICCV_2019/papers/Savva_Habitat_A_Platform_for_Embodied_AI_Research_ICCV_2019_paper.pdf) [**ICCV 2019**] [[**Project page**](https://aihabitat.org/)] [[**Habitat-Sim**](https://github.com/facebookresearch/habitat-sim)] [[**Habitat-Lab**](https://github.com/facebookresearch/habitat-lab)] [[**Habitat Challenge**](https://github.com/facebookresearch/habitat-challenge)]
 

Facebook AI Research, Facebook Reality Labs, Georgia Institute of Technology, Simon Fraser University, Intel Labs, UC Berkeley


* [**Habitat 2.0: Training Home Assistants to Rearrange their Habitat**](https://scontent.fhkg4-2.fna.fbcdn.net/v/t39.8562-6/10000000_254710466627524_1145871437139214759_n.pdf?_nc_cat=106&ccb=1-7&_nc_sid=ad8a9d&_nc_ohc=ui4K7s8ek_sAX8DLtW0&_nc_ht=scontent.fhkg4-2.fna&oh=00_AfCXUgrrxo_0G2trCUecPU_JeiF0ZwkxGGpiPPUHHk3XCw&oe=64F38AD0) [**NeurIPS 2021**] [[**Project page**](https://research.facebook.com/publications/habitat-2-0-training-home-assistants-to-rearrange-their-habitat/#:~:text=Habitat%202.0%3A%20Training%20Home%20Assistants%20to%20Rearrange%20their,AI%20stack%20%E2%80%93%20data%2C%20simulation%2C%20and%20benchmark%20tasks.)]
 

Facebook AI Research, Georgia Tech, Intel Research, Simon Fraser University, UC Berkeley

> ### Others

* [**Least-to-Most Prompting Enables Complex Reasoning in Large Language Models**](https://arxiv.org/pdf/2205.10625) [**ICLR 2023**] 


Google Research, Brain Team

* [**React: Synergizing reasoning and acting in language models**](https://arxiv.org/pdf/2210.03629.pdf) [**ICLR 2023**] [![](https://img.shields.io/github/stars/ysymyth/ReAct?style=social&label=Code+Stars)](https://github.com/ysymyth/ReAct) 


Shunyu Yao^1∗, Jeffrey Zhao², Dian Yu², Nan Du², Izhak Shafran², Karthik Narasimhan¹, Yuan Cao² 


¹Department of Computer Science, Princeton University ², Google Research, Brain team

* [**Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models**](https://arxiv.org/pdf/2308.10379.pdf) [**arXiv 2023**] 


Virginia Tech, Microsoft

* [**Graph of Thoughts: Solving Elaborate Problems with Large Language Models**](https://arxiv.org/abs/2308.09687.pdf) [**arXiv 2023**] 


ETH Zurich, Cledar, Warsaw University of Technology

* [**Tree of Thoughts: Deliberate Problem Solving with Large Language Models**](https://arxiv.org/pdf/2305.10601.pdf) [**arXiv 2023**] 


Shunyu Yao¹, Dian Yu², Jeffrey Zhao², Izhak Shafran², Thomas L. Griffiths¹, Yuan Cao², Karthik Narasimhan¹ 


¹Princeton University, ²Google DeepMind

* [**Chain-of-Thought Prompting Elicits Reasoning in Large Language Models**](https://arxiv.org/pdf/2201.11903.pdf) [**NeurIPS 2022**] 


Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma,

Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, Denny Zhou


Google Research, Brain Team

* [**MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge**](https://proceedings.neurips.cc/paper_files/paper/2022/file/74a67268c5cc5910f64938cac4526a90-Paper-Datasets_and_Benchmarks.pdf) [**NeurIPS 2022**] [[Github](https://github.com/MineDojo/MineDojo)] [![](https://img.shields.io/github/stars/MineDojo/MineDojo?style=social&label=Code+Stars)](https://github.com/MineDojo/MineDojo) [[Project page](https://minedojo.org/)] [[Knowledge Base](https://minedojo.org/knowledge_base.html)] 


Linxi Fan¹

, Guanzhi Wang^2∗

, Yunfan Jiang^3*

, Ajay Mandlekar¹

, Yuncong Yang⁴

,

Haoyi Zhu⁵

, Andrew Tang⁴

, De-An Huang¹

, Yuke Zhu^1,6†

, Anima Anandkumar^1,2†


¹NVIDIA, ²Caltech, ³Stanford, ⁴Columbia, ⁵SJTU, ⁶UT Austin

* [**Distilling Internet-Scale Vision-Language Models into Embodied Agents**](https://openreview.net/pdf?id=6vVkGnEpP7) [**ICML 2023**] 


Theodore Sumers^1∗ Kenneth Marino² Arun Ahuja² Rob Fergus² Ishita Dasgupta² 


* [**LISA: Reasoning Segmentation via Large Language Model**](https://arxiv.org/pdf/2308.00692.pdf) [**arXiv 2023**] [[**Github**](https://github.com/dvlab-research/LISA)] [[**Huggingface Models**](https://huggingface.co/xinlai)] [[**Dataset**](https://drive.google.com/drive/folders/125mewyg5Ao6tZ3ZdJ-1-E3n04LGVELqy?usp=sharing)] [[**Online Demo**](http://103.170.5.190:7860/)]     

TXin Lai¹ Zhuotao Tian² Yukang Chen¹ Yanwei Li¹ Yuhui Yuan³ Shu Liu² Jiaya Jia^1,2 


¹The Chinese University of Hong Kong ²SmartMore ³MSRA


> ### Acknowledge

[1] Trend pic from this [repo](https://github.com/Paitesanshi/LLM-Agent-Survey/tree/main).


[2] Figure from this paper: [The Rise and Potential of Large Language Model Based Agents: A Survey](https://arxiv.org/pdf/2309.07864.pdf).
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zchoi/Awesome-Embodied-Agent-with-LLMs

Awesome Lists containing this project

README