Awesome-Embodied-Robotics-and-Agent

This is a curated list of "Embodied AI or robot with Large Language Models" research. Watch this repository for the latest updates! 🔥
https://github.com/zchoi/Awesome-Embodied-Robotics-and-Agent

Last synced: 4 days ago
JSON representation

Development of Embodied Robotics and Benchmarks
- π0-video-2
- π0-video-3
- image
- π0-video-2
- π0-video-3
- image
- π0-video-1
- π0-video-2
- π0-video-3
- image
- π0-video-2
- π0-video-3
- image
- π0-video-2
- image
- π0-video-3
- π0-video-2
- π0-video-3
- image
- π0-video-3
- image
- π0-video-2
Methods
- **FAST: Efficient Action Tokenization for Vision-Language-Action Models** - intelligence/fast)]<br>
- **Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models**
- **Symbolic Learning Enables Self-Evolving Agents** - cn/agents)]<br>
- **Embodied-agents**
- **Machinascript-for-robots**
- **DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**
- **MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework**
- **AppAgent: Multimodal Agents as Smartphone Users** - official.github.io/)] [[**Github**](https://github.com/mnotgod96/AppAgent)] <br>
- **Agent AI: Surveying the Horizons of Multimodal Interaction**
- **Large Multimodal Agents: A Survey** - large-multimodal-agents)]<br>
- **Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents**
- **The Rise and Potential of Large Language Model Based Agents: A Survey**
- **Robust agents learn causal world models**
- **A Survey on Vision-Language-Action Models for Embodied AI**
- **Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld** - Alfworld)]<br>
- **Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning** - dm/)][[**Github**](https://github.com/GuanSuns/LLMs-World-Models-for-Planning)]<br>
- **Large Multimodal Agents: A Survey** - large-multimodal-agents)]<br>
- **A Survey on Self-Evolution of Large Language Models**
- **Agent AI: Surveying the Horizons of Multimodal Interaction**
- **The Rise and Potential of Large Language Model Based Agents: A Survey**
- **A Survey on LLM-based Autonomous Agents**
- **AGENTGYM: Evolving Large Language Model-based Agents across Diverse Environments**
- **Symbolic Learning Enables Self-Evolving Agents** - cn/agents)]<br>
- **Embodied-agents**
- **Machinascript-for-robots**
- **DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model**
- **MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework**
- **Learning Interactive Real-World Simulators** - simulator.github.io/unisim/)]<br>
- **Robust agents learn causal world models**
- **Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld** - Alfworld)]<br>
- **Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning** - dm/)][[**Github**](https://github.com/GuanSuns/LLMs-World-Models-for-Planning)]<br>
- **Eureka: Human-Level Reward Design via Coding Large Language Models** - research.github.io/)] [[**Github**](https://github.com/eureka-research/Eureka)] <br>
- **RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds**
- **Can Language Agents Be Alternatives to PPO? A Preliminary Empirical Study on OpenAI Gym**
- **RoboGPT: An intelligent agent of making embodied long-term decisions for daily instruction tasks**
- **Aligning Agents like Large Language Models**
- **AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents**
- **Eureka: Human-Level Reward Design via Coding Large Language Models** - research.github.io/)] [[**Github**](https://github.com/eureka-research/Eureka)] <br>
- **Learning to Model the World with Language**
- **MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning**
- **AppAgent: Multimodal Agents as Smartphone Users** - official.github.io/)] [[**Github**](https://github.com/mnotgod96/AppAgent)] <br>
- **KALM: Knowledgeable Agents by Offline Reinforcement Learning from Large Language Model Rollouts**
- **Language Reward Modulation for Pretraining Reinforcement Learning**
- **Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes**
- **Can Language Agents Be Alternatives to PPO? A Preliminary Empirical Study on OpenAI Gym**
- **RLAdapter: Bridging Large Language Models to Reinforcement Learning in Open Worlds**
- **STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models**
- **Text2Reward: Dense Reward Generation with Language Models for Reinforcement Learning**
- **Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning**
- **IndoorSim-to-OutdoorReal: Learning to Navigate Outdoors without any Outdoor Experience**
- **Adaptive Coordination in Social Embodied Rearrangement**
- **ESC: Exploration with Soft Commonsense Constraints for Zero-shot Object Navigation**
- **Leveraging Large Language Models for Optimised Coordination in Textual Multi-Agent Reinforcement Learning**
- **Online Continual Learning for Interactive Instruction Following Agents**
- **ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning**
- **Language Reward Modulation for Pretraining Reinforcement Learning**
- **Informing Reinforcement Learning Agents by Grounding Natural Language to Markov Decision Processes**
- **Learning to Model the World with Language**
- **Language Reward Modulation for Pretraining Reinforcement Learning**
- **Voyager: An Open-Ended Embodied Agent with Large Language Models**
- **Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives**
- **RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation**
- **Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study** - agents.github.io/Cradle/) [[**Code**]](https://baai-agents.github.io/Cradle/) <br>
- **See and Think: Embodied Agent in Virtual Environment**
- **AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents**
- **STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models**
- **ADAPTER-RL: Adaptation of Any Agent using Reinforcement Learning**
- **Online Continual Learning for Interactive Instruction Following Agents**
- **Guiding Pretraining in Reinforcement Learning with Large Language Models**
- **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization** - Pro)] <br>
- **Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives**
- **MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control**
- **MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control**
- **MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception**
- **RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation**
- **Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study** - agents.github.io/Cradle/) [[**Code**]](https://baai-agents.github.io/Cradle/) <br>
- **Agent Instructs Large Language Models to be General Zero-Shot Reasoners**
- **Agent Instructs Large Language Models to be General Zero-Shot Reasoners**
- **JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models** - jarvis1.github.io/)] <br>
- **JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models** - jarvis1.github.io/)] <br>
- **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents**
- **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents**
- **CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society** - ai/camel)] [[**Project page**](https://www.camel-ai.org/)]<br>
- **CAMEL: Communicative Agents for “Mind” Exploration of Large Scale Language Model Society** - ai/camel)] [[**Project page**](https://www.camel-ai.org/)]<br>
- **Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents** - planner)] [[**Project page**](https://wenlong.page/language-planner/)] <br>
- **FILM: Following Instructions in Language with Modular Methods**
- **Embodied Task Planning with Large Language Models**
- **SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning**
- **Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents** - planner)] [[**Project page**](https://wenlong.page/language-planner/)] <br>
- **Embodied Task Planning with Large Language Models**
- **SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning**
- **Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics** - adaptive-policy)] [[**Github**](https://github.com/KuoHaoZeng/AAP)] <br>
- **Reasoning with Language Model is Planning with World Model**
- **Do As I Can, Not As I Say: Grounding Language in Robotic Affordances**
- **Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics** - adaptive-policy)] [[**Github**](https://github.com/KuoHaoZeng/AAP)] <br>
- **Modeling Dynamic Environments with Scene Graph Memory**
- **AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation**
- **Inner Monologue: Embodied Reasoning through Planning with Language Models**
- **Language Models Meet World Models: Embodied Experiences Enhance Language Models** - model-for-language-model?style=social&label=Code+Stars)](https://github.com/szxiangjn/world-model-for-language-model) [[**Twitter**](https://twitter.com/szxiangjn/status/1659399771126370304)]<br>
- **A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution** - alfred.github.io/)] [[**Poster**](https://openreview.net/attachment?id=NeGDZeyjcKa&name=poster)]<br>
- **Code as Policies: Language Model Programs for Embodied Control** - as-policies.github.io/)] [[**Github**](https://code-as-policies.github.io)] [[**Blog**](https://ai.googleblog.com/2022/11/robots-that-write-their-own-code.html)] [[**Colab**](https://colab.research.google.com/drive/124TE4TsGYyrvduzeDclufyvwcc2qbbrE)]<br>
- **Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents**
- **Modeling Dynamic Environments with Scene Graph Memory**
- **Reasoning with Language Model is Planning with World Model**
- **Do As I Can, Not As I Say: Grounding Language in Robotic Affordances**
- **Inner Monologue: Embodied Reasoning through Planning with Language Models**
- **Language Models Meet World Models: Embodied Experiences Enhance Language Models** - model-for-language-model?style=social&label=Code+Stars)](https://github.com/szxiangjn/world-model-for-language-model) [[**Twitter**](https://twitter.com/szxiangjn/status/1659399771126370304)]<br>
- **AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation**
- **A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution** - alfred.github.io/)] [[**Poster**](https://openreview.net/attachment?id=NeGDZeyjcKa&name=poster)]<br>
- **Do Embodied Agents Dream of Pixelated Sheep?: Embodied Decision Making using Language Guided World Modelling**
- **LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models** - lab.github.io/LLM-Planner/)] [[**Github**](https://github.com/OSU-NLP-Group/LLM-Planner)]<br>
- **Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model**
- **VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models**
- **Palm-e: An embodied multimodal language mode** - e.github.io)]<br>
- **Large Language Models as Commonsense Knowledge for Large-Scale Task Planning**
- **Code as Policies: Language Model Programs for Embodied Control** - as-policies.github.io/)] [[**Github**](https://code-as-policies.github.io)] [[**Blog**](https://ai.googleblog.com/2022/11/robots-that-write-their-own-code.html)] [[**Colab**](https://colab.research.google.com/drive/124TE4TsGYyrvduzeDclufyvwcc2qbbrE)]<br>
- **3D-LLM: Injecting the 3D World into Large Language Models** - Foundation-Model/3D-LLM?style=social&label=Code+Stars)](https://github.com/UMass-Foundation-Model/3D-LLM) <br>
- **An Embodied Generalist Agent in 3D World**
- **Building Cooperative Embodied Agents Modularly with Large Language Models** - www.cs.umass.edu/Co-LLM-Agents/)] [[**Github**](https://github.com/UMass-Foundation-Model/Co-LLM-Agents/)]<br>
- **MindAgent: Emergent Gaming Interaction** - Chun Zhu<sup>15678</sup> Demetri Terzopoulos<sup>1</sup> Li Fei-Fei<sup>4</sup> Jianfeng Gao<sup>2</sup><br><sup>1</sup>UCLA; <sup>2</sup>Microsoft Research, Redmond; <sup>3</sup>Xbox Team, Microsoft; <sup>4</sup>Stanford; <sup>5</sup>BIGAI; <sup>6</sup>PKU; <sup>7</sup>THU; <sup>8</sup>UCLA
- **Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum**
- **Adaptive Coordination in Social Embodied Rearrangement**
- **3D-LLM: Injecting the 3D World into Large Language Models** - Foundation-Model/3D-LLM?style=social&label=Code+Stars)](https://github.com/UMass-Foundation-Model/3D-LLM) <br>
- **VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models**
- **Palm-e: An embodied multimodal language mode** - e.github.io)]<br>
- **Large Language Models as Commonsense Knowledge for Large-Scale Task Planning**
- **An Embodied Generalist Agent in 3D World**
- **Building Cooperative Embodied Agents Modularly with Large Language Models** - www.cs.umass.edu/Co-LLM-Agents/)] [[**Github**](https://github.com/UMass-Foundation-Model/Co-LLM-Agents/)]<br>
- **War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars**
- **MindAgent: Emergent Gaming Interaction** - Chun Zhu<sup>15678</sup> Demetri Terzopoulos<sup>1</sup> Li Fei-Fei<sup>4</sup> Jianfeng Gao<sup>2</sup><br><sup>1</sup>UCLA; <sup>2</sup>Microsoft Research, Redmond; <sup>3</sup>Xbox Team, Microsoft; <sup>4</sup>Stanford; <sup>5</sup>BIGAI; <sup>6</sup>PKU; <sup>7</sup>THU; <sup>8</sup>UCLA
- **Demonstration-free Autonomous Reinforcement Learning via Implicit and Bidirectional Curriculum**
- **DetGPT: Detect What You Need via Reasoning**
- **LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent**
- **3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment**
- **Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning**
- - exploration?style=social&label=Code+Stars)](https://github.com/facebookresearch/interaction-exploration) [[Project page](https://vision.cs.utexas.edu/projects/interaction-exploration/)] <br>
- **Embodied Question Answering in Photorealistic Environments with Point Cloud Perception**
- **Embodied Question Answering**
- **A Simple Approach for Visual Room Rearrangement: 3D Mapping and Semantic Search**
- **Multi-Target Embodied Question Answering**
- **Neural Modular Control for Embodied Question Answering**
- **ALFWorld: Aligning Text and Embodied Environments for Interactive Learning**
- **RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation** - ai.github.io/)] [[**Github**](https://github.com/Genesis-Embodied-AI/RoboGen)] <br>
- **ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks**
- **SQA3D: Situated Question Answering in 3D Scenes**
- **VIMA: Robot Manipulation with Multimodal Prompts** - Bench**](https://github.com/vimalabs/VimaBench)] <br>
- **IQA: Visual Question Answering in Interactive Environments** - iqa-cvpr-2018)] [[**Demo video (YouTube)**](https://www.youtube.com/watch?v=pXd3C-1jr98&feature=youtu.be)]<br>
- **Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments** - qa)]<br>
- **LEGENT: Open Platform for Embodied Agents**
- **AI2-THOR: An Interactive 3D Environment for Visual AI**
- **Tree of Thoughts: Deliberate Problem Solving with Large Language Models**
- **Habitat 2.0: Training Home Assistants to Rearrange their Habitat** - 2-0-training-home-assistants-to-rearrange-their-habitat/#:~:text=Habitat%202.0%3A%20Training%20Home%20Assistants%20to%20Rearrange%20their,AI%20stack%20%E2%80%93%20data%2C%20simulation%2C%20and%20benchmark%20tasks.)]<br>
- **iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes**
- **Habitat: A Platform for Embodied AI Research** - Sim**](https://github.com/facebookresearch/habitat-sim)] [[**Habitat-Lab**](https://github.com/facebookresearch/habitat-lab)] [[**Habitat Challenge**](https://github.com/facebookresearch/habitat-challenge)]<br>
- **Least-to-Most Prompting Enables Complex Reasoning in Large Language Models**
- **React: Synergizing reasoning and acting in language models**
- ![Star History Chart - history.com/#zchoi/Awesome-Embodied-Agent-with-LLMs&Date)-->
- 1
- **Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models**
- **Chain-of-Thought Prompting Elicits Reasoning in Large Language Models**
- **MINEDOJO: Building Open-Ended Embodied Agents with Internet-Scale Knowledge**
- **Distilling Internet-Scale Vision-Language Models into Embodied Agents**
- **LISA: Reasoning Segmentation via Large Language Model** - research/LISA)] [[**Huggingface Models**](https://huggingface.co/xinlai)] [[**Dataset**](https://drive.google.com/drive/folders/125mewyg5Ao6tZ3ZdJ-1-E3n04LGVELqy?usp=sharing)] [[**Online Demo**](http://103.170.5.190:7860/)]
- **Mobile-Agent: The Powerful Mobile Device Operation Assistant Family** - PLUG/MobileAgent/tree/main/Mobile-Agent-v2)]<br>
- **RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control** - 2-new-model-translates-vision-and-language-into-action/)] <br>
- **π0: A Vision-Language-Action Flow Model for General Robot Control**
- **Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models**
- **Meta-Control: Automatic Model-based Control System Synthesis for Heterogeneous Robot Skills** - control-paper.github.io/)]<br>
- **A Survey on Self-Evolution of Large Language Models**
- **Agent AI: Surveying the Horizons of Multimodal Interaction**
- **Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents**
- **Language Reward Modulation for Pretraining Reinforcement Learning**
- **Agent Instructs Large Language Models to be General Zero-Shot Reasoners**
- **Reasoning with Language Model is Planning with World Model**
- **Do As I Can, Not As I Say: Grounding Language in Robotic Affordances**
- **Context-Aware Planning and Environment-Aware Memory for Instruction Following Embodied Agents**
- **Language Models Meet World Models: Embodied Experiences Enhance Language Models** - model-for-language-model?style=social&label=Code+Stars)](https://github.com/szxiangjn/world-model-for-language-model) [[**Twitter**](https://twitter.com/szxiangjn/status/1659399771126370304)]<br>
- **Large Language Models as Commonsense Knowledge for Large-Scale Task Planning**
- **Mobile-Agent: The Powerful Mobile Device Operation Assistant Family** - PLUG/MobileAgent/tree/main/Mobile-Agent-v2)]<br>
- **War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars**
- **OpenVLA: An Open-Source Vision-Language-Action Model**
- **Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection** - as-Monitor/)] <br>
- **π0.5: a VLA with Open-World Generalization**
- **Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks** - reasoner.github.io/)] [[**HuggingFace🤗**](https://huggingface.co/datasets/zwq2018/embodied_reasoner)] <br>

Programming Languages

Categories

Methods 187 Development of Embodied Robotics and Benchmarks 22

Sub Categories

Keywords

artificial-intelligence 4 llm 4 robotics 4 agents 2 diffusion 2 embodied 2 embodied-agent 2 embodied-agents 2 generative-ai 2 large-language-models 2 mbodi 2 mbodiai 2 multimodal 2 transformer 2 vision-language-model 2 vlm 2 ai 2 arduino 2 gpt 2 gpt-3 2 gpt-4 2 llama 2 llama2 2 llms 2 mistral 2 mistral-7b 2 mixtral 2 mixtral-8x7b 2 raspberry-pi 2 raspberry-pi-4 2 robot 2 robots 2