Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/AGI-Edgerunners/LLM-Agents-Papers

A repo lists papers related to LLM based agent
https://github.com/AGI-Edgerunners/LLM-Agents-Papers

agents large-language-models llm-agent paper-list

Last synced: 3 months ago
JSON representation

A repo lists papers related to LLM based agent

Awesome Lists containing this project

README

        

# LLM-Agents-Papers
## :writing_hand: Description
Last Updated Time: 2024/7/1

A repo lists papers related to LLM based agent. Includes
* [Survey](#Survey)
* [Planning](#Planning), [Feedback&Reflection](#FeedbackReflection), [Memory Mechanism](#Memory-Mechanism)
* [Role Playing](#Role-Playing), [Game Playing](#Game-Playing), [Tool Usage&Human-Agent Interaction](#Tool-UsageHuman-Agent-Interaction)
* [Benchmark&Evaluation](#BenchmarkEvaluation), [Environment&Platform](#EnvironmentPlatform)
* [Agent Framework](#Agent-Framework), [Multi-Agent System](#Multi-Agent-System)
* [Agent Fine-tuning](#Agent-Fine-tuning)
## :yellow_heart: Recommendation
For more comprehensive reading, we also recommend other paper lists:
* [zjunlp/LLMAgentPapers](https://github.com/zjunlp/LLMAgentPapers): Must-read Papers on Large Language Model Agents.
* [teacherpeterpan/self-correction-llm-papers](https://github.com/teacherpeterpan/self-correction-llm-papers): This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.
* [Paitesanshi/LLM-Agent-Survey](https://github.com/Paitesanshi/LLM-Agent-Survey): A Survey on LLM-based Autonomous Agents.
* [woooodyy/llm-agent-paper-list](https://github.com/woooodyy/llm-agent-paper-list): Must-read papers for LLM-based agents.
* [git-disl/awesome-LLM-game-agent-papers](https://github.com/git-disl/awesome-LLM-game-agent-papers): Must-read papers for LLM-based Game agents.
## :newspaper: Papers
### Survey
- [2024/06/09] **A Survey on LLM-Based Agents: Common Workflows and Reusable LLM-Profiled Components** | [[paper]](https://arxiv.org/abs/2406.05804) | [code]

- [2024/06/03] **Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization** | [[paper]](https://arxiv.org/abs/2406.01171) | [[code]](https://github.com/miulab/personallm-survey)

- [2024/06/01] **Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey** | [[paper]](https://arxiv.org/abs/2406.00252) | [code]

- [2024/05/16] **Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents** | [[paper]](https://arxiv.org/abs/2405.10467) | [code]

- [2024/04/17] **The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey** | [[paper]](https://arxiv.org/abs/2404.11584) | [code]

- [2024/04/17] **Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions** | [[paper]](https://arxiv.org/abs/2404.11023) | [code]

- [2024/04/03] **Empowering Biomedical Discovery with AI Agents** | [[paper]](https://arxiv.org/abs/2404.02831) | [code]

- [2024/04/02] **A Survey on Large Language Model-Based Game Agents** | [[paper]](https://arxiv.org/abs/2404.02039) | [[code]](https://github.com/git-disl/awesome-LLM-game-agent-papers)

- [2024/03/26] **Large Language Models for Human-Robot Interaction: Opportunities and Risks** | [[paper]](https://arxiv.org/abs/2405.00693) | [code]

- [2024/03/07] **Promising and worth-to-try future directions for advancing state-of-the-art surrogates methods of agent-based models in social and health computational sciences** | [[paper]](https://arxiv.org/abs/2403.04417) | [code]

- [2024/02/28] **Large Language Models and Games: A Survey and Roadmap** | [[paper]](https://arxiv.org/abs/2402.18659) | [code]

- [2024/02/28] **A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems** | [[paper]](https://arxiv.org/abs/2402.18013) | [code]

- [2024/02/07] **Can Large Language Model Agents Simulate Human Trust Behaviors?** | [[paper]](https://arxiv.org/abs/2402.04559) | [[code]](https://github.com/camel-ai/agent-trust)

- [2024/02/06] **Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science** | [[paper]](https://arxiv.org/abs/2402.04247) | [code]

- [2024/02/05] **Understanding the planning of LLM agents: A survey** | [[paper]](https://arxiv.org/abs/2402.02716) | [code]

- [2024/02/02] **Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions** | [[paper]](https://arxiv.org/abs/2402.01108) | [code]

- [2024/01/01] **If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents** | [[paper]](https://arxiv.org/abs/2401.00812) | [code]

- [2023/12/31] **A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots** | [[paper]](https://arxiv.org/abs/2401.00609) | [code]

- [2023/12/19] **Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives** | [[paper]](https://arxiv.org/abs/2312.11970) | [code]

- [2023/09/14] **The Rise and Potential of Large Language Model Based Agents: A Survey** | [[paper]](https://arxiv.org/abs/2309.07864) | [[code]](https://github.com/woooodyy/llm-agent-paper-list)

- [2023/08/22] **A Survey on Large Language Model based Autonomous Agents** | [[paper]](https://arxiv.org/abs/2308.11432) | [[code]](https://github.com/Paitesanshi/LLM-Agent-Survey)

- [2023/06/27] **Next Steps for Human-Centered Generative AI: A Technical Perspective** | [[paper]](https://arxiv.org/abs/2306.15774) | [code]

- [2023/04/06] **Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions** | [[paper]](https://arxiv.org/abs/2304.02868) | [code]

---
### Planning
- [2024/06/17] **RePrompt: Planning by Automatic Prompt Engineering for Large Language Models Agents** | [[paper]](https://arxiv.org/abs/2406.11132) | [code]

- [2024/06/06] **Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering** | [[paper]](https://arxiv.org/abs/2406.03807) | [[code]](https://github.com/OceannTwT/Tool-Planner)

- [2024/05/28] **A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models** | [[paper]](https://arxiv.org/abs/2405.18208) | [code]

- [2024/05/27] **LLM-Based Cooperative Agents using Information Relevance and Plan Validation** | [[paper]](https://arxiv.org/abs/2405.16751) | [code]

- [2024/05/24] **Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models** | [[paper]](https://arxiv.org/abs/2405.15143) | [[code]](https://github.com/conglu1997/intelligent-go-explore)

- [2024/04/28] **Logic Agent: Enhancing Validity with Logic Rule Invocation** | [[paper]](https://arxiv.org/abs/2404.18130) | [code]

- [2024/04/21] **Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following** | [[paper]](https://arxiv.org/abs/2404.15190) | [code]

- [2024/03/13] **AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2403.08978) | [code]

- [2024/03/12] **AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production** | [[paper]](https://arxiv.org/abs/2403.07952) | [code]

- [2024/03/11] **Strength Lies in Differences! Towards Effective Non-collaborative Dialogues via Tailored Strategy Planning** | [[paper]](https://arxiv.org/abs/2403.06769) | [code]

- [2024/03/10] **TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision** | [[paper]](https://arxiv.org/abs/2403.06221) | [code]

- [2024/03/05] **KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents** | [[paper]](https://arxiv.org/abs/2403.03101) | [code]

- [2024/03/05] **Language Guided Exploration for RL Agents in Text Environments** | [[paper]](https://arxiv.org/abs/2403.03141) | [code]

- [2024/02/29] **PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval** | [[paper]](https://arxiv.org/abs/2402.19273) | [code]

- [2024/02/28] **Data Interpreter: An LLM Agent For Data Science** | [[paper]](https://arxiv.org/abs/2402.18679) | [[code]](https://github.com/geekan/metagpt)

- [2024/02/18] **What's the Plan? Evaluating and Developing Planning-Aware Techniques for LLMs** | [[paper]](https://arxiv.org/abs/2402.11489) | [code]

- [2024/02/18] **PreAct: Predicting Future in ReAct Enhances Agent's Planning Ability** | [[paper]](https://arxiv.org/abs/2402.11534) | [code]

- [2024/02/16] **When is Tree Search Useful for LLM Planning? It Depends on the Discriminator** | [[paper]](https://arxiv.org/abs/2402.10890) | [[code]](https://github.com/osu-nlp-group/llm-planning-eval)

- [2024/02/09] **Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty** | [[paper]](https://arxiv.org/abs/2402.06529) | [code]

- [2024/02/06] **RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents** | [[paper]](https://arxiv.org/abs/2402.03610) | [code]

- [2024/02/02] **TravelPlanner: A Benchmark for Real-World Planning with Language Agents** | [[paper]](https://arxiv.org/abs/2402.01622) | [[code]](https://github.com/OSU-NLP-Group/TravelPlanner)

- [2024/01/10] **AUTOACT: Automatic Agent Learning from Scratch via Self-Planning** | [[paper]](https://arxiv.org/abs/2401.05268) | [code]

- [2023/11/19] **TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems** | [[paper]](https://arxiv.org/abs/2311.11315) | [code]

- [2023/10/09] **Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena** | [[paper]](https://arxiv.org/abs/2310.05746) | [[code]](https://github.com/jiangjiechen/auction-arena)

- [2023/08/07] **TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents** | [[paper]](https://arxiv.org/abs/2308.03427) | [code]

- [2023/05/26] **AdaPlanner: Adaptive Planning from Feedback with Language Models** | [[paper]](https://arxiv.org/abs/2305.16653) | [[code]](https://github.com/haotiansun14/adaplanner)

- [2023/05/24] **Reasoning with Language Model is Planning with World Model** | [[paper]](https://arxiv.org/abs/2305.14992) | [code]

- [2023/05/24] **Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning** | [[paper]](https://arxiv.org/abs/2305.14909) | [[code]](https://github.com/GuanSuns/LLMs-World-Models-for-Planning)

- [2023/03/29] **Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks** | [[paper]](https://arxiv.org/abs/2303.16563) | [[code]](https://sites.google.com/view/plan4mc)

- [2023/02/03] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents** | [[paper]](https://arxiv.org/abs/2302.01560) | [code]

- [2022/12/08] **LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models** | [[paper]](https://arxiv.org/abs/2212.04088) | [[code]](https://dki-lab.github.io/LLM-Planner/)

---
### Feedback&Reflection
- [2024/06/05] **LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback** | [[paper]](https://arxiv.org/abs/2406.03363) | [code]

- [2024/03/18] **QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction** | [[paper]](https://arxiv.org/abs/2403.11886) | [code]

- [2024/03/17] **Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback** | [[paper]](https://arxiv.org/abs/2403.11330) | [code]

- [2024/03/08] **ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues** | [[paper]](https://arxiv.org/abs/2403.05326) | [code]

- [2024/03/04] **Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents** | [[paper]](https://arxiv.org/abs/2403.02502) | [code]

- [2024/02/27] **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization** | [[paper]](https://arxiv.org/abs/2402.17574) | [code]

- [2024/02/26] **SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection** | [[paper]](https://arxiv.org/abs/2402.16705) | [code]

- [2024/02/24] **Empowering Large Language Model Agents through Action Learning** | [[paper]](https://arxiv.org/abs/2402.15809) | [[code]](https://github.com/zhao-ht/learnact)

- [2024/02/22] **Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning** | [[paper]](https://arxiv.org/abs/2402.14963) | [code]

- [2024/02/19] **A Critical Evaluation of AI Feedback for Aligning Large Language Models** | [[paper]](https://arxiv.org/abs/2402.12366) | [code]

- [2024/02/06] **AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls** | [[paper]](https://arxiv.org/abs/2402.04253) | [[code]](https://github.com/dyabel/anytool)

- [2024/02/02] **StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback** | [[paper]](https://arxiv.org/abs/2402.01391) | [code]

- [2024/02/01] **Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model** | [[paper]](https://arxiv.org/abs/2402.01051) | [code]

- [2023/12/18] **CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update** | [[paper]](https://arxiv.org/abs/2312.10908) | [[code]](https://clova-tool.github.io/)

- [2023/11/14] **The ART of LLM Refinement: Ask, Refine, and Trust** | [[paper]](https://arxiv.org/abs/2311.07961) | [code]

- [2023/10/31] **Learning From Mistakes Makes LLM Better Reasoner** | [[paper]](https://arxiv.org/abs/2310.20689) | [[code]](https://github.com/microsoft/LEMA)

- [2023/08/01] **SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning** | [[paper]](https://arxiv.org/abs/2308.00436) | [[code]](https://github.com/ningmiao/selfcheck)

- [2023/07/27] **PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback** | [[paper]](https://arxiv.org/abs/2307.14936) | [code]

- [2023/05/26] **AdaPlanner: Adaptive Planning from Feedback with Language Models** | [[paper]](https://arxiv.org/abs/2305.16653) | [[code]](https://github.com/haotiansun14/adaplanner)

- [2023/05/22] **Making Language Models Better Tool Learners with Execution Feedback** | [[paper]](https://arxiv.org/abs/2305.13068) | [[code]](https://github.com/zjunlp/trice)

- [2023/04/11] **Teaching Large Language Models to Self-Debug** | [[paper]](https://arxiv.org/abs/2304.05128) | [code]

- [2023/03/30] **Self-Refine: Iterative Refinement with Self-Feedback** | [[paper]](https://arxiv.org/abs/2303.17651) | [[code]](https://github.com/madaan/self-refine)

- [2023/02/03] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents** | [[paper]](https://arxiv.org/abs/2302.01560) | [code]

---
### Memory Mechanism
- [2024/05/29] **Toward Conversational Agents with Context and Time Sensitive Long-term Memory** | [[paper]](https://arxiv.org/abs/2406.00057) | [[code]](https://github.com/Zyphra/TemporalMemoryDataset)

- [2024/04/15] **Memory Sharing for Large Language Model based Agents** | [[paper]](https://arxiv.org/abs/2404.09982) | [[code]](https://github.com/GHupppp/MemorySharingLLM)

- [2024/02/27] **Evaluating Very Long-Term Conversational Memory of LLM Agents** | [[paper]](https://arxiv.org/abs/2402.17753) | [code]

- [2024/02/19] **Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations** | [[paper]](https://arxiv.org/abs/2402.11975) | [code]

- [2024/02/07] **InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory** | [[paper]](https://arxiv.org/abs/2402.04617) | [code]

- [2024/02/06] **RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents** | [[paper]](https://arxiv.org/abs/2402.03610) | [code]

- [2023/12/22] **Empowering Working Memory for Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2312.17259) | [code]

- [2023/12/22] **Evolving Large Language Model Assistant with Long-Term Conditional Memory** | [[paper]](https://arxiv.org/abs/2312.17257) | [code]

- [2023/11/10] **JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models** | [[paper]](https://arxiv.org/abs/2311.05997) | [[code]](https://github.com/CraftJarvis/JARVIS-1)

- [2023/10/16] **CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization** | [[paper]](https://arxiv.org/abs/2310.10134) | [[code]](https://github.com/allenai/clin)

- [2023/06/06] **ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory** | [[paper]](https://arxiv.org/abs/2306.03901) | [code]

- [2023/05/31] **Monotonic Location Attention for Length Generalization** | [[paper]](https://arxiv.org/abs/2305.20019) | [code]

- [2023/05/26] **Randomized Positional Encodings Boost Length Generalization of Transformers** | [[paper]](https://arxiv.org/abs/2305.16843) | [code]

- [2023/05/25] **Landmark Attention: Random-Access Infinite Context Length for Transformers** | [[paper]](https://arxiv.org/abs/2305.16300) | [code]

- [2023/05/24] **Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration** | [[paper]](https://arxiv.org/abs/2305.15262) | [code]

- [2023/05/24] **Adapting Language Models to Compress Contexts** | [[paper]](https://arxiv.org/abs/2305.14788) | [code]

- [2023/05/23] **RET-LLM: Towards a General Read-Write Memory for Large Language Models** | [[paper]](https://arxiv.org/abs/2305.14322) | [code]

- [2023/05/22] **RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text** | [[paper]](https://arxiv.org/abs/2305.13304) | [code]

- [2023/05/19] **ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings** | [[paper]](https://arxiv.org/abs/2305.11554) | [code]

- [2023/05/17] **MemoryBank: Enhancing Large Language Models with Long-Term Memory** | [[paper]](https://arxiv.org/abs/2305.10250) | [code]

- [2023/05/15] **Small Models are Valuable Plug-ins for Large Language Models** | [[paper]](https://arxiv.org/abs/2305.08848) | [code]

- [2023/05/02] **Unlimiformer: Long-Range Transformers with Unlimited Length Input** | [[paper]](https://arxiv.org/abs/2305.01625) | [code]

- [2023/05/01] **Learning to Reason and Memorize with Self-Notes** | [[paper]](https://arxiv.org/abs/2305.00833) | [code]

- [2023/04/27] **ChatLog: Recording and Analyzing ChatGPT Across Time** | [[paper]](https://arxiv.org/abs/2304.14106) | [code]

- [2023/04/26] **Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System** | [[paper]](https://arxiv.org/abs/2304.13343) | [code]

- [2023/04/21] **Emergent and Predictable Memorization in Large Language Models** | [[paper]](https://arxiv.org/abs/2304.11158) | [code]

- [2023/03/17] **CoLT5: Faster Long-Range Transformers with Conditional Computation** | [[paper]](https://arxiv.org/abs/2303.09752) | [code]

---
### Role Playing
- [2024/06/26] **Mental Modeling of Reinforcement Learning Agents by Language Models** | [[paper]](https://arxiv.org/abs/2406.18505) | [code]

- [2024/06/19] **StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation** | [[paper]](https://arxiv.org/abs/2406.13840) | [code]

- [2024/06/17] **Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector** | [[paper]](https://arxiv.org/abs/2406.11277) | [code]

- [2024/06/17] **Input Conditioned Graph Generation for Language Agents** | [[paper]](https://arxiv.org/abs/2406.11555) | [[code]](https://github.com/lukasvierling/dynamicgptswarm)

- [2024/06/17] **HoLLMwood: Unleashing the Creativity of Large Language Models in Screenwriting via Role Playing** | [[paper]](https://arxiv.org/abs/2406.11683) | [code]

- [2024/06/11] **Agent-SiMT: Agent-assisted Simultaneous Machine Translation with Large Language Models** | [[paper]](https://arxiv.org/abs/2406.06910) | [code]

- [2024/06/10] **Can Language Models Serve as Text-Based World Simulators?** | [[paper]](https://arxiv.org/abs/2406.06485) | [code]

- [2024/06/09] **Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions** | [[paper]](https://arxiv.org/abs/2406.05688) | [[code]](https://github.com/chengtan9907/reviewmt)

- [2024/06/07] **SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals** | [[paper]](https://arxiv.org/abs/2406.04784) | [code]

- [2024/06/05] **BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents** | [[paper]](https://arxiv.org/abs/2406.03007) | [[code]](https://github.com/dpamk/badagent)

- [2024/05/28] **TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models** | [[paper]](https://arxiv.org/abs/2405.18027) | [code]

- [2024/05/25] **GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases** | [[paper]](https://arxiv.org/abs/2405.16205) | [code]

- [2024/05/25] **AutoManual: Generating Instruction Manuals by LLM Agents via Interactive Environmental Learning** | [[paper]](https://arxiv.org/abs/2405.16247) | [code]

- [2024/05/12] **Exploring the Potential of Conversational AI Support for Agent-Based Social Simulation Model Design** | [[paper]](https://arxiv.org/abs/2405.08032) | [code]

- [2024/05/10] **LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play** | [[paper]](https://arxiv.org/abs/2405.06373) | [code]

- [2024/05/06] **Large Language Models (LLMs) as Agents for Augmented Democracy** | [[paper]](https://arxiv.org/abs/2405.03452) | [code]

- [2024/05/06] **SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering** | [[paper]](https://arxiv.org/abs/2405.15793) | [code]

- [2024/05/02] **GAIA: A General AI Assistant for Intelligent Accelerator Operations** | [[paper]](https://arxiv.org/abs/2405.01359) | [code]

- [2024/05/01] **"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time** | [[paper]](https://arxiv.org/abs/2405.00801) | [code]

- [2024/04/30] **PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games** | [[paper]](https://arxiv.org/abs/2404.19721) | [code]

- [2024/04/30] **Large Language Model Agent for Fake News Detection** | [[paper]](https://arxiv.org/abs/2405.01593) | [code]

- [2024/04/27] **CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments** | [[paper]](https://arxiv.org/abs/2404.18021) | [code]

- [2024/04/26] **Large Language Model Agent as a Mechanical Designer** | [[paper]](https://arxiv.org/abs/2404.17525) | [code]

- [2024/04/25] **Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents** | [[paper]](https://arxiv.org/abs/2404.16698) | [code]

- [2024/04/22] **How Well Can LLMs Echo Us? Evaluating AI Chatbots' Role-Play Ability with ECHO** | [[paper]](https://arxiv.org/abs/2404.13957) | [code]

- [2024/04/19] **Cooperative Sentiment Agents for Multimodal Sentiment Analysis** | [[paper]](https://arxiv.org/abs/2404.12642) | [[code]](https://github.com/smwanghhh/co-sa)

- [2024/04/19] **Towards Human-centered Proactive Conversational Agents** | [[paper]](https://arxiv.org/abs/2404.12670) | [code]

- [2024/04/13] **LLMSat: A Large Language Model-Based Goal-Oriented Agent for Autonomous Space Exploration** | [[paper]](https://arxiv.org/abs/2405.01392) | [code]

- [2024/04/10] **Apollonion: Profile-centric Dialog Agent** | [[paper]](https://arxiv.org/abs/2404.08692) | [code]

- [2024/04/09] **SurveyAgent: A Conversational System for Personalized and Efficient Research Survey** | [[paper]](https://arxiv.org/abs/2404.06364) | [code]

- [2024/03/31] **DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model** | [[paper]](https://arxiv.org/abs/2404.01342) | [[code]](https://github.com/OpenGVLab/DiffAgent)

- [2024/03/29] **DataAgent: Evaluating Large Language Models' Ability to Answer Zero-Shot, Natural Language Queries** | [[paper]](https://arxiv.org/abs/2404.00188) | [code]

- [2024/03/23] **EduAgent: Generative Student Agents in Learning** | [[paper]](https://arxiv.org/abs/2404.07963) | [code]

- [2024/03/22] **CACA Agent: Capability Collaboration based AI Agent** | [[paper]](https://arxiv.org/abs/2403.15137) | [code]

- [2024/03/19] **Characteristic AI Agents via Large Language Models** | [[paper]](https://arxiv.org/abs/2403.12368) | [code]

- [2024/03/15] **VideoAgent: Long-form Video Understanding with Large Language Model as Agent** | [[paper]](https://arxiv.org/abs/2403.10517) | [code]

- [2024/03/05] **ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary** | [[paper]](https://arxiv.org/abs/2403.02574) | [code]

- [2024/03/05] **SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents** | [[paper]](https://arxiv.org/abs/2403.02959) | [code]

- [2024/03/02] **SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code** | [[paper]](https://arxiv.org/abs/2403.01248) | [code]

- [2024/02/29] **On the Decision-Making Abilities in Role-Playing using Large Language Models** | [[paper]](https://arxiv.org/abs/2402.18807) | [code]

- [2024/02/28] **Prospect Personalized Recommendation on Large Language Model-based Agent Platform** | [[paper]](https://arxiv.org/abs/2402.18240) | [code]

- [2024/02/28] **Data Interpreter: An LLM Agent For Data Science** | [[paper]](https://arxiv.org/abs/2402.18679) | [[code]](https://github.com/geekan/metagpt)

- [2024/02/27] **BASES: Large-scale Web Search User Simulation with Large Language Model based Agents** | [[paper]](https://arxiv.org/abs/2402.17505) | [code]

- [2024/02/26] **Language Agents as Optimizable Graphs** | [[paper]](https://arxiv.org/abs/2402.16823) | [[code]](https://github.com/metauto-ai/gptswarm)

- [2024/02/26] **Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation** | [[paper]](https://arxiv.org/abs/2402.16333) | [code]

- [2024/02/25] **Understanding Public Perceptions of AI Conversational Agents: A Cross-Cultural Analysis** | [[paper]](https://arxiv.org/abs/2402.16039) | [code]

- [2024/02/25] **Bootstrapping Cognitive Agents with a Large Language Model** | [[paper]](https://arxiv.org/abs/2403.00810) | [code]

- [2024/02/23] **On the Multi-turn Instruction Following for Conversational Web Agents** | [[paper]](https://arxiv.org/abs/2402.15057) | [code]

- [2024/02/22] **Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation** | [[paper]](https://arxiv.org/abs/2402.14744) | [code]

- [2024/02/21] **Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent** | [[paper]](https://arxiv.org/abs/2402.13717) | [code]

- [2024/02/20] **Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues** | [[paper]](https://arxiv.org/abs/2402.12738) | [code]

- [2024/02/20] **Soft Self-Consistency Improves Language Model Agents** | [[paper]](https://arxiv.org/abs/2402.13212) | [code]

- [2024/02/20] **CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management** | [[paper]](https://arxiv.org/abs/2402.14850) | [code]

- [2024/02/19] **Polarization of Autonomous Generative AI Agents Under Echo Chambers** | [[paper]](https://arxiv.org/abs/2402.12212) | [code]

- [2024/02/19] **LLM Agents for Psychology: A Study on Gamified Assessments** | [[paper]](https://arxiv.org/abs/2402.12326) | [code]

- [2024/02/19] **Stick to your Role! Stability of Personal Values Expressed in Large Language Models** | [[paper]](https://arxiv.org/abs/2402.14846) | [code]

- [2024/02/19] **WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment** | [[paper]](https://arxiv.org/abs/2402.12275) | [code]

- [2024/02/18] **Modelling Political Coalition Negotiations Using LLM-based Agents** | [[paper]](https://arxiv.org/abs/2402.11712) | [code]

- [2024/02/17] **Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents** | [[paper]](https://arxiv.org/abs/2402.11208) | [code]

- [2024/02/15] **Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients** | [[paper]](https://arxiv.org/abs/2402.10153) | [code]

- [2024/02/13] **Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast** | [[paper]](https://arxiv.org/abs/2402.08567) | [[code]](https://github.com/sail-sg/agent-smith)

- [2024/02/06] **Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies** | [[paper]](https://arxiv.org/abs/2402.03628) | [code]

- [2024/02/06] **Can Generative Agents Predict Emotion?** | [[paper]](https://arxiv.org/abs/2402.04232) | [code]

- [2024/02/05] **LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models** | [[paper]](https://arxiv.org/abs/2402.02896) | [code]

- [2024/02/05] **GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models** | [[paper]](https://arxiv.org/abs/2402.03299) | [code]

- [2024/02/04] **NavHint: Vision and Language Navigation Agent with a Hint Generator** | [[paper]](https://arxiv.org/abs/2402.02559) | [code]

- [2024/02/02] **TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution** | [[paper]](https://arxiv.org/abs/2402.01586) | [code]

- [2024/02/01] **Executable Code Actions Elicit Better LLM Agents** | [[paper]](https://arxiv.org/abs/2402.01030) | [code]

- [2024/01/31] **LLMs Simulate Big Five Personality Traits: Further Evidence** | [[paper]](https://arxiv.org/abs/2402.01765) | [code]

- [2024/01/29] **Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues** | [[paper]](https://arxiv.org/abs/2402.01737) | [code]

- [2024/01/09] **Agent Alignment in Evolving Social Norms** | [[paper]](https://arxiv.org/abs/2401.04620) | [code]

- [2023/12/28] **Experiential Co-Learning of Software-Developing Agents** | [[paper]](https://arxiv.org/abs/2312.17025) | [code]

- [2023/12/27] **Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs** | [[paper]](https://arxiv.org/abs/2312.16378) | [code]

- [2023/12/21] **ChatGPT as a commenter to the news: can LLMs generate human-like opinions?** | [[paper]](https://arxiv.org/abs/2312.13961) | [code]

- [2023/12/19] **Can ChatGPT be Your Personal Medical Assistant?** | [[paper]](https://arxiv.org/abs/2312.12006) | [code]

- [2023/12/06] **LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem** | [[paper]](https://arxiv.org/abs/2312.03815) | [code]

- [2023/11/28] **War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars** | [[paper]](https://arxiv.org/abs/2311.17227) | [code]

- [2023/11/23] **Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach** | [[paper]](https://arxiv.org/abs/2311.13884) | [code]

- [2023/11/10] **Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations** | [[paper]](https://arxiv.org/abs/2311.06330) | [[code]](https://github.com/Roihn/SABM)

- [2023/10/01] **RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models** | [[paper]](https://arxiv.org/abs/2310.00746) | [[code]](https://github.com/InteractiveNLP-Team/RoleLLM-public)

- [2023/09/08] **Unleashing the Power of Graph Learning through LLM-based Autonomous Agents** | [[paper]](https://arxiv.org/abs/2309.04565) | [code]

- [2023/09/05] **Cognitive Architectures for Language Agents** | [[paper]](https://arxiv.org/abs/2309.02427) | [code]

- [2023/08/22] **Towards an On-device Agent for Text Rewriting** | [[paper]](https://arxiv.org/abs/2308.11807) | [code]

- [2023/08/14] **ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate** | [[paper]](https://arxiv.org/abs/2308.07201) | [[code]](https://github.com/thunlp/ChatEval)

- [2023/08/10] **LLM As DBA** | [[paper]](https://arxiv.org/abs/2308.05481) | [[code]](https://github.com/TsinghuaDatabaseGroup/DB-GPT)

- [2023/07/24] **To Infinity and Beyond: SHOW-1 and Showrunner Agents in Multi-Agent Simulations** | [[paper]](https://fablestudio.github.io/showrunner-agents/) | [code]

- [2023/06/28] **Inferring the Goals of Communicating Agents from Actions and Instructions** | [[paper]](https://arxiv.org/abs/2306.16207) | [code]

- [2023/05/30] **Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate** | [[paper]](https://arxiv.org/abs/2305.19118) | [[code]](https://github.com/Skytliang/Multi-Agents-Debate)

- [2023/05/27] **SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks** | [[paper]](https://arxiv.org/abs/2305.17390) | [[code]](https://github.com/yuchenlin/swiftsage/)

- [2023/05/26] **Training Socially Aligned Language Models in Simulated Human Society** | [[paper]](https://arxiv.org/abs/2305.16960) | [[code]](https://github.com/agi-templar/Stable-Alignment)

- [2023/05/25] **Role-Play with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.16367) | [code]

- [2023/05/17] **Tree of Thoughts: Deliberate Problem Solving with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.10601) | [[code]](https://github.com/ysymyth/tree-of-thought-llm)

- [2023/05/09] **TidyBot: Personalized Robot Assistance with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.05658) | [[code]](https://github.com/jimmyyhwu/tidybot)

- [2023/05/02] **The Role of Summarization in Generative Agents: A Preliminary Perspective** | [[paper]](https://arxiv.org/abs/2305.01253) | [code]

- [2023/04/26] **Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models** | [[paper]](https://arxiv.org/abs/2304.13835) | [[code]](https://github.com/facebookresearch/LIGHT)

- [2023/04/24] **ChatLLM Network: More brains, More intelligence** | [[paper]](https://arxiv.org/abs/2304.12998) | [code]

- [2023/04/21] **Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback** | [[paper]](https://arxiv.org/abs/2304.10750) | [code]

- [2023/04/19] **Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models** | [paper] | [code]

- [2023/04/15] **Self-collaboration Code Generation via ChatGPT** | [[paper]](https://arxiv.org/abs/2304.07590) | [code]

- [2023/04/07] **Generative Agents: Interactive Simulacra of Human Behavior** | [[paper]](https://arxiv.org/abs/2304.03442) | [[code]](https://github.com/joonspk-research/generative_agents)

- [2023/03/31] **CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society** | [[paper]](https://arxiv.org/abs/2303.17760) | [code]

- [2022/12/08] **LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models** | [[paper]](https://arxiv.org/abs/2212.04088) | [[code]](https://dki-lab.github.io/LLM-Planner/)

---
### Game Playing
- [2024/06/05] **The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games** | [[paper]](https://arxiv.org/abs/2406.03299) | [code]

- [2024/05/23] **Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication** | [[paper]](https://arxiv.org/abs/2405.14173) | [code]

- [2024/05/08] **LLMs with Personalities in Multi-issue Negotiation Games** | [[paper]](https://arxiv.org/abs/2405.05248) | [code]

- [2024/04/03] **Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game** | [[paper]](https://arxiv.org/abs/2404.02532) | [code]

- [2024/03/26] **Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies** | [[paper]](https://arxiv.org/abs/2403.17497) | [[code]](https://github.com/clp-research/cost-sharing-reference-game)

- [2024/02/19] **LLM Agents for Psychology: A Study on Gamified Assessments** | [[paper]](https://arxiv.org/abs/2402.12326) | [code]

- [2024/02/13] **Large Language Models as Minecraft Agents** | [[paper]](https://arxiv.org/abs/2402.08392) | [code]

- [2024/02/12] **Large Language Models as Agents in Two-Player Games** | [[paper]](https://arxiv.org/abs/2402.08078) | [code]

- [2024/02/07] **Can Large Language Model Agents Simulate Human Trust Behaviors?** | [[paper]](https://arxiv.org/abs/2402.04559) | [[code]](https://github.com/camel-ai/agent-trust)

- [2024/02/04] **Enhance Reasoning for Large Language Models in the Game Werewolf** | [[paper]](https://arxiv.org/abs/2402.02330) | [code]

- [2024/02/02] **PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models** | [[paper]](https://arxiv.org/abs/2402.01118) | [code]

- [2023/12/29] **Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game** | [[paper]](https://arxiv.org/abs/2312.17515) | [code]

- [2023/11/10] **JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models** | [[paper]](https://arxiv.org/abs/2311.05997) | [[code]](https://github.com/CraftJarvis/JARVIS-1)

- [2023/10/31] **Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models** | [[paper]](https://arxiv.org/abs/2310.20499) | [code]

- [2023/09/29] **Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4** | [[paper]](https://arxiv.org/abs/2309.17277) | [[code]](https://github.com/CR-Gjx/Suspicion-Agent)

- [2023/09/18] **MindAgent: Emergent Gaming Interaction** | [[paper]](https://arxiv.org/abs/2309.09971) | [[code]](https://mindagent.github.io/)

- [2023/09/10] **An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents** | [[paper]](https://arxiv.org/abs/2309.05076) | [code]

- [2023/09/09] **Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf** | [[paper]](https://arxiv.org/abs/2309.04658) | [code]

- [2023/08/23] **Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis** | [[paper]](https://arxiv.org/abs/2308.12466) | [code]

- [2023/05/31] **Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models** | [[paper]](https://arxiv.org/abs/2305.19761) | [code]

- [2023/05/26] **Playing repeated games with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.16867) | [code]

- [2023/05/25] **Voyager: An Open-Ended Embodied Agent with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.16291) | [[code]](https://github.com/MineDojo/Voyager)

- [2023/05/25] **Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory** | [[paper]](https://arxiv.org/abs/2305.17144) | [[code]](https://github.com/OpenGVLab/GITM)

- [2023/05/19] **Examining the Inter-Consistency of Large Language Models: An In-depth Analysis via Debate** | [paper] | [code]

- [2023/05/17] **Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback** | [[paper]](https://arxiv.org/abs/2305.10142) | [[code]](https://github.com/FranxYao/GPT-Bargaining)

- [2023/05/08] **Knowledge-enhanced Agents for Interactive Text Games** | [[paper]](https://arxiv.org/abs/2305.05091) | [code]

- [2023/03/29] **Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks** | [[paper]](https://arxiv.org/abs/2303.16563) | [[code]](https://sites.google.com/view/plan4mc)

- [2023/02/03] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents** | [[paper]](https://arxiv.org/abs/2302.01560) | [code]

---
### Tool Usage&Human-Agent Interaction
- [2024/06/28] **Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task** | [[paper]](https://arxiv.org/abs/2406.19648) | [code]

- [2024/06/17] **GUICourse: From General Vision Language Models to Versatile GUI Agents** | [[paper]](https://arxiv.org/abs/2406.11317) | [[code]](https://github.com/yiye3/guicourse)

- [2024/06/11] **Towards Human-AI Collaboration in Healthcare: Guided Deferral Systems with Large Language Models** | [[paper]](https://arxiv.org/abs/2406.07212) | [code]

- [2024/06/06] **Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering** | [[paper]](https://arxiv.org/abs/2406.03807) | [[code]](https://github.com/OceannTwT/Tool-Planner)

- [2024/06/03] **Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration** | [[paper]](https://arxiv.org/abs/2406.01014) | [[code]](https://github.com/x-plug/mobileagent)

- [2024/06/02] **Towards a copilot in BIM authoring tool using a large language model-based agent for intelligent human-machine interaction** | [[paper]](https://arxiv.org/abs/2406.16903) | [code]

- [2024/05/30] **Large Language Models Can Self-Improve At Web Agent Tasks** | [[paper]](https://arxiv.org/abs/2405.20309) | [code]

- [2024/05/23] **Human-Agent Cooperation in Games under Incomplete Information through Natural Language Communication** | [[paper]](https://arxiv.org/abs/2405.14173) | [code]

- [2024/05/17] **Latent State Estimation Helps UI Agents to Reason** | [[paper]](https://arxiv.org/abs/2405.11120) | [code]

- [2024/05/02] **CACTUS: Chemistry Agent Connecting Tool-Usage to Science** | [[paper]](https://arxiv.org/abs/2405.00972) | [[code]](https://github.com/pnnl/cactus)

- [2024/05/01] **Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning** | [[paper]](https://arxiv.org/abs/2405.00516) | [code]

- [2024/05/01] **"Ask Me Anything": How Comcast Uses LLMs to Assist Agents in Real Time** | [[paper]](https://arxiv.org/abs/2405.00801) | [code]

- [2024/04/23] **Aligning LLM Agents by Learning Latent Preference from User Edits** | [[paper]](https://arxiv.org/abs/2404.15269) | [code]

- [2024/04/16] **Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning** | [[paper]](https://arxiv.org/abs/2404.10887) | [code]

- [2024/04/09] **SurveyAgent: A Conversational System for Personalized and Efficient Research Survey** | [[paper]](https://arxiv.org/abs/2404.06364) | [code]

- [2024/04/04] **AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent** | [[paper]](https://arxiv.org/abs/2404.03648) | [[code]](https://github.com/THUDM/AutoWebGLM)

- [2024/03/12] **AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production** | [[paper]](https://arxiv.org/abs/2403.07952) | [code]

- [2024/03/05] **InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2403.02691) | [code]

- [2024/03/05] **Android in the Zoo: Chain-of-Action-Thought for GUI Agents** | [[paper]](https://arxiv.org/abs/2403.02713) | [code]

- [2024/02/27] **BASES: Large-scale Web Search User Simulation with Large Language Model based Agents** | [[paper]](https://arxiv.org/abs/2402.17505) | [code]

- [2024/02/26] **Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models** | [[paper]](https://arxiv.org/abs/2402.16696) | [code]

- [2024/02/23] **On the Multi-turn Instruction Following for Conversational Web Agents** | [[paper]](https://arxiv.org/abs/2402.15057) | [code]

- [2024/02/20] **Large Language Model-based Human-Agent Collaboration for Complex Task Solving** | [[paper]](https://arxiv.org/abs/2402.12914) | [code]

- [2024/02/20] **AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning** | [[paper]](https://arxiv.org/abs/2402.13225) | [code]

- [2024/02/18] **SciAgent: Tool-augmented Language Models for Scientific Reasoning** | [[paper]](https://arxiv.org/abs/2402.11451) | [code]

- [2024/02/18] **Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models** | [[paper]](https://arxiv.org/abs/2402.11723) | [code]

- [2024/02/17] **Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima** | [[paper]](https://arxiv.org/abs/2402.11271) | [code]

- [2024/02/16] **ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages** | [[paper]](https://arxiv.org/abs/2402.10753) | [[code]](https://github.com/junjie-ye/toolsword)

- [2024/02/14] **Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications** | [[paper]](https://arxiv.org/abs/2402.09015) | [code]

- [2024/02/09] **CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models** | [[paper]](https://arxiv.org/abs/2402.06360) | [code]

- [2024/02/08] **UFO: A UI-Focused Agent for Windows OS Interaction** | [[paper]](https://arxiv.org/abs/2402.07939) | [code]

- [2024/02/06] **AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls** | [[paper]](https://arxiv.org/abs/2402.04253) | [[code]](https://github.com/dyabel/anytool)

- [2024/01/11] **EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction** | [[paper]](https://arxiv.org/abs/2401.06201) | [code]

- [2024/01/03] **GPT-4V(ision) is a Generalist Web Agent, if Grounded** | [[paper]](https://arxiv.org/abs/2401.01614) | [code]

- [2023/12/21] **Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System** | [[paper]](https://arxiv.org/abs/2312.13816) | [code]

- [2023/12/21] **AppAgent: Multimodal Agents as Smartphone Users** | [[paper]](https://arxiv.org/abs/2312.13771) | [[code]](https://github.com/mnotgod96/AppAgent)

- [2023/12/18] **CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update** | [[paper]](https://arxiv.org/abs/2312.10908) | [[code]](https://clova-tool.github.io/)

- [2023/12/14] **CogAgent: A Visual Language Model for GUI Agents** | [[paper]](https://arxiv.org/abs/2312.08914) | [code]

- [2023/11/19] **TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems** | [[paper]](https://arxiv.org/abs/2311.11315) | [code]

- [2023/10/18] **MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models** | [[paper]](https://arxiv.org/abs/2310.11954) | [code]

- [2023/10/13] **AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems** | [[paper]](https://arxiv.org/abs/2310.09233) | [code]

- [2023/10/12] **A Zero-Shot Language Agent for Computer Control with Structured Reflection** | [[paper]](https://arxiv.org/abs/2310.08740) | [code]

- [2023/09/02] **ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models** | [[paper]](https://arxiv.org/abs/2309.00986) | [[code]](https://github.com/modelscope/modelscope-agent)

- [2023/08/07] **TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents** | [[paper]](https://arxiv.org/abs/2308.03427) | [code]

- [2023/06/05] **When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm** | [[paper]](https://arxiv.org/abs/2306.02552) | [[code]](https://github.com/RUC-GSAI/YuLan-Rec)

---
### Benchmark&Evaluation
- [2024/06/28] **Designing and Evaluating Multi-Chatbot Interface for Human-AI Communication: Preliminary Findings from a Persuasion Task** | [[paper]](https://arxiv.org/abs/2406.19648) | [code]

- [2024/06/16] **GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents** | [[paper]](https://arxiv.org/abs/2406.10819) | [code]

- [2024/06/13] **ResearchArena: Benchmarking LLMs' Ability to Collect and Organize Information as Research Agents** | [[paper]](https://arxiv.org/abs/2406.10291) | [code]

- [2024/06/13] **StreamBench: Towards Benchmarking Continuous Improvement of Language Agents** | [[paper]](https://arxiv.org/abs/2406.08747) | [code]

- [2024/06/07] **WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild** | [[paper]](https://arxiv.org/abs/2406.04770) | [[code]](https://github.com/allenai/wildbench)

- [2024/06/07] **GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents** | [[paper]](https://arxiv.org/abs/2406.06613) | [code]

- [2024/05/23] **ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation** | [[paper]](https://arxiv.org/abs/2405.14125) | [code]

- [2024/05/23] **AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents** | [[paper]](https://arxiv.org/abs/2405.14573) | [code]

- [2024/05/16] **Speaker Verification in Agent-Generated Conversations** | [[paper]](https://arxiv.org/abs/2405.10150) | [code]

- [2024/05/13] **AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments** | [[paper]](https://arxiv.org/abs/2405.07960) | [code]

- [2024/05/01] **WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting** | [[paper]](https://arxiv.org/abs/2405.00823) | [[code]](https://github.com/olly-styles/workbench)

- [2024/04/23] **Evaluating Tool-Augmented Agents in Remote Sensing Platforms** | [[paper]](https://arxiv.org/abs/2405.00709) | [code]

- [2024/04/15] **MMInA: Benchmarking Multihop Multimodal Internet Agents** | [[paper]](https://arxiv.org/abs/2404.09992) | [[code]](https://github.com/shulin16/mmina)

- [2024/04/11] **OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments** | [[paper]](https://arxiv.org/abs/2404.07972) | [[code]](https://github.com/xlang-ai/OSWorld)

- [2024/04/09] **AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents** | [[paper]](https://arxiv.org/abs/2404.06411) | [[code]](https://github.com/nec-research/agentquest)

- [2024/03/20] **RoleInteract: Evaluating the Social Interaction of Role-Playing Agents** | [[paper]](https://arxiv.org/abs/2403.13679) | [code]

- [2024/03/18] **How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments** | [[paper]](https://arxiv.org/abs/2403.11807) | [code]

- [2024/03/18] **Tur[k]ingBench: A Challenge Benchmark for Web Agents** | [[paper]](https://arxiv.org/abs/2403.11905) | [code]

- [2024/03/13] **Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation** | [[paper]](https://arxiv.org/abs/2403.09738) | [code]

- [2024/03/05] **InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2403.02691) | [code]

- [2024/02/27] **Benchmarking Data Science Agents** | [[paper]](https://arxiv.org/abs/2402.17168) | [code]

- [2024/02/27] **OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web** | [[paper]](https://arxiv.org/abs/2402.17553) | [code]

- [2024/02/18] **Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation** | [[paper]](https://arxiv.org/abs/2402.11443) | [[code]](https://github.com/nanshineloong/self-evolving-benchmark)

- [2024/02/18] **MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization** | [[paper]](https://arxiv.org/abs/2402.11453) | [code]

- [2024/01/02] **CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation** | [[paper]](https://arxiv.org/abs/2401.01275) | [code]

- [2023/12/28] **How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation** | [[paper]](https://arxiv.org/abs/2312.17115) | [code]

- [2023/12/26] **RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models** | [[paper]](https://arxiv.org/abs/2312.16132) | [code]

- [2023/11/17] **Testing Language Model Agents Safely in the Wild** | [[paper]](https://arxiv.org/abs/2311.10538) | [code]

- [2023/11/16] **ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks** | [[paper]](https://arxiv.org/abs/2311.09835) | [[code]](https://ml-bench.github.io/)

- [2023/11/15] **ToolTalk: Evaluating Tool-Usage in a Conversational Setting** | [[paper]](https://arxiv.org/abs/2311.10775) | [[code]](https://github.com/microsoft/ToolTalk)

- [2023/10/24] **FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions** | [[paper]](https://arxiv.org/abs/2310.15421) | [code]

- [2023/10/09] **Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena** | [[paper]](https://arxiv.org/abs/2310.05746) | [[code]](https://github.com/jiangjiechen/auction-arena)

- [2023/10/02] **SmartPlay : A Benchmark for LLMs as Intelligent Agents** | [[paper]](https://arxiv.org/abs/2310.01557) | [[code]](https://github.com/microsoft/SmartPlay)

- [2023/09/18] **MindAgent: Emergent Gaming Interaction** | [[paper]](https://arxiv.org/abs/2309.09971) | [[code]](https://mindagent.github.io/)

- [2023/08/11] **BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents** | [[paper]](https://arxiv.org/abs/2308.05960) | [[code]](https://github.com/salesforce/BOLAA)

- [2023/08/07] **AgentBench: Evaluating LLMs as Agents** | [[paper]](https://arxiv.org/abs/2308.03688) | [[code]](https://github.com/THUDM/AgentBench)

- [2023/07/31] **HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution** | [[paper]](https://arxiv.org/abs/2307.16883) | [[code]](https://github.com/project-miracl/hagrid)

---
### Environment&Platform
- [2024/06/06] **AgentGym: Evolving Large Language Model-based Agents across Diverse Environments** | [[paper]](https://arxiv.org/abs/2406.04151) | [[code]](https://github.com/woooodyy/agentgym)

- [2024/05/24] **Hacc-Man: An Arcade Game for Jailbreaking LLMs** | [[paper]](https://arxiv.org/abs/2405.15902) | [code]

- [2024/05/23] **AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents** | [[paper]](https://arxiv.org/abs/2405.14573) | [code]

- [2024/04/01] **Rapid Mobile App Development for Generative AI Agents on MIT App Inventor** | [[paper]](https://arxiv.org/abs/2405.01561) | [code]

- [2024/03/28] **MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs** | [[paper]](https://arxiv.org/abs/2403.19267) | [[code]](https://github.com/cocacola-lab/mineland)

- [2024/03/26] **Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies** | [[paper]](https://arxiv.org/abs/2403.17497) | [[code]](https://github.com/clp-research/cost-sharing-reference-game)

- [2023/03/14] **CB2: Collaborative Natural Language Interaction Research Platform** | [[paper]](https://arxiv.org/abs/2303.08127) | [[code]](https://github.com/lil-lab/cb2)

---
### Agent Framework
- [2024/06/24] **OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer** | [[paper]](https://arxiv.org/abs/2406.16620) | [code]

- [2024/04/11] **Behavior Trees Enable Structured Programming of Language Model Agents** | [[paper]](https://arxiv.org/abs/2404.07439) | [[code]](https://github.com/RichardKelley/dendron)

- [2024/04/05] **Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents** | [[paper]](https://arxiv.org/abs/2404.04237) | [code]

- [2024/03/29] **ITCMA: A Generative Agent Based on a Computational Consciousness Structure** | [[paper]](https://arxiv.org/abs/2403.20097) | [code]

- [2024/03/18] **QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction** | [[paper]](https://arxiv.org/abs/2403.11886) | [code]

- [2024/02/26] **RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation** | [[paper]](https://arxiv.org/abs/2402.16667) | [code]

- [2024/02/26] **Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering** | [[paper]](https://arxiv.org/abs/2402.16313) | [code]

- [2024/02/22] **Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering** | [[paper]](https://arxiv.org/abs/2402.14320) | [code]

- [2024/02/17] **KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph** | [[paper]](https://arxiv.org/abs/2402.11163) | [code]

- [2024/01/05] **AFSPP: Agent Framework for Shaping Preference and Personality with Large Language Models** | [[paper]](https://arxiv.org/abs/2401.02870) | [code]

- [2023/11/02] **ProAgent: From Robotic Process Automation to Agentic Process Automation** | [[paper]](https://arxiv.org/abs/2311.10751) | [[code]](https://github.com/OpenBMB/ProAgent)

- [2023/09/29] **Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency** | [[paper]](https://arxiv.org/abs/2309.17382) | [[code]](https://github.com/agentification/RAFA_code)

- [2023/09/14] **Agents: An Open-source Framework for Autonomous Language Agents** | [[paper]](https://arxiv.org/abs/2309.07870) | [[code]](https://github.com/aiwaves-cn/agents)

- [2023/08/22] **ProAgent: Building Proactive Cooperative AI with Large Language Models** | [[paper]](https://arxiv.org/abs/2308.11339) | [[code]](https://github.com/PKU-Alignment/ProAgent)

- [2023/06/09] **Mind2Web: Towards a Generalist Agent for the Web** | [[paper]](https://arxiv.org/abs/2306.06070) | [[code]](https://github.com/OSU-NLP-Group/Mind2Web)

---
### Multi-Agent System
- [2024/06/26] **Simulating The U.S. Senate: An LLM-Driven Agent Approach to Modeling Legislative Behavior and Bipartisanship** | [[paper]](https://arxiv.org/abs/2406.18702) | [code]

- [2024/06/21] **Autonomous Agents for Collaborative Task under Information Asymmetry** | [[paper]](https://arxiv.org/abs/2406.14928) | [code]

- [2024/06/20] **Artificial Leviathan: Exploring Social Evolution of LLM Agents Through the Lens of Hobbesian Social Contract Theory** | [[paper]](https://arxiv.org/abs/2406.14373) | [code]

- [2024/06/17] **Improving Multi-Agent Debate with Sparse Communication Topology** | [[paper]](https://arxiv.org/abs/2406.11776) | [code]

- [2024/06/13] **Multi-Agent Software Development through Cross-Team Collaboration** | [[paper]](https://arxiv.org/abs/2406.08979) | [[code]](https://github.com/openbmb/chatdev)

- [2024/06/11] **CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation** | [[paper]](https://arxiv.org/abs/2406.07054) | [[code]](https://github.com/lirenhao1997/coevol)

- [2024/06/07] **Mixture-of-Agents Enhances Large Language Model Capabilities** | [[paper]](https://arxiv.org/abs/2406.04692) | [code]

- [2024/06/05] **Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework** | [[paper]](https://arxiv.org/abs/2406.03075) | [code]

- [2024/06/04] **Chain of Agents: Large Language Models Collaborating on Long-Context Tasks** | [[paper]](https://arxiv.org/abs/2406.02818) | [code]

- [2024/06/03] **Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration** | [[paper]](https://arxiv.org/abs/2406.01014) | [[code]](https://github.com/x-plug/mobileagent)

- [2024/05/30] **Safe Multi-agent Reinforcement Learning with Natural Language Constraints** | [[paper]](https://arxiv.org/abs/2405.20018) | [code]

- [2024/05/27] **LLM-Based Cooperative Agents using Information Relevance and Plan Validation** | [[paper]](https://arxiv.org/abs/2405.16751) | [code]

- [2024/05/23] **CityGPT: Towards Urban IoT Learning, Analysis and Interaction with Multi-Agent System** | [[paper]](https://arxiv.org/abs/2405.14691) | [code]

- [2024/05/20] **(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts** | [[paper]](https://arxiv.org/abs/2405.11804) | [code]

- [2024/05/17] **LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions** | [[paper]](https://arxiv.org/abs/2405.11106) | [code]

- [2024/05/07] **Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework** | [[paper]](https://arxiv.org/abs/2405.04294) | [code]

- [2024/05/06] **Conformity, Confabulation, and Impersonation: Persona Inconstancy in Multi-Agent LLM Collaboration** | [[paper]](https://arxiv.org/abs/2405.03862) | [code]

- [2024/05/05] **Language Evolution for Evading Social Media Regulation via LLM-based Multi-agent Simulation** | [[paper]](https://arxiv.org/abs/2405.02858) | [code]

- [2024/04/28] **ComposerX: Multi-Agent Symbolic Music Composition with LLMs** | [[paper]](https://arxiv.org/abs/2404.18081) | [[code]](https://github.com/lllindsey0615/composerx)

- [2024/04/25] **Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents** | [[paper]](https://arxiv.org/abs/2404.16698) | [code]

- [2024/04/23] **BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis** | [[paper]](https://arxiv.org/abs/2404.15532) | [[code]](https://github.com/agiresearch/battleagent)

- [2024/04/23] **CT-Agent: Clinical Trial Multi-Agent with Large Language Model-based Reasoning** | [[paper]](https://arxiv.org/abs/2404.14777) | [code]

- [2024/04/14] **Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation** | [[paper]](https://arxiv.org/abs/2404.09127) | [code]

- [2024/04/12] **Leveraging Multi-AI Agents for Cross-Domain Knowledge Discovery** | [[paper]](https://arxiv.org/abs/2404.08511) | [code]

- [2024/04/10] **MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education** | [[paper]](https://arxiv.org/abs/2404.06711) | [code]

- [2024/04/09] **Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems** | [[paper]](https://arxiv.org/abs/2404.06413) | [code]

- [2024/04/08] **360{\deg}REA: Towards A Reusable Experience Accumulation with 360{\deg} Assessment for Multi-Agent System** | [[paper]](https://arxiv.org/abs/2404.05569) | [code]

- [2024/04/06] **MACM: Utilizing a Multi-Agent System for Condition Mining in Solving Complex Mathematical Problems** | [[paper]](https://arxiv.org/abs/2404.04735) | [[code]](https://github.com/bin123apple/macm)

- [2024/04/03] **Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game** | [[paper]](https://arxiv.org/abs/2404.02532) | [code]

- [2024/04/02] **Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization** | [[paper]](https://arxiv.org/abs/2404.02183) | [code]

- [2024/04/02] **CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models** | [[paper]](https://arxiv.org/abs/2404.01663) | [code]

- [2024/04/01] **TraveLER: A Multi-LMM Agent Framework for Video Question-Answering** | [[paper]](https://arxiv.org/abs/2404.01476) | [code]

- [2024/03/28] **MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation** | [[paper]](https://arxiv.org/abs/2403.19305) | [code]

- [2024/03/26] **MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution** | [[paper]](https://arxiv.org/abs/2403.17927) | [code]

- [2024/03/21] **Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering** | [[paper]](https://arxiv.org/abs/2403.14783) | [code]

- [2024/03/20] **Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior** | [[paper]](https://arxiv.org/abs/2403.13433) | [code]

- [2024/03/19] **Embodied LLM Agents Learn to Cooperate in Organized Teams** | [[paper]](https://arxiv.org/abs/2403.12482) | [code]

- [2024/03/18] **How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments** | [[paper]](https://arxiv.org/abs/2403.11807) | [code]

- [2024/03/12] **Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations** | [[paper]](https://arxiv.org/abs/2403.07769) | [code]

- [2024/03/02] **AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks** | [[paper]](https://arxiv.org/abs/2403.04783) | [code]

- [2024/02/28] **Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?** | [[paper]](https://arxiv.org/abs/2402.18272) | [code]

- [2024/02/26] **Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation** | [[paper]](https://arxiv.org/abs/2402.16333) | [code]

- [2024/02/26] **Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering** | [[paper]](https://arxiv.org/abs/2402.16313) | [code]

- [2024/02/26] **LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments** | [[paper]](https://arxiv.org/abs/2402.16499) | [code]

- [2024/02/21] **LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain** | [[paper]](https://arxiv.org/abs/2402.14871) | [code]

- [2024/02/20] **What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents** | [[paper]](https://arxiv.org/abs/2402.13184) | [code]

- [2024/02/18] **Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation** | [[paper]](https://arxiv.org/abs/2402.11443) | [[code]](https://github.com/nanshineloong/self-evolving-benchmark)

- [2024/02/18] **LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration** | [[paper]](https://arxiv.org/abs/2402.11550) | [code]

- [2024/02/15] **TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation** | [[paper]](https://arxiv.org/abs/2402.10178) | [code]

- [2024/02/03] **More Agents Is All You Need** | [[paper]](https://arxiv.org/abs/2402.05120) | [code]

- [2024/02/02] **A Multi-Agent Conversational Recommender System** | [[paper]](https://arxiv.org/abs/2402.01135) | [code]

- [2024/01/27] **ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning** | [[paper]](https://arxiv.org/abs/2402.04268) | [code]

- [2024/01/11] **Combating Adversarial Attacks with Multi-Agent Debate** | [[paper]](https://arxiv.org/abs/2401.05998) | [code]

- [2024/01/08] **MARG: Multi-Agent Review Generation for Scientific Papers** | [[paper]](https://arxiv.org/abs/2401.04259) | [code]

- [2024/01/08] **SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems** | [[paper]](https://arxiv.org/abs/2401.03945) | [code]

- [2024/01/08] **Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet** | [[paper]](https://arxiv.org/abs/2401.03630) | [code]

- [2023/12/20] **AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation** | [[paper]](https://arxiv.org/abs/2312.13010) | [code]

- [2023/12/01] **Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games** | [[paper]](https://arxiv.org/abs/2312.00746) | [code]

- [2023/10/31] **Multi-Agent Consensus Seeking via Large Language Models** | [[paper]](https://arxiv.org/abs/2310.20151) | [code]

- [2023/10/25] **MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning** | [[paper]](https://arxiv.org/abs/2310.16730) | [code]

- [2023/10/10] **MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents** | [[paper]](https://arxiv.org/abs/2310.06500) | [code]

- [2023/10/03] **Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View** | [[paper]](https://arxiv.org/abs/2310.02124) | [[code]](https://github.com/zjunlp/MachineSoM)

- [2023/09/22] **Learning to Coordinate with Anyone** | [[paper]](https://arxiv.org/abs/2309.12633) | [code]

- [2023/09/18] **MindAgent: Emergent Gaming Interaction** | [[paper]](https://arxiv.org/abs/2309.09971) | [[code]](https://mindagent.github.io/)

- [2023/08/21] **AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents** | [[paper]](https://arxiv.org/abs/2308.10848) | [[code]](https://github.com/OpenBMB/AgentVerse)

- [2023/08/03] **InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent** | [[paper]](https://arxiv.org/abs/2308.01552) | [code]

- [2023/08/01] **MetaGPT: Meta Programming for Multi-Agent Collaborative Framework** | [[paper]](https://arxiv.org/abs/2308.00352) | [[code]](https://github.com/geekan/MetaGPT)

- [2023/07/16] **Communicative Agents for Software Development** | [[paper]](https://arxiv.org/abs/2307.07924) | [[code]](https://github.com/openbmb/chatdev)

- [2023/07/11] **Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration** | [[paper]](https://arxiv.org/abs/2307.05300) | [[code]](https://github.com/MikeWangWZHL/Solo-Performance-Prompting)

- [2023/07/05] **Building Cooperative Embodied Agents Modularly with Large Language Models** | [[paper]](https://arxiv.org/abs/2307.02485) | [[code]](https://github.com/UMass-Foundation-Model/Co-LLM-Agents)

- [2023/06/05] **Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents** | [[paper]](https://arxiv.org/abs/2306.03314) | [code]

---
### Agent Fine-tuning
- [2024/06/11] **CoEvol: Constructing Better Responses for Instruction Finetuning through Multi-Agent Cooperation** | [[paper]](https://arxiv.org/abs/2406.07054) | [[code]](https://github.com/lirenhao1997/coevol)

- [2024/06/05] **LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback** | [[paper]](https://arxiv.org/abs/2406.03363) | [code]

- [2024/06/03] **Reflection-Reinforced Self-Training for Language Agents** | [[paper]](https://arxiv.org/abs/2406.01495) | [[code]](https://github.com/PlusLabNLP/Re-ReST)

- [2024/05/31] **Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training** | [[paper]](https://arxiv.org/abs/2406.00222) | [code]

- [2024/05/30] **Large Language Models Can Self-Improve At Web Agent Tasks** | [[paper]](https://arxiv.org/abs/2405.20309) | [code]

- [2024/05/16] **Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning** | [[paper]](https://arxiv.org/abs/2405.10292) | [code]

- [2024/05/01] **Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning** | [[paper]](https://arxiv.org/abs/2405.00516) | [code]

- [2024/04/17] **Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent** | [[paper]](https://arxiv.org/abs/2404.11459) | [code]

- [2024/04/16] **Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning** | [[paper]](https://arxiv.org/abs/2404.10887) | [code]

- [2024/04/05] **Social Skill Training with Large Language Models** | [[paper]](https://arxiv.org/abs/2404.04204) | [code]

- [2024/04/02] **CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models** | [[paper]](https://arxiv.org/abs/2404.01663) | [code]

- [2024/03/29] **Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning** | [[paper]](https://arxiv.org/abs/2403.19962) | [code]

- [2024/03/21] **ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training** | [[paper]](https://arxiv.org/abs/2403.14589) | [code]

- [2024/03/19] **Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models** | [[paper]](https://arxiv.org/abs/2403.12881) | [code]

- [2024/03/18] **EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents** | [[paper]](https://arxiv.org/abs/2403.12014) | [code]

- [2024/02/23] **AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning** | [[paper]](https://arxiv.org/abs/2402.15506) | [code]

- [2024/02/21] **Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent** | [[paper]](https://arxiv.org/abs/2402.13717) | [code]

- [2024/02/19] **A Critical Evaluation of AI Feedback for Aligning Large Language Models** | [[paper]](https://arxiv.org/abs/2402.12366) | [code]

- [2024/02/18] **Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents** | [[paper]](https://arxiv.org/abs/2402.11651) | [code]

- [2024/02/17] **Training Language Model Agents without Modifying Language Models** | [[paper]](https://arxiv.org/abs/2402.11359) | [code]

- [2024/01/10] **Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training** | [[paper]](https://arxiv.org/abs/2401.05566) | [code]

- [2024/01/10] **Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk** | [[paper]](https://arxiv.org/abs/2401.05033) | [code]

- [2024/01/05] **From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models** | [[paper]](https://arxiv.org/abs/2401.02777) | [code]

- [2023/12/22] **Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning** | [[paper]](https://arxiv.org/abs/2312.14878) | [code]

- [2023/12/20] **Machine Mindset: An MBTI Exploration of Large Language Models** | [[paper]](https://arxiv.org/abs/2312.12999) | [[code]](https://github.com/PKU-YuanGroup/Machine-Mindset)

- [2023/11/28] **Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld** | [[paper]](https://arxiv.org/abs/2311.16714) | [code]

- [2023/10/19] **AgentTuning: Enabling Generalized Agent Abilities for LLMs** | [[paper]](https://arxiv.org/abs/2310.12823) | [[code]](https://github.com/THUDM/AgentTuning)

- [2023/10/09] **FireAct: Toward Language Agent Fine-tuning** | [[paper]](https://arxiv.org/abs/2310.05915) | [[code]](https://github.com/anchen1011/FireAct)

- [2023/10/01] **Adapting LLM Agents Through Communication** | [[paper]](https://arxiv.org/abs/2310.01444v2) | [code]

---
### Others
---
## :star: Star History

[![Star History Chart](https://api.star-history.com/svg?repos=AGI-Edgerunners/LLM-Agents-Papers&type=Date)](https://star-history.com/#AGI-Edgerunners/LLM-Agents-Papers&Date)