Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
https://github.com/AGI-Edgerunners/LLM-Agents-Papers

A repo lists papers related to LLM based agent
https://github.com/AGI-Edgerunners/LLM-Agents-Papers
agents large-language-models llm-agent paper-list
Last synced: 3 months ago
JSON representation
A repo lists papers related to LLM based agent
Host: GitHub
URL: https://github.com/AGI-Edgerunners/LLM-Agents-Papers
Owner: AGI-Edgerunners
Created: 2023-05-31T08:33:48.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2023-12-26T12:12:57.000Z (6 months ago)
Last Synced: 2023-12-27T13:04:34.342Z (6 months ago)
Topics: agents, large-language-models, llm-agent, paper-list
Language: Python
Homepage:
Size: 453 KB
Stars: 312
Watchers: 11
Forks: 26
Open Issues: 1
Metadata Files:
- Readme: README.md
Lists

awesome-language-agents - LLM-Agents-Papers
awesome-llm-powered-agent - LLM-Agents-Papers - A repo lists papers about LLM role playing, memory mechanism and LLM game playing. (Other Related Sources / Multi-Agent Simulation Projects)
README

        # LLM-Agents-Papers

## :writing_hand: Description

Last Updated Time: 2024/3/27

A repo lists papers related to LLM based agent. Includes

* methods of role playing, memory mechanism and game playing

* methods of feedback or reflection

* methods of tool usage or human-agent interaction

* multi-agent system

* benchmarks and surveys of the field

* environments or platforms

* agent fine-tuning

## :yellow_heart: Recommendation

For more comprehensive reading, we also recommend other paper lists:

* [zjunlp/LLMAgentPapers](https://github.com/zjunlp/LLMAgentPapers): Must-read Papers on Large Language Model Agents.

* [teacherpeterpan/self-correction-llm-papers](https://github.com/teacherpeterpan/self-correction-llm-papers): This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.

* [Paitesanshi/LLM-Agent-Survey](https://github.com/Paitesanshi/LLM-Agent-Survey): A Survey on LLM-based Autonomous Agents.

* [woooodyy/llm-agent-paper-list](https://github.com/woooodyy/llm-agent-paper-list): Must-read papers for LLM-based agents.

## :newspaper: Papers

- Survey

	- [2024/03/07] **Promising and worth-to-try future directions for advancing state-of-the-art surrogates methods of agent-based models in social and health computational sciences** | [[paper]](https://arxiv.org/abs/2403.04417) | [code]

	- [2024/02/28] **Large Language Models and Games: A Survey and Roadmap** | [[paper]](https://arxiv.org/abs/2402.18659) | [code]

	- [2024/02/28] **A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems** | [[paper]](https://arxiv.org/abs/2402.18013) | [code]

	- [2024/02/07] **Can Large Language Model Agents Simulate Human Trust Behaviors?** | [[paper]](https://arxiv.org/abs/2402.04559) | [code]

	- [2024/02/06] **Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science** | [[paper]](https://arxiv.org/abs/2402.04247) | [code]

	- [2024/02/05] **Understanding the planning of LLM agents: A survey** | [[paper]](https://arxiv.org/abs/2402.02716) | [code]

	- [2024/02/02] **Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions** | [[paper]](https://arxiv.org/abs/2402.01108) | [code]

	- [2024/01/01] **If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents** | [[paper]](https://arxiv.org/abs/2401.00812) | [code]

	- [2023/12/31] **A Survey of Personality, Persona, and Profile in Conversational Agents and Chatbots** | [[paper]](https://arxiv.org/abs/2401.00609) | [code]

	- [2023/12/19] **Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives** | [[paper]](https://arxiv.org/abs/2312.11970) | [code]

	- [2023/09/14] **The Rise and Potential of Large Language Model Based Agents: A Survey** | [[paper]](https://arxiv.org/abs/2309.07864) | [[code]](https://github.com/woooodyy/llm-agent-paper-list)

	- [2023/08/22] **A Survey on Large Language Model based Autonomous Agents** | [[paper]](https://arxiv.org/abs/2308.11432) | [[code]](https://github.com/Paitesanshi/LLM-Agent-Survey)

	- [2023/06/27] **Next Steps for Human-Centered Generative AI: A Technical Perspective** | [[paper]](https://arxiv.org/abs/2306.15774) | [code]

	- [2023/04/06] **Can Large Language Models Play Text Games Well? Current State-of-the-Art and Open Questions** | [[paper]](https://arxiv.org/abs/2304.02868) | [code]

---

- Agent Fine-tuning

	- [2024/03/21] **ReAct Meets ActRe: Autonomous Annotation of Agent Trajectories for Contrastive Self-Training** | [[paper]](https://arxiv.org/abs/2403.14589) | [code]

	- [2024/03/19] **Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models** | [[paper]](https://arxiv.org/abs/2403.12881) | [code]

	- [2024/03/18] **EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents** | [[paper]](https://arxiv.org/abs/2403.12014) | [code]

	- [2024/02/23] **AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning** | [[paper]](https://arxiv.org/abs/2402.15506) | [code]

	- [2024/02/21] **Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent** | [[paper]](https://arxiv.org/abs/2402.13717) | [code]

	- [2024/02/19] **A Critical Evaluation of AI Feedback for Aligning Large Language Models** | [[paper]](https://arxiv.org/abs/2402.12366) | [code]

	- [2024/02/18] **Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents** | [[paper]](https://arxiv.org/abs/2402.11651) | [code]

	- [2024/02/17] **Training Language Model Agents without Modifying Language Models** | [[paper]](https://arxiv.org/abs/2402.11359) | [code]

	- [2024/01/10] **Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training** | [[paper]](https://arxiv.org/abs/2401.05566) | [code]

	- [2024/01/10] **Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk** | [[paper]](https://arxiv.org/abs/2401.05033) | [code]

	- [2024/01/05] **From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models** | [[paper]](https://arxiv.org/abs/2401.02777) | [code]

	- [2023/12/22] **Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning** | [[paper]](https://arxiv.org/abs/2312.14878) | [code]

	- [2023/12/20] **Machine Mindset: An MBTI Exploration of Large Language Models** | [[paper]](https://arxiv.org/abs/2312.12999) | [[code]](https://github.com/PKU-YuanGroup/Machine-Mindset)

	- [2023/10/19] **AgentTuning: Enabling Generalized Agent Abilities for LLMs** | [[paper]](https://arxiv.org/abs/2310.12823) | [[code]](https://github.com/THUDM/AgentTuning)

	- [2023/10/09] **FireAct: Toward Language Agent Fine-tuning** | [[paper]](https://arxiv.org/abs/2310.05915) | [[code]](https://github.com/anchen1011/FireAct)

	- [2023/10/01] **Adapting LLM Agents Through Communication** | [[paper]](https://arxiv.org/abs/2310.01444v2) | [code]

---

- Role Playing

	- [2024/03/22] **CACA Agent: Capability Collaboration based AI Agent** | [[paper]](https://arxiv.org/abs/2403.15137) | [code]

	- [2024/03/19] **Characteristic AI Agents via Large Language Models** | [[paper]](https://arxiv.org/abs/2403.12368) | [code]

	- [2024/03/15] **VideoAgent: Long-form Video Understanding with Large Language Model as Agent** | [[paper]](https://arxiv.org/abs/2403.10517) | [code]

	- [2024/03/05] **ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary** | [[paper]](https://arxiv.org/abs/2403.02574) | [code]

	- [2024/03/05] **SimuCourt: Building Judicial Decision-Making Agents with Real-world Judgement Documents** | [[paper]](https://arxiv.org/abs/2403.02959) | [code]

	- [2024/03/02] **SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code** | [[paper]](https://arxiv.org/abs/2403.01248) | [code]

	- [2024/02/29] **On the Decision-Making Abilities in Role-Playing using Large Language Models** | [[paper]](https://arxiv.org/abs/2402.18807) | [code]

	- [2024/02/28] **Prospect Personalized Recommendation on Large Language Model-based Agent Platform** | [[paper]](https://arxiv.org/abs/2402.18240) | [code]

	- [2024/02/28] **Data Interpreter: An LLM Agent For Data Science** | [[paper]](https://arxiv.org/abs/2402.18679) | [[code]](https://github.com/geekan/metagpt)

	- [2024/02/27] **BASES: Large-scale Web Search User Simulation with Large Language Model based Agents** | [[paper]](https://arxiv.org/abs/2402.17505) | [code]

	- [2024/02/26] **Language Agents as Optimizable Graphs** | [[paper]](https://arxiv.org/abs/2402.16823) | [[code]](https://github.com/metauto-ai/gptswarm)

	- [2024/02/26] **Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation** | [[paper]](https://arxiv.org/abs/2402.16333) | [code]

	- [2024/02/25] **Understanding Public Perceptions of AI Conversational Agents: A Cross-Cultural Analysis** | [[paper]](https://arxiv.org/abs/2402.16039) | [code]

	- [2024/02/25] **Bootstrapping Cognitive Agents with a Large Language Model** | [[paper]](https://arxiv.org/abs/2403.00810) | [code]

	- [2024/02/23] **On the Multi-turn Instruction Following for Conversational Web Agents** | [[paper]](https://arxiv.org/abs/2402.15057) | [code]

	- [2024/02/22] **Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation** | [[paper]](https://arxiv.org/abs/2402.14744) | [code]

	- [2024/02/21] **Neeko: Leveraging Dynamic LoRA for Efficient Multi-Character Role-Playing Agent** | [[paper]](https://arxiv.org/abs/2402.13717) | [code]

	- [2024/02/20] **Can Large Language Models be Used to Provide Psychological Counselling? An Analysis of GPT-4-Generated Responses Using Role-play Dialogues** | [[paper]](https://arxiv.org/abs/2402.12738) | [code]

	- [2024/02/20] **Soft Self-Consistency Improves Language Model Agents** | [[paper]](https://arxiv.org/abs/2402.13212) | [code]

	- [2024/02/20] **CHATATC: Large Language Model-Driven Conversational Agents for Supporting Strategic Air Traffic Flow Management** | [[paper]](https://arxiv.org/abs/2402.14850) | [code]

	- [2024/02/19] **Polarization of Autonomous Generative AI Agents Under Echo Chambers** | [[paper]](https://arxiv.org/abs/2402.12212) | [code]

	- [2024/02/19] **LLM Agents for Psychology: A Study on Gamified Assessments** | [[paper]](https://arxiv.org/abs/2402.12326) | [code]

	- [2024/02/19] **Stick to your Role! Stability of Personal Values Expressed in Large Language Models** | [[paper]](https://arxiv.org/abs/2402.14846) | [code]

	- [2024/02/19] **WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment** | [[paper]](https://arxiv.org/abs/2402.12275) | [code]

	- [2024/02/18] **Modelling Political Coalition Negotiations Using LLM-based Agents** | [[paper]](https://arxiv.org/abs/2402.11712) | [code]

	- [2024/02/17] **Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents** | [[paper]](https://arxiv.org/abs/2402.11208) | [code]

	- [2024/02/15] **Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients** | [[paper]](https://arxiv.org/abs/2402.10153) | [code]

	- [2024/02/13] **Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast** | [[paper]](https://arxiv.org/abs/2402.08567) | [[code]](https://github.com/sail-sg/agent-smith)

	- [2024/02/06] **Professional Agents -- Evolving Large Language Models into Autonomous Experts with Human-Level Competencies** | [[paper]](https://arxiv.org/abs/2402.03628) | [code]

	- [2024/02/06] **Can Generative Agents Predict Emotion?** | [[paper]](https://arxiv.org/abs/2402.04232) | [code]

	- [2024/02/05] **LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models** | [[paper]](https://arxiv.org/abs/2402.02896) | [code]

	- [2024/02/05] **GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models** | [[paper]](https://arxiv.org/abs/2402.03299) | [code]

	- [2024/02/04] **NavHint: Vision and Language Navigation Agent with a Hint Generator** | [[paper]](https://arxiv.org/abs/2402.02559) | [code]

	- [2024/02/02] **TrustAgent: Towards Safe and Trustworthy LLM-based Agents through Agent Constitution** | [[paper]](https://arxiv.org/abs/2402.01586) | [code]

	- [2024/02/01] **Executable Code Actions Elicit Better LLM Agents** | [[paper]](https://arxiv.org/abs/2402.01030) | [code]

	- [2024/01/31] **LLMs Simulate Big Five Personality Traits: Further Evidence** | [[paper]](https://arxiv.org/abs/2402.01765) | [code]

	- [2024/01/29] **Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues** | [[paper]](https://arxiv.org/abs/2402.01737) | [code]

	- [2024/01/09] **Agent Alignment in Evolving Social Norms** | [[paper]](https://arxiv.org/abs/2401.04620) | [code]

	- [2023/12/28] **Experiential Co-Learning of Software-Developing Agents** | [[paper]](https://arxiv.org/abs/2312.17025) | [code]

	- [2023/12/27] **Automating Knowledge Acquisition for Content-Centric Cognitive Agents Using LLMs** | [[paper]](https://arxiv.org/abs/2312.16378) | [code]

	- [2023/12/21] **ChatGPT as a commenter to the news: can LLMs generate human-like opinions?** | [[paper]](https://arxiv.org/abs/2312.13961) | [code]

	- [2023/12/19] **Can ChatGPT be Your Personal Medical Assistant?** | [[paper]](https://arxiv.org/abs/2312.12006) | [code]

	- [2023/12/06] **LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem** | [[paper]](https://arxiv.org/abs/2312.03815) | [code]

	- [2023/11/28] **War and Peace (WarAgent): Large Language Model-based Multi-Agent Simulation of World Wars** | [[paper]](https://arxiv.org/abs/2311.17227) | [code]

	- [2023/11/23] **Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach** | [[paper]](https://arxiv.org/abs/2311.13884) | [code]

	- [2023/11/10] **Smart Agent-Based Modeling: On the Use of Large Language Models in Computer Simulations** | [[paper]](https://arxiv.org/abs/2311.06330) | [[code]](https://github.com/Roihn/SABM)

	- [2023/10/01] **RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models** | [[paper]](https://arxiv.org/abs/2310.00746) | [[code]](https://github.com/InteractiveNLP-Team/RoleLLM-public)

	- [2023/09/08] **Unleashing the Power of Graph Learning through LLM-based Autonomous Agents** | [[paper]](https://arxiv.org/abs/2309.04565) | [code]

	- [2023/09/05] **Cognitive Architectures for Language Agents** | [[paper]](https://arxiv.org/abs/2309.02427) | [code]

	- [2023/08/22] **Towards an On-device Agent for Text Rewriting** | [[paper]](https://arxiv.org/abs/2308.11807) | [code]

	- [2023/08/14] **ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate** | [[paper]](https://arxiv.org/abs/2308.07201) | [[code]](https://github.com/thunlp/ChatEval)

	- [2023/08/10] **LLM As DBA** | [[paper]](https://arxiv.org/abs/2308.05481) | [[code]](https://github.com/TsinghuaDatabaseGroup/DB-GPT)

	- [2023/07/24] **To Infinity and Beyond: SHOW-1 and Showrunner Agents in Multi-Agent Simulations** | [[paper]](https://fablestudio.github.io/showrunner-agents/) | [code]

	- [2023/06/28] **Inferring the Goals of Communicating Agents from Actions and Instructions** | [[paper]](https://arxiv.org/abs/2306.16207) | [code]

	- [2023/05/30] **Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate** | [[paper]](https://arxiv.org/abs/2305.19118) | [[code]](https://github.com/Skytliang/Multi-Agents-Debate)

	- [2023/05/27] **SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks** | [[paper]](https://arxiv.org/abs/2305.17390) | [[code]](https://github.com/yuchenlin/swiftsage/)

	- [2023/05/26] **Training Socially Aligned Language Models in Simulated Human Society** | [[paper]](https://arxiv.org/abs/2305.16960) | [[code]](https://github.com/agi-templar/Stable-Alignment)

	- [2023/05/25] **Role-Play with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.16367) | [code]

	- [2023/05/17] **Tree of Thoughts: Deliberate Problem Solving with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.10601) | [[code]](https://github.com/ysymyth/tree-of-thought-llm)

	- [2023/05/09] **TidyBot: Personalized Robot Assistance with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.05658) | [[code]](https://github.com/jimmyyhwu/tidybot)

	- [2023/05/02] **The Role of Summarization in Generative Agents: A Preliminary Perspective** | [[paper]](https://arxiv.org/abs/2305.01253) | [code]

	- [2023/04/26] **Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models** | [[paper]](https://arxiv.org/abs/2304.13835) | [[code]](https://github.com/facebookresearch/LIGHT)

	- [2023/04/24] **ChatLLM Network: More brains, More intelligence** | [[paper]](https://arxiv.org/abs/2304.12998) | [code]

	- [2023/04/21] **Improving Grounded Language Understanding in a Collaborative Environment by Interacting with Agents Through Help Feedback** | [[paper]](https://arxiv.org/abs/2304.10750) | [code]

	- [2023/04/19] **Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models** | [paper] | [code]

	- [2023/04/15] **Self-collaboration Code Generation via ChatGPT** | [[paper]](https://arxiv.org/abs/2304.07590) | [code]

	- [2023/04/07] **Generative Agents: Interactive Simulacra of Human Behavior** | [[paper]](https://arxiv.org/abs/2304.03442) | [[code]](https://github.com/joonspk-research/generative_agents)

	- [2023/03/31] **CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society** | [[paper]](https://arxiv.org/abs/2303.17760) | [code]

	- [2022/12/08] **LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models** | [[paper]](https://arxiv.org/abs/2212.04088) | [[code]](https://dki-lab.github.io/LLM-Planner/)

---

- Multi-Agent System

	- [2024/03/21] **Multi-Agent VQA: Exploring Multi-Agent Foundation Models in Zero-Shot Visual Question Answering** | [[paper]](https://arxiv.org/abs/2403.14783) | [code]

	- [2024/03/20] **Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior** | [[paper]](https://arxiv.org/abs/2403.13433) | [code]

	- [2024/03/19] **Embodied LLM Agents Learn to Cooperate in Organized Teams** | [[paper]](https://arxiv.org/abs/2403.12482) | [code]

	- [2024/03/18] **How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments** | [[paper]](https://arxiv.org/abs/2403.11807) | [code]

	- [2024/03/12] **Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations** | [[paper]](https://arxiv.org/abs/2403.07769) | [code]

	- [2024/03/02] **AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks** | [[paper]](https://arxiv.org/abs/2403.04783) | [code]

	- [2024/02/28] **Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?** | [[paper]](https://arxiv.org/abs/2402.18272) | [code]

	- [2024/02/26] **Unveiling the Truth and Facilitating Change: Towards Agent-based Large-scale Social Movement Simulation** | [[paper]](https://arxiv.org/abs/2402.16333) | [code]

	- [2024/02/26] **Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering** | [[paper]](https://arxiv.org/abs/2402.16313) | [code]

	- [2024/02/26] **LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments** | [[paper]](https://arxiv.org/abs/2402.16499) | [code]

	- [2024/02/21] **LLM Based Multi-Agent Generation of Semi-structured Documents from Semantic Templates in the Public Administration Domain** | [[paper]](https://arxiv.org/abs/2402.14871) | [code]

	- [2024/02/20] **What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents** | [[paper]](https://arxiv.org/abs/2402.13184) | [code]

	- [2024/02/18] **Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation** | [[paper]](https://arxiv.org/abs/2402.11443) | [[code]](https://github.com/nanshineloong/self-evolving-benchmark)

	- [2024/02/18] **LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration** | [[paper]](https://arxiv.org/abs/2402.11550) | [code]

	- [2024/02/15] **TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation** | [[paper]](https://arxiv.org/abs/2402.10178) | [code]

	- [2024/02/03] **More Agents Is All You Need** | [[paper]](https://arxiv.org/abs/2402.05120) | [code]

	- [2024/02/02] **A Multi-Agent Conversational Recommender System** | [[paper]](https://arxiv.org/abs/2402.01135) | [code]

	- [2024/01/27] **ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning** | [[paper]](https://arxiv.org/abs/2402.04268) | [code]

	- [2024/01/11] **Combating Adversarial Attacks with Multi-Agent Debate** | [[paper]](https://arxiv.org/abs/2401.05998) | [code]

	- [2024/01/08] **MARG: Multi-Agent Review Generation for Scientific Papers** | [[paper]](https://arxiv.org/abs/2401.04259) | [code]

	- [2024/01/08] **SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems** | [[paper]](https://arxiv.org/abs/2401.03945) | [code]

	- [2024/01/08] **Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet** | [[paper]](https://arxiv.org/abs/2401.03630) | [code]

	- [2023/12/20] **AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation** | [[paper]](https://arxiv.org/abs/2312.13010) | [code]

	- [2023/12/01] **Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games** | [[paper]](https://arxiv.org/abs/2312.00746) | [code]

	- [2023/10/31] **Multi-Agent Consensus Seeking via Large Language Models** | [[paper]](https://arxiv.org/abs/2310.20151) | [code]

	- [2023/10/25] **MultiPrompter: Cooperative Prompt Optimization with Multi-Agent Reinforcement Learning** | [[paper]](https://arxiv.org/abs/2310.16730) | [code]

	- [2023/10/10] **MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents** | [[paper]](https://arxiv.org/abs/2310.06500) | [code]

	- [2023/10/03] **Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View** | [[paper]](https://arxiv.org/abs/2310.02124) | [[code]](https://github.com/zjunlp/MachineSoM)

	- [2023/09/22] **Learning to Coordinate with Anyone** | [[paper]](https://arxiv.org/abs/2309.12633) | [code]

	- [2023/09/18] **MindAgent: Emergent Gaming Interaction** | [[paper]](https://arxiv.org/abs/2309.09971) | [[code]](https://mindagent.github.io/)

	- [2023/08/21] **AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents** | [[paper]](https://arxiv.org/abs/2308.10848) | [[code]](https://github.com/OpenBMB/AgentVerse)

	- [2023/08/03] **InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent** | [[paper]](https://arxiv.org/abs/2308.01552) | [code]

	- [2023/08/01] **MetaGPT: Meta Programming for Multi-Agent Collaborative Framework** | [[paper]](https://arxiv.org/abs/2308.00352) | [[code]](https://github.com/geekan/MetaGPT)

	- [2023/07/16] **Communicative Agents for Software Development** | [[paper]](https://arxiv.org/abs/2307.07924) | [[code]](https://github.com/openbmb/chatdev)

	- [2023/07/11] **Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration** | [[paper]](https://arxiv.org/abs/2307.05300) | [[code]](https://github.com/MikeWangWZHL/Solo-Performance-Prompting)

	- [2023/07/05] **Building Cooperative Embodied Agents Modularly with Large Language Models** | [[paper]](https://arxiv.org/abs/2307.02485) | [[code]](https://github.com/UMass-Foundation-Model/Co-LLM-Agents)

	- [2023/06/05] **Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents** | [[paper]](https://arxiv.org/abs/2306.03314) | [code]

---

- Feedback&Reflection

	- [2024/03/17] **Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback** | [[paper]](https://arxiv.org/abs/2403.11330) | [code]

	- [2024/03/08] **ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues** | [[paper]](https://arxiv.org/abs/2403.05326) | [code]

	- [2024/03/04] **Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents** | [[paper]](https://arxiv.org/abs/2403.02502) | [code]

	- [2024/02/27] **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization** | [[paper]](https://arxiv.org/abs/2402.17574) | [code]

	- [2024/02/26] **SelectIT: Selective Instruction Tuning for Large Language Models via Uncertainty-Aware Self-Reflection** | [[paper]](https://arxiv.org/abs/2402.16705) | [code]

	- [2024/02/24] **Empowering Large Language Model Agents through Action Learning** | [[paper]](https://arxiv.org/abs/2402.15809) | [[code]](https://github.com/zhao-ht/learnact)

	- [2024/02/22] **Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning** | [[paper]](https://arxiv.org/abs/2402.14963) | [code]

	- [2024/02/19] **A Critical Evaluation of AI Feedback for Aligning Large Language Models** | [[paper]](https://arxiv.org/abs/2402.12366) | [code]

	- [2024/02/06] **AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls** | [[paper]](https://arxiv.org/abs/2402.04253) | [[code]](https://github.com/dyabel/anytool)

	- [2024/02/02] **StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback** | [[paper]](https://arxiv.org/abs/2402.01391) | [code]

	- [2024/02/01] **Generation, Distillation and Evaluation of Motivational Interviewing-Style Reflections with a Foundational Language Model** | [[paper]](https://arxiv.org/abs/2402.01051) | [code]

	- [2023/12/18] **CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update** | [[paper]](https://arxiv.org/abs/2312.10908) | [[code]](https://clova-tool.github.io/)

	- [2023/11/14] **The ART of LLM Refinement: Ask, Refine, and Trust** | [[paper]](https://arxiv.org/abs/2311.07961) | [code]

	- [2023/10/31] **Learning From Mistakes Makes LLM Better Reasoner** | [[paper]](https://arxiv.org/abs/2310.20689) | [[code]](https://github.com/microsoft/LEMA)

	- [2023/08/01] **SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning** | [[paper]](https://arxiv.org/abs/2308.00436) | [[code]](https://github.com/ningmiao/selfcheck)

	- [2023/07/27] **PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback** | [[paper]](https://arxiv.org/abs/2307.14936) | [code]

	- [2023/05/26] **AdaPlanner: Adaptive Planning from Feedback with Language Models** | [[paper]](https://arxiv.org/abs/2305.16653) | [[code]](https://github.com/haotiansun14/adaplanner)

	- [2023/05/22] **Making Language Models Better Tool Learners with Execution Feedback** | [[paper]](https://arxiv.org/abs/2305.13068) | [[code]](https://github.com/zjunlp/trice)

	- [2023/04/11] **Teaching Large Language Models to Self-Debug** | [[paper]](https://arxiv.org/abs/2304.05128) | [code]

	- [2023/03/30] **Self-Refine: Iterative Refinement with Self-Feedback** | [[paper]](https://arxiv.org/abs/2303.17651) | [[code]](https://github.com/madaan/self-refine)

	- [2023/02/03] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents** | [[paper]](https://arxiv.org/abs/2302.01560) | [code]

---

- Memory Mechanism

	- [2024/02/27] **Evaluating Very Long-Term Conversational Memory of LLM Agents** | [[paper]](https://arxiv.org/abs/2402.17753) | [code]

	- [2024/02/19] **Compress to Impress: Unleashing the Potential of Compressive Memory in Real-World Long-Term Conversations** | [[paper]](https://arxiv.org/abs/2402.11975) | [code]

	- [2024/02/07] **InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory** | [[paper]](https://arxiv.org/abs/2402.04617) | [code]

	- [2024/02/06] **RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents** | [[paper]](https://arxiv.org/abs/2402.03610) | [code]

	- [2023/12/22] **Empowering Working Memory for Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2312.17259) | [code]

	- [2023/12/22] **Evolving Large Language Model Assistant with Long-Term Conditional Memory** | [[paper]](https://arxiv.org/abs/2312.17257) | [code]

	- [2023/11/10] **JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models** | [[paper]](https://arxiv.org/abs/2311.05997) | [[code]](https://github.com/CraftJarvis/JARVIS-1)

	- [2023/10/16] **CLIN: A Continually Learning Language Agent for Rapid Task Adaptation and Generalization** | [[paper]](https://arxiv.org/abs/2310.10134) | [[code]](https://github.com/allenai/clin)

	- [2023/06/06] **ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory** | [[paper]](https://arxiv.org/abs/2306.03901) | [code]

	- [2023/05/31] **Monotonic Location Attention for Length Generalization** | [[paper]](https://arxiv.org/abs/2305.20019) | [code]

	- [2023/05/26] **Randomized Positional Encodings Boost Length Generalization of Transformers** | [[paper]](https://arxiv.org/abs/2305.16843) | [code]

	- [2023/05/25] **Landmark Attention: Random-Access Infinite Context Length for Transformers** | [[paper]](https://arxiv.org/abs/2305.16300) | [code]

	- [2023/05/24] **Revisiting Parallel Context Windows: A Frustratingly Simple Alternative and Chain-of-Thought Deterioration** | [[paper]](https://arxiv.org/abs/2305.15262) | [code]

	- [2023/05/24] **Adapting Language Models to Compress Contexts** | [[paper]](https://arxiv.org/abs/2305.14788) | [code]

	- [2023/05/23] **RET-LLM: Towards a General Read-Write Memory for Large Language Models** | [[paper]](https://arxiv.org/abs/2305.14322) | [code]

	- [2023/05/22] **RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text** | [[paper]](https://arxiv.org/abs/2305.13304) | [code]

	- [2023/05/19] **ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings** | [[paper]](https://arxiv.org/abs/2305.11554) | [code]

	- [2023/05/17] **MemoryBank: Enhancing Large Language Models with Long-Term Memory** | [[paper]](https://arxiv.org/abs/2305.10250) | [code]

	- [2023/05/15] **Small Models are Valuable Plug-ins for Large Language Models** | [[paper]](https://arxiv.org/abs/2305.08848) | [code]

	- [2023/05/02] **Unlimiformer: Long-Range Transformers with Unlimited Length Input** | [[paper]](https://arxiv.org/abs/2305.01625) | [code]

	- [2023/05/01] **Learning to Reason and Memorize with Self-Notes** | [[paper]](https://arxiv.org/abs/2305.00833) | [code]

	- [2023/04/27] **ChatLog: Recording and Analyzing ChatGPT Across Time** | [[paper]](https://arxiv.org/abs/2304.14106) | [code]

	- [2023/04/26] **Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System** | [[paper]](https://arxiv.org/abs/2304.13343) | [code]

	- [2023/04/21] **Emergent and Predictable Memorization in Large Language Models** | [[paper]](https://arxiv.org/abs/2304.11158) | [code]

	- [2023/03/17] **CoLT5: Faster Long-Range Transformers with Conditional Computation** | [[paper]](https://arxiv.org/abs/2303.09752) | [code]

---

- Planning

	- [2024/03/13] **AutoGuide: Automated Generation and Selection of State-Aware Guidelines for Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2403.08978) | [code]

	- [2024/03/12] **AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production** | [[paper]](https://arxiv.org/abs/2403.07952) | [code]

	- [2024/03/11] **Strength Lies in Differences! Towards Effective Non-collaborative Dialogues via Tailored Strategy Planning** | [[paper]](https://arxiv.org/abs/2403.06769) | [code]

	- [2024/03/10] **TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision** | [[paper]](https://arxiv.org/abs/2403.06221) | [code]

	- [2024/03/05] **KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents** | [[paper]](https://arxiv.org/abs/2403.03101) | [code]

	- [2024/03/05] **Language Guided Exploration for RL Agents in Text Environments** | [[paper]](https://arxiv.org/abs/2403.03141) | [code]

	- [2024/02/29] **PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval** | [[paper]](https://arxiv.org/abs/2402.19273) | [code]

	- [2024/02/28] **Data Interpreter: An LLM Agent For Data Science** | [[paper]](https://arxiv.org/abs/2402.18679) | [[code]](https://github.com/geekan/metagpt)

	- [2024/02/18] **What's the Plan? Evaluating and Developing Planning-Aware Techniques for LLMs** | [[paper]](https://arxiv.org/abs/2402.11489) | [code]

	- [2024/02/18] **PreAct: Predicting Future in ReAct Enhances Agent's Planning Ability** | [[paper]](https://arxiv.org/abs/2402.11534) | [code]

	- [2024/02/16] **When is Tree Search Useful for LLM Planning? It Depends on the Discriminator** | [[paper]](https://arxiv.org/abs/2402.10890) | [[code]](https://github.com/osu-nlp-group/llm-planning-eval)

	- [2024/02/09] **Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty** | [[paper]](https://arxiv.org/abs/2402.06529) | [code]

	- [2024/02/06] **RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents** | [[paper]](https://arxiv.org/abs/2402.03610) | [code]

	- [2024/02/02] **TravelPlanner: A Benchmark for Real-World Planning with Language Agents** | [[paper]](https://arxiv.org/abs/2402.01622) | [[code]](https://github.com/OSU-NLP-Group/TravelPlanner)

	- [2024/01/10] **AUTOACT: Automatic Agent Learning from Scratch via Self-Planning** | [[paper]](https://arxiv.org/abs/2401.05268) | [code]

	- [2023/11/19] **TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems** | [[paper]](https://arxiv.org/abs/2311.11315) | [code]

	- [2023/10/09] **Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena** | [[paper]](https://arxiv.org/abs/2310.05746) | [[code]](https://github.com/jiangjiechen/auction-arena)

	- [2023/08/07] **TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents** | [[paper]](https://arxiv.org/abs/2308.03427) | [code]

	- [2023/05/26] **AdaPlanner: Adaptive Planning from Feedback with Language Models** | [[paper]](https://arxiv.org/abs/2305.16653) | [[code]](https://github.com/haotiansun14/adaplanner)

	- [2023/05/24] **Reasoning with Language Model is Planning with World Model** | [[paper]](https://arxiv.org/abs/2305.14992) | [code]

	- [2023/05/24] **Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning** | [[paper]](https://arxiv.org/abs/2305.14909) | [[code]](https://github.com/GuanSuns/LLMs-World-Models-for-Planning)

	- [2023/03/29] **Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks** | [[paper]](https://arxiv.org/abs/2303.16563) | [[code]](https://sites.google.com/view/plan4mc)

	- [2023/02/03] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents** | [[paper]](https://arxiv.org/abs/2302.01560) | [code]

	- [2022/12/08] **LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models** | [[paper]](https://arxiv.org/abs/2212.04088) | [[code]](https://dki-lab.github.io/LLM-Planner/)

---

- Game Playing

	- [2024/03/26] **Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies** | [[paper]](https://arxiv.org/abs/2403.17497) | [[code]](https://github.com/clp-research/cost-sharing-reference-game)

	- [2024/02/19] **LLM Agents for Psychology: A Study on Gamified Assessments** | [[paper]](https://arxiv.org/abs/2402.12326) | [code]

	- [2024/02/13] **Large Language Models as Minecraft Agents** | [[paper]](https://arxiv.org/abs/2402.08392) | [code]

	- [2024/02/12] **Large Language Models as Agents in Two-Player Games** | [[paper]](https://arxiv.org/abs/2402.08078) | [code]

	- [2024/02/04] **Enhance Reasoning for Large Language Models in the Game Werewolf** | [[paper]](https://arxiv.org/abs/2402.02330) | [code]

	- [2024/02/02] **PokeLLMon: A Human-Parity Agent for Pokemon Battles with Large Language Models** | [[paper]](https://arxiv.org/abs/2402.01118) | [code]

	- [2023/12/29] **Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game** | [[paper]](https://arxiv.org/abs/2312.17515) | [code]

	- [2023/11/10] **JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models** | [[paper]](https://arxiv.org/abs/2311.05997) | [[code]](https://github.com/CraftJarvis/JARVIS-1)

	- [2023/10/31] **Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models** | [[paper]](https://arxiv.org/abs/2310.20499) | [code]

	- [2023/09/29] **Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4** | [[paper]](https://arxiv.org/abs/2309.17277) | [[code]](https://github.com/CR-Gjx/Suspicion-Agent)

	- [2023/09/18] **MindAgent: Emergent Gaming Interaction** | [[paper]](https://arxiv.org/abs/2309.09971) | [[code]](https://mindagent.github.io/)

	- [2023/09/10] **An Appraisal-Based Chain-Of-Emotion Architecture for Affective Language Model Game Agents** | [[paper]](https://arxiv.org/abs/2309.05076) | [code]

	- [2023/09/09] **Exploring Large Language Models for Communication Games: An Empirical Study on Werewolf** | [[paper]](https://arxiv.org/abs/2309.04658) | [code]

	- [2023/08/23] **Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis** | [[paper]](https://arxiv.org/abs/2308.12466) | [code]

	- [2023/05/31] **Recursive Metropolis-Hastings Naming Game: Symbol Emergence in a Multi-agent System based on Probabilistic Generative Models** | [[paper]](https://arxiv.org/abs/2305.19761) | [code]

	- [2023/05/26] **Playing repeated games with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.16867) | [code]

	- [2023/05/25] **Voyager: An Open-Ended Embodied Agent with Large Language Models** | [[paper]](https://arxiv.org/abs/2305.16291) | [[code]](https://github.com/MineDojo/Voyager)

	- [2023/05/25] **Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory** | [[paper]](https://arxiv.org/abs/2305.17144) | [[code]](https://github.com/OpenGVLab/GITM)

	- [2023/05/19] **Examining the Inter-Consistency of Large Language Models: An In-depth Analysis via Debate** | [paper] | [code]

	- [2023/05/17] **Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback** | [[paper]](https://arxiv.org/abs/2305.10142) | [[code]](https://github.com/FranxYao/GPT-Bargaining)

	- [2023/05/08] **Knowledge-enhanced Agents for Interactive Text Games** | [[paper]](https://arxiv.org/abs/2305.05091) | [code]

	- [2023/03/29] **Plan4MC: Skill Reinforcement Learning and Planning for Open-World Minecraft Tasks** | [[paper]](https://arxiv.org/abs/2303.16563) | [[code]](https://sites.google.com/view/plan4mc)

	- [2023/02/03] **Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents** | [[paper]](https://arxiv.org/abs/2302.01560) | [code]

---

- Game Platform

	- [2024/03/26] **Sharing the Cost of Success: A Game for Evaluating and Learning Collaborative Multi-Agent Instruction Giving and Following Policies** | [[paper]](https://arxiv.org/abs/2403.17497) | [[code]](https://github.com/clp-research/cost-sharing-reference-game)

	- [2023/03/14] **CB2: Collaborative Natural Language Interaction Research Platform** | [[paper]](https://arxiv.org/abs/2303.08127) | [[code]](https://github.com/lil-lab/cb2)

---

- Benchmark&Evaluation&Framework

	- [2024/03/20] **RoleInteract: Evaluating the Social Interaction of Role-Playing Agents** | [[paper]](https://arxiv.org/abs/2403.13679) | [code]

	- [2024/03/18] **How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments** | [[paper]](https://arxiv.org/abs/2403.11807) | [code]

	- [2024/03/18] **Tur[k]ingBench: A Challenge Benchmark for Web Agents** | [[paper]](https://arxiv.org/abs/2403.11905) | [code]

	- [2024/03/18] **QueryAgent: A Reliable and Efficient Reasoning Framework with Environmental Feedback based Self-Correction** | [[paper]](https://arxiv.org/abs/2403.11886) | [code]

	- [2024/03/13] **Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation** | [[paper]](https://arxiv.org/abs/2403.09738) | [code]

	- [2024/03/05] **InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2403.02691) | [code]

	- [2024/02/27] **Benchmarking Data Science Agents** | [[paper]](https://arxiv.org/abs/2402.17168) | [code]

	- [2024/02/27] **OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web** | [[paper]](https://arxiv.org/abs/2402.17553) | [code]

	- [2024/02/26] **RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation** | [[paper]](https://arxiv.org/abs/2402.16667) | [code]

	- [2024/02/26] **Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering** | [[paper]](https://arxiv.org/abs/2402.16313) | [code]

	- [2024/02/22] **Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering** | [[paper]](https://arxiv.org/abs/2402.14320) | [code]

	- [2024/02/18] **Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation** | [[paper]](https://arxiv.org/abs/2402.11443) | [[code]](https://github.com/nanshineloong/self-evolving-benchmark)

	- [2024/02/18] **MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization** | [[paper]](https://arxiv.org/abs/2402.11453) | [code]

	- [2024/02/17] **KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph** | [[paper]](https://arxiv.org/abs/2402.11163) | [code]

	- [2024/01/05] **AFSPP: Agent Framework for Shaping Preference and Personality with Large Language Models** | [[paper]](https://arxiv.org/abs/2401.02870) | [code]

	- [2024/01/02] **CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation** | [[paper]](https://arxiv.org/abs/2401.01275) | [code]

	- [2023/12/28] **How Far Are We from Believable AI Agents? A Framework for Evaluating the Believability of Human Behavior Simulation** | [[paper]](https://arxiv.org/abs/2312.17115) | [code]

	- [2023/12/26] **RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models** | [[paper]](https://arxiv.org/abs/2312.16132) | [code]

	- [2023/11/17] **Testing Language Model Agents Safely in the Wild** | [[paper]](https://arxiv.org/abs/2311.10538) | [code]

	- [2023/11/16] **ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks** | [[paper]](https://arxiv.org/abs/2311.09835) | [[code]](https://ml-bench.github.io/)

	- [2023/11/15] **ToolTalk: Evaluating Tool-Usage in a Conversational Setting** | [[paper]](https://arxiv.org/abs/2311.10775) | [[code]](https://github.com/microsoft/ToolTalk)

	- [2023/11/02] **ProAgent: From Robotic Process Automation to Agentic Process Automation** | [[paper]](https://arxiv.org/abs/2311.10751) | [[code]](https://github.com/OpenBMB/ProAgent)

	- [2023/10/24] **FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions** | [[paper]](https://arxiv.org/abs/2310.15421) | [code]

	- [2023/10/09] **Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena** | [[paper]](https://arxiv.org/abs/2310.05746) | [[code]](https://github.com/jiangjiechen/auction-arena)

	- [2023/10/02] **SmartPlay : A Benchmark for LLMs as Intelligent Agents** | [[paper]](https://arxiv.org/abs/2310.01557) | [[code]](https://github.com/microsoft/SmartPlay)

	- [2023/09/29] **Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency** | [[paper]](https://arxiv.org/abs/2309.17382) | [[code]](https://github.com/agentification/RAFA_code)

	- [2023/09/18] **MindAgent: Emergent Gaming Interaction** | [[paper]](https://arxiv.org/abs/2309.09971) | [[code]](https://mindagent.github.io/)

	- [2023/09/14] **Agents: An Open-source Framework for Autonomous Language Agents** | [[paper]](https://arxiv.org/abs/2309.07870) | [[code]](https://github.com/aiwaves-cn/agents)

	- [2023/08/22] **ProAgent: Building Proactive Cooperative AI with Large Language Models** | [[paper]](https://arxiv.org/abs/2308.11339) | [[code]](https://github.com/PKU-Alignment/ProAgent)

	- [2023/08/11] **BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents** | [[paper]](https://arxiv.org/abs/2308.05960) | [[code]](https://github.com/salesforce/BOLAA)

	- [2023/08/07] **AgentBench: Evaluating LLMs as Agents** | [[paper]](https://arxiv.org/abs/2308.03688) | [[code]](https://github.com/THUDM/AgentBench)

	- [2023/07/31] **HAGRID: A Human-LLM Collaborative Dataset for Generative Information-Seeking with Attribution** | [[paper]](https://arxiv.org/abs/2307.16883) | [[code]](https://github.com/project-miracl/hagrid)

	- [2023/06/09] **Mind2Web: Towards a Generalist Agent for the Web** | [[paper]](https://arxiv.org/abs/2306.06070) | [[code]](https://github.com/OSU-NLP-Group/Mind2Web)

---

- Tool Usage&Human-Agent Interaction

	- [2024/03/12] **AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production** | [[paper]](https://arxiv.org/abs/2403.07952) | [code]

	- [2024/03/05] **InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents** | [[paper]](https://arxiv.org/abs/2403.02691) | [code]

	- [2024/03/05] **Android in the Zoo: Chain-of-Action-Thought for GUI Agents** | [[paper]](https://arxiv.org/abs/2403.02713) | [code]

	- [2024/02/27] **BASES: Large-scale Web Search User Simulation with Large Language Model based Agents** | [[paper]](https://arxiv.org/abs/2402.17505) | [code]

	- [2024/02/26] **Look Before You Leap: Towards Decision-Aware and Generalizable Tool-Usage for Large Language Models** | [[paper]](https://arxiv.org/abs/2402.16696) | [code]

	- [2024/02/23] **On the Multi-turn Instruction Following for Conversational Web Agents** | [[paper]](https://arxiv.org/abs/2402.15057) | [code]

	- [2024/02/20] **Large Language Model-based Human-Agent Collaboration for Complex Task Solving** | [[paper]](https://arxiv.org/abs/2402.12914) | [code]

	- [2024/02/20] **AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning** | [[paper]](https://arxiv.org/abs/2402.13225) | [code]

	- [2024/02/18] **SciAgent: Tool-augmented Language Models for Scientific Reasoning** | [[paper]](https://arxiv.org/abs/2402.11451) | [code]

	- [2024/02/18] **Shaping Human-AI Collaboration: Varied Scaffolding Levels in Co-writing with Language Models** | [[paper]](https://arxiv.org/abs/2402.11723) | [code]

	- [2024/02/17] **Human-AI Interactions in the Communication Era: Autophagy Makes Large Models Achieving Local Optima** | [[paper]](https://arxiv.org/abs/2402.11271) | [code]

	- [2024/02/16] **ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages** | [[paper]](https://arxiv.org/abs/2402.10753) | [[code]](https://github.com/junjie-ye/toolsword)

	- [2024/02/14] **Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications** | [[paper]](https://arxiv.org/abs/2402.09015) | [code]

	- [2024/02/09] **CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models** | [[paper]](https://arxiv.org/abs/2402.06360) | [code]

	- [2024/02/08] **UFO: A UI-Focused Agent for Windows OS Interaction** | [[paper]](https://arxiv.org/abs/2402.07939) | [code]

	- [2024/02/06] **AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls** | [[paper]](https://arxiv.org/abs/2402.04253) | [[code]](https://github.com/dyabel/anytool)

	- [2024/01/03] **GPT-4V(ision) is a Generalist Web Agent, if Grounded** | [[paper]](https://arxiv.org/abs/2401.01614) | [code]

	- [2023/12/21] **Team Flow at DRC2023: Building Common Ground and Text-based Turn-taking in a Travel Agent Spoken Dialogue System** | [[paper]](https://arxiv.org/abs/2312.13816) | [code]

	- [2023/12/21] **AppAgent: Multimodal Agents as Smartphone Users** | [[paper]](https://arxiv.org/abs/2312.13771) | [[code]](https://github.com/mnotgod96/AppAgent)

	- [2023/12/18] **CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update** | [[paper]](https://arxiv.org/abs/2312.10908) | [[code]](https://clova-tool.github.io/)

	- [2023/12/14] **CogAgent: A Visual Language Model for GUI Agents** | [[paper]](https://arxiv.org/abs/2312.08914) | [code]

	- [2023/11/19] **TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems** | [[paper]](https://arxiv.org/abs/2311.11315) | [code]

	- [2023/10/18] **MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models** | [[paper]](https://arxiv.org/abs/2310.11954) | [code]

	- [2023/10/13] **AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems** | [[paper]](https://arxiv.org/abs/2310.09233) | [code]

	- [2023/10/12] **A Zero-Shot Language Agent for Computer Control with Structured Reflection** | [[paper]](https://arxiv.org/abs/2310.08740) | [code]

	- [2023/09/02] **ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models** | [[paper]](https://arxiv.org/abs/2309.00986) | [[code]](https://github.com/modelscope/modelscope-agent)

	- [2023/08/07] **TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents** | [[paper]](https://arxiv.org/abs/2308.03427) | [code]

	- [2023/06/05] **When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm** | [[paper]](https://arxiv.org/abs/2306.02552) | [[code]](https://github.com/RUC-GSAI/YuLan-Rec)

---

- Others

---

## :star: Star History

[![Star History Chart](https://api.star-history.com/svg?repos=AGI-Edgerunners/LLM-Agents-Papers&type=Date)](https://star-history.com/#AGI-Edgerunners/LLM-Agents-Papers&Date)