https://github.com/llm-evaluation-s-always-fatiguing/leaf-playground

A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
https://github.com/llm-evaluation-s-always-fatiguing/leaf-playground

agent agent-based-simulation agents automation chatgpt evaluations llm-evaluation

Last synced: over 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/llm-evaluation-s-always-fatiguing/leaf-playground
Owner: LLM-Evaluation-s-Always-Fatiguing
License: mit
Created: 2023-11-22T08:42:27.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-06-18T08:18:17.000Z (about 2 years ago)
Last Synced: 2025-03-02T05:52:23.805Z (over 1 year ago)
Topics: agent, agent-based-simulation, agents, automation, chatgpt, evaluations, llm-evaluation
Language: Python
Homepage:
Size: 868 KB
Stars: 24
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

## Introduction

**leaf-playground** is a "definition driven development" framework to build scenario simulation projects that human and LLM-based agents can participant in together to compete to or co-operate with each other. It is primarily designed to efficiently evaluate the performance of LLM-based agents at the action level in specific scenarios or tasks, but it also possesses enormous potential for LLM native applications, such as developing [a language-based game](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-hub/tree/main/who_is_the_spy).

Apart from the framework itself, a bunch of CLI commands are provided to help developers speedup the process of building a scenario simulation project, and easily deploy a server with a [WEB UI](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-webui) where users can create simulation tasks, manually and(or) automatically evaluate agents' performance, visualize the simulation process and evaluation results.

Below are sister projects of **leaf-playground**:
- [**leaf-playground-webui**](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-webui): the implementation of the leaf-playground's WEB UI.
- [**leaf-playground-hub**](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-hub): hosts our officially implemented scenario simulation projects.

## Features

- **"Definition Driven Development"**: advanced syntax for structured scenario definitions and programming conventions.
- **Human + Multiple Agents**: facilitates human and AI Agents interaction in designated scenarios.
- **Auto Evaluation**: automated action-level evaluation and report visualization for AI Agents.
- **Local server support**: one-click local service deployment for scenario simulation tasks management and execution.
- **Containerization**: containerization support for running scenario simulation tasks.
- **Auto generate projects**: auto-generate and auto-complete code for scenario simulation projects.
- **Debug Friendly**: support remote debugger across processes in Pycharm professional IDE.

## Installation

### Environment Setup

Make sure you have `Python` and `Node.js` installed on your computer, if not, you can set up the environment by following instructions:
- install `Python`: we recommend to use [miniconda](https://docs.conda.io/projects/miniconda/en/latest/miniconda-install.html) to configure Python virtual environment.
- install `Node.js`: you can download and install Node.js from [Node.js official site](https://nodejs.org/en).

### Quick Install
**leaf-playground** has already been upload to pypi, thus you can use `pip` to quickly install:
```shell
pip install leaf-playground
```

![Static Badge](https://img.shields.io/badge/introduced%20in-0.5.0-brightgreen?style=plastic) If you want to save data in PostgreSQL instead of SQLite, you need to include the `postgresql` extra dependency:

```shell
pip install leaf-playground[postgresql]
```

![Static Badge](https://img.shields.io/badge/introduced%20in-0.5.0-brightgreen?style=plastic) If you are a framework or scenario simulation project developer who want to debug the code, you need to include the `debug` extra dependency:

```shell
pip install leaf-playground[debug]
```

### Install from source
To install **leaf-playground** from the source, you need to clone the project by using `git clone`, then in your local `leaf-playground` directory, run:
```shell
pip install .
```

## Usage

### Start Server and Create a Task

To start the server that contains projects hosted in **leaf-playground-hub**, you need to first clone this project, then in the directory of your local **leaf-playground-hub**, using CLI command to start server with webui:
```shell
leaf-out start-server [--port PORT] [--ui_port UI_PORT]
```

By default, the backend service will run on port 8000, the UI service will run on port 3000, you can use `--port` and `--ui_port` options to use different ports respectively.

Below is a video demonstrates how to create and run a task that using MMLU dataset to evaluate LLM-based agents.

https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground/assets/754493/0c980a97-1b7f-4884-bd85-fbdc60121ac8

## Maintainers

[@PanQiWei](https://github.com/panqiwei); [@Pandazki](https://github.com/pandazki).

## Roadmap

### The Framework

- [x] support human participant in the scenario simulation as a dynamic agent
- [x] running each scenario simulation task in a docker container
- [x] support manage task status(pause, restart, interrupt, etc.)
- [ ] support full task data persistence
- [x] save task info, logs and message in database
- [x] save task results in database or remote file system
- [ ] support for resuming runtime state and information from checkpoint and continuing execution
- [ ] support complete project automatically
- [x] complete scene definition automatically
- [ ] complete agents automatically
- [x] complete agent base classes automatically
- [ ] complete specific agent class automatically
- [ ] complete evaluator automatically
- [x] complete scene automatically
- [ ] refactor `ai_backend` to `llm_backend_tools` to remove some heavy dependencies
- [ ] support streaming agents' responses

### The Hub

- [x] optimize scene flow of `who_is_the_spy` project and add metrics and evaluators
- [ ] create a new project to support using OpenAI [evals](https://github.com/openai/evals)
- [ ] create a new project to support using Microsoft [promptbench](https://github.com/microsoft/promptbench)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/llm-evaluation-s-always-fatiguing/leaf-playground

Awesome Lists containing this project

README