{"id":25870693,"url":"https://github.com/llm-evaluation-s-always-fatiguing/leaf-playground","last_synced_at":"2025-03-02T06:20:35.795Z","repository":{"id":214502991,"uuid":"722008176","full_name":"LLM-Evaluation-s-Always-Fatiguing/leaf-playground","owner":"LLM-Evaluation-s-Always-Fatiguing","description":"A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.","archived":false,"fork":false,"pushed_at":"2024-06-18T08:18:17.000Z","size":889,"stargazers_count":24,"open_issues_count":1,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-02T05:52:23.805Z","etag":null,"topics":["agent","agent-based-simulation","agents","automation","chatgpt","evaluations","llm-evaluation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LLM-Evaluation-s-Always-Fatiguing.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-22T08:42:27.000Z","updated_at":"2024-10-06T10:47:59.000Z","dependencies_parsed_at":"2024-02-08T08:28:01.664Z","dependency_job_id":"7ca7703f-aa62-4e07-a825-167b64cdbd13","html_url":"https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground","commit_stats":null,"previous_names":["llm-evaluation-s-always-fatiguing/leaf-playground"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLM-Evaluation-s-Always-Fatiguing%2Fleaf-playground","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLM-Evaluation-s-Always-Fatiguing%2Fleaf-playground/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLM-Evaluation-s-Always-Fatiguing%2Fleaf-playground/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LLM-Evaluation-s-Always-Fatiguing%2Fleaf-playground/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LLM-Evaluation-s-Always-Fatiguing","download_url":"https://codeload.github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241466633,"owners_count":19967495,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent","agent-based-simulation","agents","automation","chatgpt","evaluations","llm-evaluation"],"created_at":"2025-03-02T06:20:34.707Z","updated_at":"2025-03-02T06:20:35.776Z","avatar_url":"https://github.com/LLM-Evaluation-s-Always-Fatiguing.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ca href=\"https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground\"\u003e\n    \u003cimg alt=\"Logo\" src=\"docs/leaf-playground-logo.svg\" width=\"550\"\u003e\n  \u003c/a\u003e\n  \u003cp\u003e\n    \u003ca target=\"_blank\" href=\"https://pypi.org/project/leaf-playground/\"\u003e\n      \u003cimg alt=\"PyPI - Version\" src=\"https://img.shields.io/pypi/v/leaf-playground.svg?color=369B5A\u0026labelColor=black\u0026logo=github\u0026style=plastic\"\u003e\n    \u003c/a\u003e\n    \u003cimg alt=\"GitHub commit activity\" src=\"https://img.shields.io/github/commit-activity/w/LLM-Evaluation-s-Always-Fatiguing/leaf-playground?color=369B5A\u0026labelColor=black\u0026logo=github\u0026style=plastic\"\u003e\n    \u003cimg alt=\"PyPI - Downloads\" src=\"https://img.shields.io/pypi/dd/leaf-playground?color=369B5A\u0026labelColor=black\u0026logo=github\u0026style=plastic\"\u003e\n    \u003cimg alt=\"PyPI - Python Version\" src=\"https://img.shields.io/pypi/pyversions/leaf-playground?color=369B5A\u0026labelColor=black\u0026logo=github\u0026style=plastic\"\u003e\n    \u003cimg alt=\"Static Badge\" src=\"https://img.shields.io/badge/node.js-%E2%89%A518.19.0-brightgreen?color=369B5A\u0026labelColor=black\u0026logo=github\u0026style=plastic\"\u003e\n  \u003c/p\u003e\n\u003c/div\u003e\n\n\n## Introduction\n\n**leaf-playground** is a \"definition driven development\" framework to build scenario simulation projects that human and LLM-based agents can participant in together to compete to or co-operate with each other. It is primarily designed to efficiently evaluate the performance of LLM-based agents at the action level in specific scenarios or tasks, but it also possesses enormous potential for LLM native applications, such as developing [a language-based game](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-hub/tree/main/who_is_the_spy).\n\nApart from the framework itself, a bunch of CLI commands are provided to help developers speedup the process of building a scenario simulation project, and easily deploy a server with a [WEB UI](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-webui) where users can create simulation tasks, manually and(or) automatically evaluate agents' performance, visualize the simulation process and evaluation results.\n\nBelow are sister projects of **leaf-playground**:\n- [**leaf-playground-webui**](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-webui): the implementation of the leaf-playground's WEB UI.\n- [**leaf-playground-hub**](https://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground-hub): hosts our officially implemented scenario simulation projects.\n\n## Features\n\n- **\"Definition Driven Development\"**: advanced syntax for structured scenario definitions and programming conventions.\n- **Human + Multiple Agents**: facilitates human and AI Agents interaction in designated scenarios.\n- **Auto Evaluation**: automated action-level evaluation and report visualization for AI Agents.\n- **Local server support**: one-click local service deployment for scenario simulation tasks management and execution.\n- **Containerization**: containerization support for running scenario simulation tasks.\n- **Auto generate projects**: auto-generate and auto-complete code for scenario simulation projects.\n- **Debug Friendly**: support remote debugger across processes in Pycharm professional IDE.\n\n## Installation\n\n### Environment Setup\n\nMake sure you have `Python` and `Node.js` installed on your computer, if not, you can set up the environment by following instructions:\n- install `Python`: we recommend to use [miniconda](https://docs.conda.io/projects/miniconda/en/latest/miniconda-install.html) to configure Python virtual environment.\n- install `Node.js`: you can download and install Node.js from [Node.js official site](https://nodejs.org/en).\n\n### Quick Install\n**leaf-playground** has already been upload to pypi, thus you can use `pip` to quickly install:\n```shell\npip install leaf-playground\n```\n\n![Static Badge](https://img.shields.io/badge/introduced%20in-0.5.0-brightgreen?style=plastic) If you want to save data in PostgreSQL instead of SQLite, you need to include the `postgresql` extra dependency:\n\n```shell\npip install leaf-playground[postgresql]\n```\n\n![Static Badge](https://img.shields.io/badge/introduced%20in-0.5.0-brightgreen?style=plastic) If you are a framework or scenario simulation project developer who want to debug the code, you need to include the `debug` extra dependency:\n\n```shell\npip install leaf-playground[debug]\n```\n\n### Install from source\nTo install **leaf-playground** from the source, you need to clone the project by using `git clone`, then in your local `leaf-playground` directory, run:\n```shell\npip install .\n```\n\n## Usage\n\n### Start Server and Create a Task\n\nTo start the server that contains projects hosted in **leaf-playground-hub**, you need to first clone this project, then in the directory of your local **leaf-playground-hub**, using CLI command to start server with webui:\n```shell\nleaf-out start-server [--port PORT] [--ui_port UI_PORT]\n```\n\nBy default, the backend service will run on port 8000, the UI service will run on port 3000, you can use `--port` and `--ui_port` options to use different ports respectively.\n\nBelow is a video demonstrates how to create and run a task that using MMLU dataset to evaluate LLM-based agents.\n\nhttps://github.com/LLM-Evaluation-s-Always-Fatiguing/leaf-playground/assets/754493/0c980a97-1b7f-4884-bd85-fbdc60121ac8\n\n## Maintainers\n\n[@PanQiWei](https://github.com/panqiwei); [@Pandazki](https://github.com/pandazki).\n\n## Roadmap\n\n### The Framework\n\n- [x] support human participant in the scenario simulation as a dynamic agent\n- [x] running each scenario simulation task in a docker container\n- [x] support manage task status(pause, restart, interrupt, etc.)\n- [ ] support full task data persistence\n  - [x] save task info, logs and message in database\n  - [x] save task results in database or remote file system\n  - [ ] support for resuming runtime state and information from checkpoint and continuing execution\n- [ ] support complete project automatically\n  - [x] complete scene definition automatically\n  - [ ] complete agents automatically\n    - [x] complete agent base classes automatically\n    - [ ] complete specific agent class automatically\n  - [ ] complete evaluator automatically\n  - [x] complete scene automatically\n- [ ] refactor `ai_backend` to `llm_backend_tools` to remove some heavy dependencies\n- [ ] support streaming agents' responses\n\n### The Hub\n\n- [x] optimize scene flow of `who_is_the_spy` project and add metrics and evaluators\n- [ ] create a new project to support using OpenAI [evals](https://github.com/openai/evals)\n- [ ] create a new project to support using Microsoft [promptbench](https://github.com/microsoft/promptbench)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllm-evaluation-s-always-fatiguing%2Fleaf-playground","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fllm-evaluation-s-always-fatiguing%2Fleaf-playground","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fllm-evaluation-s-always-fatiguing%2Fleaf-playground/lists"}