{"id":15632355,"url":"https://github.com/xingyaoww/code-act","last_synced_at":"2025-05-16T13:07:40.854Z","repository":{"id":220483207,"uuid":"742738613","full_name":"xingyaoww/code-act","owner":"xingyaoww","description":"Official Repo for ICML 2024 paper \"Executable Code Actions Elicit Better LLM Agents\" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji.","archived":false,"fork":false,"pushed_at":"2024-05-23T23:14:59.000Z","size":25300,"stargazers_count":1055,"open_issues_count":9,"forks_count":86,"subscribers_count":13,"default_branch":"main","last_synced_at":"2025-04-12T14:55:55.095Z","etag":null,"topics":["llm","llm-agent","llm-finetuning","llm-framework"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xingyaoww.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-13T08:06:04.000Z","updated_at":"2025-04-12T14:41:53.000Z","dependencies_parsed_at":"2024-10-23T02:01:49.095Z","dependency_job_id":null,"html_url":"https://github.com/xingyaoww/code-act","commit_stats":null,"previous_names":["xingyaoww/code-act"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xingyaoww%2Fcode-act","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xingyaoww%2Fcode-act/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xingyaoww%2Fcode-act/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xingyaoww%2Fcode-act/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xingyaoww","download_url":"https://codeload.github.com/xingyaoww/code-act/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254535829,"owners_count":22087399,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["llm","llm-agent","llm-finetuning","llm-framework"],"created_at":"2024-10-03T10:43:41.077Z","updated_at":"2025-05-16T13:07:40.836Z","avatar_url":"https://github.com/xingyaoww.png","language":"Python","funding_links":[],"categories":["Learning","🌐 Resources \u0026 Tools","AI Agents"],"sub_categories":["Repositories","Open-Source Projects"],"readme":"\u003ch1 align=\"center\"\u003e Executable Code Actions Elicit Better LLM Agents \u003c/h1\u003e\n\n\u003cp align=\"center\"\u003e\n\u003ca href=\"https://arxiv.org/abs/2402.01030\"\u003e📃 Paper\u003c/a\u003e\n•\n\u003ca href=\"https://huggingface.co/datasets/xingyaoww/code-act\" \u003e🤗 Data (CodeActInstruct)\u003c/a\u003e\n•\n\u003ca href=\"https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1\" \u003e🤗 Model (CodeActAgent-Mistral-7b-v0.1)\u003c/a\u003e\n•\n\u003ca href=\"https://chat.xwang.dev/\"\u003e🤖 Chat with CodeActAgent!\u003c/a\u003e\n\u003c/p\u003e\n\nWe propose to use executable **code** to consolidate LLM agents’ **act**ions into a unified action space (**CodeAct**).\nIntegrated with a Python interpreter, CodeAct can execute code actions and dynamically revise prior actions or emit new actions upon new observations (e.g., code execution results) through multi-turn interactions (check out [this example!](https://chat.xwang.dev/r/Vqn108G)).\n\n![Overview](figures/overview.png)\n\n## News\n\n**Apr 10, 2024**: CodeActAgent Mistral is [officially available at `ollama`](https://ollama.com/xingyaow/codeact-agent-mistral)!\n\n**Mar 11, 2024**: We also add [llama.cpp](https://github.com/ggerganov/llama.cpp) support for inferencing CodeActAgent on laptop (tested on MacOS), check out instructions [here](#using-llamacpp-for-laptop)!\n\n**Mar 11, 2024**: We now support serving all CodeActAgent's components (LLM serving, code executor, MongoDB, Chat-UI) via Kubernetes ⎈! Check out [this guide](docs/KUBERNETES_DEPLOY.md)!\n\n**Feb 2, 2024**: CodeAct is released!\n\n## Why CodeAct?\n\nOur extensive analysis of 17 LLMs on API-Bank and a newly curated benchmark [M\u003csup\u003e3\u003c/sup\u003eToolEval](docs/EVALUATION.md) shows that CodeAct outperforms widely used alternatives like Text and JSON (up to 20% higher success rate). Please check our paper for more detailed analysis!\n\n![Comparison between CodeAct and Text/JSON](figures/codeact-comparison-table.png)\n*Comparison between CodeAct and Text / JSON as action.*\n\n![Comparison between CodeAct and Text/JSON](figures/codeact-comparison-perf.png)\n*Quantitative results comparing CodeAct and {Text, JSON} on M\u003csup\u003e3\u003c/sup\u003eToolEval.*\n\n\n## 📁 CodeActInstruct\n\nWe collect an instruction-tuning dataset, CodeActInstruct, consists of 7k multi-turn interactions using CodeAct. Dataset is release at [huggingface dataset 🤗](https://huggingface.co/datasets/xingyaoww/code-act). Please refer to the paper and [this section](#-data-generation-optional) for details of data collection.\n\n\n![Data Statistics](figures/data-stats.png)\n*Dataset Statistics. Token statistics are computed using Llama-2 tokenizer.*\n\n## 🪄 CodeActAgent\n\nTrained on **CodeActInstruct** and general conversations, **CodeActAgent** excels at out-of-domain agent tasks compared to open-source models of the same size, while not sacrificing generic performance (e.g., knowledge, dialog). We release two variants of CodeActAgent:\n- **CodeActAgent-Mistral-7b-v0.1** (recommended, [model link](https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1)): using Mistral-7b-v0.1 as the base model with 32k context window.\n- **CodeActAgent-Llama-7b** ([model link](https://huggingface.co/xingyaoww/CodeActAgent-Llama-2-7b)): using Llama-2-7b as the base model with 4k context window.\n\n![Model Performance](figures/model-performance.png)\n*Evaluation results for CodeActAgent. ID and OD correspondingly stand for in-domain and out-of-domain evaluation. Overall averaged performance normalizes the MT-Bench score to be consistent with other tasks and excludes in-domain tasks for fair comparison.*\n\n\nPlease check out [:page_with_curl: our paper](TODO) for more details about data collection, model training, evaluation, and more!\n\n\n## 🚀 Use CodeActAgent for Your Application!\n\n\u003cvideo src=\"https://github.com/xingyaoww/code-act/assets/38853559/62c80ada-62ce-447e-811c-fc801dd4beac\"\u003e \u003c/video\u003e\n*Demo of the chat interface.*\n\nA CodeActAgent system contains the following components:\n\n- **LLM Serving**: We use [vLLM as an example](#serve-the-model-using-vllm-into-openai-compatible-api), but any serving software that can serve the model into an OpenAI compatile API should be fine.\n- **Interaction Interface**:\n  - [Chat-UI for chat interface + MongoDB for chat history](#via-chat-ui)\n  - OR [simple Python script](#via-simple-python-script)\n- **Code Execution Engine**: This service will start an [API](#start-your-code-execution-engine) that accepts code execution requests from Chat-UI or the Python script, then starts an individual docker container to execute code for *each* chat session.\n\n🌟 **If you have access to a Kubernetes cluster**: You can follow [our Kubernetes setup guide](docs/KUBERNETES_DEPLOY.md) that allows you to spin up all of these components using one command!\n\nFollow the guide below to set up with Docker.\n\n### Serve the Model into OpenAI Compatible API\n\n#### Using VLLM via Docker (requires [nvidia-docker](https://github.com/NVIDIA/nvidia-docker))\n\n```bash\n# You should download the model first, here is an example for CodeActAgent-Mistral\ncd $YOUR_DIR_TO_DOWNLOADED_MISTRAL_MODEL\ngit lfs install\ngit clone https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1\n./scripts/chat/start_vllm.sh $YOUR_DIR_TO_DOWNLOADED_MISTRAL_MODEL/CodeActAgent-Mistral-7b-v0.1\n# OR\n# ./scripts/chat_ui/start_vllm.sh $YOUR_DIR_TO_DOWNLOADED_LLAMA_MODEL/CodeActAgent-Llama-7b\n```\n\nThis script (docker-required) will start hosting the model based on `CUDA_VISIBLE_DEVICES` to port `8080` and you may access the model via OPENAI_API_BASE of `http://localhost:8080/v1` (by default). You may check the [OpenAI API's official documentation](https://platform.openai.com/docs/api-reference/chat/create) for detailed instruction. You may also check vLLM's [official instruction](https://vllm.ai/) for more information.\n\n#### Using LLama.cpp (for laptop!)\n\nThis is tested on MacOS (M2 Max, Ventura 13.6).\n\n**Install LLama.cpp**\n```bash\ngit clone https://github.com/ggerganov/llama.cpp.git\n# optionally create a conda environment for installation\nconda create -n llamacpp python=3.10\n# Install dependencies for llama cpp\ncd llama.cpp\nconda activate llamacpp\npip install -r requirements.txt\n# Build (refer to https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#build for more details)\nmake\n```\n\n**(Optional) Convert Model into [gguf](https://github.com/ggerganov/ggml/blob/master/docs/gguf.md) Format**\n\nOR you can skip the following commands by downloading the pre-converted quantized version (q8_0) [here](https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1.q8_0.gguf).\n```bash\n# Download the model if you haven't\ngit lfs install\ngit clone https://huggingface.co/xingyaoww/CodeActAgent-Mistral-7b-v0.1\n# Assume you are in the directory of llama.cpp\npython convert.py ./CodeActAgent-Mistral-7b-v0.1 --outtype f16 --outfile CodeActAgent-Mistral-7b-v0.1.f16.gguf\n# (optional) Quantize for faster inference\n./quantize CodeActAgent-Mistral-7b-v0.1.f16.gguf CodeActAgent-Mistral-7b-v0.1.q8_0.gguf Q8_0\n```\n\n**Serve into OpenAI compatible API**\n\nSee [this](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#llamacpp-http-server) for a detailed description of the arguments.\n```bash\n./server -m CodeActAgent-Mistral-7b-v0.1.q8_0.gguf -c 8192 --port 8080\n```\n\nNow you can access the OpenAI compatible server on `http://localhost:8080/v1` with model name being `CodeActAgent-Mistral-7b-v0.1.q8_0.gguf`. **You need to change model name from `CodeActAgent-Mistral-7b-v0.1` to `CodeActAgent-Mistral-7b-v0.1.q8_0.gguf` for the interaction interface** in the following section (in chat-ui configuration file or in the Python script)!\n\n#### (Optional) Test if OpenAI-compatible API is working\n```bash\ncurl -X POST 'http://localhost:8080/v1/chat/completions' -d '{\n  \"model\": \"CodeActAgent-Mistral-7b-v0.1.q8_0.gguf\",\n  \"messages\": [\n      {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n      {\"role\": \"user\", \"content\": \"How to build a website?\"}\n  ]\n}'\n```\n\n\n### Start your Code Execution Engine!\n\nWe implemented a containerized code execution engine based on [JupyterKernelGateway](https://github.com/jupyter-server/kernel_gateway). The idea is to start a Jupyter server inside a [docker container](scripts/chat_ui/code_execution/Dockerfile) *per chat session* to support code execution request from the model (the session will timeout in a fixed period of time). It requires docker to be installed locally.\n\n```bash\n# Start a code execution server at 8081\n./scripts/chat/code_execution/start_jupyter_server.sh 8081\n```\n\n### Interact with the system!\n\n#### via simple Python script\n\nIf you don't want to spin up a fancy interface and just want to play with it from the command line, we got you covered!\n\n```bash\n# Make sure you started model server (vLLM or llama.cpp) and code execution engine before running this!\npython3 scripts/chat/demo.py --model_name xingyaoww/CodeActAgent-Mistral-7b-v0.1 --openai_api_base http://$YOUR_API_HOST:$YOUR_API_PORT/v1 --jupyter_kernel_url http://$YOUR_CODE_EXEC_ENGINE_HOST:$YOUR_CODE_EXEC_ENGINE_PORT/execute\n```\n\n\n#### via Chat-UI\n\nIf you've served the model and the code execution engine, you can run your own chat interface just like [this](https://chat.xwang.dev)!\n\nIf you want user management, you may need to start your own mongoDB instance: \n\n```bash\n./scripts/chat/start_mongodb.sh $YOUR_MONGO_DB_PASSWORD\n# The database will be created at `pwd`/data/mongodb and available at localhost:27017\n```\n\nThen, you can configure your `chat-ui` interface.\n\n```bash\ncp chat-ui/.env.template chat-ui/.env.local\n# Make sure you modify .env.local to your configuration by correctly fill-in\n# 1. JUPYTER_API_URL\n# 2. model endpoint (search for 'TODO_OPENAI_BASE_URL');\n#    You also need to change the model name to CodeActAgent-Mistral-7b-v0.1.q8_0.gguf if you are using llama.cpp to infer the model\n# 3. MONGODB_URL - You may leave this empty, the chat-ui will automatically start a database (but it will be deleted once the container is stopped)\n```\n\nNow you can build and start your own web application (docker-required)!\n```bash\n./scripts/chat/run_chat_ui.sh\n# It will starts the interface on localhost:5173 by default\n\n# Run this script for debug mode\n# ./scripts/chat/run_chat_ui_debug.sh\n```\n\nFor more information (e.g., if you don't want to use docker), please check-out chat-ui's [documentation](https://github.com/huggingface/chat-ui)!\n\n\n## 🎥 Reproduce Experiments in the Paper\n\n```bash\ngit clone https://github.com/xingyaoww/code-act\n# To clone all submodules\ngit submodule update --init --recursive\n```\n\n### 📂 Data Generation (Optional)\n\n**Recommended:** You may download the processed **CodeActInstruct** from [huggingface dataset 🤗](https://huggingface.co/datasets/xingyaoww/code-act).\n\n**For reproducibility:** You can optionally generate data follow instructions in [docs/DATA_GENERATION.md](docs/DATA_GENERATION.md) to generate interaction data.\n\n### 📘 Model Training\n\nWe use a fork of [Megatron-LLM](https://github.com/xingyaoww/Megatron-LLM) for training. You can follow [docs/MODEL_TRAINING.md](docs/MODEL_TRAINING.md) for detailed instructions.\n\n\n### 📊 Evaluation \n\nPlease refer to [docs/EVALUATION.md](docs/EVALUATION.md) for detailed instruction.\n\n## 📚 Citation\n\n```bibtex\n@inproceedings{wang2024executable,\n      title={Executable Code Actions Elicit Better LLM Agents}, \n      author={Xingyao Wang and Yangyi Chen and Lifan Yuan and Yizhe Zhang and Yunzhu Li and Hao Peng and Heng Ji},\n      year={2024},\n      eprint={2402.01030},\n      booktitle={ICML}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxingyaoww%2Fcode-act","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxingyaoww%2Fcode-act","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxingyaoww%2Fcode-act/lists"}