https://github.com/kohjingyu/search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
https://github.com/kohjingyu/search-agents
agents llms machine-learning
Last synced: 5 months ago
JSON representation
Code for the paper 🌳 Tree Search for Language Model Agents
- Host: GitHub
- URL: https://github.com/kohjingyu/search-agents
- Owner: kohjingyu
- License: mit
- Created: 2024-06-18T06:20:53.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-25T01:54:49.000Z (almost 2 years ago)
- Last Synced: 2024-07-25T03:14:44.209Z (almost 2 years ago)
- Topics: agents, llms, machine-learning
- Language: Python
- Homepage: https://jykoh.com/search-agents
- Size: 30.9 MB
- Stars: 106
- Watchers: 3
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
- awesome-ui-agents - code
- awesome-ai-agents - kohjingyu/search-agents
README
# Tree Search for Language Model Agents

We propose an inference-time tree search algorithm to enable language model agents to perform exploration and multi-step planning in interactive web environments. This repository demonstrates how to run our method on the [VisualWebArena](https://jykoh.com/vwa) and [WebArena](https://webarena.dev/) benchmarks.
## TODOs
- [ ] Add other options besides gpt-4o for the value function
## News
- [07/24/2024]: Released [trajectories](#agent-trajectories) of the gpt-4o agent.
- [06/19/2024]: GitHub repo released.
## Install
```bash
# Python 3.10 or 3.11 recommended
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install
pip install -e .
```
## End-to-end Evaluation on (V)WA
1. Setup the standalone environments.
Please check out [this page](environment_docker/README.md) for details.
2. Configurate the urls for each website.
First, export the `DATASET` to be `visualwebarena`:
```bash
export DATASET=visualwebarena
```
Then, set the URL for the websites
```bash
export CLASSIFIEDS=":9980"
export CLASSIFIEDS_RESET_TOKEN="4b61655535e7ed388f0d40a93600254c" # Default reset token for classifieds site, change if you edited its docker-compose.yml
export SHOPPING=":7770"
export REDDIT=":9999"
export WIKIPEDIA=":8888"
export HOMEPAGE=":4399"
```
If you want to run on the WebArena tasks instead, make sure to also set up the [CMS](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md#e-commerce-content-management-system-cms), [GitLab](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md#gitlab-website), and [map](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md#map) environments, and then set their respective environment variables:
```bash
export DATASET=webarena
export SHOPPING_ADMIN=":7780/admin"
export GITLAB=":8023"
export MAP=":3000"
```
3. Generate config files for each test example:
```bash
python scripts/generate_test_data.py
```
You will see `*.json` files generated in the [config_files](./config_files) folder. Each file contains the configuration for one test example.
4. Obtain and save the auto-login cookies for all websites:
```
bash prepare.sh
```
5. Set up API keys.
If using OpenAI models, set a valid OpenAI API key (starting with `sk-`) as the environment variable:
```
export OPENAI_API_KEY=your_key
```
6. Launch the evaluation. For example, to reproduce our GPT-4o + Search agent, you can run the script provided:
```bash
bash scripts/run_vwa_shopping_search.sh
```
This script will run the search agent with the default hyperparams from our paper on the full set of VWA shopping tasks. Note that the baselines that include a captioning model run on GPU by default (e.g., BLIP-2-T5XL as the captioning model will take up approximately 12GB of GPU VRAM). Similarly, the other bash scripts in `scripts/` reproduce the results on the other VWA sites and the text-only WA environment.
By default, the scripts run experiments with the agents with search. If you wish to reproduce the baseline results (without search), set `--agent_type prompt` when executing `run.py`.
### Running Llama-3 models
If you wish to run the Llama-3 models we have in our paper, first set up a [vLLM OpenAI compatible server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). Then, update the `OPENAI_BASE_URL` environment variable in `scripts/run_llama_vwa_shopping_search.sh` to reflect the URL that the vLLM server is running on. This particular script shows how to run the Llama-3 agent on the VWA shopping environment; it is otherwise very similar to the OpenAI scripts for running on the other environments.
## Agent Trajectories
We release the agent trajectories and results of the gpt-4o agent (with gpt-4o as the reward function) [here](https://drive.google.com/file/d/127GqJ19qxpAcWlUKXlr5zBeAIW5Pi_0H/view). They are saved in the same format specified in [run.py](run.py).
## Citation
If you methods or code useful, please consider citing our paper:
```
@article{koh2024tree,
title={Tree Search for Language Model Agents},
author={Koh, Jing Yu and McAleer, Stephen and Fried, Daniel and Salakhutdinov, Ruslan},
journal={arXiv preprint arXiv:2407.01476},
year={2024}
}
```
## Acknowledgements
Our code is heavily based off the VisualWebArena codebase and the WebArena codebase.