https://github.com/kohjingyu/search-agents

Code for the paper 🌳 Tree Search for Language Model Agents
https://github.com/kohjingyu/search-agents

agents llms machine-learning

Last synced: 5 months ago
JSON representation

Code for the paper 🌳 Tree Search for Language Model Agents

Host: GitHub
URL: https://github.com/kohjingyu/search-agents
Owner: kohjingyu
License: mit
Created: 2024-06-18T06:20:53.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-07-25T01:54:49.000Z (almost 2 years ago)
Last Synced: 2024-07-25T03:14:44.209Z (almost 2 years ago)
Topics: agents, llms, machine-learning
Language: Python
Homepage: https://jykoh.com/search-agents
Size: 30.9 MB
Stars: 106
Watchers: 3
Forks: 10
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff

Awesome Lists containing this project

awesome-ui-agents - code
awesome-ai-agents - kohjingyu/search-agents

README

# Tree Search for Language Model Agents

[Website]
[Paper]

![Overview](media/search_overview.gif)

We propose an inference-time tree search algorithm to enable language model agents to perform exploration and multi-step planning in interactive web environments. This repository demonstrates how to run our method on the [VisualWebArena](https://jykoh.com/vwa) and [WebArena](https://webarena.dev/) benchmarks.

## TODOs
- [ ] Add other options besides gpt-4o for the value function

## News
- [07/24/2024]: Released [trajectories](#agent-trajectories) of the gpt-4o agent.
- [06/19/2024]: GitHub repo released.

## Install
```bash
# Python 3.10 or 3.11 recommended
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install
pip install -e .
```

## End-to-end Evaluation on (V)WA
1. Setup the standalone environments.
Please check out [this page](environment_docker/README.md) for details.

2. Configurate the urls for each website.
First, export the `DATASET` to be `visualwebarena`:
```bash
export DATASET=visualwebarena
```
Then, set the URL for the websites

```bash
export CLASSIFIEDS=":9980"
export CLASSIFIEDS_RESET_TOKEN="4b61655535e7ed388f0d40a93600254c" # Default reset token for classifieds site, change if you edited its docker-compose.yml
export SHOPPING=":7770"
export REDDIT=":9999"
export WIKIPEDIA=":8888"
export HOMEPAGE=":4399"
```

If you want to run on the WebArena tasks instead, make sure to also set up the [CMS](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md#e-commerce-content-management-system-cms), [GitLab](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md#gitlab-website), and [map](https://github.com/web-arena-x/webarena/blob/main/environment_docker/README.md#map) environments, and then set their respective environment variables:
```bash
export DATASET=webarena
export SHOPPING_ADMIN=":7780/admin"
export GITLAB=":8023"
export MAP=":3000"
```

3. Generate config files for each test example:
```bash
python scripts/generate_test_data.py
```
You will see `*.json` files generated in the [config_files](./config_files) folder. Each file contains the configuration for one test example.

4. Obtain and save the auto-login cookies for all websites:
```
bash prepare.sh
```

5. Set up API keys.

If using OpenAI models, set a valid OpenAI API key (starting with `sk-`) as the environment variable:
```
export OPENAI_API_KEY=your_key
```

6. Launch the evaluation. For example, to reproduce our GPT-4o + Search agent, you can run the script provided:

```bash
bash scripts/run_vwa_shopping_search.sh
```

This script will run the search agent with the default hyperparams from our paper on the full set of VWA shopping tasks. Note that the baselines that include a captioning model run on GPU by default (e.g., BLIP-2-T5XL as the captioning model will take up approximately 12GB of GPU VRAM). Similarly, the other bash scripts in `scripts/` reproduce the results on the other VWA sites and the text-only WA environment.

By default, the scripts run experiments with the agents with search. If you wish to reproduce the baseline results (without search), set `--agent_type prompt` when executing `run.py`.

### Running Llama-3 models

If you wish to run the Llama-3 models we have in our paper, first set up a [vLLM OpenAI compatible server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). Then, update the `OPENAI_BASE_URL` environment variable in `scripts/run_llama_vwa_shopping_search.sh` to reflect the URL that the vLLM server is running on. This particular script shows how to run the Llama-3 agent on the VWA shopping environment; it is otherwise very similar to the OpenAI scripts for running on the other environments.

## Agent Trajectories

We release the agent trajectories and results of the gpt-4o agent (with gpt-4o as the reward function) [here](https://drive.google.com/file/d/127GqJ19qxpAcWlUKXlr5zBeAIW5Pi_0H/view). They are saved in the same format specified in [run.py](run.py).

## Citation
If you methods or code useful, please consider citing our paper:
```
@article{koh2024tree,
title={Tree Search for Language Model Agents},
author={Koh, Jing Yu and McAleer, Stephen and Fried, Daniel and Salakhutdinov, Ruslan},
journal={arXiv preprint arXiv:2407.01476},
year={2024}
}
```

## Acknowledgements

Our code is heavily based off the VisualWebArena codebase and the WebArena codebase.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kohjingyu/search-agents

Awesome Lists containing this project

README