https://github.com/osu-nlp-group/explorer
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
https://github.com/osu-nlp-group/explorer
gui-agents machine-learning synthetic-dataset-generation web-agents
Last synced: 4 months ago
JSON representation
[ACL'25 (Findings)] Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents
- Host: GitHub
- URL: https://github.com/osu-nlp-group/explorer
- Owner: OSU-NLP-Group
- License: mit
- Created: 2025-02-21T00:06:49.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-29T03:41:30.000Z (11 months ago)
- Last Synced: 2025-07-29T05:38:32.062Z (11 months ago)
- Topics: gui-agents, machine-learning, synthetic-dataset-generation, web-agents
- Language: Python
- Homepage: https://osu-nlp-group.github.io/Explorer/
- Size: 10.3 MB
- Stars: 11
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Explorer
This is the official codebase for **Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents** [**ACL 2025 (Findings)**]. This project is a collaboration between The Ohio State University and Microsoft Research.
- [๐ Website](https://osu-nlp-group.github.io/Explorer/)
- [๐ Paper](https://aclanthology.org/2025.findings-acl.326.pdf)
## ๐ฆ Training
### Mind2Web-Live
```
cd train/
torchrun --nproc_per_node=4 train_qwen2vl.py \
--use_flash_attention --bf16 \
--train_dir \
--train_data_dir \
--output_dir \
--num_train_epochs 10 \
--batch_size 64 \
--use-google-search
```
### Multimodal-Mind2Web
```
cd train/
torchrun --nproc_per_node=4 train_qwen2vl.py \
--use_flash_attention --bf16 \
--train_dir \
--train_dir_order \
--train_data_dir \
--output_dir \
--num_train_epochs 2 \
--batch_size 64 \
--model_name_or_path Qwen/Qwen2-VL-7B-Instruct \
--use-nogoto-gs-format \
--order_all_steps \
--learning_rate 1e-5
```
## ๐งช Evaluation
### Mind2Web-Live
**Step 1:** Installation
```
conda create --name myenv python=3.12.5
pip install -r evals/mind2web_live_eval/requirements.txt
```
**Step 2:** Start x server and set the DISPLAY environment variable
```
Xvfb :99 -screen 0 1920x1280x16 &
export DISPLAY=:99
export OPENAI_API_KEY=xxxxxxxxxxxx
```
**Step 3:** Run the evaluation script:
```
python -m evals.mind2web_live_eval.evaluate_model --index -1 --planning_text_model {qwen2-vl-7b|phi-3.5v} --toml-path evals/mind2web_live_eval/configs/setting.toml --use-flash-attention --ckpt-path CKPT_PATH --temp 0.01 --log-dir LOG_DIR --viewport-width 1280
```
### Multimodal-Mind2Web
To evaluate the performance of the trained model on the Multimodal-Mind2Web benchmark:
**Step 1:** Installation
```
conda create --name myenv python=3.12.5
pip install -r evals/mind2web_orig_eval/requirements.txt
```
**Step 2:** Download the DeBERTa candidate generation scores from the following link:
[๐ DeBERTa Score File](https://buckeyemailosu-my.sharepoint.com/:u:/g/personal/deng_595_buckeyemail_osu_edu/EZllMua3lABAhXQnCN7-pr4BIP4YV8xPfbgyP5FXT18wag?e=yXkK8k)
**Step 3:** Run the evaluation script:
```
cd evals
python -m mind2web_orig_eval.eval \
--use-flash-attention \
--ckpt-path \
--log-dir \
--score-file \
--split {test_domain|test_task|test_website} \
--model {qwen-7b|phi-3.5}
```
### In-domain evaluation
**Step 1:** Installation
```
conda create --name myenv python=3.12.5
pip install -r evals/in_domain_eval/requirements.txt
```
**Step 2:** Set necessary env variables (`OPENAI_API_KEY` for evaluating API-based models)
```
export OPENAI_API_KEY=xxxxxxxxxxxx
```
**Step 3:** Run the evaluation script:
```
python -u -m evals.in_domain_eval.eval --input-file in_domain_test.json --ckpt-path --use-flash-attention --log-dir --use-spiral
```
Structure of `in_domain_test.json`:
```
[
,
,
...
,
]
```
### MiniWoB++
**Step 1:** Installation
```
conda create --name myenv python=3.12.5
pip install -r evals/miniwob/requirements.txt
```
**Step 2:** Run the evaluation script:
```
bash evals/miniwob/eval-explorer.sh
```
## ๐ Trajectory Synthesis
```
Xvfb :99 -screen 0 1920x1280x16 &
export DISPLAY=:99
export OPENAI_API_KEY=xxxxxxxxxxxx
python -m traj_gen.main \
--model-dir MODEL_DIR \
--init-url INIT_URL \
--max-steps MAX_STEPS
```
## Citation
If you find this work useful, please consider starring our repo and citing our paper:
```
@inproceedings{pahuja-etal-2025-explorer,
title = "Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents",
author = "Pahuja, Vardaan and
Lu, Yadong and
Rosset, Corby and
Gou, Boyu and
Mitra, Arindam and
Whitehead, Spencer and
Su, Yu and
Awadallah, Ahmed Hassan",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.326/",
pages = "6300--6323",
ISBN = "979-8-89176-256-5",
}
```