https://github.com/Alibaba-NLP/WebWalker

🌐 WebWalker: Benchmarking LLMs in Web Traversal
https://github.com/Alibaba-NLP/WebWalker

agent alibaba artificial-intelligence information-seeking llm multi-agent rag web-agent

Last synced: 5 months ago
JSON representation

🌐 WebWalker: Benchmarking LLMs in Web Traversal

Host: GitHub
URL: https://github.com/Alibaba-NLP/WebWalker
Owner: Alibaba-NLP
Created: 2025-01-09T11:07:35.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-01-21T16:41:30.000Z (5 months ago)
Last Synced: 2025-01-21T17:37:18.231Z (5 months ago)
Topics: agent, alibaba, artificial-intelligence, information-seeking, llm, multi-agent, rag, web-agent
Language: Python
Homepage: https://alibaba-nlp.github.io/WebWalker/
Size: 32 MB
Stars: 164
Watchers: 6
Forks: 10
Open Issues: 1
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

StarryDivineSky - Alibaba-NLP/WebWalker
awesome-llm-agent - WebWalker

README

WebWalker: Benchmarking LLMs in Web Traversal

_**Jialong Wu, Wenbiao Yin, Jiang Yong, Zhenglin Wang, Zekun Xi, Runnan Fang**_

_**Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, Fei Huang
**_

_Tongyi Lab , Alibaba Group_

👏 Welcome to try web traversal via our **[ Modelscope online demo](https://www.modelscope.cn/studios/iic/WebWalker/)** or **[🤗 Huggingface online demo](https://huggingface.co/spaces/callanwu/WebWalker)**!

[🤖Project]
[📄Paper]
[🚩Citation]

Repo for [_WebWalker: Benchmarking LLMs in Web Traversal_](https://arxiv.org/pdf/2501.07572)

# 📖 Quick Start

- 🌏 The **Online Demo** is available at [ModelScope](https://www.modelscope.cn/studios/jialongwu/WebWalker/) and [HuggingFace](https://huggingface.co/spaces/callanwu/WebWalker) now！

- 🤗 The **WebWalkerQA** dataset is available at[ HuggingFace Datasets](https://huggingface.co/datasets/callanwu/WebWalkerQA)!

- 🤗 The **WebWalkerQA** Leaderborad is available at[ HuggingFace Space](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!

# 📌 Introduction

- We construct a challenging benchmark, **WebWalkerQA**, which is composed of **680** queries from four real-world scenarios across over **1373** webpages.
- To tackle the challenge of web-navigation tasks requiring long context, we propose **WebWalker**, which utilizes a multi-agent framework for effective memory management.
- Extensive experiments show that the WebWalkerQA is **challenging**, and for information-seeking tasks, **vertical exploration** within the page proves to be beneficial.

# 📚 WebWalkerQA Dataset

The json item of WebWalkerQA dataset is organized in the following format:

```json
{
"Question": "When is the paper submission deadline for the ACL 2025 Industry Track, and what is the venue address for the conference?",
"Answer": "The paper submission deadline for the ACL 2025 Industry Track is March 21, 2025. The conference will be held in Brune-Kreisky-Platz 1.",
"Root_Url": "https://2025.aclweb.org/",
"Info": {
"Hop": "multi-source",
"Domain": "Conference",
"Language": "English",
"Difficulty_Level": "Medium",
"Source_Website": [
"https://2025.aclweb.org/calls/industry_track/",
"https://2025.aclweb.org/venue/"
],
"Golden_Path": ["root->call>student_research_workshop", "root->venue"]
}
}
```

🤗 The WebWalkerQA Leaderboard is is available at[ HuggingFace](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!

You can load the dataset via the following code:

```python
from datasets import load_dataset
ds = load_dataset("callanwu/WebWalkerQA", split="main")
```

Additionally, we possess a collection of approximately 14k silver QA pairs, which, although not yet carefully human-verified.
You can load the silver dataset by changing the split to `silver`.

## 💡 Perfomance

### 📊 Result on Web Agents

The performance on Web Agents are shown below:

### 📊 Result on RAG-Systems

🤗 The WebWalkerQA Leaderboard is is available at[ HuggingFace](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!

🚩 Welcome to submit your method to the leaderboard!

# 🛠 Dependencies

```bash
conda create -n webwalker python=3.10
git clone https://github.com/alibaba-nlp/WebWalker.git
cd WebWalker
pip install -e .
# Install requirements
pip install -r requirements.txt
# Run post-installation setup
crawl4ai-setup
# Verify your installation
crawl4ai-doctor
```

### 💻 Running WebWalker Demo Locally

🔑 Before running, please export the OPENAI API key or Dashscope API key as an environment variable:

```bash
export OPEN_AI_API_KEY=YOUR_API_KEY
export OPEN_AI_API_BASE_URL=YOUR_API_BASE_URL
```

```bash
export DASHSCOPE_API_KEY=YOUR_API_KEY
```

> You can use other supported API keys with Qwen-Agent. For more details, please refer to the [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent/tree/main/qwen_agent/llm). To configure the API key, modify the code in lines 44-53 of [`src/app.py`](https://github.com/Alibaba-NLP/WebWalker/blob/main/src/app.py#L44-L53).

Then, run the `app.py` file with Streamlit:

```bash
cd src
streamlit run app.py
```

### Runing RAG-System on WebWalkerQA

```bash
cd src
python rag_system.py --api_name [API_NAME] --output_file [OUTPUT_PATH]
```

The details of environment setup can be found in the [README.md](./src/README.md) in the `src` folder.

# 🔍 Evaluation

The evaluation script for accuracy of the output answers using GPT-4 can be used as follows:

```bash
cd src
python evaluate.py --input_path [INPUT_PATH]--output_path [OUTPUT_PATH]
```

## 🌻Acknowledgement

- This work is implemented by [ReACT](https://github.com/ysymyth/ReAct), [Qwen-Agents](https://github.com/QwenLM/Qwen-Agent), [LangChain](https://github.com/langchain-ai/langchain). Sincere thanks for their efforts.
- We sincerely thank the contributors and maintainers of [ai4crawl](https://github.com/unclecode/crawl4ai) for their open-source tool❤️, which helped us get web pages in a Markdown-like format.
- The repo is contributed by [Jialong Wu](https://callanwu.github.io/), if you have any questions, please feel free to contact via [email protected] or [email protected] or create an issue.

## 🚩Citation

If this work is helpful, please kindly cite as:

```bigquery
@misc{wu2025webwalker,
title={WebWalker: Benchmarking LLMs in Web Traversal},
author={Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Deyu Zhou and Pengjun Xie and Fei Huang},
year={2025},
eprint={2501.07572},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.07572},
}
```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Alibaba-NLP/WebWalker&type=Date)](https://star-history.com/#Alibaba-NLP/WebWalker&Date)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Alibaba-NLP/WebWalker

Awesome Lists containing this project

README

WebWalker: Benchmarking LLMs in Web Traversal