Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/Alibaba-NLP/WebWalker

๐ŸŒ WebWalker: Benchmarking LLMs in Web Traversal
https://github.com/Alibaba-NLP/WebWalker

agent alibaba artificial-intelligence information-seeking llm multi-agent rag web-agent

Last synced: 11 days ago
JSON representation

๐ŸŒ WebWalker: Benchmarking LLMs in Web Traversal

Awesome Lists containing this project

README

        






WebWalker: Benchmarking LLMs in Web Traversal



Stars






_**Jialong Wu, Wenbiao Yin, Jiang Yong, Zhenglin Wang, Zekun Xi, Runnan Fang**_

_**Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, Fei Huang
**_

_Tongyi Lab , Alibaba Group_

๐Ÿ‘ Welcome to try web traversal via our **[ Modelscope online demo](https://www.modelscope.cn/studios/iic/WebWalker/)** or **[๐Ÿค— Huggingface online demo](https://huggingface.co/spaces/callanwu/WebWalker)**!


[๐Ÿค–Project]
[๐Ÿ“„Paper]
[๐ŸšฉCitation]


Repo for [_WebWalker: Benchmarking LLMs in Web Traversal_](https://arxiv.org/pdf/2501.07572)

# ๐Ÿ“– Quick Start

- ๐ŸŒ The **Online Demo** is available at [ModelScope](https://www.modelscope.cn/studios/jialongwu/WebWalker/) and [HuggingFace](https://huggingface.co/spaces/callanwu/WebWalker) now๏ผ

- ๐Ÿค— The **WebWalkerQA** dataset is available at[ HuggingFace Datasets](https://huggingface.co/datasets/callanwu/WebWalkerQA)!

- ๐Ÿค— The **WebWalkerQA** Leaderborad is available at[ HuggingFace Space](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!

# ๐Ÿ“Œ Introduction

- We construct a challenging benchmark, **WebWalkerQA**, which is composed of **680** queries from four real-world scenarios across over **1373** webpages.
- To tackle the challenge of web-navigation tasks requiring long context, we propose **WebWalker**, which utilizes a multi-agent framework for effective memory management.
- Extensive experiments show that the WebWalkerQA is **challenging**, and for information-seeking tasks, **vertical exploration** within the page proves to be beneficial.



# ๐Ÿ“š WebWalkerQA Dataset

The json item of WebWalkerQA dataset is organized in the following format:

```json
{
"Question": "When is the paper submission deadline for the ACL 2025 Industry Track, and what is the venue address for the conference?",
"Answer": "The paper submission deadline for the ACL 2025 Industry Track is March 21, 2025. The conference will be held in Brune-Kreisky-Platz 1.",
"Root_Url": "https://2025.aclweb.org/",
"Info": {
"Hop": "multi-source",
"Domain": "Conference",
"Language": "English",
"Difficulty_Level": "Medium",
"Source_Website": [
"https://2025.aclweb.org/calls/industry_track/",
"https://2025.aclweb.org/venue/"
],
"Golden_Path": ["root->call>student_research_workshop", "root->venue"]
}
}
```

๐Ÿค— The WebWalkerQA Leaderboard is is available at[ HuggingFace](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!

You can load the dataset via the following code:

```python
from datasets import load_dataset
ds = load_dataset("callanwu/WebWalkerQA", split="main")
```

Additionally, we possess a collection of approximately 14k silver QA pairs, which, although not yet carefully human-verified.
You can load the silver dataset by changing the split to `silver`.

## ๐Ÿ’ก Perfomance

### ๐Ÿ“Š Result on Web Agents

The performance on Web Agents are shown below:



### ๐Ÿ“Š Result on RAG-Systems



๐Ÿค— The WebWalkerQA Leaderboard is is available at[ HuggingFace](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!

๐Ÿšฉ Welcome to submit your method to the leaderboard!

# ๐Ÿ›  Dependencies

```bash
conda create -n webwalker python=3.10
git clone https://github.com/alibaba-nlp/WebWalker.git
cd WebWalker
pip install -e .
# Install requirements
pip install -r requirements.txt
# Run post-installation setup
crawl4ai-setup
# Verify your installation
crawl4ai-doctor
```

### ๐Ÿ’ป Running WebWalker Demo Locally

๐Ÿ”‘ Before running, please export the OPENAI API key or Dashscope API key as an environment variable:

```bash
export OPEN_AI_API_KEY=YOUR_API_KEY
export OPEN_AI_API_BASE_URL=YOUR_API_BASE_URL
```

or

```bash
export DASHSCOPE_API_KEY=YOUR_API_KEY
```

> You can use other supported API keys with Qwen-Agent. For more details, please refer to the [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent/tree/main/qwen_agent/llm). To configure the API key, modify the code in lines 44-53 of [`src/app.py`](https://github.com/Alibaba-NLP/WebWalker/blob/main/src/app.py#L44-L53).

Then, run the `app.py` file with Streamlit:

```bash
cd src
streamlit run app.py
```

### Runing RAG-System on WebWalkerQA

```bash
cd src
python rag_system.py --api_name [API_NAME] --output_file [OUTPUT_PATH]
```

The details of environment setup can be found in the [README.md](./src/README.md) in the `src` folder.

# ๐Ÿ” Evaluation

The evaluation script for accuracy of the output answers using GPT-4 can be used as follows:

```bash
cd src
python evaluate.py --input_path [INPUT_PATH]--output_path [OUTPUT_PATH]
```

## ๐ŸŒปAcknowledgement

- This work is implemented by [ReACT](https://github.com/ysymyth/ReAct), [Qwen-Agents](https://github.com/QwenLM/Qwen-Agent), [LangChain](https://github.com/langchain-ai/langchain). Sincere thanks for their efforts.
- We sincerely thank the contributors and maintainers of [ai4crawl](https://github.com/unclecode/crawl4ai) for their open-source toolโค๏ธ, which helped us get web pages in a Markdown-like format.
- The repo is contributed by [Jialong Wu](https://callanwu.github.io/), if you have any questions, please feel free to contact via [email protected] or [email protected] or create an issue.

## ๐ŸšฉCitation

If this work is helpful, please kindly cite as:

```bigquery
@misc{wu2025webwalker,
title={WebWalker: Benchmarking LLMs in Web Traversal},
author={Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Deyu Zhou and Pengjun Xie and Fei Huang},
year={2025},
eprint={2501.07572},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.07572},
}
```

## Star History

[![Star History Chart](https://api.star-history.com/svg?repos=Alibaba-NLP/WebWalker&type=Date)](https://star-history.com/#Alibaba-NLP/WebWalker&Date)