Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/Alibaba-NLP/WebWalker
๐ WebWalker: Benchmarking LLMs in Web Traversal
https://github.com/Alibaba-NLP/WebWalker
agent alibaba artificial-intelligence information-seeking llm multi-agent rag web-agent
Last synced: 11 days ago
JSON representation
๐ WebWalker: Benchmarking LLMs in Web Traversal
- Host: GitHub
- URL: https://github.com/Alibaba-NLP/WebWalker
- Owner: Alibaba-NLP
- Created: 2025-01-09T11:07:35.000Z (26 days ago)
- Default Branch: main
- Last Pushed: 2025-01-21T16:41:30.000Z (14 days ago)
- Last Synced: 2025-01-21T17:37:18.231Z (14 days ago)
- Topics: agent, alibaba, artificial-intelligence, information-seeking, llm, multi-agent, rag, web-agent
- Language: Python
- Homepage: https://alibaba-nlp.github.io/WebWalker/
- Size: 32 MB
- Stars: 164
- Watchers: 6
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-llm-agent - WebWalker
README
WebWalker: Benchmarking LLMs in Web Traversal
_**Jialong Wu, Wenbiao Yin, Jiang Yong, Zhenglin Wang, Zekun Xi, Runnan Fang**_
_**Linhai Zhang, Yulan He, Deyu Zhou, Pengjun Xie, Fei Huang
**__Tongyi Lab , Alibaba Group_
๐ Welcome to try web traversal via our **[ Modelscope online demo](https://www.modelscope.cn/studios/iic/WebWalker/)** or **[๐ค Huggingface online demo](https://huggingface.co/spaces/callanwu/WebWalker)**!
[๐คProject]
[๐Paper]
[๐ฉCitation]Repo for [_WebWalker: Benchmarking LLMs in Web Traversal_](https://arxiv.org/pdf/2501.07572)
# ๐ Quick Start
- ๐ The **Online Demo** is available at [ModelScope](https://www.modelscope.cn/studios/jialongwu/WebWalker/) and [HuggingFace](https://huggingface.co/spaces/callanwu/WebWalker) now๏ผ
- ๐ค The **WebWalkerQA** dataset is available at[ HuggingFace Datasets](https://huggingface.co/datasets/callanwu/WebWalkerQA)!
- ๐ค The **WebWalkerQA** Leaderborad is available at[ HuggingFace Space](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!
# ๐ Introduction
- We construct a challenging benchmark, **WebWalkerQA**, which is composed of **680** queries from four real-world scenarios across over **1373** webpages.
- To tackle the challenge of web-navigation tasks requiring long context, we propose **WebWalker**, which utilizes a multi-agent framework for effective memory management.
- Extensive experiments show that the WebWalkerQA is **challenging**, and for information-seeking tasks, **vertical exploration** within the page proves to be beneficial.
# ๐ WebWalkerQA Dataset
The json item of WebWalkerQA dataset is organized in the following format:
```json
{
"Question": "When is the paper submission deadline for the ACL 2025 Industry Track, and what is the venue address for the conference?",
"Answer": "The paper submission deadline for the ACL 2025 Industry Track is March 21, 2025. The conference will be held in Brune-Kreisky-Platz 1.",
"Root_Url": "https://2025.aclweb.org/",
"Info": {
"Hop": "multi-source",
"Domain": "Conference",
"Language": "English",
"Difficulty_Level": "Medium",
"Source_Website": [
"https://2025.aclweb.org/calls/industry_track/",
"https://2025.aclweb.org/venue/"
],
"Golden_Path": ["root->call>student_research_workshop", "root->venue"]
}
}
```๐ค The WebWalkerQA Leaderboard is is available at[ HuggingFace](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!
You can load the dataset via the following code:
```python
from datasets import load_dataset
ds = load_dataset("callanwu/WebWalkerQA", split="main")
```Additionally, we possess a collection of approximately 14k silver QA pairs, which, although not yet carefully human-verified.
You can load the silver dataset by changing the split to `silver`.## ๐ก Perfomance
### ๐ Result on Web Agents
The performance on Web Agents are shown below:
### ๐ Result on RAG-Systems
๐ค The WebWalkerQA Leaderboard is is available at[ HuggingFace](https://huggingface.co/spaces/callanwu/WebWalkerQALeadeboard)!
๐ฉ Welcome to submit your method to the leaderboard!
# ๐ Dependencies
```bash
conda create -n webwalker python=3.10
git clone https://github.com/alibaba-nlp/WebWalker.git
cd WebWalker
pip install -e .
# Install requirements
pip install -r requirements.txt
# Run post-installation setup
crawl4ai-setup
# Verify your installation
crawl4ai-doctor
```### ๐ป Running WebWalker Demo Locally
๐ Before running, please export the OPENAI API key or Dashscope API key as an environment variable:
```bash
export OPEN_AI_API_KEY=YOUR_API_KEY
export OPEN_AI_API_BASE_URL=YOUR_API_BASE_URL
```or
```bash
export DASHSCOPE_API_KEY=YOUR_API_KEY
```> You can use other supported API keys with Qwen-Agent. For more details, please refer to the [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent/tree/main/qwen_agent/llm). To configure the API key, modify the code in lines 44-53 of [`src/app.py`](https://github.com/Alibaba-NLP/WebWalker/blob/main/src/app.py#L44-L53).
Then, run the `app.py` file with Streamlit:
```bash
cd src
streamlit run app.py
```### Runing RAG-System on WebWalkerQA
```bash
cd src
python rag_system.py --api_name [API_NAME] --output_file [OUTPUT_PATH]
```The details of environment setup can be found in the [README.md](./src/README.md) in the `src` folder.
# ๐ Evaluation
The evaluation script for accuracy of the output answers using GPT-4 can be used as follows:
```bash
cd src
python evaluate.py --input_path [INPUT_PATH]--output_path [OUTPUT_PATH]
```## ๐ปAcknowledgement
- This work is implemented by [ReACT](https://github.com/ysymyth/ReAct), [Qwen-Agents](https://github.com/QwenLM/Qwen-Agent), [LangChain](https://github.com/langchain-ai/langchain). Sincere thanks for their efforts.
- We sincerely thank the contributors and maintainers of [ai4crawl](https://github.com/unclecode/crawl4ai) for their open-source toolโค๏ธ, which helped us get web pages in a Markdown-like format.
- The repo is contributed by [Jialong Wu](https://callanwu.github.io/), if you have any questions, please feel free to contact via [email protected] or [email protected] or create an issue.## ๐ฉCitation
If this work is helpful, please kindly cite as:
```bigquery
@misc{wu2025webwalker,
title={WebWalker: Benchmarking LLMs in Web Traversal},
author={Jialong Wu and Wenbiao Yin and Yong Jiang and Zhenglin Wang and Zekun Xi and Runnan Fang and Deyu Zhou and Pengjun Xie and Fei Huang},
year={2025},
eprint={2501.07572},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.07572},
}
```## Star History
[![Star History Chart](https://api.star-history.com/svg?repos=Alibaba-NLP/WebWalker&type=Date)](https://star-history.com/#Alibaba-NLP/WebWalker&Date)