{"id":47403018,"url":"https://github.com/Tencent/WebAggregator","last_synced_at":"2026-04-03T18:00:52.310Z","repository":{"id":319201992,"uuid":"1066266856","full_name":"Tencent/WebAggregator","owner":"Tencent","description":null,"archived":false,"fork":false,"pushed_at":"2025-10-18T15:37:44.000Z","size":14397,"stargazers_count":67,"open_issues_count":1,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-03T10:06:36.444Z","etag":null,"topics":["agent"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2510.14438","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Tencent.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-29T08:55:30.000Z","updated_at":"2026-01-26T04:55:33.000Z","dependencies_parsed_at":"2025-10-18T12:24:50.506Z","dependency_job_id":"ff0f2755-6934-4a43-8a3d-8bf7568140a0","html_url":"https://github.com/Tencent/WebAggregator","commit_stats":null,"previous_names":["tencent/webaggregator"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Tencent/WebAggregator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent%2FWebAggregator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent%2FWebAggregator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent%2FWebAggregator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent%2FWebAggregator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Tencent","download_url":"https://codeload.github.com/Tencent/WebAggregator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Tencent%2FWebAggregator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31368156,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-03T17:53:18.093Z","status":"ssl_error","status_checked_at":"2026-04-03T17:53:17.617Z","response_time":107,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agent"],"created_at":"2026-03-20T14:00:37.288Z","updated_at":"2026-04-03T18:00:52.282Z","avatar_url":"https://github.com/Tencent.png","language":"Python","readme":"# 🌐 *Explore to Evolve*: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents\n\n## 🌟 Introduction\n\n[![arXiv](https://img.shields.io/badge/arXiv-2510.14438-b31b1b.svg)](https://arxiv.org/abs/2510.14438) [![Data](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Data:WebAggregatorQA-ffc107?color=ffc107\u0026logoColor=white)](https://huggingface.co/datasets/CognitiveKernel/WebAggregatorQA) [![Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model:WebAggregator%208B-ffc107?color=ffc107\u0026logoColor=white)](https://huggingface.co/CognitiveKernel/WebAggregator-8B) \n[![Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model:WebAggregator%2032B-ffc107?color=ffc107\u0026logoColor=white)](https://huggingface.co/CognitiveKernel/WebAggregator-32B) \n\n![](assets/perfm-bar.svg)\n\n\n\n- ***Explore to Evolve*** aims to generate diverse, high-quality training data for web agent foundation models, enhancing their capabilities in multi-tool usage, **information seeking**, and **information aggregation**.\n\n- WebAggregator, the finetuned model on WebAggregatorQA, demonstrates strong performance on GAIA-text and the WebAggregatorQA test set.\n\n\n\n---\n\n## ✨ Features\n\n![](assets/illus.jpg)\n\n- 🤖 **Fully Automated and Verifiable QA Construction**  \n- 😄 **Open Source**: Complete codebase including QA construction engine, queries, trajectories, and models.\n- 👍 **Highly Customizable**: Collect data tailored to your needs with minimal human effort, and easily customize your own agent!\n\n\n\n---\n\n## ⚡ Quick Start\n\nFollow these steps to get started:\n\n### 1️⃣ Clone the Repository\n\n```bash\ngit clone https://github.com/Tencent/WebAggregator\n```\n\n### 2️⃣ Install Dependencies\n\n1. This project builds upon smolagents’ “open deep research” example 👉 [smolagents open_deep_research dependencies](https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research). Thanks for their great work and please cite them!\n\n2. Install this project’s requirements:\n\n```bash\npip install -r requirements.txt\n```\n\n\n\n\n3. **Please note**: the implementation must utilize the `./smolagents`, which provides the added functionality for trajectory collection by us. Or you can directly replace the smolagets/agents.py in your original library.\n\n\n---\n\n## 🚀 Usage\n\n### ⚙️ Configuration\n\n​Set the configuration in the following files:​​\n\n- ​`./config.py`: Contains settings for your agent's foundation LLM, the LLMs for specific tools, and dataset paths.\n- `./model_list.py`: This file is used to implement the method for calling your foundation models (e.g., via vLLM, LiteLLM, or Azure). It calls the models that are configured in `./config.py`. We provide an example implementation. For more details, please refer to the smolagents repository. \n\n\nThe function of others:\n- `./web_tools.py`: Tools for agent. You could modify it to suit your needs.\n- `./run_agent.py`: The implemented agent.\n- `./run`: Scripts for running the agent.\n- `./data`: Input data for QA construction (URLs), evaluation (Benchmarks) and traj sampling (QAs).\n\n---\n\n### ▶️ Running the Project\n\n\u003e **Note:** Before running any scripts, ensure all paths, model checkpoints, and other necessary parameters are properly set in the source files.\n\n---\n\n#### 1️⃣ Evaluation\n\nTo evaluate your agent, serve your tuned checkpoint and update the corresponding settings in `config.py`. Make sure the correct `model_id` is set in the evaluation script `test.sh`, then run:\n\n\n```bash\nbash run/test.sh\n```\n\nThis command evaluates your specified model and benchmark. After evaluation, it uses LLM-as-judge to assess performance and prints the accuracy.\n\n---\n\n#### 2️⃣ QA Construction\n\nStart building automatic web agent data:\n\n1. Download our collected URLs 👉 [URLs](https://huggingface.co/datasets/CognitiveKernel/WebAggregatorQA) **or** gather URLs related to your domains of interest!\n\n2. Then, run the following command to collect the data.\n\n```bash\nbash run/QA_building.sh\n```\n\n---\n\n#### 3️⃣ Trajectory Sampling\n\nTraining trajectories for fine-tuning your agent foundation models are available at 👉 [WebAggregatorQA](https://huggingface.co/datasets/CognitiveKernel/WebAggregatorQA). Sample data can be found in `./data/train-samples` for initial testing purposes.\n\n```bash\nbash run/traj_sampling.sh\n```\n\n\n---\n\n## Friendly links to other works from Tencent AI Lab\n\n- Deep Research Agent framework: [Cognitive Kernel-Pro](https://github.com/Tencent/CognitiveKernel-Pro)\n- Agent Self-Evolving Research, including [WebEvolver, WebCoT](https://github.com/Tencent/SelfEvolvingAgent), [WebVoyager](https://github.com/MinorJerry/WebVoyager), [OpenWebVoyager](https://github.com/MinorJerry/OpenWebVoyager).\n\n## Citation\n\n```bibtex\n@misc{wang2025exploreevolvescalingevolved,\n      title={Explore to Evolve: Scaling Evolved Aggregation Logic via Proactive Online Exploration for Deep Research Agents}, \n      author={Rui Wang and Ce Zhang and Jun-Yu Ma and Jianshu Zhang and Hongru Wang and Yi Chen and Boyang Xue and Tianqing Fang and Zhisong Zhang and Hongming Zhang and Haitao Mi and Dong Yu and Kam-Fai Wong},\n      year={2025},\n      eprint={2510.14438},\n      archivePrefix={arXiv},\n      primaryClass={cs.CL},\n      url={https://arxiv.org/abs/2510.14438}, \n}\n\n@misc{fang2025cognitivekernelpro,\n      title={Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training}, \n      author={Tianqing Fang and Zhisong Zhang and Xiaoyang Wang and Rui Wang and Can Qin and Yuxuan Wan and Jun-Yu Ma and Ce Zhang and Jiaqi Chen and Xiyun Li and Hongming Zhang and Haitao Mi and Dong Yu},\n      year={2025},\n      eprint={2508.00414},\n      archivePrefix={arXiv},\n      primaryClass={cs.AI},\n      url={https://arxiv.org/abs/2508.00414}, \n}\n```\n","funding_links":[],"categories":["信息获取"],"sub_categories":["其他信息工具"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTencent%2FWebAggregator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FTencent%2FWebAggregator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FTencent%2FWebAggregator/lists"}