https://github.com/GAIR-NLP/DeepResearcher
Scaling Deep Research via Reinforcement Learning in Real-world Environments.
https://github.com/GAIR-NLP/DeepResearcher
Last synced: 24 days ago
JSON representation
Scaling Deep Research via Reinforcement Learning in Real-world Environments.
- Host: GitHub
- URL: https://github.com/GAIR-NLP/DeepResearcher
- Owner: GAIR-NLP
- License: apache-2.0
- Created: 2025-04-02T10:31:03.000Z (about 2 months ago)
- Default Branch: main
- Last Pushed: 2025-04-13T04:38:34.000Z (about 1 month ago)
- Last Synced: 2025-04-13T05:27:21.715Z (about 1 month ago)
- Language: Python
- Homepage:
- Size: 12.8 MB
- Stars: 184
- Watchers: 4
- Forks: 17
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-deep-research - DeepResearcher - A framework for scaling deep research via reinforcement learning in real-world environments. (🤖 Deep Research Systems / 🌐 Open-Source Deep Research Implementations)
- awesome-deep-research - DeepResearcher - A framework for scaling deep research via reinforcement learning in real-world environments. (🤖 Deep Research Systems / 🌐 Open-Source Deep Research Implementations)
README
# DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments
This is the official repository for [DeepResearcher](https://arxiv.org/abs/2504.03160).
## 📝 IntroductionDeepResearcher is the first comprehensive framework for end-to-end training of LLM-based deep research agents through scaling reinforcement learning (RL) in real-world environments with authentic web search interactions. Our qualitative analysis reveals emergent **cognitive behaviors** from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers.
![]()
![]()
## 📋 Table of Contents
- [Introduction](#-introduction)
- [Model](#-Model)
- [Performance](#-performance)
- [Get started](#-get-started)
- [Acknowledgement](#-Acknowledgement)
- [Citation](#✍️-citation)## 🤖 Model
DeepResearcher is now available on huggingface-hub:
| Model Name | HF Checkpoint | Size |
| ---------- | ------------------------------------------------------------ | :------: |
| DeepResearcher-7b | [🤗 GAIR/DeepResearcher-7b](https://huggingface.co/GAIR/DeepResearcher-7b) | **7B**## 🏆 Performance
Extensive experiments on open-domain research tasks demonstrate that DeepResearcher achieves substantial improvements of up to 28.9 points over prompt engineering-based baselines and up to 7.2 points over RAG-based RL agents. Our qualitative analysis reveals emergent cognitive behaviors from end-to-end RL training, including the ability to formulate plans, cross-validate information from multiple sources, engage in self-reflection to redirect research, and maintain honesty when unable to find definitive answers. Our results highlight that end-to-end training in real-world web environments is not merely an implementation detail but a fundamental requirement for developing robust research capabilities aligned with real-world applications.
![]()
![]()
## 🚀 Get Started
### Package Installation
To begin using this repo, you need to install the required dependencies. You can do this by running the following command:
```bash
git clone https://github.com/GAIR-NLP/DeepResearcher.git
conda create -n deepresearcher python=3.10
conda activate deepresearcher
cd DeepResearcher
pip3 install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip3 install flash-attn --no-build-isolation
pip3 install -e .
pip3 install -r requirements.txt
```### Start ray before training and inference
We use ray to train model, befor start ray you should set ```PET_NODE_RANK``` first. (**This is compulsory even if you only have 1 node**).
Here is the code of the head node:
```bash
export PET_NODE_RANK=0
ray start --head
```### Run backend handler
Running the following command to launch the server handler:
1. Modify ```serper_api_key``` or ```azure_bing_search_subscription_key``` & ```search_engine``` in ```./scrl/handler/config.yaml```
2. Add ```qwen-plus``` api key in ```./scrl/handler/server_handler.py```
```python
client = OpenAI(
api_key="sk-xxx",
base_url="xxxx"
)
```
3. Start server handler:
```bash
python ./scrl/handler/server_handler.py
```After launching all server handlers, you can replace ```server_url_list``` in ```./scrl/handler/config.yaml``` in your training host node and then run:
```bash
python ./scrl/handler/handler.py
```
### Training modelUsing the following command to train the model:
```bash
bash train_grpo.sh
```### Evaluate
Using the following command to generate rollout:
```bash
bash evaluate.sh
```
You can find the rollout file in: ```./outputs/{project_name}/{experiment_name}/rollout/rollout_step_0.json```
You can rename and copy it into ```./evaluate/{experiment_name}_result.json```Then, run the following command:
```bash
python ./evaluate/cacluate_metrics.py {experiment_name}
```
You can check the score in ```./evaluate/{experiment_name}_score.json```## 🙏 Acknowledgement
DeepResearcher is inspired by [Deepseek-R1](https://github.com/deepseek-ai/DeepSeek-R1) with its implementation based on [veRL](https://github.com/volcengine/verl) and [Search-r1](https://github.com/PeterGriffinJin/Search-R1). We deeply appreciate the contributions of these teams to open-source research and development.
## ✍️ Citation
Please cite the repo if the model/code/conclusion in this repo are helpful to you.
```
@misc{zheng2025deepresearcherscalingdeepresearch,
title={DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments},
author={Yuxiang Zheng and Dayuan Fu and Xiangkun Hu and Xiaojie Cai and Lyumanshan Ye and Pengrui Lu and Pengfei Liu},
year={2025},
eprint={2504.03160},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2504.03160},
}
```