{"id":30695962,"url":"https://github.com/openpipe/open_deep_research_training","last_synced_at":"2025-09-02T07:33:36.502Z","repository":{"id":312149660,"uuid":"1045938450","full_name":"OpenPipe/open_deep_research_training","owner":"OpenPipe","description":"Training setup for Langchain's Open Deep Research","archived":false,"fork":false,"pushed_at":"2025-08-28T22:23:47.000Z","size":29863,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-29T01:57:16.782Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenPipe.png","metadata":{"files":{"readme":"README-ORIGINAL.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-28T00:16:30.000Z","updated_at":"2025-08-28T22:23:50.000Z","dependencies_parsed_at":"2025-08-29T02:07:34.431Z","dependency_job_id":null,"html_url":"https://github.com/OpenPipe/open_deep_research_training","commit_stats":null,"previous_names":["openpipe/open_deep_research_training"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/OpenPipe/open_deep_research_training","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenPipe%2Fopen_deep_research_training","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenPipe%2Fopen_deep_research_training/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenPipe%2Fopen_deep_research_training/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenPipe%2Fopen_deep_research_training/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenPipe","download_url":"https://codeload.github.com/OpenPipe/open_deep_research_training/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenPipe%2Fopen_deep_research_training/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273250233,"owners_count":25072167,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-02T02:00:09.530Z","response_time":77,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-09-02T07:32:10.924Z","updated_at":"2025-09-02T07:33:36.392Z","avatar_url":"https://github.com/OpenPipe.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🔬 Open Deep Research\n\n\u003cimg width=\"1388\" height=\"298\" alt=\"full_diagram\" src=\"https://github.com/user-attachments/assets/12a2371b-8be2-4219-9b48-90503eb43c69\" /\u003e\n\nDeep research has broken out as one of the most popular agent applications. This is a simple, configurable, fully open source deep research agent that works across many model providers, search tools, and MCP servers. It's performance is on par with many popular deep research agents ([see Deep Research Bench leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard)).\n\n\u003cimg width=\"817\" height=\"666\" alt=\"Screenshot 2025-07-13 at 11 21 12 PM\" src=\"https://github.com/user-attachments/assets/052f2ed3-c664-4a4f-8ec2-074349dcaa3f\" /\u003e\n\n### 🔥 Recent Updates\n\n**August 2, 2025**: Achieved #6 ranking on the [Deep Research Bench Leaderboard](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard) with an overall score of 0.4344. \n\n**July 30, 2025**: Read about the evolution from our original implementations to the current version in our [blog post](https://rlancemartin.github.io/2025/07/30/bitter_lesson/).\n\n**July 16, 2025**: Read more in our [blog](https://blog.langchain.com/open-deep-research/) and watch our [video](https://www.youtube.com/watch?v=agGiWUpxkhg) for a quick overview.\n\n### 🚀 Quickstart\n\n1. Clone the repository and activate a virtual environment:\n```bash\ngit clone https://github.com/langchain-ai/open_deep_research.git\ncd open_deep_research\nuv venv\nsource .venv/bin/activate  # On Windows: .venv\\Scripts\\activate\n```\n\n2. Install dependencies:\n```bash\nuv sync\n# or\nuv pip install -r pyproject.toml\n```\n\n3. Set up your `.env` file to customize the environment variables (for model selection, search tools, and other configuration settings):\n```bash\ncp .env.example .env\n```\n\n4. Launch agent with the LangGraph server locally:\n\n```bash\n# Install dependencies and start the LangGraph server\nuvx --refresh --from \"langgraph-cli[inmem]\" --with-editable . --python 3.11 langgraph dev --allow-blocking\n```\n\nThis will open the LangGraph Studio UI in your browser.\n\n```\n- 🚀 API: http://127.0.0.1:2024\n- 🎨 Studio UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024\n- 📚 API Docs: http://127.0.0.1:2024/docs\n```\n\nAsk a question in the `messages` input field and click `Submit`. Select different configuration in the \"Manage Assistants\" tab.\n\n### ⚙️ Configurations\n\n#### LLM :brain:\n\nOpen Deep Research supports a wide range of LLM providers via the [init_chat_model() API](https://python.langchain.com/docs/how_to/chat_models_universal_init/). It uses LLMs for a few different tasks. See the below model fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) file for more details. This can be accessed via the LangGraph Studio UI. \n\n- **Summarization** (default: `openai:gpt-4.1-mini`): Summarizes search API results\n- **Research** (default: `openai:gpt-4.1`): Power the search agent\n- **Compression** (default: `openai:gpt-4.1`): Compresses research findings\n- **Final Report Model** (default: `openai:gpt-4.1`): Write the final report\n\n\u003e Note: the selected model will need to support [structured outputs](https://python.langchain.com/docs/integrations/chat/) and [tool calling](https://python.langchain.com/docs/how_to/tool_calling/).\n\n\u003e Note: For OpenRouter: Follow [this guide](https://github.com/langchain-ai/open_deep_research/issues/75#issuecomment-2811472408) and for local models via Ollama  see [setup instructions](https://github.com/langchain-ai/open_deep_research/issues/65#issuecomment-2743586318).\n\n#### Search API :mag:\n\nOpen Deep Research supports a wide range of search tools. By default it uses the [Tavily](https://www.tavily.com/) search API. Has full MCP compatibility and work native web search for Anthropic and OpenAI. See the `search_api` and `mcp_config` fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) file for more details. This can be accessed via the LangGraph Studio UI. \n\n#### Other \n\nSee the fields in the [configuration.py](https://github.com/langchain-ai/open_deep_research/blob/main/src/open_deep_research/configuration.py) for various other settings to customize the behavior of Open Deep Research. \n\n### 📊 Evaluation\n\nOpen Deep Research is configured for evaluation with [Deep Research Bench](https://huggingface.co/spaces/Ayanami0730/DeepResearch-Leaderboard). This benchmark has 100 PhD-level research tasks (50 English, 50 Chinese), crafted by domain experts across 22 fields (e.g., Science \u0026 Tech, Business \u0026 Finance) to mirror real-world deep-research needs. It has 2 evaluation metrics, but the leaderboard is based on the RACE score. This uses LLM-as-a-judge (Gemini) to evaluate research reports against a golden set of reports compiled by experts across a set of metrics.\n\n#### Usage\n\n\u003e Warning: Running across the 100 examples can cost ~$20-$100 depending on the model selection.\n\nThe dataset is available on [LangSmith via this link](https://smith.langchain.com/public/c5e7a6ad-fdba-478c-88e6-3a388459ce8b/d). To kick off evaluation, run the following command:\n\n```bash\n# Run comprehensive evaluation on LangSmith datasets\npython tests/run_evaluate.py\n```\n\nThis will provide a link to a LangSmith experiment, which will have a name `YOUR_EXPERIMENT_NAME`. Once this is done, extract the results to a JSONL file that can be submitted to the Deep Research Bench.\n\n```bash\npython tests/extract_langsmith_data.py --project-name \"YOUR_EXPERIMENT_NAME\" --model-name \"you-model-name\" --dataset-name \"deep_research_bench\"\n```\n\nThis creates `tests/expt_results/deep_research_bench_model-name.jsonl` with the required format. Move the generated JSONL file to a local clone of the Deep Research Bench repository and follow their [Quick Start guide](https://github.com/Ayanami0730/deep_research_bench?tab=readme-ov-file#quick-start) for evaluation submission.\n\n#### Results \n\n| Name | Commit | Summarization | Research | Compression | Total Cost | Total Tokens | RACE Score | Experiment |\n|------|--------|---------------|----------|-------------|------------|--------------|------------|------------|\n| Defaults | [6532a41](https://github.com/langchain-ai/open_deep_research/commit/6532a4176a93cc9bb2102b3d825dcefa560c85d9) | openai:gpt-4.1-mini | openai:gpt-4.1 | openai:gpt-4.1 | $45.98 | 58,015,332 | 0.4309 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=cf4355d7-6347-47e2-a774-484f290e79bc\u0026baseline=undefined) |\n| Claude Sonnet 4 | [f877ea9](https://github.com/langchain-ai/open_deep_research/pull/163/commits/f877ea93641680879c420ea991e998b47aab9bcc) | openai:gpt-4.1-mini | anthropic:claude-sonnet-4-20250514 | openai:gpt-4.1 | $187.09 | 138,917,050 | 0.4401 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=04f6002d-6080-4759-bcf5-9a52e57449ea\u0026baseline=undefined) |\n| Deep Research Bench Submission | [c0a160b](https://github.com/langchain-ai/open_deep_research/commit/c0a160b57a9b5ecd4b8217c3811a14d8eff97f72) | openai:gpt-4.1-nano | openai:gpt-4.1 | openai:gpt-4.1 | $87.83 | 207,005,549 | 0.4344 | [Link](https://smith.langchain.com/o/ebbaf2eb-769b-4505-aca2-d11de10372a4/datasets/6e4766ca-6[…]ons=e6647f74-ad2f-4cb9-887e-acb38b5f73c0\u0026baseline=undefined) | \n\n### 🚀 Deployments and Usage\n\n#### LangGraph Studio\n\nFollow the [quickstart](#-quickstart) to start LangGraph server locally and test the agent out on LangGraph Studio.\n\n#### Hosted deployment\n \nYou can easily deploy to [LangGraph Platform](https://langchain-ai.github.io/langgraph/concepts/#deployment-options). \n\n#### Open Agent Platform\n\nOpen Agent Platform (OAP) is a UI from which non-technical users can build and configure their own agents. OAP is great for allowing users to configure the Deep Researcher with different MCP tools and search APIs that are best suited to their needs and the problems that they want to solve.\n\nWe've deployed Open Deep Research to our public demo instance of OAP. All you need to do is add your API Keys, and you can test out the Deep Researcher for yourself! Try it out [here](https://oap.langchain.com)\n\nYou can also deploy your own instance of OAP, and make your own custom agents (like Deep Researcher) available on it to your users.\n1. [Deploy Open Agent Platform](https://docs.oap.langchain.com/quickstart)\n2. [Add Deep Researcher to OAP](https://docs.oap.langchain.com/setup/agents)\n\n### Legacy Implementations 🏛️\n\nThe `src/legacy/` folder contains two earlier implementations that provide alternative approaches to automated research. They are less performant than the current implementation, but provide alternative ideas understanding the different approaches to deep research.\n\n#### 1. Workflow Implementation (`legacy/graph.py`)\n- **Plan-and-Execute**: Structured workflow with human-in-the-loop planning\n- **Sequential Processing**: Creates sections one by one with reflection\n- **Interactive Control**: Allows feedback and approval of report plans\n- **Quality Focused**: Emphasizes accuracy through iterative refinement\n\n#### 2. Multi-Agent Implementation (`legacy/multi_agent.py`)  \n- **Supervisor-Researcher Architecture**: Coordinated multi-agent system\n- **Parallel Processing**: Multiple researchers work simultaneously\n- **Speed Optimized**: Faster report generation through concurrency\n- **MCP Support**: Extensive Model Context Protocol integration\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenpipe%2Fopen_deep_research_training","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopenpipe%2Fopen_deep_research_training","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopenpipe%2Fopen_deep_research_training/lists"}