https://github.com/mshumer/OpenDeepResearcher

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/mshumer/OpenDeepResearcher
Owner: mshumer
License: mit
Created: 2025-02-03T23:08:25.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-02-03T23:55:28.000Z (5 months ago)
Last Synced: 2025-02-04T00:22:53.031Z (5 months ago)
Language: Jupyter Notebook
Size: 20.5 KB
Stars: 39
Watchers: 2
Forks: 7
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-generative-ai-data-scientist - OpenDeepResearcher
awesome-ai-papers - [OpenDeepResearcher - DeepResearch](https://github.com/jina-ai/node-DeepResearch)\]\[[open-deep-research](https://github.com/nickscamara/open-deep-research)\]\[[open-deep-research blog](https://huggingface.co/blog/open-deep-research)\]\[[open_deep_research](https://github.com/huggingface/smolagents/tree/main/examples/open_deep_research)\]\[[deep-research](https://github.com/dzhng/deep-research)\]\[[deep-research](https://github.com/u14app/deep-research)\]\[[Auto-Deep-Research](https://github.com/HKUDS/Auto-Deep-Research)\]\[[deep-searcher](https://github.com/zilliztech/deep-searcher)\]\[[local-deep-research](https://github.com/LearningCircuit/local-deep-research)\]\[[local-deep-researcher](https://github.com/langchain-ai/local-deep-researcher)\]\[[open_deep_research](https://github.com/langchain-ai/open_deep_research)\]\[[Agentic-Reasoning](https://github.com/theworldofagents/Agentic-Reasoning)\] (NLP / 3. Pretraining)
awesome-deep-research - OpenDeepResearcher - An AI researcher that continuously searches for information based on a user query, using SERPAPI for Google searches, Jina for webpage content extraction, and OpenRouter (default: claude-3.5-haiku) for generating search queries and evaluating relevance. (🤖 Deep Research Systems / 🌐 Open-Source Deep Research Implementations)
awesome-deep-research - OpenDeepResearcher - An AI researcher that continuously searches for information based on a user query, using SERPAPI for Google searches, Jina for webpage content extraction, and OpenRouter (default: claude-3.5-haiku) for generating search queries and evaluating relevance. (🤖 Deep Research Systems / 🌐 Open-Source Deep Research Implementations)

README

# OpenDeepResearcher

This notebook implements an **AI researcher** that continuously searches for information based on a user query until the system is confident that it has gathered all the necessary details. It makes use of several services to do so:

- **SERPAPI**: To perform Google searches.
- **Jina**: To fetch and extract webpage content.
- **OpenRouter** (default model: `anthropic/claude-3.5-haiku`): To interact with a LLM for generating search queries, evaluating page relevance, and extracting context.

## Features

- **Iterative Research Loop:** The system refines its search queries iteratively until no further queries are required.
- **Asynchronous Processing:** Searches, webpage fetching, evaluation, and context extraction are performed concurrently to improve speed.
- **Duplicate Filtering:** Aggregates and deduplicates links within each round, ensuring that the same link isn’t processed twice.
- **LLM-Powered Decision Making:** Uses the LLM to generate new search queries, decide on page usefulness, extract relevant context, and produce a final comprehensive report.
- **Gradio Interface:** Use the `open-deep-researcher - gradio` notebook if you want to use this in a functional UI

## Requirements

- API access and keys for:
- **OpenRouter API**
- **SERPAPI API**
- **Jina API**

## Setup

1. **Clone or Open the Notebook:**
- Download the notebook file or open it directly in [Google Colab](https://colab.research.google.com/github/mshumer/OpenDeepResearcher/blob/main/open_deep_researcher.ipynb).

2. **Install `nest_asyncio`:**

Run the first cell to set up `nest_asyncio`.

3. **Configure API Keys:**
- Replace the placeholder values in the notebook for `OPENROUTER_API_KEY`, `SERPAPI_API_KEY`, and `JINA_API_KEY` with your actual API keys.

## Usage

1. **Run the Notebook Cells:**
Execute all cells in order. The notebook will prompt you for:
- A research query/topic.
- An optional maximum number of iterations (default is 10).

2. **Follow the Research Process:**
- **Initial Query & Search Generation:** The notebook uses the LLM to generate initial search queries.
- **Asynchronous Searches & Extraction:** It performs SERPAPI searches for all queries concurrently, aggregates unique links, and processes each link in parallel to determine page usefulness and extract relevant context.
- **Iterative Refinement:** After each round, the aggregated context is analyzed by the LLM to determine if further search queries are needed.
- **Final Report:** Once the LLM indicates that no further research is needed (or the iteration limit is reached), a final report is generated based on all gathered context.

3. **View the Final Report:**
The final comprehensive report will be printed in the output.

## How It Works

1. **Input & Query Generation:**
The user enters a research topic, and the LLM generates up to four distinct search queries.

2. **Concurrent Search & Processing:**
- **SERPAPI:** Each search query is sent to SERPAPI concurrently.
- **Deduplication:** All retrieved links are aggregated and deduplicated within the current iteration.
- **Jina & LLM:** Each unique link is processed concurrently to fetch webpage content via Jina, evaluate its usefulness with the LLM, and extract relevant information if the page is deemed useful.

3. **Iterative Refinement:**
The system passes the aggregated context to the LLM to determine if further search queries are needed. New queries are generated if required; otherwise, the loop terminates.

4. **Final Report Generation:**
All gathered context is compiled and sent to the LLM to produce a final, comprehensive report addressing the original query.

## Troubleshooting

- **RuntimeError with asyncio:**
If you encounter an error like:
```
RuntimeError: asyncio.run() cannot be called from a running event loop
```
Ensure you have applied `nest_asyncio` as shown in the setup section.

- **API Issues:**
Verify that your API keys are correct and that you are not exceeding any rate limits.

---

Follow me on [X](https://x.com/mattshumer_) for updates on this and other AI things I'm working on.

OpenDeepResearcher is released under the MIT License. See the LICENSE file for more details.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/mshumer/OpenDeepResearcher

Awesome Lists containing this project

README