{"id":25143007,"url":"https://github.com/mshumer/OpenDeepResearcher","last_synced_at":"2025-10-23T16:30:58.017Z","repository":{"id":275672509,"uuid":"926822398","full_name":"mshumer/OpenDeepResearcher","owner":"mshumer","description":null,"archived":false,"fork":false,"pushed_at":"2025-02-03T23:55:28.000Z","size":21,"stargazers_count":39,"open_issues_count":0,"forks_count":7,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-02-04T00:22:53.031Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mshumer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-03T23:08:25.000Z","updated_at":"2025-02-04T00:22:17.000Z","dependencies_parsed_at":"2025-02-04T00:22:56.044Z","dependency_job_id":"5816ab21-5969-4491-8787-053ab6f9a9fa","html_url":"https://github.com/mshumer/OpenDeepResearcher","commit_stats":null,"previous_names":["mshumer/opendeepresearcher"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mshumer%2FOpenDeepResearcher","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mshumer%2FOpenDeepResearcher/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mshumer%2FOpenDeepResearcher/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mshumer%2FOpenDeepResearcher/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mshumer","download_url":"https://codeload.github.com/mshumer/OpenDeepResearcher/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237855749,"owners_count":19377024,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-02-08T19:01:44.917Z","updated_at":"2025-10-23T16:30:58.012Z","avatar_url":"https://github.com/mshumer.png","language":"Jupyter Notebook","readme":"# OpenDeepResearcher\n\nThis notebook implements an **AI researcher** that continuously searches for information based on a user query until the system is confident that it has gathered all the necessary details. It makes use of several services to do so:\n\n- **SERPAPI**: To perform Google searches.\n- **Jina**: To fetch and extract webpage content.\n- **OpenRouter** (default model: `anthropic/claude-3.5-haiku`): To interact with a LLM for generating search queries, evaluating page relevance, and extracting context.\n\n## Features\n\n- **Iterative Research Loop:** The system refines its search queries iteratively until no further queries are required.\n- **Asynchronous Processing:** Searches, webpage fetching, evaluation, and context extraction are performed concurrently to improve speed.\n- **Duplicate Filtering:** Aggregates and deduplicates links within each round, ensuring that the same link isn’t processed twice.\n- **LLM-Powered Decision Making:** Uses the LLM to generate new search queries, decide on page usefulness, extract relevant context, and produce a final comprehensive report.\n- **Gradio Interface:** Use the `open-deep-researcher - gradio` notebook if you want to use this in a functional UI\n\n## Requirements\n\n- API access and keys for:\n  - **OpenRouter API**\n  - **SERPAPI API**\n  - **Jina API**\n\n## Setup\n\n1. **Clone or Open the Notebook:**\n   - Download the notebook file or open it directly in [Google Colab](https://colab.research.google.com/github/mshumer/OpenDeepResearcher/blob/main/open_deep_researcher.ipynb).\n\n2. **Install `nest_asyncio`:**\n\n   Run the first cell to set up `nest_asyncio`.\n\n3. **Configure API Keys:**\n   - Replace the placeholder values in the notebook for `OPENROUTER_API_KEY`, `SERPAPI_API_KEY`, and `JINA_API_KEY` with your actual API keys.\n\n## Usage\n\n1. **Run the Notebook Cells:**\n   Execute all cells in order. The notebook will prompt you for:\n   - A research query/topic.\n   - An optional maximum number of iterations (default is 10).\n\n2. **Follow the Research Process:**\n   - **Initial Query \u0026 Search Generation:** The notebook uses the LLM to generate initial search queries.\n   - **Asynchronous Searches \u0026 Extraction:** It performs SERPAPI searches for all queries concurrently, aggregates unique links, and processes each link in parallel to determine page usefulness and extract relevant context.\n   - **Iterative Refinement:** After each round, the aggregated context is analyzed by the LLM to determine if further search queries are needed.\n   - **Final Report:** Once the LLM indicates that no further research is needed (or the iteration limit is reached), a final report is generated based on all gathered context.\n\n3. **View the Final Report:**\n   The final comprehensive report will be printed in the output.\n\n## How It Works\n\n1. **Input \u0026 Query Generation:**  \n   The user enters a research topic, and the LLM generates up to four distinct search queries.\n\n2. **Concurrent Search \u0026 Processing:**  \n   - **SERPAPI:** Each search query is sent to SERPAPI concurrently.\n   - **Deduplication:** All retrieved links are aggregated and deduplicated within the current iteration.\n   - **Jina \u0026 LLM:** Each unique link is processed concurrently to fetch webpage content via Jina, evaluate its usefulness with the LLM, and extract relevant information if the page is deemed useful.\n\n3. **Iterative Refinement:**  \n   The system passes the aggregated context to the LLM to determine if further search queries are needed. New queries are generated if required; otherwise, the loop terminates.\n\n4. **Final Report Generation:**  \n   All gathered context is compiled and sent to the LLM to produce a final, comprehensive report addressing the original query.\n\n## Troubleshooting\n\n- **RuntimeError with asyncio:**  \n  If you encounter an error like:\n  ```\n  RuntimeError: asyncio.run() cannot be called from a running event loop\n  ```\n  Ensure you have applied `nest_asyncio` as shown in the setup section.\n\n- **API Issues:**  \n  Verify that your API keys are correct and that you are not exceeding any rate limits.\n\n---\n\nFollow me on [X](https://x.com/mattshumer_) for updates on this and other AI things I'm working on.\n\nHead to [ShumerPrompt](https://shumerprompt.com), my \"Github for Prompts\"!\n\nOpenDeepResearcher is released under the MIT License. See the LICENSE file for more details.\n","funding_links":[],"categories":["Deep Research Agents","NLP","📊 Data Table","🤖 Deep Research Systems"],"sub_categories":["3. Pretraining","🌐 Open-Source Deep Research Implementations"],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmshumer%2FOpenDeepResearcher","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmshumer%2FOpenDeepResearcher","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmshumer%2FOpenDeepResearcher/lists"}