{"id":31753742,"url":"https://github.com/servicenow/drbench","last_synced_at":"2026-02-21T22:02:04.836Z","repository":{"id":305206527,"uuid":"1020872153","full_name":"ServiceNow/drbench","owner":"ServiceNow","description":"An enterprise deep research benchmark","archived":false,"fork":false,"pushed_at":"2025-11-05T22:00:22.000Z","size":13331,"stargazers_count":23,"open_issues_count":1,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-11-06T00:05:53.249Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ServiceNow.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-16T14:19:07.000Z","updated_at":"2025-11-05T22:00:27.000Z","dependencies_parsed_at":"2025-07-18T23:11:22.076Z","dependency_job_id":"af7aba40-b197-4325-9ecf-8bc32bc49d2b","html_url":"https://github.com/ServiceNow/drbench","commit_stats":null,"previous_names":["servicenow/drbench"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ServiceNow/drbench","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2Fdrbench","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2Fdrbench/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2Fdrbench/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2Fdrbench/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ServiceNow","download_url":"https://codeload.github.com/ServiceNow/drbench/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ServiceNow%2Fdrbench/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29694785,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-21T18:18:25.093Z","status":"ssl_error","status_checked_at":"2026-02-21T18:18:22.435Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-09T17:54:00.281Z","updated_at":"2026-02-21T22:02:04.827Z","avatar_url":"https://github.com/ServiceNow.png","language":"Python","readme":"# DrBench Enterprise Research Benchmark\n\n\n![drbench_banner.png](docs/drbench_banner.png)\n\n\n\n[![Read the Paper](https://img.shields.io/badge/Paper-arXiv-8512DA)](https://arxiv.org/abs/2510.00172) [![Join our Discord](https://img.shields.io/badge/Discord-Join%20the%20Community-7289DA?logo=discord)](https://discord.gg/9rQ6HgBbkd)\n\n**`DRBench`** is the first of its kind benchmark designed to evaluate deep research agents on complex, open-ended **enterprise deep research tasks**.\n\nIt tests an agent’s ability to conduct **multi-hop, insight-driven research** across public and private data sources,just like a real enterprise analyst.\n\n\n\n\n## Data Overview\n\nThe benchmark data is included in `drbench/data/`:\n\n- **DR Questions**: [DR Questions CSV](drbench/data/summary/dr_questions.csv)\n- **Facts**: [Facts Directory](drbench/data/summary/facts/)\n- **Tasks**: Complete task configurations in `drbench/data/tasks/` with enterprise files\n\n**Alternative Data Source**: The dataset is also available on HuggingFace at [ServiceNow/drbench](https://huggingface.co/datasets/ServiceNow/drbench) for browsing task metadata and using with the `DRBENCH_DATA_DIR` environment variable.\n\n## Quick Start\n\n### Install Requirements\n\n```bash\nuv pip install -e .\n```\n\n**Custom Data Directory:** By default, the library uses data from `drbench/data/`. To use a custom data location (e.g., cloned from HuggingFace), set the `DRBENCH_DATA_DIR` environment variable:\n```bash\nexport DRBENCH_DATA_DIR=/path/to/custom/data\n```\n\n### (1) Quick Run (Without Docker) \n\n```\npython minimal_local.py \n```\n\nThis loads task SANITY0, generates a basic report and saves the results under `results/minimal_local`\n\n### (2) Quick Run (With Docker) \n\n\n### Install Docker (https://www.docker.com/get-started/)\n```\ncd services\nmake local-build\n```\nthis takes around 30 minutes and only has to be done once\n\n### Run agent on Docker Environment\n\n```\npython minimal.py \n```\n\nThis loads task DR0001, generates a basic report and saves the results under `results/minimal`\n\n\n\n\n### (3) Test Your Own Agent\n\nBuild and evaluate your own research agent in just 4 steps!\n\n#### (a) Load a Task\n\nFirst, pick a task to work with:\n\n```python\nfrom drbench import task_loader\ntask = task_loader.get_task_from_id(\"DR0001\")\n```\n\nSee what the task is about:\n\n```python\nprint(task.summary())\nprint(task.get_dr_question())\n```\n\n#### (b) Create Your Agent\n\nYour agent needs a `generate_report` method that returns a report with structured insights:\n\n```python\nclass MyAgent:\n    def generate_report(self, query, env):\n        # Your research logic here\n        # 1. Search across files, emails, chats, and web\n        # 2. Extract insights with supporting citations\n        # 3. Synthesize into a comprehensive report\n\n        insights = [\n            {\n                \"claim\": \"Key finding from your research\",\n                \"citations\": [\"file.pdf\", \"https://example.com\"]\n            },\n            # ... more insights\n        ]\n\n        return {\n            \"report_insights\": insights,  # List of claim-citation pairs\n            \"report_text\": report_text     # Full report as string\n        }\n```\n\nRefer to `BasicAgent` for a simple example in `drbench/agents/basic_agent.py`\n\nOr use the full `DrBenchAgent` in `drbench/agents/drbench_agent/drbench_agent.py`:\n\n#### (d) Evaluate Your Report\n\nSee how well your agent did:\n\n```python\nfrom drbench.score_report import score_report\nscores = score_report(\n    predicted_report=report,\n    task=task,\n    savedir=\"my_results\"\n)\n\nprint(f\"Insights Recall: {scores['insights_recall']:.3f}\")\n```\n\n---\n\n## 🧠 Why `drbench`?\n\n- **🔎 Real Deep Research Tasks**  \n  Not simple fact lookups. It has tasks like _\"What changes should we make to our product roadmap to ensure compliance?\"_ which require multi-step reasoning, synthesis, and reporting.\n\n- **🏢 Enterprise Context Grounding**  \n  Each task is rooted in realistic **user personas** (e.g., Product Developer) and **organizational settings** (e.g., ServiceNow), for deep understanding and contextual awareness.\n\n- **🧩 Multi-Modal, Multi-Source Reasoning**  \n  Agents must search, retrieve, and reason across:\n  - Internal chat logs 💬  \n  - Cloud file systems 📂  \n  - Spreadsheets 📊  \n  - PDFs 📄  \n  - Websites 🌐  \n  - Emails 📧\n\n- **🧠 Insight-Centric Evaluation**  \n  Reports are scored based on whether agents extract the **most critical insights** and **properly cite** their sources.\n\n---\n\n## 📦 What You have\n\n✅ The **first benchmark** for deep research across hybrid enterprise environments  \n✅ A suite of **real-world tasks** across Enterprise UseCases like CRM\n✅ A **realistic simulated enterprise stack** (chat, docs, email, web, etc.)  \n✅ A task generation framework blending **web-based facts** and **local context**  \n✅ A **lightweight, scalable evaluation mechanism** for insightfulness and citation\n\n\n---\n\n## 🤝 Get Involved\n\nInterested in early access, collaboration, or feedback?  \n- Reach out via [issam.laradji@servicenow.com]\n- Join our Discord Channel [https://discord.gg/9rQ6HgBbkd]\n\n---\n\n## 🤝 Core Contributers\n\n- Amirhossein Abaskohi – \u003camirhossein.abaskohi@servicenow.com\u003e\n- Tianyi Chen – \u003ctianyi.chen@servicenow.com\u003e  \n- Miguel Muñoz – \u003cmiguel.munoz@servicenow.com\u003e  \n- Curtis Fox - \u003ccurtis.fox@servicenow.com\u003e\n- Alex Drioun – \u003calexandre.drouin@servicenow.com\u003e  \n- Issam Laradji – \u003cissam.laradji@servicenow.com\u003e\n\n\n#### Citation\n\n```\n@article{abaskohi2025drbench,\n  title={DRBench: A Realistic Benchmark for Enterprise Deep Research},\n  author={Abaskohi, Amirhossein and Chen, Tianyi and Mu{\\~n}oz-M{\\'a}rmol, Miguel and Fox, Curtis and Ramesh, Amrutha Varshini and Marcotte, {\\'E}tienne and L{\\`u}, Xing Han and Chapados, Nicolas and Gella, Spandana and Pal, Christopher and others},\n  journal={arXiv preprint arXiv:2510.00172},\n  year={2025}\n}\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fservicenow%2Fdrbench","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fservicenow%2Fdrbench","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fservicenow%2Fdrbench/lists"}