{"id":20064452,"url":"https://github.com/xlang-ai/bright","last_synced_at":"2025-04-07T07:01:35.938Z","repository":{"id":244163756,"uuid":"801264839","full_name":"xlang-ai/BRIGHT","owner":"xlang-ai","description":"BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval","archived":false,"fork":false,"pushed_at":"2025-02-12T06:04:35.000Z","size":12978,"stargazers_count":92,"open_issues_count":4,"forks_count":9,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-31T06:01:36.286Z","etag":null,"topics":["benchmark","reasoning","retrieval"],"latest_commit_sha":null,"homepage":"https://brightbenchmark.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc-by-4.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/xlang-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-05-15T22:37:54.000Z","updated_at":"2025-03-26T04:10:03.000Z","dependencies_parsed_at":"2024-10-23T03:16:14.127Z","dependency_job_id":"43337b11-3511-40d0-974f-85f077634be5","html_url":"https://github.com/xlang-ai/BRIGHT","commit_stats":null,"previous_names":["xlang-ai/bright"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FBRIGHT","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FBRIGHT/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FBRIGHT/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/xlang-ai%2FBRIGHT/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/xlang-ai","download_url":"https://codeload.github.com/xlang-ai/BRIGHT/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247608150,"owners_count":20965952,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["benchmark","reasoning","retrieval"],"created_at":"2024-11-13T13:46:18.383Z","updated_at":"2025-04-07T07:01:35.895Z","avatar_url":"https://github.com/xlang-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[//]: # (# BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval)\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"figures/badge.png\" alt=\"BRIGHT\"\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://brightbenchmark.github.io/\"\u003eWebsite\u003c/a\u003e •\n  \u003ca href=\"https://arxiv.org/abs/2407.12883\"\u003ePaper\u003c/a\u003e •\n  \u003ca href=\"https://huggingface.co/datasets/xlangai/BRIGHT\"\u003eData(4k downloads)\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/xlang-ai/BRIGHT/issues\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/PRs-Welcome-red\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://img.shields.io/github/last-commit/xlang-ai/BRIGHT?color=green\"\u003e\n        \u003cimg src=\"https://img.shields.io/github/last-commit/xlang-ai/BRIGHT?color=green\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/xlang-ai/BRIGHT?tab=CC-BY-4.0-1-ov-file#readme\"\u003e\n        \u003cimg src=\"https://img.shields.io/badge/license-CC--BY--4.0-blue\"\u003e\n    \u003c/a\u003e\n    \u003cbr/\u003e\n\u003c/p\u003e\n\n[//]: # (\u003cp\u003e)\n\n[//]: # (Existing retrieval benchmarks primarily consist of information-seeking queries \u0026#40;e.g., aggregated questions from search engines\u0026#41; where keyword or semantic-based retrieval is usually sufficient. However, many real-world, complex queries necessitate in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. We introduce BRIGHT to better benchmark retrieval on such challenging and realistic scenarios.)\n\n[//]: # (\u003c/p\u003e)\n\n\u003cp align=\"center\"\u003e\n    \u003cimg src=\"figures/figure1.png\" width=\"85%\" alt=\"Overview of BRIGHT benchmark\"\u003e\n\u003c/p\u003e\n\n## 📢 Updates\n- 2024-07-15: We released our [paper](https://brightbenchmark.github.io/), [code](https://github.com/xlang-ai/BRIGHT), and [data](https://huggingface.co/datasets/xlangai/BRIGHT). Check it out!\n\n\n\u003c!--\nThis repository contains the code for our paper BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval.\n\nWe introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents. We collect 1,385 real-world queries from diverse domains (StackExchange, LeetCode, and math competitions), sourced from naturally occurring or carefully curated human data. We pair these queries with web pages linked in StackExchange answers, tagged theorems in math Olympiad questions—-all of which require deliberate reasoning to identify the connections.\n--\u003e\n\n## 💾 Installation\nIn your local machine, we recommend to first create a virtual environment:\n```bash\nconda create -n bright python=3.10\nconda activate bright\ngit clone https://github.com/xlang-ai/BRIGHT\ncd BRIGHT\nwget https://download.oracle.com/java/22/latest/jdk-22_linux-x64_bin.deb\nsudo dpkg -i\npip install -r requirements.txt\n```\nThat will create the environment bright with all the required packages installed.\n\n## 🤗 Data\nBRIGHT comprises 12 diverse datasets, spanning biology, economics, robotics, math, code and more. \nThe queries can be long StackExchange posts, math or code question. \nThe documents can be blogs, news, articles, reports, etc.\nSee [Huggingface page](https://huggingface.co/datasets/xlangai/BRIGHT) for more details.\n\n## 📊 Evaluation\nWe evaluate 13 representative retrieval models of diverse sizes and architectures. Run the following command to get results:\n```\npython run.py --task {task} --model {model}\n```\n* `--task`: the task/dataset to evaluate. It can take one of `biology`,`earth_science`,`economics`,`psychology`,`robotics`,`stackoverflow`,`sustainable_living`,`leetcode`,`pony`,`aops`,`theoremqa`, \n* `--model`: the model to evaluate. Current implementation supports `bm25`,`cohere`,`e5`,`google`,`grit`,`inst-l`,`inst-xl`,`openai`,`qwen`,`sbert`,`sf`,`voyage` and `bge`. \\\nOptional:\n* `--long_context`: whether to evaluate on the long-context setting, default to `False`\n* `--query_max_length`: the maximum length for the query\n* `--doc_max_length`: the maximum length for the document\n* `--encode_batch_size`: the encoding batch size\n* `--output_dir`: the directory to output results\n* `--cache_dir`: the directory to cache document embeddings\n* `--config_dir`: the directory of instruction configurations\n* `-checkpoint`: the specific checkpoint to use\n* `--key`: key for proprietary models\n* `--debug`: whether to turn on the debug mode and load only a few documents\n\n### 🔍 Add custom model?\nIt is very easy to add evaluate custom models on BRIGHT. Just implement the following function in `retrievers.py` and add it to the mapping `RETRIEVAL_FUNCS`:\n```python\ndef retrieval_model_function_name(queries,query_ids,documents,doc_ids,excluded_ids,**kwargs):\n    ...\n    return scores\n```\nwhere `scores` is in the format:\n```bash\n{\n  \"query_id_1\": {\n    \"doc_id_1\": score_1,\n    \"doc_id_2\": score_2,\n    ...\n    \"doc_id_n\": socre_n\n  },\n  ...\n  \"query_id_m\": {\n    \"doc_id_1\": score_1,\n    \"doc_id_2\": score_2,\n    ...\n    \"doc_id_n\": socre_n\n  }\n}\n```\n\n## ❓Bugs or questions?\nIf you have any question related to the code or the paper, feel free to email Hongjin (hjsu@cs.hku.hk), Howard (hyen@cs.princeton.edu) or Mengzhou (mengzhou@cs.princeton.edu). Please try to specify the problem with details so we can help you better and quicker.\n\n## Citation\nIf you find our work helpful, please cite us:\n```citation\n@misc{BRIGHT,\n  title={BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval},\n  author={Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O and Chen, Danqi and Yu, Tao},\n  url={https://arxiv.org/abs/2407.12883},\n  year={2024},\n}\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxlang-ai%2Fbright","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fxlang-ai%2Fbright","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fxlang-ai%2Fbright/lists"}