An open API service indexing awesome lists of open source software.

https://github.com/xlang-ai/bright

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval
https://github.com/xlang-ai/bright

benchmark reasoning retrieval

Last synced: 20 days ago
JSON representation

BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval

Awesome Lists containing this project

README

        

[//]: # (# BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval)


BRIGHT


Website
Paper
Data(4k downloads)













[//]: # (

)

[//]: # (Existing retrieval benchmarks primarily consist of information-seeking queries (e.g., aggregated questions from search engines) where keyword or semantic-based retrieval is usually sufficient. However, many real-world, complex queries necessitate in-depth reasoning to identify relevant documents that go beyond surface form matching. For example, finding documentation for a coding question requires understanding the logic and syntax of the functions involved. We introduce BRIGHT to better benchmark retrieval on such challenging and realistic scenarios.)

[//]: # (

)


Overview of BRIGHT benchmark

## 📢 Updates
- 2024-07-15: We released our [paper](https://brightbenchmark.github.io/), [code](https://github.com/xlang-ai/BRIGHT), and [data](https://huggingface.co/datasets/xlangai/BRIGHT). Check it out!

## 💾 Installation
In your local machine, we recommend to first create a virtual environment:
```bash
conda create -n bright python=3.10
conda activate bright
git clone https://github.com/xlang-ai/BRIGHT
cd BRIGHT
wget https://download.oracle.com/java/22/latest/jdk-22_linux-x64_bin.deb
sudo dpkg -i
pip install -r requirements.txt
```
That will create the environment bright with all the required packages installed.

## 🤗 Data
BRIGHT comprises 12 diverse datasets, spanning biology, economics, robotics, math, code and more.
The queries can be long StackExchange posts, math or code question.
The documents can be blogs, news, articles, reports, etc.
See [Huggingface page](https://huggingface.co/datasets/xlangai/BRIGHT) for more details.

## 📊 Evaluation
We evaluate 13 representative retrieval models of diverse sizes and architectures. Run the following command to get results:
```
python run.py --task {task} --model {model}
```
* `--task`: the task/dataset to evaluate. It can take one of `biology`,`earth_science`,`economics`,`psychology`,`robotics`,`stackoverflow`,`sustainable_living`,`leetcode`,`pony`,`aops`,`theoremqa`,
* `--model`: the model to evaluate. Current implementation supports `bm25`,`cohere`,`e5`,`google`,`grit`,`inst-l`,`inst-xl`,`openai`,`qwen`,`sbert`,`sf`,`voyage` and `bge`. \
Optional:
* `--long_context`: whether to evaluate on the long-context setting, default to `False`
* `--query_max_length`: the maximum length for the query
* `--doc_max_length`: the maximum length for the document
* `--encode_batch_size`: the encoding batch size
* `--output_dir`: the directory to output results
* `--cache_dir`: the directory to cache document embeddings
* `--config_dir`: the directory of instruction configurations
* `-checkpoint`: the specific checkpoint to use
* `--key`: key for proprietary models
* `--debug`: whether to turn on the debug mode and load only a few documents

### 🔍 Add custom model?
It is very easy to add evaluate custom models on BRIGHT. Just implement the following function in `retrievers.py` and add it to the mapping `RETRIEVAL_FUNCS`:
```python
def retrieval_model_function_name(queries,query_ids,documents,doc_ids,excluded_ids,**kwargs):
...
return scores
```
where `scores` is in the format:
```bash
{
"query_id_1": {
"doc_id_1": score_1,
"doc_id_2": score_2,
...
"doc_id_n": socre_n
},
...
"query_id_m": {
"doc_id_1": score_1,
"doc_id_2": score_2,
...
"doc_id_n": socre_n
}
}
```

## ❓Bugs or questions?
If you have any question related to the code or the paper, feel free to email Hongjin ([email protected]), Howard ([email protected]) or Mengzhou ([email protected]). Please try to specify the problem with details so we can help you better and quicker.

## Citation
If you find our work helpful, please cite us:
```citation
@misc{BRIGHT,
title={BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval},
author={Su, Hongjin and Yen, Howard and Xia, Mengzhou and Shi, Weijia and Muennighoff, Niklas and Wang, Han-yu and Liu, Haisu and Shi, Quan and Siegel, Zachary S and Tang, Michael and Sun, Ruoxi and Yoon, Jinsung and Arik, Sercan O and Chen, Danqi and Yu, Tao},
url={https://arxiv.org/abs/2407.12883},
year={2024},
}
```