{"id":21839089,"url":"https://github.com/illuin-tech/vidore-benchmark","last_synced_at":"2025-04-12T00:13:21.829Z","repository":{"id":246424797,"uuid":"816996516","full_name":"illuin-tech/vidore-benchmark","owner":"illuin-tech","description":"Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.","archived":false,"fork":false,"pushed_at":"2025-04-10T12:50:39.000Z","size":3119,"stargazers_count":191,"open_issues_count":6,"forks_count":24,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-12T00:13:11.406Z","etag":null,"topics":["colpali","rag","retrieval","search","vision-language-model"],"latest_commit_sha":null,"homepage":"https://huggingface.co/vidore","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/illuin-tech.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-18T20:02:03.000Z","updated_at":"2025-04-10T12:41:09.000Z","dependencies_parsed_at":"2024-06-27T22:59:27.416Z","dependency_job_id":"e0f57717-7088-4ddf-acf9-b48ccc90cabf","html_url":"https://github.com/illuin-tech/vidore-benchmark","commit_stats":null,"previous_names":["tonywu71/vidore-benchmark"],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fvidore-benchmark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fvidore-benchmark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fvidore-benchmark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fvidore-benchmark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/illuin-tech","download_url":"https://codeload.github.com/illuin-tech/vidore-benchmark/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248497820,"owners_count":21113984,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colpali","rag","retrieval","search","vision-language-model"],"created_at":"2024-11-27T21:15:52.670Z","updated_at":"2025-04-12T00:13:21.798Z","avatar_url":"https://github.com/illuin-tech.png","language":"Python","readme":"# Vision Document Retrieval (ViDoRe): Benchmarks 👀\n\n[![arXiv](https://img.shields.io/badge/arXiv-2407.01449-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2407.01449)\n[![GitHub](https://img.shields.io/badge/ColPali_Engine-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/illuin-tech/colpali)\n[![Hugging Face](https://img.shields.io/badge/Vidore_Hf_Space-FFD21E?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000)](https://huggingface.co/vidore)\n\n[![Test](https://github.com/illuin-tech/vidore-benchmark/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/illuin-tech/vidore-benchmark/actions/workflows/test.yml)\n[![Version](https://img.shields.io/pypi/v/vidore-benchmark?color=%2334D058\u0026label=pypi%20package)](https://pypi.org/project/vidore-benchmark/)\n[![Downloads](https://static.pepy.tech/badge/vidore-benchmark)](https://pepy.tech/project/vidore-benchmark)\n\n---\n\n[[Model card]](https://huggingface.co/vidore/colpali)\n[[ViDoRe Leaderboard]](https://huggingface.co/spaces/vidore/vidore-leaderboard)\n[[Demo]](https://huggingface.co/spaces/manu/ColPali-demo)\n[[Blog Post]](https://huggingface.co/blog/manu/colpali)\n\n## Approach\n\nThe Visual Document Retrieval Benchmarks (ViDoRe v1 and v2), is introduced to evaluate the performance of document retrieval systems on visually rich documents across various tasks, domains, languages, and settings. It was used to evaluate the ColPali model, a VLM-powered retriever that efficiently retrieves documents based on their visual content and textual queries using a late-interaction mechanism.\n\n![ViDoRe Examples](assets/vidore_examples.webp)\n\n## Usage\n\nThis packages comes with a Python API and a CLI to evaluate your own retriever on the ViDoRe benchmark. Both are compatible with `Python\u003e=3.9`.\n\n### CLI mode\n\n```bash\npip install vidore-benchmark\n```\n\nTo keep this package lightweight, only the essential packages were installed. Thus, you must specify the dependency groups for models you want to evaluate with CLI (see the list in `pyproject.toml`). For instance, if you are going to evaluate the ColVision models (e.g. ColPali, ColQwen2, ColSmol, ...), you should run:\n\n```bash\npip install \"vidore-benchmark[colpali-engine]\"\n```\n\n\u003e [!WARNING]\n\u003e If possible, do not `pip install colpali-engine` directly in the env dedicated for the CLI.\n\u003e \n\u003e In particular, make sure not to install both `vidore-benchmark[colpali-engine]` and `colpali-engine[train]` simultaneously, as it will lead to a circular depencency conflict.\n\nIf you want to install all the dependencies for all the models, you can run:\n\n```bash\npip install \"vidore-benchmark[all-retrievers]\"\n```\n\nNote that in order to use `BM25Retriever`, you will need to download the `nltk` resources too:\n\n```bash\npip install \"vidore-benchmark[bm25]\"\npython -m nltk.downloader punkt punkt_tab stopwords\n```\n\n### Library mode\n\nInstall the base package using pip:\n\n```bash\npip install vidore-benchmark\n```\n\n## Command-line usage\n\n### Evaluate a retriever on ViDoRE\n\nYou can evaluate any off-the-shelf retriever on the ViDoRe benchmark v1. For instance, you\ncan evaluate the ColPali model on the ViDoRe benchmark 1 to reproduce the results from our paper.\n\n```bash\nvidore-benchmark evaluate-retriever \\\n    --model-class colpali \\\n    --model-name vidore/colpali-v1.3 \\\n    --collection-name vidore/vidore-benchmark-667173f98e70a1c0fa4db00d \\\n    --dataset-format qa \\\n    --split test\n```\n\nIf you want to evaluate your models on on new collection ViDoRe benchmark 2, a harder version of the previous benchmark you can execute the following command :\n```bash\nvidore-benchmark evaluate-retriever \\\n    --model-class colpali \\\n    --model-name vidore/colpali-v1.3 \\\n    --collection-name https://huggingface.co/collections/vidore/vidore-benchmark-v2-67ae03e3924e85b36e7f53b0 \\\n    --dataset-format beir \\\n    --split test\n```\n\nAlternatively, you can evaluate your model on a single dataset. If your retriver uses visual embeddings, you can use any dataset path from the [ViDoRe Benchmark v1](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) collection or the [ViDoRe Benchmark v2](https://huggingface.co/collections/vidore/vidore-benchmark-v2-67ae03e3924e85b36e7f53b0) (beir format instead of qa), e.g.:\n\n```bash\nvidore-benchmark evaluate-retriever \\\n    --model-class colpali \\\n    --model-name vidore/colpali-v1.3 \\\n    --dataset-name vidore/docvqa_test_subsampled \\\n    --dataset-format qa \\\n    --split test\n```\n\nIf you want to evaluate a retriever that relies on pure-text retrieval (no visual embeddings), you should use the datasets from the [ViDoRe Chunk OCR (baseline)](https://huggingface.co/collections/vidore/vidore-chunk-ocr-baseline-666acce88c294ef415548a56) instead:\n\n```bash\nvidore-benchmark evaluate-retriever \\\n    --model-class bge-m3 \\\n    --model-name BAAI/bge-m3 \\\n    --dataset-name vidore/docvqa_test_subsampled_tesseract \\\n    --dataset-format qa \\\n    --split test\n```\n\nAll the above scripts will generate a JSON file in `outputs/{model_id}_metrics.json`. Follow the instructions on the [ViDoRe Leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard) to learn how to publish your results on the leaderboard too!\n\n\u003e [!NOTE]\n\u003e The `vidore-benchmark` package supports two formats of datasets:\n\u003e \n\u003e - QA: The dataset is formatted as a question-answering task, where the queries are questions and the passages are the image pages that provide the answers.\n\u003e - BEIR: Following the [BEIR paper](https://doi.org/10.48550/arXiv.2104.08663), the dataset is formatted in 3 sub-datasets: `corpus`, `queries`, and `qrels`. The `corpus` contains the documents, the `queries` contains the queries, and the `qrels` contains the relevance scores between the queries and the documents.\n\u003e\n\u003e In the first iteration of the ViDoRe benchmark, we **arbitrarily choose** to deduplicate the queries for the QA datasets. While this made sense given our data generation process, it wasn't suited for our ViDoRe benchmark v2 which aims at being broader and multilingual. We will release the ViDoRe benchmark v2 soon.\n\n| Dataset                                                                                                    | Dataset format | Deduplicate queries |\n|------------------------------------------------------------------------------------------------------------|----------------|---------------------|\n| [ViDoRe benchmark v1](https://huggingface.co/collections/vidore/vidore-benchmark-667173f98e70a1c0fa4db00d) | QA             | ✅                   |\n| [ViDoRe benchmark v2](https://huggingface.co/collections/vidore/vidore-benchmark-v2-67ae03e3924e85b36e7f53b0) (harder/multilingual)                                                | BEIR           | ❌                   |\n\n### Documentation\n\nTo have more control over the evaluation process (e.g. the batch size used at inference), read the CLI documentation using:\n\n```bash\nvidore-benchmark evaluate-retriever --help\n```\n\nIn particular, feel free to play with the `--batch-query`, `--batch-passage`, `--batch-score`, and `--num-workers` inputs to speed up the evaluation process.\n\n## Python usage\n\n### Quickstart example\n\nWhile the CLI can be used to evaluate a fixed list of models, you can also use the Python API to evaluate your own retriever. Here is an example of how to evaluate the ColPali model on the ViDoRe benchmark. Note that your processor must implement a `process_images` and a `process_queries` methods, similarly to the ColVision processors.\n\n```python\nimport torch\nfrom colpali_engine.models import ColIdefics3, ColIdefics3Processor\nfrom datasets import load_dataset\nfrom tqdm import tqdm\n\nfrom vidore_benchmark.evaluation.vidore_evaluators import ViDoReEvaluatorQA, ViDoReEvaluatorBEIR\nfrom vidore_benchmark.retrievers import VisionRetriever\nfrom vidore_benchmark.utils.data_utils import get_datasets_from_collection\n\nmodel_name = \"vidore/colSmol-256M\"\nprocessor = ColIdefics3Processor.from_pretrained(model_name)\nmodel = ColIdefics3.from_pretrained(\n    model_name,\n    torch_dtype=torch.bfloat16,\n    device_map=\"cuda\",\n).eval()\n\n# Get retriever instance\nvision_retriever = VisionRetriever(model=model, processor=processor)\n\n# Evaluate on a single BEIR format dataset (e.g one of the ViDoRe benchmark 2 dataset)\nvidore_evaluator_beir = ViDoReEvaluatorBEIR(vision_retriever)\nds = {\n    \"corpus\" : load_dataset(\"vidore/synthetic_axa_filtered_v1.0\", name=\"corpus\", split=\"test\"),\n    \"queries\" : load_dataset(\"vidore/synthetic_axa_filtered_v1.0\", name=\"queries\", split=\"test\")\n    \"qrels\" : load_dataset(\"vidore/synthetic_axa_filtered_v1.0\", name=\"qrels\", split=\"test\")\n}\nmetrics_dataset_beir = vidore_evaluator_beir.evaluate_dataset(\n    ds=ds,\n    batch_query=4,\n    batch_passage=4,\n)\nprint(metrics_dataset_beir)\n\n# Evaluate on a single QA format dataset\nvidore_evaluator_qa = ViDoReEvaluatorQA(vision_retriever)\nds = load_dataset(\"vidore/tabfquad_test_subsampled\", split=\"test\")\nmetrics_dataset_qa = vidore_evaluator_qa.evaluate_dataset(\n    ds=ds,\n    batch_query=4,\n    batch_passage=4,\n)\nprint(metrics_dataset_qa)\n\n# Evaluate on a local directory or a HuggingFace collection\ndataset_names = get_datasets_from_collection(\"vidore/vidore-benchmark-667173f98e70a1c0fa4db00d\")\nmetrics_collection = {}\nfor dataset_name in tqdm(dataset_names, desc=\"Evaluating dataset(s)\"):\n    metrics_collection[dataset_name] = vidore_evaluator.evaluate_dataset(\n        ds=load_dataset(dataset_name, split=\"test\"),\n        batch_query=4,\n        batch_passage=4,\n    )\nprint(metrics_collection)\n```\n\n### Implement your own retriever\n\nIf you want to evaluate your own retriever to use it with the CLI, you should clone the repository and add your own class that inherits from `BaseVisionRetriever`. You can find the detailed instructions [here](https://github.com/illuin-tech/vidore-benchmark/blob/main/src/vidore_benchmark/retrievers/README.md).\n\n### Compare retrievers using the EvalManager\n\nTo easily process, visualize and compare the evaluation metrics of multiple retrievers, you can use the `EvalManager` class. Assume you have a list of previously generated JSON metric files, *e.g.*:\n\n```bash\ndata/metrics/\n├── bisiglip.json\n└── colpali.json\n```\n\nThe data is stored in `eval_manager.data` as a multi-column DataFrame with the following columns. Use the `get_df_for_metric`, `get_df_for_dataset`, and `get_df_for_model` methods to get the subset of the data you are interested in. For instance:\n\n```python\nfrom vidore_benchmark.evaluation import EvalManager\n\neval_manager = EvalManager.from_dir(\"data/metrics/\")\ndf = eval_manager.get_df_for_metric(\"ndcg_at_5\")\n```\n\n## Citation\n\n**ColPali: Efficient Document Retrieval with Vision Language Models**  \n\nAuthors: **Manuel Faysse**\\*, **Hugues Sibille**\\*, **Tony Wu**\\*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo (\\* denotes equal contribution)\n\n```latex\n@misc{faysse2024colpaliefficientdocumentretrieval,\n      title={ColPali: Efficient Document Retrieval with Vision Language Models}, \n      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},\n      year={2024},\n      eprint={2407.01449},\n      archivePrefix={arXiv},\n      primaryClass={cs.IR},\n      url={https://arxiv.org/abs/2407.01449}, \n}\n```\n\nIf you want to reproduce the results from the ColPali paper, please read the [`REPRODUCIBILITY.md`](REPRODUCIBILITY.md) file for more information.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Filluin-tech%2Fvidore-benchmark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Filluin-tech%2Fvidore-benchmark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Filluin-tech%2Fvidore-benchmark/lists"}