{"id":21839091,"url":"https://github.com/illuin-tech/colpali","last_synced_at":"2025-04-27T04:58:23.561Z","repository":{"id":245436453,"uuid":"817877504","full_name":"illuin-tech/colpali","owner":"illuin-tech","description":"The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.","archived":false,"fork":false,"pushed_at":"2025-04-26T18:26:12.000Z","size":815,"stargazers_count":1771,"open_issues_count":12,"forks_count":151,"subscribers_count":18,"default_branch":"main","last_synced_at":"2025-04-27T04:58:19.388Z","etag":null,"topics":["colpali","colqwen2","colsmol","information-retrieval","retrieval-augmented-generation","vision-language-model"],"latest_commit_sha":null,"homepage":"https://huggingface.co/vidore","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/illuin-tech.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-20T16:14:06.000Z","updated_at":"2025-04-26T16:29:33.000Z","dependencies_parsed_at":"2024-06-27T16:56:14.690Z","dependency_job_id":"7cfdd01d-fab0-4e05-b8a5-a4f3e9764f08","html_url":"https://github.com/illuin-tech/colpali","commit_stats":null,"previous_names":["manuelfay/colpali","illuin-tech/colpali"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fcolpali","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fcolpali/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fcolpali/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/illuin-tech%2Fcolpali/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/illuin-tech","download_url":"https://codeload.github.com/illuin-tech/colpali/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251089620,"owners_count":21534523,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colpali","colqwen2","colsmol","information-retrieval","retrieval-augmented-generation","vision-language-model"],"created_at":"2024-11-27T21:15:53.962Z","updated_at":"2025-04-27T04:58:23.546Z","avatar_url":"https://github.com/illuin-tech.png","language":"Python","funding_links":[],"categories":["Machine Learning Models","Multimodal RAG","5. Retrieval-Augmented Generation (RAG) \u0026 Knowledge"],"sub_categories":["Frameworks \u0026 Tools"],"readme":"# ColPali: Efficient Document Retrieval with Vision Language Models 👀\n\n[![arXiv](https://img.shields.io/badge/arXiv-2407.01449-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2407.01449)\n[![GitHub](https://img.shields.io/badge/ViDoRe_Benchmark-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/illuin-tech/vidore-benchmark)\n[![Hugging Face](https://img.shields.io/badge/Vidore_Hf_Space-FFD21E?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000)](https://huggingface.co/vidore)\n[![GitHub](https://img.shields.io/badge/Cookbooks-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/tonywu71/colpali-cookbooks)\n\n[![Test](https://github.com/illuin-tech/colpali/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/illuin-tech/colpali/actions/workflows/test.yml)\n[![Version](https://img.shields.io/pypi/v/colpali-engine?color=%2334D058\u0026label=pypi%20package)](https://pypi.org/project/colpali-engine/)\n[![Downloads](https://static.pepy.tech/badge/colpali-engine)](https://pepy.tech/project/colpali-engine)\n\n---\n\n[[Model card]](https://huggingface.co/vidore/colpali)\n[[ViDoRe Leaderboard]](https://huggingface.co/spaces/vidore/vidore-leaderboard)\n[[Demo]](https://huggingface.co/spaces/manu/ColPali-demo)\n[[Blog Post]](https://huggingface.co/blog/manu/colpali)\n\n## Associated Paper\n\nThis repository contains the code used for training the vision retrievers in the [*ColPali: Efficient Document Retrieval with Vision Language Models*](https://arxiv.org/abs/2407.01449) paper. In particular, it contains the code for training the ColPali model, which is a vision retriever based on the ColBERT architecture and the PaliGemma model.\n\n## Introduction\n\nWith our new model *ColPali*, we propose to leverage VLMs to construct efficient multi-vector embeddings in the visual space for document retrieval. By feeding the ViT output patches from PaliGemma-3B to a linear projection, we create a multi-vector representation of documents. We train the model to maximize the similarity between these document embeddings and the query embeddings, following the ColBERT method.\n\nUsing ColPali removes the need for potentially complex and brittle layout recognition and OCR pipelines with a single model that can take into account both the textual and visual content (layout, charts, ...) of a document.\n\n![ColPali Architecture](assets/colpali_architecture.webp)\n\n## List of ColVision models\n\n| Model                                                               | Score on [ViDoRe](https://huggingface.co/spaces/vidore/vidore-leaderboard) 🏆 | License    | Comments                                                                                                                                                       | Currently supported |\n|---------------------------------------------------------------------|-------------------------------------------------------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|\n| [vidore/colpali](https://huggingface.co/vidore/colpali)             | 81.3                                                                          | Gemma      | • Based on `google/paligemma-3b-mix-448`.\u003cbr /\u003e• Checkpoint used in the ColPali paper.                                                                         | ❌                   |\n| [vidore/colpali-v1.1](https://huggingface.co/vidore/colpali-v1.1)   | 81.5                                                                          | Gemma      | • Based on `google/paligemma-3b-mix-448`.\u003cbr /\u003e• Fix right padding for queries.                                                                                | ✅                   |\n| [vidore/colpali-v1.2](https://huggingface.co/vidore/colpali-v1.2)   | 83.9                                                                          | Gemma      | • Similar to `vidore/colpali-v1.1`.                                                                                                                            | ✅                   |\n| [vidore/colpali-v1.3](https://huggingface.co/vidore/colpali-v1.3)   | 84.8                                                                          | Gemma      | • Similar to `vidore/colpali-v1.2`.\u003cbr /\u003e• Trained with a larger effective batch size of 256 batch size for 3 epochs.                                          | ✅                   |\n| [vidore/colqwen2-v0.1](https://huggingface.co/vidore/colqwen2-v0.1) | 87.3                                                                          | Apache 2.0 | • Based on `Qwen/Qwen2-VL-2B-Instruct`.\u003cbr /\u003e• Supports dynamic resolution.\u003cbr /\u003e• Trained using 768 image patches per page and an effective batch size of 32. | ✅                   |\n| [vidore/colqwen2-v1.0](https://huggingface.co/vidore/colqwen2-v1.0) | 89.3                                                                          | Apache 2.0 | • Similar to `vidore/colqwen2-v0.1`, but trained with more powerful GPUs and with a larger effective batch size (256).                                         | ✅                   |\n| [vidore/colqwen2.5-v0.1](https://huggingface.co/vidore/colqwen2.5-v0.1) | 88.8                                                                          | Apache 2.0 | • Based on `Qwen/Qwen2 5-VL-3B-Instruct`\u003cbr /\u003e• Supports dynamic resolution.\u003cbr /\u003e• Trained using 768 image patches per page and an effective batch size of 32.                                         | ✅                   |\n| [vidore/colqwen2.5-v0.2](https://huggingface.co/vidore/colqwen2.5-v0.2) | 89.4                                                                          | Apache 2.0 | • Similar to `vidore/colqwen2.5-v0.1`, but trained with slightly different hyper parameters                                        | ✅                   |\n| [vidore/colSmol-256M](https://huggingface.co/vidore/colSmol-256M)   | 80.1                                                                          | Apache 2.0 | • Based on `HuggingFaceTB/SmolVLM-256M-Instruct`.                                                                                                              | ✅                   |\n| [vidore/colSmol-500M](https://huggingface.co/vidore/colSmol-500M)   | 82.3                                                                          | Apache 2.0 | • Based on `HuggingFaceTB/SmolVLM-500M-Instruct`.                                                                                                              | ✅                   |\n\n## Setup\n\nWe used Python 3.11.6 and PyTorch 2.4 to train and test our models, but the codebase is compatible with Python \u003e=3.9 and recent PyTorch versions. To install the package, run:\n\n```bash\npip install colpali-engine # from PyPi\npip install git+https://github.com/illuin-tech/colpali # from source\n```\n\nMac users using MPS with the ColQwen models have reported errors with torch 2.6.0. These errors are fixed by downgrading to torch 2.5.1.\n\n\u003e [!WARNING]\n\u003e For ColPali versions above v1.0, make sure to install the `colpali-engine` package from source or with a version above v0.2.0.\n\n## Usage\n\n### Quick start\n\n```python\nimport torch\nfrom PIL import Image\nfrom transformers.utils.import_utils import is_flash_attn_2_available\n\nfrom colpali_engine.models import ColQwen2, ColQwen2Processor\n\nmodel_name = \"vidore/colqwen2-v1.0\"\n\nmodel = ColQwen2.from_pretrained(\n    model_name,\n    torch_dtype=torch.bfloat16,\n    device_map=\"cuda:0\",  # or \"mps\" if on Apple Silicon\n    attn_implementation=\"flash_attention_2\" if is_flash_attn_2_available() else None,\n).eval()\n\nprocessor = ColQwen2Processor.from_pretrained(model_name)\n\n# Your inputs\nimages = [\n    Image.new(\"RGB\", (128, 128), color=\"white\"),\n    Image.new(\"RGB\", (64, 32), color=\"black\"),\n]\nqueries = [\n    \"What is the organizational structure for our R\u0026D department?\",\n    \"Can you provide a breakdown of last year’s financial performance?\",\n]\n\n# Process the inputs\nbatch_images = processor.process_images(images).to(model.device)\nbatch_queries = processor.process_queries(queries).to(model.device)\n\n# Forward pass\nwith torch.no_grad():\n    image_embeddings = model(**batch_images)\n    query_embeddings = model(**batch_queries)\n\nscores = processor.score_multi_vector(query_embeddings, image_embeddings)\n```\n\n### Benchmarking\n\nTo benchmark ColPali on the [ViDoRe leaderboard](https://huggingface.co/spaces/vidore/vidore-leaderboard), use the [`vidore-benchmark`](https://github.com/illuin-tech/vidore-benchmark) package.\n\n### Interpretability with similarity maps\n\nBy superimposing the late interaction similarity maps on top of the original image, we can visualize the most salient image patches with respect to each term of the query, yielding interpretable insights into model focus zones.\n\nTo use the `interpretability` module, you need to install the `colpali-engine[interpretability]` package:\n\n```bash\npip install colpali-engine[interpretability]\n```\n\nThen, after generating your embeddings with ColPali, use the following code to plot the similarity maps for each query token:\n\n```python\nimport torch\nfrom PIL import Image\n\nfrom colpali_engine.interpretability import (\n    get_similarity_maps_from_embeddings,\n    plot_all_similarity_maps,\n)\nfrom colpali_engine.models import ColPali, ColPaliProcessor\nfrom colpali_engine.utils.torch_utils import get_torch_device\n\nmodel_name = \"vidore/colpali-v1.2\"\ndevice = get_torch_device(\"auto\")\n\n# Load the model\nmodel = ColPali.from_pretrained(\n    model_name,\n    torch_dtype=torch.bfloat16,\n    device_map=device,\n).eval()\n\n# Load the processor\nprocessor = ColPaliProcessor.from_pretrained(model_name)\n\n# Load the image and query\nimage = Image.open(\"shift_kazakhstan.jpg\")\nquery = \"Quelle partie de la production pétrolière du Kazakhstan provient de champs en mer ?\"\n\n# Preprocess inputs\nbatch_images = processor.process_images([image]).to(device)\nbatch_queries = processor.process_queries([query]).to(device)\n\n# Forward passes\nwith torch.no_grad():\n    image_embeddings = model.forward(**batch_images)\n    query_embeddings = model.forward(**batch_queries)\n\n# Get the number of image patches\nn_patches = processor.get_n_patches(image_size=image.size, patch_size=model.patch_size)\n\n# Get the tensor mask to filter out the embeddings that are not related to the image\nimage_mask = processor.get_image_mask(batch_images)\n\n# Generate the similarity maps\nbatched_similarity_maps = get_similarity_maps_from_embeddings(\n    image_embeddings=image_embeddings,\n    query_embeddings=query_embeddings,\n    n_patches=n_patches,\n    image_mask=image_mask,\n)\n\n# Get the similarity map for our (only) input image\nsimilarity_maps = batched_similarity_maps[0]  # (query_length, n_patches_x, n_patches_y)\n\n# Tokenize the query\nquery_tokens = processor.tokenizer.tokenize(query)\n\n# Plot and save the similarity maps for each query token\nplots = plot_all_similarity_maps(\n    image=image,\n    query_tokens=query_tokens,\n    similarity_maps=similarity_maps,\n)\nfor idx, (fig, ax) in enumerate(plots):\n    fig.savefig(f\"similarity_map_{idx}.png\")\n```\n\nFor a more detailed example, you can refer to the interpretability notebooks from the [ColPali Cookbooks 👨🏻‍🍳](https://github.com/tonywu71/colpali-cookbooks) repository.\n\n### Token pooling\n\n[Token pooling](https://doi.org/10.48550/arXiv.2409.14683) is a CRUDE-compliant method (document addition/deletion-friendly) that aims at reducing the sequence length of multi-vector embeddings. For ColPali, many image patches share redundant information, e.g. white background patches. By pooling these patches together, we can reduce the amount of embeddings while retaining most of the page's signal. Retrieval performance with hierarchical mean token pooling on image embeddings can be found in the [ColPali paper](https://doi.org/10.48550/arXiv.2407.01449). In our experiments, we found that a pool factor of 3 offered the optimal trade-off: the total number of vectors is reduced by $66.7\\%$ while $97.8\\%$ of the original performance is maintained.\n\nTo use token pooling, you can use the `HierarchicalEmbeddingPooler` class from the `colpali-engine` package:\n\n```python\nimport torch\n\nfrom colpali_engine.compression.token_pooling import HierarchicalTokenPooler\n\n# Dummy multivector embeddings\nlist_embeddings = [\n    torch.rand(10, 768),\n    torch.rand(20, 768),\n]\n\n# Define the pooler with the desired level of compression\npooler = HierarchicalTokenPooler()\n\n# Pool the embeddings\noutputs = pooler.pool_embeddings(list_embeddings, pool_factor=2)\n```\n\nIf your inputs are padded 3D tensor embeddings instead of lists of 2D tensors, use `padding=True` and specify the padding used by your tokenizer to make sure the `HierarchicalTokenPooler` correctly removes the padding values before pooling:\n\n```python\nimport torch\nfrom PIL import Image\nfrom transformers.utils.import_utils import is_flash_attn_2_available\n\nfrom colpali_engine.compression.token_pooling import HierarchicalTokenPooler\nfrom colpali_engine.models import ColQwen2, ColQwen2Processor\n\nmodel_name = \"vidore/colqwen2-v1.0\"\nmodel = ColQwen2.from_pretrained(\n    model_name,\n    torch_dtype=torch.bfloat16,\n    device_map=\"cuda:0\",  # or \"mps\" if on Apple Silicon\n    attn_implementation=\"flash_attention_2\" if is_flash_attn_2_available() else None,\n).eval()\nprocessor = ColQwen2Processor.from_pretrained(model_name)\n\ntoken_pooler = HierarchicalTokenPooler()\n\n# Your page images\nimages = [\n    Image.new(\"RGB\", (128, 128), color=\"white\"),\n    Image.new(\"RGB\", (32, 32), color=\"black\"),\n]\n\n# Process the inputs\nbatch_images = processor.process_images(images).to(model.device)\n\n# Forward pass\nwith torch.no_grad():\n    image_embeddings = model(**batch_images)\n\n# Apply token pooling (reduces the sequence length of the multi-vector embeddings)\nimage_embeddings = token_pooler.pool_embeddings(\n    image_embeddings,\n    pool_factor=2,\n    padding=True,\n    padding_side=processor.tokenizer.padding_side,\n)\n```\n\n\nAlternatively, you can use the `LambdaTokenPooler` to define your own custom pooling function:\n\n```python\nimport torch\nfrom typing import Dict, Tuple\n\nfrom colpali_engine.compression.token_pooling import LambdaTokenPooler\n\ndef custom_pooling(embedding: torch.Tensor) -\u003e torch.Tensor:\n    \"\"\"\n    Custom pooling function that reduces sequence length by half.\n    \"\"\"\n\n    token_length = embedding.size(0)\n    # Resize to half the original length by averaging pairs of tokens\n    half_length = token_length // 2 + (token_length % 2)\n    pooled_embeddings = torch.zeros((half_length, embedding.size(1)), dtype=embedding.dtype, device=embedding.device)\n    \n    for i in range(half_length):\n        start_idx = i * 2\n        end_idx = min(start_idx + 2, token_length)\n        cluster_indices = torch.arange(start_idx, end_idx)\n        pooled_embeddings[i] = embedding[cluster_indices].mean(dim=0)\n        pooled_embeddings[i] = torch.nn.functional.normalize(pooled_embeddings[i], p=2, dim=-1)\n        \n    return pooled_embeddings\n\npooler = LambdaTokenPooler(pool_func=custom_pooling)\n\n# Dummy multivector embeddings\nlist_embeddings = [\n    torch.rand(10, 768),\n    torch.rand(20, 768),\n]\n\n# Pool the embeddings\noutputs = pooler.pool_embeddings(list_embeddings)\n```\n\nThe custom pooling function should take a 2D tensor (token_length, embedding_dim) as input and return a tensor of shape (num_clusters, embedding_dim) representing the pooled embeddings.\n\n### Training\n\nTo keep a lightweight repository, only the essential packages were installed. In particular, you must specify the dependencies to use the training script for ColPali. You can do this using the following command:\n\n```bash\npip install \"colpali-engine[train]\"\n```\n\nAll the model configs used can be found in `scripts/configs/` and rely on the [configue](https://github.com/illuin-tech/configue) package for straightforward configuration. They should be used with the `train_colbert.py` script.\n\n#### Example 1: Local training\n\n```bash\nUSE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml\n```\n\nor using `accelerate`:\n\n```bash\naccelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml\n```\n\n#### Example 2: Training on a SLURM cluster\n\n```bash\nsbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1  -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap=\"accelerate launch scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml\"\n\nsbatch --nodes=1  --time=5:00:00 -A cad15443 --gres=gpu:8  --constraint=MI250 --job-name=colpali --wrap=\"python scripts/train/train_colbert.py scripts/configs/pali/train_colpali_docmatix_hardneg_model.yaml\"\n```\n\n## Contributing\n\nWe welcome contributions to ColPali! 🤗\n\nTo contribute to ColPali, first install the development dependencies for proper testing/linting:\n\n```bash\npip install \"colpali-engine[dev]\"\n```\n\nTo run all the tests, you will have to install all optional dependencies (or you'll get an error in test discovery):\n\n```bash\npip install \"colpali-engine[all]\"\n```\n\nWhen your PR is ready, ping one of the repository maintainers. We will do our best to review it as soon as possible!\n\n## Community Projects\n\nSeveral community projects and ressources have been developed around ColPali to facilitate its usage. Feel free to reach out if you want to add your project to this list!\n\n### Libraries 📚\n\n| Library Name  | Description                                                                                                                                                                                                                                          |\n|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  |\n| Byaldi        | [`Byaldi`](https://github.com/AnswerDotAI/byaldi) is [RAGatouille](https://github.com/AnswerDotAI/RAGatouille)'s equivalent for ColPali, leveraging the `colpali-engine` package to facilitate indexing and storing embeddings.                      |\n| PyVespa       | [`PyVespa`](https://pyvespa.readthedocs.io/en/latest/examples/colpali-document-retrieval-vision-language-models-cloud.html) allows interaction with [Vespa](https://vespa.ai/), a production-grade vector database, with detailed ColPali support.   |\n| Qdrant | Tutorial about using ColQwen2 with the [Qdrant](https://qdrant.tech/documentation/advanced-tutorials/pdf-retrieval-at-scale/) vector database. |\n| Elastic Search     | Tutorial about using ColPali with the [Elastic Search](https://www.elastic.co/search-labs/blog/elastiacsearch-colpali-document-search) vector database. |\n| Weaviate | Tutorial about using multi-vector embeddings with the [Weaviate](https://weaviate.io/developers/weaviate/tutorials/multi-vector-embeddings) vector database. |\n| Candle        | [Candle](https://github.com/huggingface/candle/tree/main/candle-examples/examples/colpali) enables ColPali inference with an efficient ML framework for Rust.                                                                                        |\n| EmbedAnything | [`EmbedAnything`](https://github.com/StarlightSearch/EmbedAnything) Allows end-to-end ColPali inference with both Candle and ONNX backend.                                                                                                           |\n| DocAI         | [DocAI](https://github.com/PragmaticMachineLearning/docai) uses ColPali with GPT-4o and Langchain to extract structured information from documents.                                                                                                  |\n| VARAG         | [VARAG](https://github.com/adithya-s-k/VARAG) uses ColPali in a vision-only and a hybrid RAG pipeline.                                                                                                                                               |\n| ColBERT Live! | [`ColBERT Live!`](https://github.com/jbellis/colbert-live/) enables ColPali usage with vector databases supporting large datasets, compression, and non-vector predicates.                                                                           |\n| ColiVara      | [`ColiVara`](https://github.com/tjmlabs/ColiVara/) is retrieval API that allows you to store, search, and retrieve documents based on their visual embedding. It is a web-first implementation of the ColPali paper using ColQwen2 as the LLM model. |\n| BentoML       | Deploy ColPali easily with BentoML using [this example repository](https://github.com/bentoml/BentoColPali). BentoML features adaptive batching and zero-copy I/O to minimize overhead.                                                              |\n| NoOCR       | NoOCR is end-to-end, [open source](https://github.com/kyryl-opens-ml/no-ocr) solution for complex PDFs, powered by ColPali embeddings. |\n| Astra Multi-vector     | [`Astra-multivector`](https://github.com/brian-ogrady/astradb-multivector) provides enterprise-grade integration with AstraDB for late-interaction models like ColPali, ColQwen2, and ColBERT. It implements efficient token pooling and embedding caching strategies to dramatically reduce latency and index size while maintaining retrieval quality. The library leverages Cassandra's distributed architecture for high-throughput vector search at scale. |\n\n### Notebooks 📙\n\n| Notebook Title                                               | Author \u0026 Link                                                |\n| ------------------------------------------------------------ | ------------------------------------------------------------ |\n| ColPali Cookbooks                                            | [Tony's Cookbooks (ILLUIN)](https://github.com/tonywu71/colpali-cookbooks) 🙋🏻 |\n| Vision RAG Tutorial                                          | [Manu's Vision Rag Tutorial (ILLUIN)](https://github.com/ManuelFay/Tutorials/blob/main/Tuesday_Practical_2_Vision_RAG.ipynb) 🙋🏻 |\n| ColPali (Byaldi) + Qwen2-VL for RAG                          | [Merve's Notebook (HuggingFace 🤗)](https://github.com/merveenoyan/smol-vision/blob/main/ColPali_%2B_Qwen2_VL.ipynb) |\n| Indexing ColPali with Qdrant                                 | [Daniel's Notebook (HuggingFace 🤗)](https://danielvanstrien.xyz/posts/post-with-code/colpali-qdrant/2024-10-02_using_colpali_with_qdrant.html) |\n| Weaviate Tutorial                                            | [Connor's ColPali POC (Weaviate)](https://github.com/weaviate/recipes/blob/main/weaviate-features/named-vectors/NamedVectors-ColPali-POC.ipynb) |\n| Use ColPali for Multi-Modal Retrieval with Milvus            | [Milvus Documentation](https://milvus.io/docs/use_ColPali_with_milvus.md) |\n| Data Generation                                              | [Daniel's Notebook (HuggingFace 🤗)](https://danielvanstrien.xyz/posts/post-with-code/colpali/2024-09-23-generate_colpali_dataset.html) |\n| Finance Report Analysis with ColPali and Gemini              | [Jaykumaran (LearnOpenCV)](https://github.com/spmallick/learnopencv/tree/master/Multimodal-RAG-with-ColPali-Gemini) |\n| Multimodal Retrieval-Augmented Generation (RAG) with Document Retrieval (ColPali) and Vision Language Models (VLMs) | [Sergio Paniego](https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_vlms) |\n| Document Similarity Search with ColPali                      | [Frank Sommers](https://colab.research.google.com/github/fsommers/documentai/blob/main/Document_Similarity_with_ColPali_0_2_2_version.ipynb) |\n| End-to-end ColPali inference with EmbedAnything              | [Akshay Ballal (EmbedAnything)](https://colab.research.google.com/drive/1-Eiaw8wMm8I1n69N1uKOHkmpw3yV22w8?usp=sharing) |\n| ColiVara: A ColPali Retrieval API                            | [A simple RAG Example](https://github.com/tjmlabs/ColiVara-docs/blob/main/cookbook/RAG.ipynb) |\n| Multimodal RAG with Document Retrieval (ColPali), Vision Language Model (ColQwen2) and Amazon Nova | [Suman's Notebook (AWS)](https://github.com/debnsuma/fcc-ai-engineering-aws/blob/main/05-multimodal-rag-with-colpali/01-multimodal-retrival-with-colpali-retreve-gen.ipynb) |\n| Multi-vector RAG: Using Weaviate to search a collection of PDF documents | [Weaviate's Notebook](https://github.com/weaviate/recipes/blob/main/weaviate-features/multi-vector/multi-vector-colipali-rag.ipynb) |\n\n### Other resources\n\n- 📝 = blog post\n- 📋 = PDF / slides\n- 📹 = video\n\n| Title                                                                                    | Author \u0026 Link                                                                                                                                                 |\n|------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| State of AI report 2024                                                                  | [Nathan's report](https://www.stateof.ai/) 📋                                                                                                                 |\n| Technology Radar Volume 31 (October 2024)                                                | [thoughtworks's report](https://www.thoughtworks.com/radar) 📋                                                                                                |\n| LlamaIndex Webinar: ColPali - Efficient Document Retrieval with Vision Language Models   | [LlamaIndex's Youtube video](https://youtu.be/nzcBvba7mzI?si=WL9MsyiAFJMyEolz) 📹                                                                             |\n| PDF Retrieval with Vision Language Models                                                | [Jo's blog post #1 (Vespa)](https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/) 📝                                                          |\n| Scaling ColPali to billions of PDFs with Vespa                                           | [Jo's blog post #2 (Vespa)](https://blog.vespa.ai/scaling-colpali-to-billions/) 📝                                                                            |\n| Neural Search Talks: ColPali (with Manuel Faysse)                                        | [Zeta Alpha's Podcast](https://open.spotify.com/episode/2s6ljhd6VQTL2mIU9cFzCb) 📹                                                                            |\n| Multimodal Document RAG with Llama 3.2 Vision and ColQwen2                               | [Zain's blog post (Together AI)](https://www.together.ai/blog/multimodal-document-rag-with-llama-3-2-vision-and-colqwen2) 📝                                  |\n| ColPali: Document Retrieval with Vision Language Models                                  | [Antaripa Saha](https://antaripasaha.notion.site/ColPali-Efficient-Document-Retrieval-with-Vision-Language-Models-10f5314a5639803d94d0d7ac191bb5b1) 📝        |\n| Minimalist diagrams explaining ColPali                                                   | [Leonie's ColPali diagrams on X ](https://twitter.com/helloiamleonie/status/1839321865195851859)📝                                                            |\n| Multimodal RAG with ColPali and Gemini : Financial Report Analysis Application           | [Jaykumaran's blog post (LearnOpenCV)](https://learnopencv.com/multimodal-rag-with-colpali/) 📝                                                               |\n| Implement Multimodal RAG with ColPali and Vision Language Model Groq(Llava) and Qwen2-VL | [Plaban's blog post](https://medium.com/the-ai-forum/implement-multimodal-rag-with-colpali-and-vision-language-model-groq-llava-and-qwen2-vl-5c113b8c08fd) 📝 |\n| multimodal AI. open-source. in a nutshell.                                               | [Merve's Youtube video](https://youtu.be/IoGaGfU1CIg?si=yEhxMqJYxvMzGyUm) 📹                                                                                  |\n| Remove Complexity from Your RAG Applications                                             | [Kyryl's blog post (KOML)](https://kyrylai.com/2024/09/09/remove-complexity-from-your-rag-applications/) 📝                                                   |\n| Late interaction \u0026 efficient Multi-modal retrievers need more than a vector index        | [Ayush Chaurasia (LanceDB)](https://blog.lancedb.com/late-interaction-efficient-multi-modal-retrievers-need-more-than-just-a-vector-index/) 📝                |\n| Optimizing Document Retrieval with ColPali and Qdrant's Binary Quantization              | [Sabrina Aquino (Qdrant)]( https://youtu.be/_A90A-grwIc?si=MS5RV17D6sgirCRm)  📹                                                                              |\n| Hands-On Multimodal Retrieval and Interpretability (ColQwen + Vespa)                     | [Antaripa Saha](https://www.analyticsvidhya.com/blog/2024/10/multimodal-retrieval-with-colqwen-vespa/) 📝                                                     |\n\n## Paper result reproduction\n\nTo reproduce the results from the paper, you should checkout to the `v0.1.1` tag or install the corresponding `colpali-engine` package release using:\n\n```bash\npip install colpali-engine==0.1.1\n```\n\n## Citation\n\n**ColPali: Efficient Document Retrieval with Vision Language Models**  \n\nAuthors: **Manuel Faysse**\\*, **Hugues Sibille**\\*, **Tony Wu**\\*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo (\\* denotes equal contribution)\n\n```latex\n@misc{faysse2024colpaliefficientdocumentretrieval,\n      title={ColPali: Efficient Document Retrieval with Vision Language Models}, \n      author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},\n      year={2024},\n      eprint={2407.01449},\n      archivePrefix={arXiv},\n      primaryClass={cs.IR},\n      url={https://arxiv.org/abs/2407.01449}, \n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Filluin-tech%2Fcolpali","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Filluin-tech%2Fcolpali","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Filluin-tech%2Fcolpali/lists"}