{"id":30523914,"url":"https://github.com/lightonai/pylate","last_synced_at":"2026-02-25T10:17:23.303Z","repository":{"id":254447884,"uuid":"808180334","full_name":"lightonai/pylate","owner":"lightonai","description":"Late Interaction Models Training \u0026 Retrieval","archived":false,"fork":false,"pushed_at":"2025-08-07T07:29:18.000Z","size":2755,"stargazers_count":528,"open_issues_count":20,"forks_count":42,"subscribers_count":9,"default_branch":"main","last_synced_at":"2025-08-19T06:22:35.843Z","etag":null,"topics":["colbert","information-retrieval","language-model","rag"],"latest_commit_sha":null,"homepage":"https://lightonai.github.io/pylate/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lightonai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-05-30T14:44:30.000Z","updated_at":"2025-08-15T23:16:44.000Z","dependencies_parsed_at":"2024-08-26T14:09:20.022Z","dependency_job_id":"a644f3b4-4896-4e17-af5a-edfc446b4685","html_url":"https://github.com/lightonai/pylate","commit_stats":null,"previous_names":["lightonai/pylate","lightonai/giga-cherche"],"tags_count":7,"template":false,"template_full_name":null,"purl":"pkg:github/lightonai/pylate","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lightonai","download_url":"https://codeload.github.com/lightonai/pylate/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272254497,"owners_count":24901055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colbert","information-retrieval","language-model","rag"],"created_at":"2025-08-26T20:51:57.273Z","updated_at":"2026-02-25T10:17:23.296Z","avatar_url":"https://github.com/lightonai.png","language":"Python","funding_links":[],"categories":["Python","Large Language Models","SDKs \u0026 Libraries"],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n  \u003ch1\u003ePyLate\u003c/h1\u003e\n  \u003cp\u003eFlexible Training and Retrieval for Late Interaction Models\u003c/p\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg width=500 src=\"https://raw.githubusercontent.com/lightonai/pylate/refs/heads/main/docs/img/logo.png\"/\u003e\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n  \u003c!-- Documentation --\u003e\n  \u003ca href=\"https://lightonai.github.io/pylate/\"\u003e\u003cimg src=\"https://img.shields.io/badge/Documentation-purple.svg?style=flat-square\" alt=\"documentation\"\u003e\u003c/a\u003e\n  \u003c!-- License --\u003e\n  \u003ca href=\"https://opensource.org/licenses/MIT\"\u003e\u003cimg src=\"https://img.shields.io/badge/License-MIT-blue.svg?style=flat-square\" alt=\"license\"\u003e\u003c/a\u003e\n\u003c/div\u003e\n\n\u0026nbsp;\n\n\u003cp align=\"justify\"\u003e\nPyLate is a library built on top of Sentence Transformers, designed to simplify and optimize fine-tuning, inference, and retrieval with state-of-the-art ColBERT models. It enables easy fine-tuning on both single and multiple GPUs, providing flexibility for various hardware setups. PyLate also streamlines document retrieval and allows you to load a wide range of models, enabling you to construct ColBERT models from most pre-trained language models.\n\u003c/p\u003e\n\n\u0026nbsp;\n\n## Installation\n\nYou can install PyLate using pip:\n\n```bash\npip install pylate\n```\n\nFor evaluation dependencies, use:\n\n```bash\npip install \"pylate[eval]\"\n```\n\n## Documentation\n\nThe complete documentation is available [here](https://lightonai.github.io/pylate/), which includes in-depth guides, examples, and API references.\n\n\u0026nbsp;\n\n## Training\n\n### Contrastive Training\n\nHere’s a simple example of training a ColBERT model on the MS MARCO dataset triplet dataset using PyLate. This script demonstrates training with contrastive loss and evaluating the model on a held-out eval set:\n\n```python\nimport torch\nfrom datasets import load_dataset\nfrom sentence_transformers import (\n    SentenceTransformerTrainer,\n    SentenceTransformerTrainingArguments,\n)\n\nfrom pylate import evaluation, losses, models, utils\n\n# Define model parameters for contrastive training\nmodel_name = \"bert-base-uncased\"  # Choose the pre-trained model you want to use as base\nbatch_size = 32  # Larger batch size often improves results, but requires more memory\n\nnum_train_epochs = 1  # Adjust based on your requirements\n# Set the run name for logging and output directory\nrun_name = \"contrastive-bert-base-uncased\"\noutput_dir = f\"output/{run_name}\"\n\n# 1. Here we define our ColBERT model. If not a ColBERT model, will add a linear layer to the base encoder.\nmodel = models.ColBERT(model_name_or_path=model_name)\n\n# Compiling the model makes the training faster\nmodel = torch.compile(model)\n\n# Load dataset\ndataset = load_dataset(\"sentence-transformers/msmarco-bm25\", \"triplet\", split=\"train\")\n# Split the dataset (this dataset does not have a validation set, so we split the training set)\nsplits = dataset.train_test_split(test_size=0.01)\ntrain_dataset = splits[\"train\"]\neval_dataset = splits[\"test\"]\n\n# Define the loss function\ntrain_loss = losses.Contrastive(model=model)\n\n# Initialize the evaluator\ndev_evaluator = evaluation.ColBERTTripletEvaluator(\n    anchors=eval_dataset[\"query\"],\n    positives=eval_dataset[\"positive\"],\n    negatives=eval_dataset[\"negative\"],\n)\n\n# Configure the training arguments (e.g., batch size, evaluation strategy, logging steps)\nargs = SentenceTransformerTrainingArguments(\n    output_dir=output_dir,\n    num_train_epochs=num_train_epochs,\n    per_device_train_batch_size=batch_size,\n    per_device_eval_batch_size=batch_size,\n    fp16=True,  # Set to False if you get an error that your GPU can't run on FP16\n    bf16=False,  # Set to True if you have a GPU that supports BF16\n    run_name=run_name,  # Will be used in W\u0026B if `wandb` is installed\n    learning_rate=3e-6,\n)\n\n# Initialize the trainer for the contrastive training\ntrainer = SentenceTransformerTrainer(\n    model=model,\n    args=args,\n    train_dataset=train_dataset,\n    eval_dataset=eval_dataset,\n    loss=train_loss,\n    evaluator=dev_evaluator,\n    data_collator=utils.ColBERTCollator(model.tokenize),\n)\n# Start the training process\ntrainer.train()\n```\n\nAfter training, the model can be loaded using the output directory path:\n\n```python\nfrom pylate import models\n\nmodel = models.ColBERT(model_name_or_path=\"contrastive-bert-base-uncased\")\n```\n\nPlease note that temperature parameter has a [very high importance in contrastive learning](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_Understanding_the_Behaviour_of_Contrastive_Loss_CVPR_2021_paper.pdf), and a temperature around 0.02 is often used in the literature:\n\n```python\ntrain_loss = losses.Contrastive(model=model, temperature=0.02)\n```\n\nAs contrastive learning is not compatible with gradient accumulation, you can leverage [GradCache](https://arxiv.org/abs/2101.06983) to emulate bigger batch sizes without requiring more memory by using the `CachedContrastiveLoss` to define a mini_batch_size while increasing the `per_device_train_batch_size`:\n\n```python\ntrain_loss = losses.CachedContrastive(\n        model=model, mini_batch_size=mini_batch_size\n)\n```\n\nFinally, if you are in a multi-GPU setting, you can gather all the elements from the different GPUs to create even bigger batch sizes by setting `gather_across_devices` to `True` (for both `Contrastive` and `CachedContrastive` losses):\n\n```python\ntrain_loss = losses.Contrastive(model=model, gather_across_devices=True)\n```\n\n\u0026nbsp;\n\n### Knowledge Distillation\n\nTo get the best performance when training a ColBERT model, you should use knowledge distillation to train the model using the scores of a strong teacher model.\nHere's a simple example of how to train a model using knowledge distillation in PyLate on MS MARCO:\n\n```python\nimport torch\nfrom datasets import load_dataset\nfrom sentence_transformers import (\n    SentenceTransformerTrainer,\n    SentenceTransformerTrainingArguments,\n)\n\nfrom pylate import losses, models, utils\n\n# Load the datasets required for knowledge distillation (train, queries, documents)\ntrain = load_dataset(\n    path=\"lightonai/ms-marco-en-bge\",\n    name=\"train\",\n)\n\nqueries = load_dataset(\n    path=\"lightonai/ms-marco-en-bge\",\n    name=\"queries\",\n)\n\ndocuments = load_dataset(\n    path=\"lightonai/ms-marco-en-bge\",\n    name=\"documents\",\n)\n\n# Set the transformation to load the documents/queries texts using the corresponding ids on the fly\ntrain.set_transform(\n    utils.KDProcessing(queries=queries, documents=documents).transform,\n)\n\n# Define the base model, training parameters, and output directory\nmodel_name = \"bert-base-uncased\"  # Choose the pre-trained model you want to use as base\nbatch_size = 16\nnum_train_epochs = 1\n# Set the run name for logging and output directory\nrun_name = \"knowledge-distillation-bert-base\"\noutput_dir = f\"output/{run_name}\"\n\n# Initialize the ColBERT model from the base model\nmodel = models.ColBERT(model_name_or_path=model_name)\n\n# Compiling the model to make the training faster\nmodel = torch.compile(model)\n\n# Configure the training arguments (e.g., epochs, batch size, learning rate)\nargs = SentenceTransformerTrainingArguments(\n    output_dir=output_dir,\n    num_train_epochs=num_train_epochs,\n    per_device_train_batch_size=batch_size,\n    fp16=True,  # Set to False if you get an error that your GPU can't run on FP16\n    bf16=False,  # Set to True if you have a GPU that supports BF16\n    run_name=run_name,\n    learning_rate=1e-5,\n)\n\n# Use the Distillation loss function for training\ntrain_loss = losses.Distillation(model=model)\n\n# Initialize the trainer\ntrainer = SentenceTransformerTrainer(\n    model=model,\n    args=args,\n    train_dataset=train,\n    loss=train_loss,\n    data_collator=utils.ColBERTCollator(tokenize_fn=model.tokenize),\n)\n\n# Start the training process\ntrainer.train()\n```\n\n#### NanoBEIR evaluator\n\nIf you are training an English retrieval model, you can use [NanoBEIR evaluator](https://huggingface.co/collections/zeta-alpha-ai/nanobeir-66e1a0af21dfd93e620cd9f6), which allows to run small version of BEIR to get quick validation results.\n\n```python\nevaluator=evaluation.NanoBEIREvaluator(),\n```\n\n\u0026nbsp;\n\n## Datasets\n\nPyLate supports Hugging Face [Datasets](https://huggingface.co/docs/datasets/en/index), enabling seamless triplet / knowledge distillation based training. For contrastive training, you can use any of the existing sentence transformers triplet datasets. Below is an example of creating a custom triplet dataset for training:\n\n```python\nfrom datasets import Dataset\n\ndataset = [\n    {\n        \"query\": \"example query 1\",\n        \"positive\": \"example positive document 1\",\n        \"negative\": \"example negative document 1\",\n    },\n    {\n        \"query\": \"example query 2\",\n        \"positive\": \"example positive document 2\",\n        \"negative\": \"example negative document 2\",\n    },\n    {\n        \"query\": \"example query 3\",\n        \"positive\": \"example positive document 3\",\n        \"negative\": \"example negative document 3\",\n    },\n]\n\ndataset = Dataset.from_list(mapping=dataset)\n\ntrain_dataset, test_dataset = dataset.train_test_split(test_size=0.3)\n```\n\nNote that PyLate supports more than one negative per query, simply add the additional negatives after the first one in the row.\n\n```python\n{\n        \"query\": \"example query 1\",\n        \"positive\": \"example positive document 1\",\n        \"negative_1\": \"example negative document 1\",\n        \"negative_2\": \"example negative document 2\",\n}\n```\n\nTo create a knowledge distillation dataset, you can use the following snippet:\n\n```python\nfrom datasets import Dataset\n\ndataset = [\n    {\n        \"query_id\": 54528,\n        \"document_ids\": [\n            6862419,\n            335116,\n            339186,\n        ],\n        \"scores\": [\n            0.4546215673141326,\n            0.6575686537173476,\n            0.26825184192900203,\n        ],\n    },\n    {\n        \"query_id\": 749480,\n        \"document_ids\": [\n            6862419,\n            335116,\n            339186,\n        ],\n        \"scores\": [\n            0.2546215673141326,\n            0.7575686537173476,\n            0.96825184192900203,\n        ],\n    },\n]\n\n\ndataset = Dataset.from_list(mapping=dataset)\n\ndocuments = [\n    {\"document_id\": 6862419, \"text\": \"example doc 1\"},\n    {\"document_id\": 335116, \"text\": \"example doc 2\"},\n    {\"document_id\": 339186, \"text\": \"example doc 3\"},\n]\n\nqueries = [\n    {\"query_id\": 749480, \"text\": \"example query\"},\n]\n\ndocuments = Dataset.from_list(mapping=documents)\n\nqueries = Dataset.from_list(mapping=queries)\n```\n\n\u0026nbsp;\n\n## Retrieval\n\nPyLate provides an efficient index with [FastPLAID](https://github.com/lightonai/fast-plaid). Simply load a ColBERT model and initialize the index to perform retrieval.\n\n```python\nfrom pylate import indexes, models, retrieve\n\nmodel = models.ColBERT(\n    model_name_or_path=\"lightonai/GTE-ModernColBERT-v1\",\n)\n\nindex = indexes.PLAID(\n    index_folder=\"pylate-index\",\n    index_name=\"index\",\n    override=True,\n)\n\nretriever = retrieve.ColBERT(index=index)\n```\n\nOnce the model and index are set up, we can add documents to the index using their embeddings and corresponding ids:\n\n```python\ndocuments_ids = [\"1\", \"2\", \"3\"]\n\ndocuments = [\n    \"ColBERT’s late-interaction keeps token-level embeddings to deliver cross-encoder-quality ranking at near-bi-encoder speed, enabling fine-grained relevance, robustness across domains, and hardware-friendly scalable search.\",\n\n    \"PLAID compresses ColBERT token vectors via product quantization to shrink storage by 10×, uses two-stage centroid scoring for sub-200 ms latency, and plugs directly into existing ColBERT pipelines.\",\n\n    \"PyLate is a library built on top of Sentence Transformers, designed to simplify and optimize fine-tuning, inference, and retrieval with state-of-the-art ColBERT models. It enables easy fine-tuning on both single and multiple GPUs, providing flexibility for various hardware setups. PyLate also streamlines document retrieval and allows you to load a wide range of models, enabling you to construct ColBERT models from most pre-trained language models.\",\n]\n\n# Encode the documents\ndocuments_embeddings = model.encode(\n    documents,\n    batch_size=32,\n    is_query=False, # Encoding documents\n    show_progress_bar=True,\n)\n\n# Add the documents ids and embeddings to the PLAID index\nindex.add_documents(\n    documents_ids=documents_ids,\n    documents_embeddings=documents_embeddings,\n)\n```\n\nThen we can retrieve the top-k documents for a given set of queries:\n\n```python\nqueries_embeddings = model.encode(\n    [\"query for document 3\", \"query for document 1\"],\n    batch_size=32,\n    is_query=True, # Encoding queries\n    show_progress_bar=True,\n)\n\nscores = retriever.retrieve(\n    queries_embeddings=queries_embeddings,\n    k=10,\n)\n\nprint(scores)\n```\n\nSample Output:\n\n```python\n[\n    [\n        {\"id\": \"3\", \"score\": 11.266985893249512},\n        {\"id\": \"1\", \"score\": 10.303335189819336},\n        {\"id\": \"2\", \"score\": 9.502392768859863},\n    ],\n    [\n        {\"id\": \"1\", \"score\": 10.88800048828125},\n        {\"id\": \"3\", \"score\": 9.950843811035156},\n        {\"id\": \"2\", \"score\": 9.602447509765625},\n    ],\n]\n```\n\n\u0026nbsp;\n\n## Reranking\n\nIf you want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use `rank.rerank` function which takes the queries and documents embeddings along with the documents ids to rerank them:\n\n```python\nfrom pylate import rank\n\nqueries = [\n    \"query A\",\n    \"query B\",\n]\n\ndocuments = [\n    [\"document A\", \"document B\"],\n    [\"document 1\", \"document C\", \"document B\"],\n]\n\ndocuments_ids = [\n    [1, 2],\n    [1, 3, 2],\n]\n\nqueries_embeddings = model.encode(\n    queries,\n    is_query=True,\n)\n\ndocuments_embeddings = model.encode(\n    documents,\n    is_query=False,\n)\n\nreranked_documents = rank.rerank(\n    documents_ids=documents_ids,\n    queries_embeddings=queries_embeddings,\n    documents_embeddings=documents_embeddings,\n)\n```\n\n\u0026nbsp;\n\n## Contributing\n\nWe welcome contributions! To get started:\n\n1. Install the development dependencies:\n\n```bash\npip install \"pylate[dev]\"\n```\n\n2. Run tests:\n\n```bash\nmake test\n```\n\n3. Format code with Ruff:\n\n```bash\nmake lint\n```\n\n## Citation\n\nYou can refer to the library with this BibTeX:\n\n```bibtex\n@inproceedings{DBLP:conf/cikm/ChaffinS25,\n  author       = {Antoine Chaffin and\n                  Rapha{\\\"{e}}l Sourty},\n  editor       = {Meeyoung Cha and\n                  Chanyoung Park and\n                  Noseong Park and\n                  Carl Yang and\n                  Senjuti Basu Roy and\n                  Jessie Li and\n                  Jaap Kamps and\n                  Kijung Shin and\n                  Bryan Hooi and\n                  Lifang He},\n  title        = {PyLate: Flexible Training and Retrieval for Late Interaction Models},\n  booktitle    = {Proceedings of the 34th {ACM} International Conference on Information\n                  and Knowledge Management, {CIKM} 2025, Seoul, Republic of Korea, November\n                  10-14, 2025},\n  pages        = {6334--6339},\n  publisher    = {{ACM}},\n  year         = {2025},\n  url          = {https://github.com/lightonai/pylate},\n  doi          = {10.1145/3746252.3761608},\n}\n```\n\n## DeepWiki\n\nPyLate is indexed on [DeepWiki](https://deepwiki.com/lightonai/pylate) so you can ask questions to LLMs using Deep Research to explore the codebase and get help to add new features.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightonai%2Fpylate","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flightonai%2Fpylate","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightonai%2Fpylate/lists"}