{"id":50189747,"url":"https://github.com/hcompai/late-interaction-kernels","last_synced_at":"2026-05-25T12:04:58.275Z","repository":{"id":359577790,"uuid":"1216378070","full_name":"hcompai/late-interaction-kernels","owner":"hcompai","description":"Fused Triton kernels for late-interaction (MaxSim) scoring — ColBERT, ColPali, ModernColBERT","archived":false,"fork":false,"pushed_at":"2026-05-22T17:39:17.000Z","size":1134,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T18:51:55.946Z","etag":null,"topics":["colbert","colpali","information-retrieval","kernel","late-interaction","maxsim","pylate","triton"],"latest_commit_sha":null,"homepage":"https://hcompai.github.io/late-interaction-kernels/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hcompai.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":"docs/supported_models.md","governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-04-20T21:02:03.000Z","updated_at":"2026-05-22T17:25:13.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/hcompai/late-interaction-kernels","commit_stats":null,"previous_names":["hcompai/late-interaction-kernels"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/hcompai/late-interaction-kernels","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcompai%2Flate-interaction-kernels","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcompai%2Flate-interaction-kernels/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcompai%2Flate-interaction-kernels/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcompai%2Flate-interaction-kernels/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hcompai","download_url":"https://codeload.github.com/hcompai/late-interaction-kernels/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hcompai%2Flate-interaction-kernels/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33473729,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-25T06:32:55.349Z","status":"ssl_error","status_checked_at":"2026-05-25T06:32:35.322Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colbert","colpali","information-retrieval","kernel","late-interaction","maxsim","pylate","triton"],"created_at":"2026-05-25T12:04:57.508Z","updated_at":"2026-05-25T12:04:58.269Z","avatar_url":"https://github.com/hcompai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n\n# late-interaction-kernels\n\n\u003cimg src=\"assets/banner.webp\" alt=\"late-interaction-kernels banner\" /\u003e\n\n[![ColBERT](https://img.shields.io/badge/ColBERT-2004.12832-b31b1b.svg?style=for-the-badge)](https://arxiv.org/abs/2004.12832)\n[![PyLate](https://img.shields.io/badge/PyLate-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/lightonai/pylate)\n[![colpali-engine](https://img.shields.io/badge/colpali--engine-100000?style=for-the-badge\u0026logo=github\u0026logoColor=white)](https://github.com/illuin-tech/colpali)\n[![Hugging Face](https://img.shields.io/badge/Hcompany-FFD21E?style=for-the-badge\u0026logo=huggingface\u0026logoColor=000)](https://huggingface.co/Hcompany)\n\n[![CI](https://github.com/hcompai/late-interaction-kernels/actions/workflows/ci.yml/badge.svg?branch=main)](https://github.com/hcompai/late-interaction-kernels/actions/workflows/ci.yml)\n[![Version](https://img.shields.io/pypi/v/late-interaction-kernels?color=%2334D058\u0026label=pypi%20package)](https://pypi.org/project/late-interaction-kernels/)\n[![Downloads](https://static.pepy.tech/badge/late-interaction-kernels)](https://pepy.tech/project/late-interaction-kernels)\n\n---\n\n[[How it works]](https://hcompai.github.io/late-interaction-kernels/how-it-works.html)\n[[Kernel picker]](https://hcompai.github.io/late-interaction-kernels/choose-a-kernel.html)\n[[Benchmarks]](docs/benchmarks.md)\n[[Design]](docs/design.md)\n[[Supported models]](docs/supported_models.md)\n[[Changelog]](CHANGELOG.md)\n\n\u003c/div\u003e\n\n\u003e [!NOTE]\n\u003e Full algorithmic walkthrough, animations and benchmark plots live on the docs site: **[hcompai.github.io/late-interaction-kernels](https://hcompai.github.io/late-interaction-kernels/how-it-works.html)**.\n\n## Introduction\n\n`late-interaction-kernels` provides fused Triton kernels for **MaxSim**, the late-interaction scoring used by ColBERT, ColPali, ModernColBERT, LateOn and ColBERTv2. The kernels are numerically identical to plain PyTorch and come with three APIs:\n\n- a one-line PyLate drop-in (`patch_pylate()`),\n- a stateless `nn.Module` (`MaxSimScorer`) for custom training loops,\n- function-level entry points (`maxsim`, `maxsim_varlen`, `maxsim_padded`, ...) for everything else.\n\nThis is **not** a search engine. For end-to-end training or retrieval use [PyLate](https://github.com/lightonai/pylate), [FastPlaid](https://github.com/lightonai/fast-plaid) or [NextPlaid](https://github.com/lightonai/next-plaid). This library is the MaxSim math they compile down to.\n\n## Install\n\n```bash\npip install late-interaction-kernels\n```\n\n| Platform                       | Backend                                                                       |\n| ------------------------------ | ----------------------------------------------------------------------------- |\n| Linux + CUDA (sm_75+)          | Fused Triton kernels (autotuned, FP8 on Hopper).                              |\n| macOS (Apple Silicon, MPS)     | Fused Metal `simdgroup_matrix` for inference, `torch.compile` for training.   |\n| CPU / Windows                  | Autograd-aware pure-PyTorch reference.                                        |\n\n## Quickstart\n\n### Patch PyLate (one line)\n\n```python\nfrom late_interaction_kernels import patch_pylate\n\npatch_pylate()\n# PyLate training / rerank code is unchanged\n```\n\nSet `LIK_DISABLE=1` in the environment to fall back to vanilla PyLate at runtime.\n\n### Custom training loop\n\n```python\nfrom late_interaction_kernels import MaxSimScorer\n\nscorer = MaxSimScorer(normalize=True)                # nn.Module, no parameters\nscores = scorer(Q, D, q_mask=q_mask, d_mask=d_mask)  # [Nq, Nd] fp32\nscores.mean().backward()\n```\n\n### Top-k retrieval\n\n```python\nfrom late_interaction_kernels import retrieve\n\nscores, indices = retrieve(Q, D, top_k=100, chunk=4096)\n# both [Nq, 100]; chunk= bounds peak HBM at Nq * (chunk + top_k)\n```\n\n### PLAID / ColBERTv2 on compressed, ragged docs\n\n```python\nfrom late_interaction_kernels.plaid import maxsim_residual_varlen\n\nscores = maxsim_residual_varlen(\n    Q, codes_flat, residuals_flat, cu_seqlens_d,\n    centroids=centroids, bucket_weights=bucket_weights,\n    nbits=2, normalize=True,\n)  # [Nd] fp32; one kernel does decompress + L2-normalize + MaxSim\n```\n\n## Benchmarks\n\n1×H100 80GB SXM, bf16 inputs / fp32 accumulator, 50-iter median. All\nspeedups are measured at **matched numerics** — every baseline runs the\neinsum with an fp32 accumulator (same as the fused kernel), and parity\nis asserted at `atol=1e-2` before timing.\n\n| Workload                                                    | Speedup            |\n| ----------------------------------------------------------- | ------------------ |\n| Reranking / inference (vs eager fp32-acc *and* `torch.compile`) | 2-11×          |\n| Long-context (`Ld ≥ 8k`) MaxSim fwd+bwd                     | runs; naive OOMs   |\n| PyLate cached-contrastive MaxSim + backward (vs vanilla)    | 4.0-5.5×           |\n| PLAID rerank vs `fast_plaid.engine.search()` (incl. top-k)  | 19-32×             |\n| Fused D-side head (training)                                | 1.2-4.2×           |\n| FP8 MaxSim inference vs same kernel in bf16 (Hopper)        | 1.9-2.5×           |\n| LateOn-Code-edge training (real MS MARCO triplets)          | 1.05-1.27× e2e     |\n\n`torch.compile` is within ±5% of eager on every forward shape because\nInductor still has to materialise the `[Nq · Nd · Lq · Ld]` similarity\ntensor before the `max(-1)` reduction — that materialisation *is* what\nthe fused kernel exists to skip. Full tables and reproduction commands:\n[`docs/benchmarks.md`](docs/benchmarks.md).\n\n## Choose a kernel\n\n\u003ctable\u003e\n\u003ctr\u003e\n\u003ctd width=\"55%\" valign=\"middle\"\u003e\n\nNot sure which entry point fits your stack? The docs site ships an interactive decision tree that narrows the public API down to the right function in four questions (stack · phase · layout · workload):\n\n**👉 [hcompai.github.io/late-interaction-kernels/choose-a-kernel.html](https://hcompai.github.io/late-interaction-kernels/choose-a-kernel.html#choose-a-kernel)**\n\n\u003c/td\u003e\n\u003ctd width=\"45%\" valign=\"middle\" align=\"center\"\u003e\n\n\u003ca href=\"https://hcompai.github.io/late-interaction-kernels/choose-a-kernel.html#choose-a-kernel\"\u003e\n  \u003cimg src=\"assets/kernel_picker_widget_preview.webp\" alt=\"Pick a kernel · interactive decision tree\" width=\"420\"\u003e\n\u003c/a\u003e\n\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/table\u003e\n\n## API\n\n| Symbol                                | What it does                                                          |\n| ------------------------------------- | --------------------------------------------------------------------- |\n| `patch_pylate()` / `unpatch_pylate()` | One-line PyLate drop-in. `LIK_DISABLE=1` kill switch.                 |\n| `MaxSimScorer(normalize=, backward=)` | Stateless `nn.Module`, autograd-aware.                                |\n| `retrieve(Q, D, top_k, chunk=)`       | Top-k retrieval, chunked for huge corpora.                            |\n| `maxsim`                              | Core MaxSim, dense layout. Autograd-aware; auto-skips argmax save when no input requires grad. |\n| `maxsim_varlen`                       | Packed (`cu_seqlens`) layout. Autograd-aware.                         |\n| `maxsim_padded`                       | Padded reranking wrapper: packs internally, returns `[B, C]` fp32.    |\n\nOther kernels are in submodules: `padded`, `score_pairs`, `fused_head`, `plaid`, `fp8`, `experimental`, `reference`. See [`docs/design.md`](docs/design.md) for details on every kernel, the autograd graph and the backward variants.\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cstrong\u003e🔽 Configuration knobs (env vars + kwargs)\u003c/strong\u003e\u003c/summary\u003e\n\n| Knob                                                              | Effect                                                            |\n| ----------------------------------------------------------------- | ----------------------------------------------------------------- |\n| `maxsim(..., backward=\"auto\" \\| \"unified\" \\| \"atomic\" \\| \"csr\")`  | Per-call `grad_D` strategy. `\"auto\"` picks per shape.             |\n| `patch_colpali_engine()` / `unpatch_colpali_engine()`             | colpali_engine drop-in: loss + scoring routes through the kernel. |\n| `LIK_DISABLE=1`                                                   | Patched entry points delegate to vanilla PyLate.                  |\n| `LIK_SUPPRESS_NORM_WARN=1`                                        | Silence the \"looks unnormalized\" one-shot warning.                |\n| `LIK_DISABLE_COMPILE=1`                                           | Skip `torch.compile` on the MPS path (eager fallback).            |\n| `LIK_FORCE_MPS_BACKEND={metal,compile,reference}`                 | Pin the MPS dispatch.                                             |\n\n\u003c/details\u003e\n\n## Development\n\n```bash\ngit clone https://github.com/hcompai/late-interaction-kernels\ncd late-interaction-kernels\nuv sync --extra dev --extra pylate --extra torch-cuda   # GPU dev; use --extra torch-cpu on CPU-only boxes\nuv run pytest -q                                        # CUDA tests auto-skip without a GPU\nuv run ruff check . \u0026\u0026 uv run ruff format --check .\n```\n\n\u003e [!NOTE]\n\u003e Pick exactly one of `--extra torch-cuda` (pulls torch from the CUDA index — `cu124`) or `--extra torch-cpu` (CPU-only wheel, what CI uses). The two are declared as conflicting in `pyproject.toml` so the lockfile resolves cleanly for both. On macOS, `--extra torch-cpu` falls back to PyPI's default (MPS-capable) wheel automatically.\n\nGPU tests run automatically on every push to `main`. To run them on a PR, apply the `run-gpu-tests` label.\n\nSee [`CONTRIBUTING.md`](CONTRIBUTING.md) for the contribution workflow.\n\n## Citation\n\n```bibtex\n@software{late_interaction_kernels_2026,\n  author  = {Lac, Aurélien and Wu, Tony},\n  title   = {{late-interaction-kernels}: Fused Triton kernels for late-interaction scoring},\n  year    = {2026},\n  url     = {https://github.com/hcompai/late-interaction-kernels},\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhcompai%2Flate-interaction-kernels","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhcompai%2Flate-interaction-kernels","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhcompai%2Flate-interaction-kernels/lists"}