{"id":29308643,"url":"https://github.com/lightonai/pylate-rs","last_synced_at":"2025-07-08T08:02:11.249Z","repository":{"id":302797104,"uuid":"1011833443","full_name":"lightonai/pylate-rs","owner":"lightonai","description":"PyLate efficient inference engine","archived":false,"fork":false,"pushed_at":"2025-07-04T09:43:06.000Z","size":0,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-04T09:45:01.012Z","etag":null,"topics":["colbert","late-interaction","python","rust","wasm"],"latest_commit_sha":null,"homepage":"https://lightonai.github.io/pylate-rs/","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lightonai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-01T12:04:17.000Z","updated_at":"2025-07-04T09:43:09.000Z","dependencies_parsed_at":"2025-07-05T13:15:18.190Z","dependency_job_id":null,"html_url":"https://github.com/lightonai/pylate-rs","commit_stats":null,"previous_names":["lightonai/pylate-rs"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lightonai/pylate-rs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate-rs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate-rs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate-rs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate-rs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lightonai","download_url":"https://codeload.github.com/lightonai/pylate-rs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fpylate-rs/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":263747300,"owners_count":23505087,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colbert","late-interaction","python","rust","wasm"],"created_at":"2025-07-07T07:14:33.999Z","updated_at":"2025-07-07T07:14:43.413Z","avatar_url":"https://github.com/lightonai.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align=\"center\"\u003e\n    \u003ch1\u003epylate-rs\u003c/h1\u003e\n\u003c/div\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg width=500 src=\"https://github.com/lightonai/pylate-rs/blob/01ee9895d83d6bc0a52ba826f6b634d33be479ce/docs/logo.jpg\"/\u003e\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://lightonai.github.io/pylate-rs/\"\u003e\u003cimg src=\"https://img.shields.io/badge/blog-%23000000.svg?style=for-the-badge\u0026logoColor=white\" alt=\"blog\"\u003e\u003c/a\u003e\n    \u003ca href=\"https://docs.rs/pylate-rs/latest/pylate_rs/all.html\"\u003e\u003cimg src=\"https://img.shields.io/badge/crate-%23000000.svg?style=for-the-badge\u0026logoColor=white\" alt=\"crate\"\u003e\u003c/a\u003e\n\n\u003c/div\u003e\n\n\u003cdiv align=\"center\"\u003e\n\u003cb\u003eEfficient Inference for PyLate\u003c/b\u003e\n\u003c/div\u003e\n\n\u0026nbsp;\n\n## ⭐️ Overview\n\n**pylate-rs** is a high-performance inference engine for [PyLate](https://github.com/lightonai/pylate) models, meticulously crafted in Rust for optimal speed and efficiency.\n\nWhile model training is handled by PyLate, which supports a variety of late interaction models, `pylate-rs` is engineered to execute these models at speeds.\n\n- **Accelerated Performance**: Experience significantly faster model loading and rapid cold starts, making it ideal for serverless environments and low-latency applications.\n\n- **Lightweight Design**: Built on the [Candle](https://github.com/huggingface/candle) ML framework, `pylate-rs` maintains a minimal footprint suitable for resource-constrained systems like serverless functions and edge computing.\n\n- **Broad Hardware Support**: Optimized for diverse hardware, with dedicated builds for standard CPUs, Intel (MKL), Apple Silicon (Accelerate \u0026 Metal), and NVIDIA GPUs (CUDA).\n\n- **Cross-Platform Integration**: Seamlessly integrate `pylate-rs` into your projects with bindings for Python, Rust, and JavaScript/WebAssembly.\n\nFor a complete, high-performance multi-vector search pipeline, pair `pylate-rs` with its companion library, [**FastPlaid**](https://github.com/lightonai/fast-plaid), at inference time.\n\nExplore our [**WebAssembly live demo**](https://lightonai.github.io/pylate-rs/).\n\n\u0026nbsp;\n\n## 💻 Installation\n\nInstall the version of `pylate-rs` that matches your hardware for optimal performance.\n\n### Python\n\n| Target Hardware          | Installation Command                                     |\n| :----------------------- | :------------------------------------------------------- |\n| **Standard CPU**         | `pip install pylate-rs`                                  |\n| **Apple CPU** (macOS)    | `pip install pylate-rs-accelerate`                       |\n| **Intel CPU** (MKL)      | `pip install pylate-rs-mkl`                              |\n| **Apple GPU** (M1/M2/M3) | `pip install pylate-rs-metal`                            |\n| **NVIDIA GPU** (CUDA)    | `pip install git+https://github.com/lightonai/pylate-rs` |\n\nCuda wheels are not yet available on PyPI because of their size. But you can install them with the link to the repository. WIP.\n\n\u0026nbsp;\n\n### Rust\n\nAdd `pylate-rs` to your `Cargo.toml` by enabling the feature flag that corresponds to your backend.\n\n| Feature      | Target Hardware          | Installation Command                        |\n| :----------- | :----------------------- | :------------------------------------------ |\n| _(default)_  | **Standard CPU**         | `cargo add pylate-rs`                       |\n| `accelerate` | **Apple CPU** (macOS)    | `cargo add pylate-rs --features accelerate` |\n| `mkl`        | **Intel CPU** (MKL)      | `cargo add pylate-rs --features mkl`        |\n| `metal`      | **Apple GPU** (M1/M2/M3) | `cargo add pylate-rs --features metal`      |\n| `cuda`       | **NVIDIA GPU** (CUDA)    | `cargo add pylate-rs --features cuda`       |\n\n\u0026nbsp;\n\n## ⚡️ Quick Start\n\n### Python\n\nGet started in just a few lines of Python.\n\n```python\nfrom pylate_rs import models\n\n# Initialize the model for your target device (\"cpu\", \"cuda\", or \"mps\")\nmodel = models.ColBERT(\n    model_name_or_path=\"lightonai/GTE-ModernColBERT-v1\",\n    device=\"cuda\"\n)\n\n# Encode queries and documents\nqueries_embeddings = model.encode(\n    sentences=[\"What is the capital of France?\", \"How big is the sun?\"],\n    is_query=True\n)\n\ndocuments_embeddings = model.encode(\n    sentences=[\"Paris is the capital of France.\", \"The sun is a star.\"],\n    is_query=False\n)\n\n# Calculate similarity scores\nsimilarities = model.similarity(queries_embeddings, documents_embeddings)\n\nprint(f\"Similarity scores:\\n{similarities}\")\n\n# Use hierarchical pooling to reduce document embedding size and speed up downstream tasks\npooled_documents_embeddings = model.encode(\n    sentences=[\"Paris is the capital of France.\", \"The sun is a star.\"],\n    is_query=False,\n    pool_factor=2, # Halves the number of token embeddings\n)\n\nsimilarities_pooled = model.similarity(queries_embeddings, pooled_documents_embeddings)\n\nprint(f\"Similarity scores with pooling:\\n{similarities_pooled}\")\n```\n\n\u0026nbsp;\n\n### Rust\n\n```rust\nuse anyhow::Result;\nuse candle_core::Device;\nuse pylate_rs::{hierarchical_pooling, ColBERT};\n\nfn main() -\u003e Result\u003c()\u003e {\n    // Set the device (e.g., Cpu, Cuda, Metal)\n    let device = Device::Cpu;\n\n    // Initialize the model\n    let mut model: ColBERT = ColBERT::from(\"lightonai/GTE-ModernColBERT-v1\")\n        .with_device(device)\n        .try_into()?;\n\n    // Encode queries and documents\n    let queries = vec![\"What is the capital of France?\".to_string()];\n    let documents = vec![\"Paris is the capital of France.\".to_string()];\n\n    let query_embeddings = model.encode(\u0026queries, true)?;\n    let document_embeddings = model.encode(\u0026documents, false)?;\n\n    // Calculate similarity\n    let similarities = model.similarity(\u0026query_embeddings, \u0026document_embeddings)?;\n    println!(\"Similarity score: {}\", similarities.data[0][0]);\n\n    // Use hierarchical pooling\n    let pooled_document_embeddings = hierarchical_pooling(\u0026document_embeddings, 2)?;\n    let pooled_similarities = model.similarity(\u0026query_embeddings, \u0026pooled_document_embeddings)?;\n    println!(\"Similarity score after hierarchical pooling: {}\", pooled_similarities.data[0][0]);\n\n    Ok(())\n}\n```\n\n\u0026nbsp;\n\n## 📊 Benchmarks\n\n```python\nDevice    backend        Queries per seconds        Documents per seconds        Model loading time\ncpu       PyLate         350.10                     32.16                        2.06\ncpu       pylate-rs      386.21 (+10%)              42.15 (+31%)                 0.07 (-97%)\n\ncuda      PyLate         2236.48                    882.66                       3.62\ncuda      pylate-rs      4046.88 (+81%)             976.23 (+11%)                1.95 (-46%)\n\nmps       PyLate         580.81                     103.10                       1.95\nmps       pylate-rs      291.71 (-50%)              23.26 (-77%)                 0.08 (-96%)\n```\n\nBenchmark were run with Python. `pylate-rs` provide significant performance improvement, especially in scenarios requiring fast startup times. While on a Mac it takes up to 5 seconds to load a model with the Transformers backend and encode a single query, `pylate-rs` achieves this in just 0.11 seconds, making it ideal for low-latency applications. Don't expect `pylate-rs` to be much faster than `PyLate` to encode a lot of content at the same time as PyTorch is heavily optimized.\n\n\u0026nbsp;\n\n## 📦 Using Custom Models\n\n`pylate-rs` is compatible with any model saved in the PyLate format, whether from the Hugging Face Hub or a local directory. PyLate itself is compatible with a wide range of models, including those from Sentence Transformers, Hugging Face Transformers, and custom models. So before using `pylate-rs`, ensure your model is saved in the PyLate format. You can easily convert and upload your own models using PyLate.\n\nPushing a model to the Hugging Face Hub in PyLate format is straightforward. Here’s how you can do it:\n\n```bash\npip install pylate\n```\n\nThen, you can use the following Python code snippet to push your model:\n\n```python\nfrom pylate import models\n\n# Load your model\nmodel = models.ColBERT(model_name_or_path=\"your-base-model-on-hf\")\n\n# Push in PyLate format\nmodel.push_to_hub(\n    repo_id=\"YourUsername/YourModelName\",\n    private=False,\n    token=\"YOUR_HUGGINGFACE_TOKEN\",\n)\n```\n\nIf you want to save a model in PyLate format locally, you can do so with the following code snippet:\n\n```python\nfrom pylate import models\n\n# Load your model\nmodel = models.ColBERT(model_name_or_path=\"your-base-model-on-hf\")\n\n# Save in PyLate format\nmodel.save_pretrained(\"path/to/save/GTE-ModernColBERT-v1-pylate\")\n```\n\nAn existing set of models compatible with `pylate-rs` is available on the Hugging Face Hub under the [**LightOn**](https://huggingface.co/collections/lightonai/pylate-6862b571946fe88330d65264) namespace.\n\n\u0026nbsp;\n\n## Retrieval pipeline\n\n```bash\npip install pylate-rs fast-plaid\n```\n\nHere is a sample code for running ColBERT with pylate-rs and fast-plaid.\n\n```python\nimport torch\nfrom fast_plaid import search\nfrom pylate_rs import models\n\nmodel = models.ColBERT(\n    model_name_or_path=\"lightonai/GTE-ModernColBERT-v1\",\n    device=\"cpu\", # mps or cuda\n)\n\ndocuments = [\n    \"1st Arrondissement: Louvre, Tuileries Garden, Palais Royal, historic, tourist.\",\n    \"2nd Arrondissement: Bourse, financial, covered passages, Sentier, business.\",\n    \"3rd Arrondissement: Marais, Musée Picasso, galleries, trendy, historic.\",\n    \"4th Arrondissement: Notre-Dame, Marais, Hôtel de Ville, LGBTQ+.\",\n    \"5th Arrondissement: Latin Quarter, Sorbonne, Panthéon, student, intellectual.\",\n    \"6th Arrondissement: Saint-Germain-des-Prés, Luxembourg Gardens, chic, artistic, cafés.\",\n    \"7th Arrondissement: Eiffel Tower, Musée d'Orsay, Les Invalides, affluent, prestigious.\",\n    \"8th Arrondissement: Champs-Élysées, Arc de Triomphe, luxury, shopping, Élysée.\",\n    \"9th Arrondissement: Palais Garnier, department stores, shopping, theaters.\",\n    \"10th Arrondissement: Gare du Nord, Gare de l'Est, Canal Saint-Martin.\",\n    \"11th Arrondissement: Bastille, nightlife, Oberkampf, revolutionary, hip.\",\n    \"12th Arrondissement: Bois de Vincennes, Opéra Bastille, Bercy, residential.\",\n    \"13th Arrondissement: Chinatown, Bibliothèque Nationale, modern, diverse, street-art.\",\n    \"14th Arrondissement: Montparnasse, Catacombs, residential, artistic, quiet.\",\n    \"15th Arrondissement: Residential, family, populous, Parc André Citroën.\",\n    \"16th Arrondissement: Trocadéro, Bois de Boulogne, affluent, elegant, embassies.\",\n    \"17th Arrondissement: Diverse, Palais des Congrès, residential, Batignolles.\",\n    \"18th Arrondissement: Montmartre, Sacré-Cœur, Moulin Rouge, artistic, historic.\",\n    \"19th Arrondissement: Parc de la Villette, Cité des Sciences, canals, diverse.\",\n    \"20th Arrondissement: Père Lachaise, Belleville, cosmopolitan, artistic, historic.\",\n]\n\n# Encoding documents\ndocuments_embeddings = model.encode(\n    sentences=documents,\n    is_query=False,\n    pool_factor=2, # Let's divide the number of embeddings by 2.\n)\n\n# Creating the FastPlaid index\nfast_plaid = search.FastPlaid(index=\"index\")\n\n\nfast_plaid.create(\n    documents_embeddings=[torch.tensor(embedding) for embedding in documents_embeddings]\n)\n```\n\nWe can then load the existing index and search for the most relevant documents:\n\n```python\nimport torch\nfrom fast_plaid import search\nfrom pylate_rs import models\n\nfast_plaid = search.FastPlaid(index=\"index\")\n\nqueries = [\n    \"arrondissement with the Eiffel Tower and Musée d'Orsay\",\n    \"Latin Quarter and Sorbonne University\",\n    \"arrondissement with Sacré-Cœur and Moulin Rouge\",\n    \"arrondissement with the Louvre and Tuileries Garden\",\n    \"arrondissement with Notre-Dame Cathedral and the Marais\",\n]\n\nqueries_embeddings = model.encode(\n    sentences=queries,\n    is_query=True,\n)\n\nscores = fast_plaid.search(\n    queries_embeddings=torch.tensor(queries_embeddings),\n    top_k=3,\n)\n\nprint(scores)\n```\n\n## 📝 Citation\n\nIf you use `pylate-rs` in your research or project, please cite it as follows:\n\n```bibtex\n@misc{PyLate,\n  title={PyLate: Flexible Training and Retrieval for Late Interaction Models},\n  author={Chaffin, Antoine and Sourty, Raphaël},\n  url={https://github.com/lightonai/pylate},\n  year={2024}\n}\n```\n\n\u0026nbsp;\n\n## WebAssembly\n\nFor JavaScript and TypeScript projects, install the WASM package from npm.\n\n```bash\nnpm install pylate-rs\n```\n\nLoad the model by fetching the required files from a local path or the Hugging Face Hub.\n\n```javascript\nimport { ColBERT } from \"pylate-rs\";\n\nconst REQUIRED_FILES = [\n  \"tokenizer.json\",\n  \"model.safetensors\",\n  \"config.json\",\n  \"config_sentence_transformers.json\",\n  \"1_Dense/model.safetensors\",\n  \"1_Dense/config.json\",\n  \"special_tokens_map.json\",\n];\n\nasync function loadModel(modelRepo) {\n  const fetchAllFiles = async (basePath) =\u003e {\n    const responses = await Promise.all(\n      REQUIRED_FILES.map((file) =\u003e fetch(`${basePath}/${file}`))\n    );\n    for (const response of responses) {\n      if (!response.ok) throw new Error(`File not found: ${response.url}`);\n    }\n    return Promise.all(\n      responses.map((res) =\u003e res.arrayBuffer().then((b) =\u003e new Uint8Array(b)))\n    );\n  };\n\n  try {\n    let modelFiles;\n    try {\n      // Attempt to load from a local `models` directory first\n      modelFiles = await fetchAllFiles(`models/${modelRepo}`);\n    } catch (e) {\n      console.warn(\n        `Local model not found, falling back to Hugging Face Hub.`,\n        e\n      );\n      // Fallback to fetching directly from the Hugging Face Hub\n      modelFiles = await fetchAllFiles(\n        `https://huggingface.co/${modelRepo}/resolve/main`\n      );\n    }\n\n    const [\n      tokenizer,\n      model,\n      config,\n      stConfig,\n      dense,\n      denseConfig,\n      tokensConfig,\n    ] = modelFiles;\n\n    // Instantiate the model with the loaded files\n    const colbertModel = new ColBERT(\n      model,\n      dense,\n      tokenizer,\n      config,\n      stConfig,\n      denseConfig,\n      tokensConfig,\n      32\n    );\n\n    // You can now use `colbertModel` for encoding\n    console.log(\"Model loaded successfully!\");\n    return colbertModel;\n  } catch (error) {\n    console.error(\"Model Loading Error:\", error);\n  }\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightonai%2Fpylate-rs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flightonai%2Fpylate-rs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightonai%2Fpylate-rs/lists"}