{"id":25449770,"url":"https://github.com/elchemista/vettore","last_synced_at":"2026-06-13T02:09:23.648Z","repository":{"id":277829878,"uuid":"933617168","full_name":"elchemista/vettore","owner":"elchemista","description":"Elixir  in memory VectorDB build with Rust  using rustler! It's small, fast, efficient, simple! ","archived":false,"fork":false,"pushed_at":"2025-09-01T21:12:45.000Z","size":35243,"stargazers_count":19,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-03-17T20:06:40.919Z","etag":null,"topics":["elixir","rag","rust","rustler","vector-database","vectors"],"latest_commit_sha":null,"homepage":"https://hex.pm/packages/vettore","language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elchemista.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-16T11:20:43.000Z","updated_at":"2026-03-13T11:05:43.000Z","dependencies_parsed_at":null,"dependency_job_id":"04d8b431-85b7-49a2-ac17-c9bb01e00386","html_url":"https://github.com/elchemista/vettore","commit_stats":null,"previous_names":["elchemista/vettore"],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/elchemista/vettore","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Fvettore","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Fvettore/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Fvettore/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Fvettore/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elchemista","download_url":"https://codeload.github.com/elchemista/vettore/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elchemista%2Fvettore/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34269428,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["elixir","rag","rust","rustler","vector-database","vectors"],"created_at":"2025-02-17T21:18:10.736Z","updated_at":"2026-06-13T02:09:23.641Z","avatar_url":"https://github.com/elchemista.png","language":"Elixir","funding_links":[],"categories":["Vector Databases and RAG"],"sub_categories":[],"readme":"# Vettore\n\nVettore is a small vector toolkit for Elixir that keeps your data in ETS and\nuses Rust only where it helps: distance kernels, normalization, HNSW search, and\nMUVERA-style encodings.\n\nEarlier versions leaned toward a Rust-owned in-memory database. That was fast,\nbut it made the library feel less like an Elixir tool and more like an external\nengine with Elixir bindings. Vettore now chooses ETS as the canonical store on\npurpose:\n\n- records are visible and easy to inspect from Elixir\n- supervision, snapshots, and ownership stay simple\n- metadata and application values live beside vectors naturally\n- native indexes can be rebuilt from canonical ETS state\n- the public API stays small, predictable, and BEAM-friendly\n\nThe important idea is simple:\n\n- Elixir owns the records.\n- ETS is the source of truth.\n- Rust accelerates the expensive parts.\n- Search results say clearly what is a score and what is a distance.\n\nThat choice is not the absolute fastest possible architecture. A fully\nRust-owned vector database can beat ETS for large exact scans, but Vettore\noptimizes for a different kind of usefulness: simple integration with ordinary\nElixir systems, with Rust kept as acceleration rather than ownership.\n\n## What You Get\n\n- ETS-backed collections\n- exact flat search\n- native HNSW approximate search\n- Matryoshka-style funnel search\n- binary quantized candidate search\n- hybrid candidate pipelines with exact or multi-vector reranking\n- ColBERT-style late interaction over multi-vector records\n- MUVERA-style fixed-dimensional encodings\n- named distance, similarity, normalization, and MMR helpers\n- a top-level `Vettore.*` API, plus compatibility wrappers for the older\n  `Vettore.new/0` database-style API\n\n## Installation\n\n```elixir\ndef deps do\n  [\n    {:vettore, \"~\u003e 0.3.1\"}\n  ]\nend\n```\n\n## Quick Start\n\nCreate a collection, insert a few records, and search:\n\n```elixir\n{:ok, collection} =\n  Vettore.new(\n    name: :documents,\n    dimensions: 3,\n    index: :flat,\n    metric: :cosine,\n    normalize: :l2\n  )\n\n:ok =\n  Vettore.put_many(collection, [\n    %{id: \"east\", vector: [1.0, 0.0, 0.0], metadata: %{kind: :axis}},\n    %{id: \"north\", vector: [0.0, 1.0, 0.0]},\n    %{id: \"west\", vector: [-1.0, 0.0, 0.0]}\n  ])\n\n{:ok, results} =\n  Vettore.search(collection, [1.0, 0.0, 0.0], limit: 2)\n```\n\nResults are `%Vettore.Result{}` structs:\n\n```elixir\n%Vettore.Result{\n  id: \"east\",\n  value: \"east\",\n  score: 1.0,\n  distance: 0.0,\n  metric: :cosine,\n  metadata: %{kind: :axis}\n}\n```\n\n## Public API\n\nNew code can stay under the top-level `Vettore` module:\n\n```elixir\nVettore.new(opts)\nVettore.put(collection, embedding)\nVettore.put_many(collection, embeddings)\nVettore.get(collection, id)\nVettore.delete(collection, id)\nVettore.all(collection)\nVettore.search(collection, query, opts)\nVettore.funnel_search(collection, query, opts)\nVettore.quantized_search(collection, query, opts)\nVettore.multi_vector_search(collection, query_vectors, opts)\nVettore.hybrid_search(collection, query, opts)\nVettore.snapshot(collection, path)\nVettore.load_snapshot(path, opts)\n```\n\n`Vettore.new/1` creates a collection. `Vettore.new/0` still creates the older\ncompatibility database.\n\n## Choosing A Search Path\n\nStart with the simplest thing that matches your job.\n\n| Use this | When |\n| --- | --- |\n| `search/3` with `index: :flat` | Small data, tests, correctness baselines, exact results |\n| `search/3` with `index: :hnsw` | Fast approximate search over larger collections |\n| `funnel_search/3` | Matryoshka embeddings where early dimensions are meaningful |\n| `quantized_search/3` | Cheap sign-bit candidate search before exact reranking |\n| `multi_vector_search/3` | ColBERT-style late interaction over token/page vectors |\n| `hybrid_search/3` | Combine candidate generators, then rerank once |\n\nThe standalone helpers are nice while exploring. For production-style retrieval,\n`hybrid_search/3` is usually the most ergonomic surface.\n\n## Exact Search\n\nFlat search keeps ids and vectors in a Rust resource and scores the whole exact\nscan in one native call. ETS remains the canonical store for values, metadata,\nsnapshots, and usability.\n\n```elixir\n{:ok, collection} =\n  Vettore.new(\n    name: :exact_vectors,\n    dimensions: 384,\n    index: :flat,\n    metric: :cosine,\n    normalize: :l2\n  )\n\n{:ok, results} =\n  Vettore.search(collection, query_vector, limit: 10)\n```\n\nThis path is intentionally boring. It is great for small collections, local\ncaches, classifier centroids, deterministic tests, and recall baselines.\n\n## HNSW Search\n\nHNSW keeps a native graph beside the ETS store. ETS remains canonical; the graph\nis an acceleration structure.\n\n```elixir\n{:ok, collection} =\n  Vettore.new(\n    name: :ann_vectors,\n    dimensions: 768,\n    index: :hnsw,\n    index_options: [\n      m: 16,\n      m0: 32,\n      ef_construction: 100,\n      ef_search: 64,\n      max_level: 12\n    ],\n    metric: :cosine,\n    normalize: :l2\n  )\n\n:ok = Vettore.put(collection, %{id: \"doc-1\", vector: embedding})\n\n{:ok, results} =\n  Vettore.search(collection, query_vector, limit: 10)\n```\n\nSupported HNSW metrics:\n\n- `:l2`\n- `:cosine`\n- `:inner_product`\n\n## Adaptive Candidate Search\n\nThese helpers first find a candidate set, then rerank with full stored vectors.\nThey are useful when you want to make the first pass cheaper without changing\nthe canonical store.\n\n### Matryoshka Funnel\n\nFunnel search scores progressively larger vector prefixes. It works best with\nmodels trained for Matryoshka or nested embeddings.\n\n```elixir\n{:ok, results} =\n  Vettore.funnel_search(collection, query_vector,\n    stages: [128, 256, 384],\n    candidates: 200,\n    limit: 10\n  )\n```\n\n### Binary Quantized Candidates\n\nQuantized search uses stored sign bits for a cheap Hamming-distance first pass,\nthen reranks with the collection metric.\n\n```elixir\n{:ok, results} =\n  Vettore.quantized_search(collection, query_vector,\n    candidates: 200,\n    limit: 10\n  )\n```\n\nVettore generates `binary_vector` at insert time:\n\n```elixir\n{:ok, embedding} = Vettore.get(collection, \"doc-1\")\nembedding.binary_vector\n# [1, 0, 1, ...]\n```\n\n## Hybrid Search\n\n`hybrid_search/3` lets you combine candidate generators, union their ids, fetch\nthe canonical records from ETS, and rerank once.\n\n```elixir\n{:ok, results} =\n  Vettore.hybrid_search(collection, query_vector,\n    generators: [\n      funnel: [stages: [128, 384], candidates: 200],\n      quantized: [candidates: 200]\n    ],\n    rerank: :exact,\n    limit: 10\n  )\n```\n\nFor HNSW collections, add `:hnsw` as a generator:\n\n```elixir\n{:ok, results} =\n  Vettore.hybrid_search(collection, query_vector,\n    generators: [\n      hnsw: [candidates: 100],\n      quantized: [candidates: 200]\n    ],\n    rerank: :exact,\n    limit: 10\n  )\n```\n\nThe same pipeline can rerank with late interaction:\n\n```elixir\n{:ok, results} =\n  Vettore.hybrid_search(collection, query_vector,\n    generators: [quantized: [candidates: 200]],\n    rerank: {:multi_vector, query_vectors},\n    limit: 10\n  )\n```\n\nThat is the general pattern:\n\n1. Generate cheap candidates.\n2. Merge them by id.\n3. Rerank with the expensive scorer you actually care about.\n\n## Multi-Vector Search\n\nMulti-vector search is for ColBERT-style retrieval: each record can hold many\nvectors, usually token vectors or page-patch vectors. A query also has many\nvectors. For each query vector, Vettore finds the best matching document vector\nand sums those best scores.\n\n```elixir\n:ok =\n  Vettore.put(collection, %Vettore.Embedding{\n    id: \"page-1\",\n    vectors: [\n      [1.0, 0.0],\n      [0.0, 1.0]\n    ],\n    metadata: %{source: \"manual\"}\n  })\n\n{:ok, results} =\n  Vettore.multi_vector_search(\n    collection,\n    [[1.0, 0.0], [0.0, 1.0]],\n    metric: :inner_product,\n    limit: 10\n  )\n```\n\nThe lower-level scoring helper is available too:\n\n```elixir\nVettore.MultiVector.colbert_score(\n  [[1.0, 0.0], [0.0, 1.0]],\n  [[1.0, 0.0], [1.0, 1.0]],\n  metric: :inner_product\n)\n# {:ok, 2.0}\n```\n\n`Vettore.MultiVector.chamfer/3` is the same MaxSim-style operation under a more\ngeneral name.\n\n## MUVERA-Style Encodings\n\nMUVERA reduces multi-vector retrieval to fixed-dimensional vectors. The intended\nflow is:\n\n1. Encode query multi-vectors into a fixed-dimensional query vector.\n2. Encode document multi-vectors into fixed-dimensional document vectors.\n3. Search those vectors with inner product.\n4. Rerank candidates with exact MaxSim/Chamfer.\n\n```elixir\nvectors = [\n  [1.0, 0.0],\n  [0.0, 1.0]\n]\n\nconfig = [\n  num_repetitions: 1,\n  num_simhash_projections: 4,\n  seed: 42,\n  projection_dimension: 2\n]\n\n{:ok, query_fde} = Vettore.Encoding.Muvera.encode_query(vectors, config)\n{:ok, doc_fde} = Vettore.Encoding.Muvera.encode_document(vectors, config)\n```\n\nConfig options:\n\n- `:dimension` - inferred from vectors by default\n- `:num_repetitions` - defaults to `1`\n- `:num_simhash_projections` - defaults to `0`\n- `:seed` - defaults to `1`\n- `:projection_dimension` - defaults to input dimension\n- `:final_projection_dimension` - optional count-sketch compression size\n\n## Records And Storage\n\nRecords are `%Vettore.Embedding{}` structs or maps with equivalent keys.\n\n```elixir\n%Vettore.Embedding{\n  id: \"doc-1\",\n  value: \"optional external value\",\n  vector: [0.1, 0.2, 0.3],\n  vectors: [[0.1, 0.2, 0.3], [0.0, 0.5, 0.5]],\n  binary_vector: [1, 1, 1],\n  metadata: %{source: \"local\"}\n}\n```\n\nUseful details:\n\n- `id` is the preferred unique identifier.\n- If `id` is missing, a non-empty string `value` can be used as the id.\n- Duplicate ids are rejected.\n- Duplicate vectors are allowed.\n- Vectors are normalized at insertion according to the collection config.\n- If `vectors` is present but `vector` is omitted, Vettore stores an averaged\n  representative vector for ordinary search/indexing.\n- `binary_vector` is generated automatically for quantized candidate search.\n\nETS collections can be snapshotted:\n\n```elixir\n:ok = Vettore.snapshot(collection, \"priv/snapshots/docs.ets\")\n\n{:ok, loaded} =\n  Vettore.load_snapshot(\"priv/snapshots/docs.ets\")\n```\n\nSnapshots store the ETS table: records, metadata, normalized vectors, binary\nvectors, multi-vectors, and collection config. Native indexes are rebuilt from\nETS when loaded.\n\nYou can load the same data with a different index:\n\n```elixir\n{:ok, loaded} =\n  Vettore.load_snapshot(\"priv/snapshots/docs.ets\", index: :hnsw)\n```\n\nETS compression is available when you want to trade CPU for memory:\n\n```elixir\n{:ok, collection} =\n  Vettore.new(\n    name: :compressed_documents,\n    dimensions: 384,\n    metric: :cosine,\n    normalize: :l2,\n    compressed: true\n  )\n```\n\n## Metrics And Scoring\n\nCollection metrics:\n\n- `:l2`\n- `:l2_squared`\n- `:cosine`\n- `:inner_product`\n- `:negative_inner_product`\n- `:manhattan`\n- `:chebyshev`\n- `:hamming`\n- `:jaccard`\n\nAliases accepted by `Vettore.new/1`:\n\n- `:euclidean` -\u003e `:l2`\n- `:dot` -\u003e `:inner_product`\n- `:dot_product` -\u003e `:inner_product`\n\nwith `Vettore.Distance` you can use directly all distance functions:\n\n```elixir\nVettore.Distance.l2([0.0, 0.0], [3.0, 4.0])\n# {:ok, 5.0}\n\nVettore.Distance.cosine([1.0, 0.0], [0.0, 1.0])\n# {:ok, 0.0}\n\nVettore.Distance.inner_product([1.0, 2.0], [3.0, 4.0])\n# {:ok, 11.0}\n```\n\n## Normalization\n\nSupported normalization modes:\n\n- `:none`\n- `:l2`\n- `:zscore`\n- `:minmax`\n\n```elixir\nVettore.Distance.normalize([3.0, 4.0], :l2)\n# {:ok, [0.6, 0.8]}\n```\n\nCollection defaults:\n\n- `metric: :cosine` defaults to `normalize: :l2`\n- all other metrics default to `normalize: :none`\n\nInserted vectors and query vectors are prepared with the same collection\nnormalization mode.\n\n## Other Helpers\n\nMMR reranking:\n\n```elixir\ninitial = [{\"a\", 0.9}, {\"b\", 0.8}, {\"c\", 0.1}]\nembeddings = [{\"a\", [1.0, 0.0]}, {\"b\", [1.0, 0.0]}, {\"c\", [0.0, 1.0]}]\n\nVettore.Distance.mmr_rerank(initial, embeddings, :cosine, 0.5, 2)\n# {:ok, [{\"a\", 0.9}, {\"c\", 0.1}]}\n```\n\nSign compression:\n\n```elixir\nVettore.Distance.compress_f32_vector([1.0, -2.0, 0.0])\n# [5]\n\nleft = Vettore.Distance.compress_f32_vector([1.0, -2.0, 0.0])\nright = Vettore.Distance.compress_f32_vector([-1.0, -2.0, 0.0])\nVettore.Distance.packed_hamming(left, right, 3)\n# {:ok, 1.0}\n```\n\n## Compatibility API\n\nThe old top-level API still exists as a small compatibility layer backed by ETS\ncollections:\n\n```elixir\ndb = Vettore.new()\n\n{:ok, \"legacy\"} =\n  Vettore.create_collection(db, \"legacy\", 2, :cosine)\n\n{:ok, \"a\"} =\n  Vettore.insert(db, \"legacy\", %Vettore.Embedding{\n    value: \"a\",\n    vector: [1.0, 0.0]\n  })\n\n{:ok, results} =\n  Vettore.similarity_search(db, \"legacy\", [1.0, 0.0], limit: 1)\n```\n\nNew code should prefer the collection-style top-level API: `Vettore.new/1`, `Vettore.put/2`, and `Vettore.search/3`.\n\n## Development\n\nThe tests include a real `ex_fastembed` integration with\n`BAAI/bge-small-en-v1.5` over a small phrase corpus.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felchemista%2Fvettore","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felchemista%2Fvettore","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felchemista%2Fvettore/lists"}