{"id":13878328,"url":"https://github.com/ankane/neighbor","last_synced_at":"2025-11-17T14:05:06.881Z","repository":{"id":52076992,"uuid":"339285968","full_name":"ankane/neighbor","owner":"ankane","description":"Nearest neighbor search for Rails","archived":false,"fork":false,"pushed_at":"2025-10-24T02:36:07.000Z","size":334,"stargazers_count":765,"open_issues_count":3,"forks_count":17,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-11-13T08:15:31.642Z","etag":null,"topics":["nearest-neighbor-search"],"latest_commit_sha":null,"homepage":"","language":"Ruby","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ankane.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2021-02-16T04:36:33.000Z","updated_at":"2025-11-11T19:17:14.000Z","dependencies_parsed_at":"2023-09-26T06:31:11.577Z","dependency_job_id":"5d78f384-da9e-4754-8ea8-432d943c4ef0","html_url":"https://github.com/ankane/neighbor","commit_stats":{"total_commits":72,"total_committers":2,"mean_commits":36.0,"dds":0.01388888888888884,"last_synced_commit":"3e3e83dcc4f8bc1d74d3a9b848b8d0d5b769569f"},"previous_names":[],"tags_count":18,"template":false,"template_full_name":null,"purl":"pkg:github/ankane/neighbor","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fneighbor","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fneighbor/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fneighbor/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fneighbor/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ankane","download_url":"https://codeload.github.com/ankane/neighbor/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ankane%2Fneighbor/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":284369283,"owners_count":26993008,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-14T02:00:06.101Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["nearest-neighbor-search"],"created_at":"2024-08-06T08:01:46.484Z","updated_at":"2025-11-17T14:05:06.874Z","avatar_url":"https://github.com/ankane.png","language":"Ruby","funding_links":[],"categories":["Ruby","Gems","Sdks \u0026 Libraries"],"sub_categories":["Query Enhancement"],"readme":"# Neighbor\n\nNearest neighbor search for Rails\n\nSupports:\n\n- Postgres (cube and pgvector)\n- MariaDB 11.8\n- MySQL 9 (searching requires HeatWave) - experimental\n- SQLite (sqlite-vec) - experimental\n\nAlso available for [Redis](https://github.com/ankane/neighbor-redis) and [S3 Vectors](https://github.com/ankane/neighbor-s3)\n\n[![Build Status](https://github.com/ankane/neighbor/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/neighbor/actions)\n\n## Installation\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"neighbor\"\n```\n\n### For Postgres\n\nNeighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [pgvector](https://github.com/pgvector/pgvector). cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.\n\nFor cube, run:\n\n```sh\nrails generate neighbor:cube\nrails db:migrate\n```\n\nFor pgvector, [install the extension](https://github.com/pgvector/pgvector#installation) and run:\n\n```sh\nrails generate neighbor:vector\nrails db:migrate\n```\n\n### For SQLite\n\nAdd this line to your application’s Gemfile:\n\n```ruby\ngem \"sqlite-vec\"\n```\n\nAnd run:\n\n```sh\nrails generate neighbor:sqlite\n```\n\n## Getting Started\n\nCreate a migration\n\n```ruby\nclass AddEmbeddingToItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    # cube\n    add_column :items, :embedding, :cube\n\n    # pgvector, MariaDB, and MySQL\n    add_column :items, :embedding, :vector, limit: 3 # dimensions\n\n    # sqlite-vec\n    add_column :items, :embedding, :binary\n  end\nend\n```\n\nAdd to your model\n\n```ruby\nclass Item \u003c ApplicationRecord\n  has_neighbors :embedding\nend\n```\n\nUpdate the vectors\n\n```ruby\nitem.update(embedding: [1.0, 1.2, 0.5])\n```\n\nGet the nearest neighbors to a record\n\n```ruby\nitem.nearest_neighbors(:embedding, distance: \"euclidean\").first(5)\n```\n\nGet the nearest neighbors to a vector\n\n```ruby\nItem.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: \"euclidean\").first(5)\n```\n\nRecords returned from `nearest_neighbors` will have a `neighbor_distance` attribute\n\n```ruby\nnearest_item = item.nearest_neighbors(:embedding, distance: \"euclidean\").first\nnearest_item.neighbor_distance\n```\n\nSee the additional docs for:\n\n- [cube](#cube)\n- [pgvector](#pgvector)\n- [MariaDB](#mariadb)\n- [MySQL](#mysql)\n- [sqlite-vec](#sqlite-vec)\n\nOr check out some [examples](#examples)\n\n## cube\n\n### Distance\n\nSupported values are:\n\n- `euclidean`\n- `cosine`\n- `taxicab`\n- `chebyshev`\n\nFor cosine distance with cube, vectors must be normalized before being stored.\n\n```ruby\nclass Item \u003c ApplicationRecord\n  has_neighbors :embedding, normalize: true\nend\n```\n\nFor inner product with cube, see [this example](examples/disco/user_recs_cube.rb).\n\n### Dimensions\n\nThe `cube` type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this.\n\nFor cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.\n\n```ruby\nclass Item \u003c ApplicationRecord\n  has_neighbors :embedding, dimensions: 3\nend\n```\n\n## pgvector\n\n### Distance\n\nSupported values are:\n\n- `euclidean`\n- `inner_product`\n- `cosine`\n- `taxicab`\n- `hamming`\n- `jaccard`\n\n### Dimensions\n\nThe `vector` type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.\n\nThe `halfvec` type can have up to 16,000 dimensions, and half vectors with up to 4,000 dimensions can be indexed.\n\nThe `bit` type can have up to 83 million dimensions, and bit vectors with up to 64,000 dimensions can be indexed.\n\nThe `sparsevec` type can have up to 16,000 non-zero elements, and sparse vectors with up to 1,000 non-zero elements can be indexed.\n\n### Indexing\n\nAdd an approximate index to speed up queries. Create a migration with:\n\n```ruby\nclass AddIndexToItemsEmbedding \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_index :items, :embedding, using: :hnsw, opclass: :vector_l2_ops\n    # or\n    add_index :items, :embedding, using: :ivfflat, opclass: :vector_l2_ops\n  end\nend\n```\n\nUse `:vector_cosine_ops` for cosine distance and `:vector_ip_ops` for inner product.\n\nSet the size of the dynamic candidate list with HNSW\n\n```ruby\nItem.connection.execute(\"SET hnsw.ef_search = 100\")\n```\n\nOr the number of probes with IVFFlat\n\n```ruby\nItem.connection.execute(\"SET ivfflat.probes = 3\")\n```\n\n### Half-Precision Vectors\n\nUse the `halfvec` type to store half-precision vectors\n\n```ruby\nclass AddEmbeddingToItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_column :items, :embedding, :halfvec, limit: 3 # dimensions\n  end\nend\n```\n\n### Half-Precision Indexing\n\nIndex vectors at half precision for smaller indexes\n\n```ruby\nclass AddIndexToItemsEmbedding \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_index :items, \"(embedding::halfvec(3)) halfvec_l2_ops\", using: :hnsw\n  end\nend\n```\n\nGet the nearest neighbors\n\n```ruby\nItem.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: \"euclidean\", precision: \"half\").first(5)\n```\n\n### Binary Vectors\n\nUse the `bit` type to store binary vectors\n\n```ruby\nclass AddEmbeddingToItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_column :items, :embedding, :bit, limit: 3 # dimensions\n  end\nend\n```\n\nGet the nearest neighbors by Hamming distance\n\n```ruby\nItem.nearest_neighbors(:embedding, \"101\", distance: \"hamming\").first(5)\n```\n\n### Binary Quantization\n\nUse expression indexing for binary quantization\n\n```ruby\nclass AddIndexToItemsEmbedding \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_index :items, \"(binary_quantize(embedding)::bit(3)) bit_hamming_ops\", using: :hnsw\n  end\nend\n```\n\n### Sparse Vectors\n\nUse the `sparsevec` type to store sparse vectors\n\n```ruby\nclass AddEmbeddingToItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_column :items, :embedding, :sparsevec, limit: 3 # dimensions\n  end\nend\n```\n\nGet the nearest neighbors\n\n```ruby\nembedding = Neighbor::SparseVector.new({0 =\u003e 0.9, 1 =\u003e 1.3, 2 =\u003e 1.1}, 3)\nItem.nearest_neighbors(:embedding, embedding, distance: \"euclidean\").first(5)\n```\n\n## MariaDB\n\n### Distance\n\nSupported values are:\n\n- `euclidean`\n- `cosine`\n- `hamming`\n\n### Indexing\n\nVector columns must use `null: false` to add a vector index\n\n```ruby\nclass CreateItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    create_table :items do |t|\n      t.vector :embedding, limit: 3, null: false\n      t.index :embedding, type: :vector\n    end\n  end\nend\n```\n\n### Binary Vectors\n\nUse the `bigint` type to store binary vectors\n\n```ruby\nclass AddEmbeddingToItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_column :items, :embedding, :bigint\n  end\nend\n```\n\nNote: Binary vectors can have up to 64 dimensions\n\nGet the nearest neighbors by Hamming distance\n\n```ruby\nItem.nearest_neighbors(:embedding, 5, distance: \"hamming\").first(5)\n```\n\n## MySQL\n\n### Distance\n\nSupported values are:\n\n- `euclidean`\n- `cosine`\n- `hamming`\n\nNote: The `DISTANCE()` function is [only available on HeatWave](https://dev.mysql.com/doc/refman/9.0/en/vector-functions.html)\n\n### Binary Vectors\n\nUse the `binary` type to store binary vectors\n\n```ruby\nclass AddEmbeddingToItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    add_column :items, :embedding, :binary\n  end\nend\n```\n\nGet the nearest neighbors by Hamming distance\n\n```ruby\nItem.nearest_neighbors(:embedding, \"\\x05\", distance: \"hamming\").first(5)\n```\n\n## sqlite-vec\n\n### Distance\n\nSupported values are:\n\n- `euclidean`\n- `cosine`\n- `taxicab`\n- `hamming`\n\n### Dimensions\n\nFor sqlite-vec, it’s a good idea to specify the number of dimensions to ensure all records have the same number.\n\n```ruby\nclass Item \u003c ApplicationRecord\n  has_neighbors :embedding, dimensions: 3\nend\n```\n\n### Virtual Tables\n\nYou can also use [virtual tables](https://alexgarcia.xyz/sqlite-vec/features/knn.html)\n\n```ruby\nclass AddEmbeddingToItems \u003c ActiveRecord::Migration[8.1]\n  def change\n    # Rails 8+\n    create_virtual_table :items, :vec0, [\n      \"id integer PRIMARY KEY AUTOINCREMENT NOT NULL\",\n      \"embedding float[3] distance_metric=L2\"\n    ]\n\n    # Rails \u003c 8\n    execute \u003c\u003c~SQL\n      CREATE VIRTUAL TABLE items USING vec0(\n        id integer PRIMARY KEY AUTOINCREMENT NOT NULL,\n        embedding float[3] distance_metric=L2\n      )\n    SQL\n  end\nend\n```\n\nUse `distance_metric=cosine` for cosine distance\n\nYou can optionally ignore any shadow tables that are created\n\n```ruby\nActiveRecord::SchemaDumper.ignore_tables += [\n  \"items_chunks\", \"items_rowids\", \"items_vector_chunks00\"\n]\n```\n\nGet the `k` nearest neighbors\n\n```ruby\nItem.where(\"embedding MATCH ?\", [1, 2, 3].to_s).where(k: 5).order(:distance)\n```\n\nFilter by primary key\n\n```ruby\nItem.where(id: [2, 3]).where(\"embedding MATCH ?\", [1, 2, 3].to_s).where(k: 5).order(:distance)\n```\n\n### Int8 Vectors\n\nUse the `type` option for int8 vectors\n\n```ruby\nclass Item \u003c ApplicationRecord\n  has_neighbors :embedding, dimensions: 3, type: :int8\nend\n```\n\n### Binary Vectors\n\nUse the `type` option for binary vectors\n\n```ruby\nclass Item \u003c ApplicationRecord\n  has_neighbors :embedding, dimensions: 8, type: :bit\nend\n```\n\nGet the nearest neighbors by Hamming distance\n\n```ruby\nItem.nearest_neighbors(:embedding, \"\\x05\", distance: \"hamming\").first(5)\n```\n\n## Examples\n\n- [Embeddings](#openai-embeddings) with OpenAI\n- [Binary embeddings](#cohere-embeddings) with Cohere\n- [Sentence embeddings](#sentence-embeddings) with Informers\n- [Hybrid search](#hybrid-search) with Informers\n- [Sparse search](#sparse-search) with Transformers.rb\n- [Recommendations](#disco-recommendations) with Disco\n\n### OpenAI Embeddings\n\nGenerate a model\n\n```sh\nrails generate model Document content:text embedding:vector{1536}\nrails db:migrate\n```\n\nAnd add `has_neighbors`\n\n```ruby\nclass Document \u003c ApplicationRecord\n  has_neighbors :embedding\nend\n```\n\nCreate a method to call the [embeddings API](https://platform.openai.com/docs/guides/embeddings)\n\n```ruby\ndef embed(input)\n  url = \"https://api.openai.com/v1/embeddings\"\n  headers = {\n    \"Authorization\" =\u003e \"Bearer #{ENV.fetch(\"OPENAI_API_KEY\")}\",\n    \"Content-Type\" =\u003e \"application/json\"\n  }\n  data = {\n    input: input,\n    model: \"text-embedding-3-small\"\n  }\n\n  response = Net::HTTP.post(URI(url), data.to_json, headers).tap(\u0026:value)\n  JSON.parse(response.body)[\"data\"].map { |v| v[\"embedding\"] }\nend\n```\n\nPass your input\n\n```ruby\ninput = [\n  \"The dog is barking\",\n  \"The cat is purring\",\n  \"The bear is growling\"\n]\nembeddings = embed(input)\n```\n\nStore the embeddings\n\n```ruby\ndocuments = []\ninput.zip(embeddings) do |content, embedding|\n  documents \u003c\u003c {content: content, embedding: embedding}\nend\nDocument.insert_all!(documents)\n```\n\nAnd get similar documents\n\n```ruby\ndocument = Document.first\ndocument.nearest_neighbors(:embedding, distance: \"cosine\").first(5).map(\u0026:content)\n```\n\nSee the [complete code](examples/openai/example.rb)\n\n### Cohere Embeddings\n\nGenerate a model\n\n```sh\nrails generate model Document content:text embedding:bit{1536}\nrails db:migrate\n```\n\nAnd add `has_neighbors`\n\n```ruby\nclass Document \u003c ApplicationRecord\n  has_neighbors :embedding\nend\n```\n\nCreate a method to call the [embed API](https://docs.cohere.com/reference/embed)\n\n```ruby\ndef embed(input, input_type)\n  url = \"https://api.cohere.com/v2/embed\"\n  headers = {\n    \"Authorization\" =\u003e \"Bearer #{ENV.fetch(\"CO_API_KEY\")}\",\n    \"Content-Type\" =\u003e \"application/json\"\n  }\n  data = {\n    texts: input,\n    model: \"embed-v4.0\",\n    input_type: input_type,\n    embedding_types: [\"ubinary\"]\n  }\n\n  response = Net::HTTP.post(URI(url), data.to_json, headers).tap(\u0026:value)\n  JSON.parse(response.body)[\"embeddings\"][\"ubinary\"].map { |e| e.map { |v| v.chr.unpack1(\"B*\") }.join }\nend\n```\n\nPass your input\n\n```ruby\ninput = [\n  \"The dog is barking\",\n  \"The cat is purring\",\n  \"The bear is growling\"\n]\nembeddings = embed(input, \"search_document\")\n```\n\nStore the embeddings\n\n```ruby\ndocuments = []\ninput.zip(embeddings) do |content, embedding|\n  documents \u003c\u003c {content: content, embedding: embedding}\nend\nDocument.insert_all!(documents)\n```\n\nEmbed the search query\n\n```ruby\nquery = \"forest\"\nquery_embedding = embed([query], \"search_query\")[0]\n```\n\nAnd search the documents\n\n```ruby\nDocument.nearest_neighbors(:embedding, query_embedding, distance: \"hamming\").first(5).map(\u0026:content)\n```\n\nSee the [complete code](examples/cohere/example.rb)\n\n### Sentence Embeddings\n\nYou can generate embeddings locally with [Informers](https://github.com/ankane/informers).\n\nGenerate a model\n\n```sh\nrails generate model Document content:text embedding:vector{384}\nrails db:migrate\n```\n\nAnd add `has_neighbors`\n\n```ruby\nclass Document \u003c ApplicationRecord\n  has_neighbors :embedding\nend\n```\n\nLoad a [model](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)\n\n```ruby\nmodel = Informers.pipeline(\"embedding\", \"sentence-transformers/all-MiniLM-L6-v2\")\n```\n\nPass your input\n\n```ruby\ninput = [\n  \"The dog is barking\",\n  \"The cat is purring\",\n  \"The bear is growling\"\n]\nembeddings = model.(input)\n```\n\nStore the embeddings\n\n```ruby\ndocuments = []\ninput.zip(embeddings) do |content, embedding|\n  documents \u003c\u003c {content: content, embedding: embedding}\nend\nDocument.insert_all!(documents)\n```\n\nAnd get similar documents\n\n```ruby\ndocument = Document.first\ndocument.nearest_neighbors(:embedding, distance: \"cosine\").first(5).map(\u0026:content)\n```\n\nSee the [complete code](examples/informers/example.rb)\n\n### Hybrid Search\n\nYou can use Neighbor for hybrid search with [Informers](https://github.com/ankane/informers).\n\nGenerate a model\n\n```sh\nrails generate model Document content:text embedding:vector{768}\nrails db:migrate\n```\n\nAnd add `has_neighbors` and a scope for keyword search\n\n```ruby\nclass Document \u003c ApplicationRecord\n  has_neighbors :embedding\n\n  scope :search, -\u003e(query) {\n    where(\"to_tsvector(content) @@ plainto_tsquery(?)\", query)\n      .order(Arel.sql(\"ts_rank_cd(to_tsvector(content), plainto_tsquery(?)) DESC\", query))\n  }\nend\n```\n\nCreate some documents\n\n```ruby\nDocument.create!(content: \"The dog is barking\")\nDocument.create!(content: \"The cat is purring\")\nDocument.create!(content: \"The bear is growling\")\n```\n\nGenerate an embedding for each document\n\n```ruby\nembed = Informers.pipeline(\"embedding\", \"Snowflake/snowflake-arctic-embed-m-v1.5\")\nembed_options = {model_output: \"sentence_embedding\", pooling: \"none\"} # specific to embedding model\n\nDocument.find_each do |document|\n  embedding = embed.(document.content, **embed_options)\n  document.update!(embedding: embedding)\nend\n```\n\nPerform keyword search\n\n```ruby\nquery = \"growling bear\"\nkeyword_results = Document.search(query).limit(20).load_async\n```\n\nAnd semantic search in parallel (the query prefix is specific to the [embedding model](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v1.5))\n\n```ruby\nquery_prefix = \"Represent this sentence for searching relevant passages: \"\nquery_embedding = embed.(query_prefix + query, **embed_options)\nsemantic_results =\n  Document.nearest_neighbors(:embedding, query_embedding, distance: \"cosine\").limit(20).load_async\n```\n\nTo combine the results, use Reciprocal Rank Fusion (RRF)\n\n```ruby\nNeighbor::Reranking.rrf(keyword_results, semantic_results).first(5)\n```\n\nOr a reranking model\n\n```ruby\nrerank = Informers.pipeline(\"reranking\", \"mixedbread-ai/mxbai-rerank-xsmall-v1\")\nresults = (keyword_results + semantic_results).uniq\nrerank.(query, results.map(\u0026:content)).first(5).map { |v| results[v[:doc_id]] }\n```\n\nSee the [complete code](examples/hybrid/example.rb)\n\n### Sparse Search\n\nYou can generate sparse embeddings locally with [Transformers.rb](https://github.com/ankane/transformers-ruby).\n\nGenerate a model\n\n```sh\nrails generate model Document content:text embedding:sparsevec{30522}\nrails db:migrate\n```\n\nAnd add `has_neighbors`\n\n```ruby\nclass Document \u003c ApplicationRecord\n  has_neighbors :embedding\nend\n```\n\nLoad a [model](https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-v1) to generate embeddings\n\n```ruby\nclass EmbeddingModel\n  def initialize(model_id)\n    @model = Transformers::AutoModelForMaskedLM.from_pretrained(model_id)\n    @tokenizer = Transformers::AutoTokenizer.from_pretrained(model_id)\n    @special_token_ids = @tokenizer.special_tokens_map.map { |_, token| @tokenizer.vocab[token] }\n  end\n\n  def embed(input)\n    feature = @tokenizer.(input, padding: true, truncation: true, return_tensors: \"pt\", return_token_type_ids: false)\n    output = @model.(**feature)[0]\n    values = Torch.max(output * feature[:attention_mask].unsqueeze(-1), dim: 1)[0]\n    values = Torch.log(1 + Torch.relu(values))\n    values[0.., @special_token_ids] = 0\n    values.to_a\n  end\nend\n\nmodel = EmbeddingModel.new(\"opensearch-project/opensearch-neural-sparse-encoding-v1\")\n```\n\nPass your input\n\n```ruby\ninput = [\n  \"The dog is barking\",\n  \"The cat is purring\",\n  \"The bear is growling\"\n]\nembeddings = model.embed(input)\n```\n\nStore the embeddings\n\n```ruby\ndocuments = []\ninput.zip(embeddings) do |content, embedding|\n  documents \u003c\u003c {content: content, embedding: Neighbor::SparseVector.new(embedding)}\nend\nDocument.insert_all!(documents)\n```\n\nEmbed the search query\n\n```ruby\nquery = \"forest\"\nquery_embedding = model.embed([query])[0]\n```\n\nAnd search the documents\n\n```ruby\nDocument.nearest_neighbors(:embedding, Neighbor::SparseVector.new(query_embedding), distance: \"inner_product\").first(5).map(\u0026:content)\n```\n\nSee the [complete code](examples/sparse/example.rb)\n\n### Disco Recommendations\n\nYou can use Neighbor for online item-based recommendations with [Disco](https://github.com/ankane/disco). We’ll use MovieLens data for this example.\n\nGenerate a model\n\n```sh\nrails generate model Movie name:string factors:cube\nrails db:migrate\n```\n\nAnd add `has_neighbors`\n\n```ruby\nclass Movie \u003c ApplicationRecord\n  has_neighbors :factors, dimensions: 20, normalize: true\nend\n```\n\nFit the recommender\n\n```ruby\ndata = Disco.load_movielens\nrecommender = Disco::Recommender.new(factors: 20)\nrecommender.fit(data)\n```\n\nStore the item factors\n\n```ruby\nmovies = []\nrecommender.item_ids.each do |item_id|\n  movies \u003c\u003c {name: item_id, factors: recommender.item_factors(item_id)}\nend\nMovie.create!(movies)\n```\n\nAnd get similar movies\n\n```ruby\nmovie = Movie.find_by(name: \"Star Wars (1977)\")\nmovie.nearest_neighbors(:factors, distance: \"cosine\").first(5).map(\u0026:name)\n```\n\nSee the complete code for [cube](examples/disco/item_recs_cube.rb) and [pgvector](examples/disco/item_recs_vector.rb)\n\n## History\n\nView the [changelog](https://github.com/ankane/neighbor/blob/master/CHANGELOG.md)\n\n## Contributing\n\nEveryone is encouraged to help improve this project. Here are a few ways you can help:\n\n- [Report bugs](https://github.com/ankane/neighbor/issues)\n- Fix bugs and [submit pull requests](https://github.com/ankane/neighbor/pulls)\n- Write, clarify, or fix documentation\n- Suggest or add new features\n\nTo get started with development:\n\n```sh\ngit clone https://github.com/ankane/neighbor.git\ncd neighbor\nbundle install\n\n# Postgres\ncreatedb neighbor_test\nbundle exec rake test:postgresql\n\n# SQLite\nbundle exec rake test:sqlite\n\n# MariaDB\ndocker run -e MARIADB_ALLOW_EMPTY_ROOT_PASSWORD=1 -e MARIADB_DATABASE=neighbor_test -p 3307:3306 mariadb:11.8\nbundle exec rake test:mariadb\n\n# MySQL\ndocker run -e MYSQL_ALLOW_EMPTY_PASSWORD=1 -e MYSQL_DATABASE=neighbor_test -p 3306:3306 mysql:9\nbundle exec rake test:mysql\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fneighbor","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fankane%2Fneighbor","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fankane%2Fneighbor/lists"}