https://github.com/argmaxml/vecsim

Last synced: about 1 year ago
JSON representation

Host: GitHub
URL: https://github.com/argmaxml/vecsim
Owner: argmaxml
License: mit
Created: 2022-11-24T09:26:18.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2023-11-30T13:40:25.000Z (over 2 years ago)
Last Synced: 2025-04-14T12:14:47.102Z (about 1 year ago)
Language: Python
Size: 58.6 KB
Stars: 4
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # VecSim - A unified interface for similarity servers

A standard, light-weight interface to all popular similarity servers.

## The problems we are trying to solve:

1. **Standard API** - Different vector similarity servers have different APIs - so switching is not trivial.

1. **Identifiers** - Some vector similarity servers support string IDs, some do not - we keep track of the mapping.

1. **Partitions** - In most cases, pre-filtering is needed prior to querying, we abstract this concept away.

1. **Aggregations** - In some cases, one item is being indexed to multiple vectors.

## Supported engines:

1. Scikit-learn, via [NearestNeighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html)

1. [RediSearch](https://redis.io/docs/stack/search/reference/vectors/)

1. [Faiss](https://github.com/facebookresearch/faiss)

1. [ElasticSearch](https://www.elastic.co)

1. [Pinecone](https://www.pinecone.io)

## QuickStart example

```python

import numpy as np

# Import a similarity server of your choice:

# SKlearn (best for small datasets or testing)

from vecsim import SciKitIndex

sim = SciKitIndex(metric='cosine', dim=32)

user_ids = ["user_"+str(1+i) for i in range(100)]

user_data = np.random.random((100,32))

item_ids=["item_"+str(101+i) for i in range(100)]

item_data = np.random.random((100,32))

sim.add_items(user_data, user_ids, partition="users")

sim.add_items(item_data, item_ids, partition="items")

# Index the data

sim.init()

# Run nearest neighbor vector search

query = np.random.random(32)

dists, items = sim.search(query, k=10) # returns a list of users and items

dists, items = sim.search(query, k=10, partition="users") # returns a list of users only

```

For more examples, please read our [documentation](https://vecsim.readthedocs.io/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/argmaxml/vecsim

Awesome Lists containing this project

README