https://github.com/argmaxml/vecsim
https://github.com/argmaxml/vecsim
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/argmaxml/vecsim
- Owner: argmaxml
- License: mit
- Created: 2022-11-24T09:26:18.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-11-30T13:40:25.000Z (over 2 years ago)
- Last Synced: 2025-04-14T12:14:47.102Z (about 1 year ago)
- Language: Python
- Size: 58.6 KB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VecSim - A unified interface for similarity servers
A standard, light-weight interface to all popular similarity servers.
## The problems we are trying to solve:
1. **Standard API** - Different vector similarity servers have different APIs - so switching is not trivial.
1. **Identifiers** - Some vector similarity servers support string IDs, some do not - we keep track of the mapping.
1. **Partitions** - In most cases, pre-filtering is needed prior to querying, we abstract this concept away.
1. **Aggregations** - In some cases, one item is being indexed to multiple vectors.
## Supported engines:
1. Scikit-learn, via [NearestNeighbors](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.NearestNeighbors.html)
1. [RediSearch](https://redis.io/docs/stack/search/reference/vectors/)
1. [Faiss](https://github.com/facebookresearch/faiss)
1. [ElasticSearch](https://www.elastic.co)
1. [Pinecone](https://www.pinecone.io)
## QuickStart example
```python
import numpy as np
# Import a similarity server of your choice:
# SKlearn (best for small datasets or testing)
from vecsim import SciKitIndex
sim = SciKitIndex(metric='cosine', dim=32)
user_ids = ["user_"+str(1+i) for i in range(100)]
user_data = np.random.random((100,32))
item_ids=["item_"+str(101+i) for i in range(100)]
item_data = np.random.random((100,32))
sim.add_items(user_data, user_ids, partition="users")
sim.add_items(item_data, item_ids, partition="items")
# Index the data
sim.init()
# Run nearest neighbor vector search
query = np.random.random(32)
dists, items = sim.search(query, k=10) # returns a list of users and items
dists, items = sim.search(query, k=10, partition="users") # returns a list of users only
```
For more examples, please read our [documentation](https://vecsim.readthedocs.io/)