Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/microsoft/SPTAG

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.
https://github.com/microsoft/SPTAG

approximate-nearest-neighbor-search distributed-serving fresh-update neighborhood-graph space-partition-tree vector-search

Last synced: about 2 months ago
JSON representation

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search scenario.

Awesome Lists containing this project

README

        

# SPTAG: A library for fast approximate nearest neighbor search

[![MIT licensed](https://img.shields.io/badge/license-MIT-yellow.svg)](https://github.com/Microsoft/SPTAG/blob/master/LICENSE)
[![Build status](https://sysdnn.visualstudio.com/SPTAG/_apis/build/status/SPTAG-GITHUB)](https://sysdnn.visualstudio.com/SPTAG/_build/latest?definitionId=2)

## **SPTAG**
SPTAG (Space Partition Tree And Graph) is a library for large scale vector approximate nearest neighbor search scenario released by [Microsoft Research (MSR)](https://www.msra.cn/) and [Microsoft Bing](http://bing.com).


architecture

## What's NEW
* Result Iterator with Relaxed Monotonicity Signal Support
* New Research Paper [SPFresh: Incremental In-Place Update for Billion-Scale Vector Search](https://dl.acm.org/doi/10.1145/3600006.3613166) - _published in SOSP 2023_
* New Research Paper [VBASE: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity](https://www.usenix.org/system/files/osdi23-zhang-qianxi_1.pdf) - _published in OSDI 2023_

## **Introduction**

This library assumes that the samples are represented as vectors and that the vectors can be compared by L2 distances or cosine distances.
Vectors returned for a query vector are the vectors that have smallest L2 distance or cosine distances with the query vector.

SPTAG provides two methods: kd-tree and relative neighborhood graph (SPTAG-KDT)
and balanced k-means tree and relative neighborhood graph (SPTAG-BKT).
SPTAG-KDT is advantageous in index building cost, and SPTAG-BKT is advantageous in search accuracy in very high-dimensional data.

## **How it works**

SPTAG is inspired by the NGS approach [[WangL12](#References)]. It contains two basic modules: index builder and searcher.
The RNG is built on the k-nearest neighborhood graph [[WangWZTG12](#References), [WangWJLZZH14](#References)]
for boosting the connectivity. Balanced k-means trees are used to replace kd-trees to avoid the inaccurate distance bound estimation in kd-trees for very high-dimensional vectors.
The search begins with the search in the space partition trees for
finding several seeds to start the search in the RNG.
The searches in the trees and the graph are iteratively conducted.

## **Highlights**
* Fresh update: Support online vector deletion and insertion
* Distributed serving: Search over multiple machines

## **Build**

### **Requirements**

* swig >= 4.0.2
* cmake >= 3.12.0
* boost >= 1.67.0

### **Fast clone**

```
set GIT_LFS_SKIP_SMUDGE=1
git clone --recurse-submodules https://github.com/microsoft/SPTAG

OR

git config --global filter.lfs.smudge "git-lfs smudge --skip -- %f"
git config --global filter.lfs.process "git-lfs filter-process --skip"
```

### **Install**

> For Linux:
```bash
mkdir build
cd build && cmake .. && make
```
It will generate a Release folder in the code directory which contains all the build targets.

> For Windows:
```bash
mkdir build
cd build && cmake -A x64 ..
```
It will generate a SPTAGLib.sln in the build directory.
Compiling the ALL_BUILD project in the Visual Studio (at least 2019) will generate a Release directory which contains all the build targets.

For detailed instructions on installing Windows binaries, please see [here](docs/WindowsInstallation.md)

> Using Docker:
```bash
docker build -t sptag .
```
Will build a docker container with binaries in `/app/Release/`.

### **Verify**

Run the SPTAGTest (or Test.exe) in the Release folder to verify all the tests have passed.

### **Usage**

The detailed usage can be found in [Get started](docs/GettingStart.md). There is also an end-to-end tutorial for building vector search online service using Python Wrapper in [Python Tutorial](docs/Tutorial.ipynb).
The detailed parameters tunning can be found in [Parameters](docs/Parameters.md).

## **References**
Please cite SPTAG in your publications if it helps your research:
```
@inproceedings{xu2023spfresh,
title={SPFresh: Incremental In-Place Update for Billion-Scale Vector Search},
author={Xu, Yuming and Liang, Hengyu and Li, Jin and Xu, Shuotao and Chen, Qi and Zhang, Qianxi and Li, Cheng and Yang, Ziyue and Yang, Fan and Yang, Yuqing and others},
booktitle={Proceedings of the 29th Symposium on Operating Systems Principles},
pages={545--561},
year={2023}
}

@inproceedings{zhang2023vbase,
title={$\{$VBASE$\}$: Unifying Online Vector Similarity Search and Relational Queries via Relaxed Monotonicity},
author={Zhang, Qianxi and Xu, Shuotao and Chen, Qi and Sui, Guoxin and Xie, Jiadong and Cai, Zhizhen and Chen, Yaoqi and He, Yinxuan and Yang, Yuqing and Yang, Fan and others},
booktitle={17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 23)},
year={2023}
}

@inproceedings{ChenW21,
author = {Qi Chen and
Bing Zhao and
Haidong Wang and
Mingqin Li and
Chuanjie Liu and
Zengzhong Li and
Mao Yang and
Jingdong Wang},
title = {SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search},
booktitle = {35th Conference on Neural Information Processing Systems (NeurIPS 2021)},
year = {2021}
}

@manual{ChenW18,
author = {Qi Chen and
Haidong Wang and
Mingqin Li and
Gang Ren and
Scarlett Li and
Jeffery Zhu and
Jason Li and
Chuanjie Liu and
Lintao Zhang and
Jingdong Wang},
title = {SPTAG: A library for fast approximate nearest neighbor search},
url = {https://github.com/Microsoft/SPTAG},
year = {2018}
}

@inproceedings{WangL12,
author = {Jingdong Wang and
Shipeng Li},
title = {Query-driven iterated neighborhood graph search for large scale indexing},
booktitle = {ACM Multimedia 2012},
pages = {179--188},
year = {2012}
}

@inproceedings{WangWZTGL12,
author = {Jing Wang and
Jingdong Wang and
Gang Zeng and
Zhuowen Tu and
Rui Gan and
Shipeng Li},
title = {Scalable k-NN graph construction for visual descriptors},
booktitle = {CVPR 2012},
pages = {1106--1113},
year = {2012}
}

@article{WangWJLZZH14,
author = {Jingdong Wang and
Naiyan Wang and
You Jia and
Jian Li and
Gang Zeng and
Hongbin Zha and
Xian{-}Sheng Hua},
title = {Trinary-Projection Trees for Approximate Nearest Neighbor Search},
journal = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
volume = {36},
number = {2},
pages = {388--403},
year = {2014
}
```

## **Contribute**

This project welcomes contributions and suggestions from all the users.

We use [GitHub issues](https://github.com/Microsoft/SPTAG/issues) for tracking suggestions and bugs.

## **License**
The entire codebase is under [MIT license](https://github.com/Microsoft/SPTAG/blob/master/LICENSE)