An open API service indexing awesome lists of open source software.

https://github.com/intellabs/vectorsearchdatasets


https://github.com/intellabs/vectorsearchdatasets

Last synced: about 1 year ago
JSON representation

Awesome Lists containing this project

README

          

# Vector Search Datasets

This repository provides code to generate several datasets for similarity search benchmarking and evaluation on
high-dimensional vectors stemming from recent deep learning models. The available datasets are:

* [DPR](dpr/README.md) [[1]](#1)
* [open-images](openimages/README.md) [[2]](#2)
* [rqa](rqa/README.md) [[3]](#3)
* [wit](wit/README.md) [[3]](#3)
* [wikipedia](text/README.md)

Please see the details of each dataset in the respective README files.

## References

[1]
Aguerrebere, C.; Bhati I.; Hildebrand M.; Tepper M.; Willke T.:Similarity search in the blink of an eye with compressed
indices. In: Proceedings of the VLDB Endowment, 16, 11, 3433 - 3446. (2023)

[2]
Aguerrebere, C.; Hildebrand M.; Bhati I.; Willke T.; Tepper M..: Locally-adaptive Quantization for Streaming Vector
Search. (2024) [arxiv]

[3]
Tepper M.; Bhati I.; Aguerrebere, C.; Hildebrand M.; Willke T.: LeanVec: Search your vectors faster by making them fit.
arXiv preprint arXiv:2312.16335 (2024)