https://github.com/intellabs/vectorsearchdatasets
https://github.com/intellabs/vectorsearchdatasets
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/intellabs/vectorsearchdatasets
- Owner: IntelLabs
- License: mit
- Created: 2023-05-19T20:09:29.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-24T20:37:31.000Z (over 1 year ago)
- Last Synced: 2024-10-26T07:52:04.581Z (over 1 year ago)
- Language: Python
- Size: 63.7 MB
- Stars: 13
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Security: security.md
Awesome Lists containing this project
README
# Vector Search Datasets
This repository provides code to generate several datasets for similarity search benchmarking and evaluation on
high-dimensional vectors stemming from recent deep learning models. The available datasets are:
* [DPR](dpr/README.md) [[1]](#1)
* [open-images](openimages/README.md) [[2]](#2)
* [rqa](rqa/README.md) [[3]](#3)
* [wit](wit/README.md) [[3]](#3)
* [wikipedia](text/README.md)
Please see the details of each dataset in the respective README files.
## References
[1]
Aguerrebere, C.; Bhati I.; Hildebrand M.; Tepper M.; Willke T.:Similarity search in the blink of an eye with compressed
indices. In: Proceedings of the VLDB Endowment, 16, 11, 3433 - 3446. (2023)
[2]
Aguerrebere, C.; Hildebrand M.; Bhati I.; Willke T.; Tepper M..: Locally-adaptive Quantization for Streaming Vector
Search. (2024) [arxiv]
[3]
Tepper M.; Bhati I.; Aguerrebere, C.; Hildebrand M.; Willke T.: LeanVec: Search your vectors faster by making them fit.
arXiv preprint arXiv:2312.16335 (2024)