https://github.com/azurecosmosdb/vectorindexscenariosuite
Vector Search Scenarios with CosmosDB.
https://github.com/azurecosmosdb/vectorindexscenariosuite
Last synced: 5 months ago
JSON representation
Vector Search Scenarios with CosmosDB.
- Host: GitHub
- URL: https://github.com/azurecosmosdb/vectorindexscenariosuite
- Owner: AzureCosmosDB
- License: mit
- Created: 2024-07-18T22:30:51.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-11-08T20:56:25.000Z (6 months ago)
- Last Synced: 2024-11-08T21:35:22.484Z (6 months ago)
- Language: C#
- Homepage:
- Size: 15.8 MB
- Stars: 4
- Watchers: 5
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VectorIndexScenarioSuite
This repository contains a suite of scenarios designed to explore vector indexing capabilities in CosmosDB NoSQL.## Blog Post Series
For a detailed explanation and walkthrough of each scenario, refer to our [Cosmos DB Blog post series](https://aka.ms/CosmosDiskANNBlogPart1).## List of supported Scenarios :
1. Wiki-Cohere-English-EmbeddingOnly Scenario :
The Wiki Cohere English Embedding Only Scenario contains 768 dimensional embeddings of English Wikipedia articles (without corresponding passage text).
The embeddings have been generated using Cohere’s multilingual-22-12 model.
For simplicity, we use pre-processed version of the dataset hosted at [BigANN](https://github.com/harsha-simhadri/big-ann-benchmarks/blob/main/benchmark/datasets.py).
This dataset contains :
- Base data slices of sizes [100K, 10Million and 35Million].
- Query vectors and corresponding ground truth neighbor identifiers / distances for 5000 vectors not in the base dataset.The dataset uses the BigANNBinary format documented at [BigANNBenchmarks](https://big-ann-benchmarks.com/neurips21.html#bench-datasets)
Please Watch / Star this repository as we will be adding multiple new scenarios in the near future.