awesome-vector-database
A curated list of awesome works related to high dimensional structure/vector search & database
https://github.com/dangkhoasdc/awesome-vector-database
Last synced: 7 days ago
JSON representation
-
📰 Articles & Talks
- What is a Vector Database?
- What makes each one different?
- [Article
- Computer Vision Meetup: Computer Vision Applications at Scale with Vector Databases
- Do we really need a specialized vector database?
- Vector database is not a separate database category
- Vector Databases: A First-Principles Approach
- Vector Search RAG Tutorial – Combine Your Data with LLMs with Advanced Search
- Efficient Vector Similarity Search in Recommender Workflows Using Milvus with NVIDIA Merlin
- Vector Databases: A Beginner’s Guide!
- eBay’s Blazingly Fast Billion-Scale Vector Similarity Engine
- Vector Database and Spring IA
- How to handle a Million Vector Embeddings in the RAG Applications
- How Meilisearch Updates a Millions Vector Embeddings Database in Under a Minute
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- Common Pitfalls To Avoid When Using Vector Databases
- Getting Started With Vector Databases
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- Vector Search RAG Tutorial – Combine Your Data with LLMs with Advanced Search
- How to handle a Million Vector Embeddings in the RAG Applications
- How to handle a Million Vector Embeddings in the RAG Applications
- How to choose your vector database in 2023?
-
:chart_with_upwards_trend: Evaluation & Metrics
-
Comparisons
-
Courses
-
Graph-based Methods
- [Paper
- [Paper - cv/hnsw), [Go Version](https://github.com/coder/hnsw)]
- [Paper
- [Paper
- [Paper
- Graph-and Tree-based Indexes for High-dimensional Vector Similarity Search: Analyses, Comparisons, and Future Directions - 21.
- BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU.
- Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus.
- Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search.
- General and practical tuning method for off-the-shelf graph-based index: Sisap indexing challenge report by team utokyo.
- Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment
- An Efficient and Robust Framework for Approximate Nearest Neighbor Search with Attribute Constraint
- Pecann: Parallel efficient clustering with graph-based approximate nearest neighbor search
- ELPIS: Graph-Based Similarity Search for Scalable Data Science - 1559.
- Worst-case performance of popular approximate nearest neighbor search implementations: Guarantees and limitations
- Optimizing Graph-based Approximate Nearest Neighbor Search: Stronger and Smarter.
- Graph-based Approximate NN Search: A Revisit
- Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism
- HVS: hierarchical graph structure based on voronoi diagrams for solving approximate nearest neighbor search - 258. [[Code](https://github.com/chuanxiao1983/HVS)]
- Revisiting $ k $-Nearest Neighbor Graph Construction on High-Dimensional Data: Experiments and Analyses
- Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
- Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search
- ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data
- Graph based nearest neighbor search: Promises and failures
- Understanding and Generalizing Monotonic Proximity Graphs for Approximate Nearest Neighbor Search
- Large-Scale Approximate k-NN Graph Construction on GPU
- An Exploration Graph with Continuous Refinement for Efficient Multimedia Retrieval
- [Paper
- [Paper - cv/hnsw), [Go Version](https://github.com/coder/hnsw)]
- [Paper
- Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment
- Pecann: Parallel efficient clustering with graph-based approximate nearest neighbor search
- Graph-based Approximate NN Search: A Revisit
- Speed-ANN: Low-Latency and High-Accuracy Nearest Neighbor Search via Intra-Query Parallelism
- Enhancing HNSW Index for Real-Time Updates: Addressing Unreachable Points and Performance Degradation
- SeRF: Segment Graph for Range-Filtering Approximate Nearest Neighbor Search - 26.
- [Paper
- BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU.
- Cagra: Highly parallel graph construction and approximate nearest neighbor search for gpus.
- Theoretical and Empirical Analysis of Adaptive Entry Point Selection for Graph-based Approximate Nearest Neighbor Search.
- General and practical tuning method for off-the-shelf graph-based index: Sisap indexing challenge report by team utokyo.
- ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms
- Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
- Freshdiskann: A fast and accurate graph-based ann index for streaming similarity search
- ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data
- Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search.
- SymphonyQG: Towards Symphonious Integration of Quantization and Graph for Approximate Nearest Neighbor Search.
- CSPG: Crossing Sparse Proximity Graphs for Approximate Nearest Neighbor Search. - eighth Annual Conference on Neural Information Processing Systems.
- UNIFY: Unified Index for Range Filtered Approximate Nearest Neighbors Search.
- Efficient approximate nearest neighbor search on high-dimensional vectors by graph and quantization.
- iRangeGraph: Improvising Range-dedicated Graphs for Range-filtering Nearest Neighbor Search. - 26. [[Code](https://github.com/YuexuanXu7/iRangeGraph)]
- Link and code: Fast indexing with graphs and compact regression codes.
- DEG: Efficient Hybrid Vector Search Using the Dynamic Edge Navigation Graph - 28.
- Scalable Overload-Aware Graph-Based Index Construction for 10-Billion-Scale Vector Similarity Search
- PilotANN: Memory-Bounded GPU Acceleration for Vector Search.
- Locality-Sensitive Indexing for Graph-Based Approximate Nearest Neighbor Search
- Breaking the Storage-Compute Bottleneck in Billion-Scale ANNS: A GPU-Driven Asynchronous I/O Framework.
- Hi-PNG: Efficient Interval-Filtering ANNS via Hierarchical Interval Partition Navigating Graph
- Instance-based Approximation Guarantees for Graph-based Nearest Neighbor Search.
- VSAG: An Optimized Search Framework for Graph-based Approximate Nearest Neighbor Search.
- MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search.
- Scalable Distributed Vector Search via Accuracy Preserving Index Construction.
- JAG: Joint Attribute Graphs for Filtered Nearest Neighbor Search.
- Odinann: Direct insert for consistently stable performance in billion-scale graphbased vector search.
- RED-ANNS: An RDMA-Enabled Distributed Framework for Graph-Based Approximate Nearest Neighbor Search.
- PiPNN: Ultra-Scalable Graph-Based Nearest Neighbor Indexing
- d-HNSW: A High-performance Vector Search Engine on Disaggregated Memory.
- FGIM: a Fast Graph-based Indexes Merging Framework for Approximate Nearest Neighbor Search.
- FlashANNS: GPU-Driven Asynchronous I/O Pipelining for Eliminating Storage-Compute Bottlenecks in Billion-Scale Similarity Search. - 27.
-
Hashing
- Awesome Papers on Learning to Hash
- [Paper
- [Paper - to-hash/blob/master/itq.py), [Matlab code](https://github.com/dangkhoasdc/sah/tree/master/itq)]
- Binary Embedding-based Retrieval at Tencent
- Binary code based hash embedding for web-scale applications
- Unsupervised Online Hashing with Multi-Bit Quantization
- PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search - 655.
- Scalable Nearest Neighbor Search with Compact Codes
- Fast Search on Binary Codes by Weighted Hamming Distance
- Fast top-K cosine similarity search through XOR-friendly binary quantization on GPUs
- Locality-sensitive hashing scheme based on longest circular co-substring
- DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search
- Point-to-hyperplane nearest neighbor search beyond the unit hypersphere
- [Paper
- [Paper
- Learning-Based Hashing for ANN Search: Foundations and Early Advances.
- PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search - 655.
- Efficient locality sensitive hashing: solutions, primitives, and applications.
- Scalable Nearest Neighbor Search with Compact Codes
- LLMs Meet Isolation Kernel: Lightweight, Learning-free Binary Embeddings for Fast Retrieval.
-
Multidimensional data / Vectors
- Vector DB Feature Matrix
- Faiss
- Typesense
- Qdrant
- Video tutorial
- Epsilla
- Vald
- vearch
- milvus
- annoy
- NGT
- pgvector
- Chroma - AI memory with semantic, full-text, & regex search
- jvector
- RAFT
- Voyager
- tinyvector
- USearch
- MRPT
- infinity
- havenask
- chromem-go
- OasysDB - iJRL5XyL7?usp=sharing)]
- arroy
- bleve
- cuVS
- sqlite-vec
- MyScaleDB
- Meilisearch - Search engine API for Semantic (vectors), full-text & hybrid search
- hora
- KGraph
- Video tutorial
- NearestNeighbors.jl
- MuopDB
- puck
- Denser Retriever
- LlamaIndex
- seekdb
- ArcadeDB - open-source multi-model database with native vector embedding support alongside graph, document, key-value and time series models
- vsag
- brinicle - Resource-efficient C++ vector index engine built for low-RAM production workloads
- VelesDB - Embedded vector + graph + columnar database. Rust core (~6MB), HNSW with 5 distance metrics, VelesQL (SQL + NEAR + MATCH). Python and Rust SDKs.
-
Other Approaches
- Spann: Highly-efficient billion-scale approximate nearest neighbor search
- Index-based, high-dimensional, cosine threshold querying with optimality guarantees - 83.
- Semi-convex hull tree: Fast nearest neighbor queries for large scale data on GPUs
- Practical near neighbor search via group testing - 9962. [[Supplement](https://proceedings.neurips.cc/paper_files/paper/2021/file/5248e5118c84beea359b6ea385393661-Supplemental.pdf)]
- iDEC: indexable distance estimating codes for approximate nearest neighbor search
- VHP: approximate nearest neighbor search via virtual hypersphere partitioning - 1455.
- Index-based, high-dimensional, cosine threshold querying with optimality guarantees - 83.
- Practical near neighbor search via group testing - 9962. [[Supplement](https://proceedings.neurips.cc/paper_files/paper/2021/file/5248e5118c84beea359b6ea385393661-Supplemental.pdf)]
- FusionANNS: An Efficient CPU/GPU Cooperative Processing Architecture for Billion-scale Approximate Nearest Neighbor Search.
- Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space.
- GleanVec: Accelerating vector search with minimalist nonlinear dimensionality reduction.
- PANTHER: Private Approximate Nearest Neighbor Search in the Single Server Setting.
- Subspace Collision: An Efficient and Accurate Framework for High-dimensional Approximate Nearest Neighbor Search.
Programming Languages
Categories
Graph-based Methods
69
Others
62
Quantization
53
📰 Articles & Talks
48
Multidimensional data / Vectors
42
Hashing
20
Uncategorized
19
Other Approaches
17
🎄Tree-based Methods
8
Survey
8
Systems
6
Courses
4
Texts
3
Comparisons
3
Tree-based Methods
1
Related Lists
1
:chart_with_upwards_trend: Evaluation & Metrics
1
Sub Categories
Keywords
vector-search
16
search-engine
13
vector-database
11
similarity-search
10
nearest-neighbor-search
9
approximate-nearest-neighbor-search
9
rag
9
machine-learning
9
llm
8
information-retrieval
7
hnsw
7
search
6
embeddings
6
rust
5
semantic-search
4
image-search
4
knn-search
4
ai
4
nearest-neighbors
4
vector
4
weaviate
3
benchmark
3
python
3
vector-search-engine
3
milvus
3
simd
3
pinecone
3
qdrant
3
recommender-system
3
vector-store
3
clustering
3
ann
3
anns
2
cuda
2
embedding
2
golang
2
distance
2
llms
2
full-text-search
2
gpu
2
neighborhood-methods
2
chromadb
2
chroma
2
retrieval-augmented-generation
2
vectordb
2
nlp
2
vectordatabase
2
retrieval
2
database
2
vector-similarity
2