https://github.com/httpjamesm/small-world-rs
The easiest HNSW vector index you'll ever use.
https://github.com/httpjamesm/small-world-rs
ai cosine-similarity embeddings euclidean-distances hnsw machine-learning rust simd vectordb vectors
Last synced: 8 months ago
JSON representation
The easiest HNSW vector index you'll ever use.
- Host: GitHub
- URL: https://github.com/httpjamesm/small-world-rs
- Owner: httpjamesm
- License: mit
- Created: 2024-12-07T00:30:06.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2024-12-09T06:44:40.000Z (10 months ago)
- Last Synced: 2025-01-24T08:38:32.099Z (9 months ago)
- Topics: ai, cosine-similarity, embeddings, euclidean-distances, hnsw, machine-learning, rust, simd, vectordb, vectors
- Language: Rust
- Homepage:
- Size: 154 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
Awesome Lists containing this project
README
# small-world-rs
small-world-rs is an HNSW vector index written in Rust.
## Features
- Fast, accurate and easy to implement
- Choose your precision (16 or 32 bit floats)
- Choose your distance metric
- Supports cosine distance (recommended for text) and euclidean distance (recommended for images)
- Serialize and deserialize for persistence## Example
See the [text-embeddings example](./examples/text-embeddings/src/main.rs) for a simple example of how to use small-world-rs to perform semantic search over a set of text embeddings.
Basically, it works like this:
1. Get your embeddings, be that from OpenAI, Ollama, or wherever
2. Create a `World` with `World::new` or `World::new_from_dump`
3. Insert your vectors into the world with `world.insert_vector`
4. Perform a search with `world.search`
5. Dump the world with `world.dump` to save for later## What config values should I use?
Key Parameters:
- `m`: Connections per layer
- Recommended: 16-64
- Sweet spot: 32
- Higher values increase recall but consume more memory- `ef_construction`: Construction-time exploration factor
- Recommended: 100-500
- Trade-off: Higher values = better recall but slower build time
- Rule of thumb: 2-4× your target `ef_search`- `ef_search`: Query-time exploration factor
- Recommended: 50-150
- Adjustable at search time
- Higher values increase accuracy but slow down search
- Tune based on recall requirements