https://github.com/httpjamesm/small-world-rs

The easiest HNSW vector index you'll ever use.
https://github.com/httpjamesm/small-world-rs

ai cosine-similarity embeddings euclidean-distances hnsw machine-learning rust simd vectordb vectors

Last synced: 8 months ago
JSON representation

The easiest HNSW vector index you'll ever use.

Host: GitHub
URL: https://github.com/httpjamesm/small-world-rs
Owner: httpjamesm
License: mit
Created: 2024-12-07T00:30:06.000Z (10 months ago)
Default Branch: main
Last Pushed: 2024-12-09T06:44:40.000Z (10 months ago)
Last Synced: 2025-01-24T08:38:32.099Z (9 months ago)
Topics: ai, cosine-similarity, embeddings, euclidean-distances, hnsw, machine-learning, rust, simd, vectordb, vectors
Language: Rust
Homepage:
Size: 154 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: readme.md

Awesome Lists containing this project

README

# small-world-rs

small-world-rs is an HNSW vector index written in Rust.

## Features

- Fast, accurate and easy to implement
- Choose your precision (16 or 32 bit floats)
- Choose your distance metric
- Supports cosine distance (recommended for text) and euclidean distance (recommended for images)
- Serialize and deserialize for persistence

## Example

See the [text-embeddings example](./examples/text-embeddings/src/main.rs) for a simple example of how to use small-world-rs to perform semantic search over a set of text embeddings.

Basically, it works like this:

1. Get your embeddings, be that from OpenAI, Ollama, or wherever
2. Create a `World` with `World::new` or `World::new_from_dump`
3. Insert your vectors into the world with `world.insert_vector`
4. Perform a search with `world.search`
5. Dump the world with `world.dump` to save for later

## What config values should I use?

Key Parameters:

- `m`: Connections per layer

- Recommended: 16-64
- Sweet spot: 32
- Higher values increase recall but consume more memory

- `ef_construction`: Construction-time exploration factor

- Recommended: 100-500
- Trade-off: Higher values = better recall but slower build time
- Rule of thumb: 2-4× your target `ef_search`

- `ef_search`: Query-time exploration factor

- Recommended: 50-150
- Adjustable at search time
- Higher values increase accuracy but slow down search
- Tune based on recall requirements

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/httpjamesm/small-world-rs

Awesome Lists containing this project

README