https://github.com/kn0sys/valentinus
A thread-safe vector database for model inference inside LMDB.
https://github.com/kn0sys/valentinus
ai embeddings lmdb ml rust vector-database
Last synced: about 1 month ago
JSON representation
A thread-safe vector database for model inference inside LMDB.
- Host: GitHub
- URL: https://github.com/kn0sys/valentinus
- Owner: kn0sys
- License: apache-2.0
- Created: 2024-07-08T21:36:26.000Z (over 1 year ago)
- Default Branch: stable
- Last Pushed: 2026-02-10T05:14:05.000Z (about 1 month ago)
- Last Synced: 2026-02-10T09:53:05.631Z (about 1 month ago)
- Topics: ai, embeddings, lmdb, ml, rust, vector-database
- Language: Rust
- Homepage: https://docs.rs/valentinus
- Size: 484 KB
- Stars: 15
- Watchers: 1
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rust - valentinus - Next generation vector database built with LMDB bindings [](https://crates.io/crates/valentinus) (Applications / Database)
- fucking-awesome-rust - valentinus - Next generation vector database built with LMDB bindings [](https://crates.io/crates/valentinus) (Applications / Database)
- awesome-rust-with-stars - valentinus - 01-08 | (Applications / Database)
README
[](https://github.com/kn0sys/valentinus/actions/workflows/rust.yml) [](https://github.com/kn0sys/valentinus/actions/workflows/test.yml) [](https://crates.io/crates/valentinus)  [](https://docs.rs/valentinus) [](https://github.com/kn0sys/valentinus/commits/main/) [](https://app.element.io/#/room/#valentinus:matrix.org)

# valentinus
A thread-safe vector database for model inference inside LMDB.
### dependencies
* bincode/serde - serialize/deserialize
* lmdb-rs - database bindings
* ndarray - numpy equivalent
* ort/onnx - embeddings
### getting started
NOTE: ensure you have the development packages below (e.g. for Fedora)
* `sudo dnf install openssl-devel`
* `sudo dnf install gcc-c++`
```bash
git clone https://github.com/kn0sys/valentinus && cd valentinus
```
### optional environment variables
| var| usage | default |
|----|-------| --------|
|`LMDB_USER` | working directory of the user for database | $USER|
|`LMDB_MAP_SIZE` | Sets max environment size, i.e. size in memory/disk of all data | 20% of available memory |
|`ONNX_PARALLEL_THREADS` | parallel execution mode for this session | 1 |
|`VALENTINUS_CUSTOM_DIM` | embeddings dimensions for custom models | all-mini-lm-6 -> 384 |
|`VALENTINUS_LMDB_ENV`| environment for the database (i.e. test, prod) | test |
# tests
* Note: all tests currently require the `all-MiniLM-L6-v2_onnx` directory
* Get the model.onnx and tokenizer.json from huggingface or [build them](https://huggingface.co/docs/optimum/en/exporters/onnx/usage_guides/export_a_model)
```bash
mkdir all-MiniLM-L6-v2_onnx \
&& cd all-MiniLM-L6-v2_onnx \
&& wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/config.json \
&& wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \
&& wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/special_tokens_map.json \
&& wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer_config.json \
&& wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/tokenizer.json \
&& wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/vocab.txt
```
`cargo test`
### examples
see [examples](https://github.com/kn0sys/valentinus/tree/stable/examples)
### reference
[inspired by this chromadb python tutorial](https://realpython.com/chromadb-vector-database/#what-is-a-vector-database)