https://github.com/kn0sys/valentinus
next generation vector db built with lmdb bindings
https://github.com/kn0sys/valentinus
ai embeddings lmdb ml rust vector-database
Last synced: 4 days ago
JSON representation
next generation vector db built with lmdb bindings
- Host: GitHub
- URL: https://github.com/kn0sys/valentinus
- Owner: kn0sys
- License: apache-2.0
- Created: 2024-07-08T21:36:26.000Z (9 months ago)
- Default Branch: stable
- Last Pushed: 2025-03-19T16:49:32.000Z (27 days ago)
- Last Synced: 2025-03-25T06:22:30.876Z (21 days ago)
- Topics: ai, embeddings, lmdb, ml, rust, vector-database
- Language: Rust
- Homepage: https://docs.rs/valentinus
- Size: 338 KB
- Stars: 12
- Watchers: 2
- Forks: 2
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-rust - valentinus - Next generation vector database built with LMDB bindings [](https://crates.io/crates/valentinus) (Applications / Database)
- fucking-awesome-rust - valentinus - Next generation vector database built with LMDB bindings [](https://crates.io/crates/valentinus) (Applications / Database)
README
[](https://github.com/kn0sys/valentinus/actions/workflows/rust.yml) [](https://github.com/kn0sys/valentinus/actions/workflows/test.yml) [](https://crates.io/crates/valentinus)  [](https://docs.rs/valentinus) [](https://github.com/kn0sys/valentinus/commits/main/) [](https://app.element.io/#/room/#valentinus:matrix.org)

# valentinus
next generation vector db built with lmdb bindings
### dependencies
* bincode/serde - serialize/deserialize
* lmdb-rs - database bindings
* ndarray - numpy equivalent
* ort/onnx - embeddings### getting started
```bash
git clone https://github.com/kn0sys/valentinus && cd valentinus
```### optional environment variables
| var| usage | default |
|----|-------| --------|
|`LMDB_USER` | working directory of the user for database | $USER|
|`LMDB_MAP_SIZE` | Sets max environment size, i.e. size in memory/disk of all data | 20% of available memory |
|`ONNX_PARALLEL_THREADS` | parallel execution mode for this session | 1 |
|`VALENTINUS_CUSTOM_DIM` | embeddings dimensions for custom models | all-mini-lm-6 -> 384 |
|`VALENTINUS_LMDB_ENV`| environment for the database (i.e. test, prod) | test |# tests
* Note: all tests currently require the `all-MiniLM-L6-v2_onnx` directory
* Get the model.onnx and tokenizer.json from huggingface or [build them](https://huggingface.co/docs/optimum/en/exporters/onnx/usage_guides/export_a_model)```bash
mkdir all-MiniLM-L6-v2_onnx
cd all-MiniLM-L6-v2_onnx && wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/config.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/special_tokens_map.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/main/tokenizer_config.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/main/tokenizer.json
wget https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/main/vocab.txt
````RUST_TEST_THREADS=1 cargo test`
### examples
see [examples](https://github.com/kn0sys/valentinus/tree/main/examples)
### reference
[inspired by this chromadb python tutorial](https://realpython.com/chromadb-vector-database/#what-is-a-vector-database)