https://github.com/cyberlife-coder/VelesDB
VelesDB: A high-performance, local-first vector database written in Rust. The SQLite of vector search.
https://github.com/cyberlife-coder/VelesDB
ai embeddings hnsw local-first machine-learning rag rust search-engine vector-database
Last synced: 5 months ago
JSON representation
VelesDB: A high-performance, local-first vector database written in Rust. The SQLite of vector search.
- Host: GitHub
- URL: https://github.com/cyberlife-coder/VelesDB
- Owner: cyberlife-coder
- License: other
- Created: 2025-12-17T18:01:02.000Z (6 months ago)
- Default Branch: main
- Last Pushed: 2026-01-20T16:41:20.000Z (5 months ago)
- Last Synced: 2026-01-20T16:44:06.436Z (5 months ago)
- Topics: ai, embeddings, hnsw, local-first, machine-learning, rag, rust, search-engine, vector-database
- Language: Rust
- Homepage: https://velesdb.com
- Size: 17.4 MB
- Stars: 10
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: docs/SECURITY_AUDIT_2025_01_07.md
Awesome Lists containing this project
- awesome-vector-database - VelesDB - Embedded vector + graph + columnar database. Rust core (~6MB), HNSW with 5 distance metrics, VelesQL (SQL + NEAR + MATCH). Python and Rust SDKs. (Multidimensional data / Vectors)
- awesome-local-first - VelesDB - first vector + graph + columnar database written in Rust (~6MB binary). Source-available (Elastic License 2.0) (โ๏ธ Development Tools & Libraries / Database Solutions)
- awesome-tauri - tauri-plugin-velesdb - Native vector database plugin. 70ยตs semantic search, โฅ95% recall, hybrid BM25+vector, offline-first, full ecosystem integrations and more. (Development / Plugins)
README
๐ง The Local Knowledge Engine for AI Agents ๐ง
Vector + Graph + ColumnStore Fusion โข 57ยตs Search โข Single Binary โข Privacy-First
๐ง Work in Progress โ We're actively building the ultimate AI memory engine.
Ideas, feedback, and contributions are welcome!
[](https://star-history.com/#cyberlife-coder/velesdb&Date)
---
## ๐ฏ The Problem We Solve
> **"My RAG agent needs both semantic search AND knowledge relationships. Existing tools force me to choose or glue multiple systems together."**
### Three Pain Points That Cost You Time & Money
| Pain Point | Business Impact | VelesDB Solution |
|------------|-----------------|------------------|
| **๐ Latency kills UX** | Cloud vector DBs add 50-100ms/query. 10 retrievals = **1+ second delay** | **57ยตs local** โ 1000x faster |
| **๐ Vectors alone aren't enough** | Semantic similarity misses relationships ("Who authored this?") | **Vector + Graph unified** in one query |
| **๐ Privacy & deployment friction** | Cloud dependencies, API keys, GDPR concerns | **15MB binary** โ works offline, air-gapped |
### ๐ฐ The ROI of Switching to VelesDB
| Metric | Before (Cloud Stack) | After (VelesDB) | Savings |
|--------|---------------------|-----------------|---------|
| **Infrastructure** | 3 databases + sync | 1 binary | **70% less code** |
| **Cloud costs** | $500-5000/mo | $0 (local) | **100% savings** |
| **Latency** | 100-300ms | < 1ms | **100x faster** |
| **Compliance** | Complex (data leaves premises) | Simple (local-first) | **HIPAA/GDPR ready** |
| **Dev time** | 3 integrations to maintain | 1 API | **3x faster shipping** |
---
## ๐ Why Developers Choose VelesDB
๐ง Vector + Graph + Columns
Unified semantic search, relationships, AND structured data.
No glue code needed.
โก 57ยตs Search
Native HNSW + AVX-512 SIMD.
1000x faster than cloud.
๐ฆ 15MB Binary
Zero dependencies.
Works offline, air-gapped.
๐ Run Anywhere
Server, Browser, Mobile, Desktop.
Same Rust codebase.
---
## ๐ข Coming From Another Vector DB?
| If you use... | VelesDB gives you... |
|---------------|----------------------|
| **Pinecone** | No API keys, no cloud costs, **100x faster locally**, + Graph + Columns |
| **Qdrant** | Single binary (15MB vs 100MB+), native WASM/Mobile, **unified Vector+Graph** |
| **Milvus** | Zero config vs complex cluster setup, **embedded mode** |
| **pgvector** | Purpose-built for vectors, **700x faster search**, native graph support |
| **ChromaDB** | Production-grade Rust vs Python prototype, **enterprise-ready** |
| **Neo4j + Pinecone** | **One database instead of two**, unified query language |
```sql
-- The query that defines VelesDB: Vector + Graph in ONE statement
MATCH (doc:Document)-[:AUTHORED_BY]->(author:Person)
WHERE similarity(doc.embedding, $question) > 0.8
AND doc.category = 'research'
RETURN author.name, author.email, doc.title
ORDER BY similarity() DESC
LIMIT 5;
```
**This query would require 2 databases and complex sync logic elsewhere. With VelesDB: one query, sub-millisecond response.**
---
## ๐ Full Ecosystem / รcosystรจme Complet
VelesDB is designed to run **where your agents live** โ from cloud servers to mobile devices to browsers.
| Domain | Component | Description | Install |
|-------------|------------------------------------|------------------------------------------|----------------------------|
| **๐ฆ Core** | [velesdb-core](crates/velesdb-core) | Core engine (HNSW, SIMD, VelesQL) | `cargo add velesdb-core` |
| **๐ Server**| [velesdb-server](crates/velesdb-server) | REST API (11 endpoints, OpenAPI) | `cargo install velesdb-server` |
| **๐ป CLI** | [velesdb-cli](crates/velesdb-cli) | Interactive REPL for VelesQL | `cargo install velesdb-cli` |
| **๐ Python** | [velesdb-python](crates/velesdb-python) | PyO3 bindings + NumPy | `pip install velesdb` |
| **๐ TypeScript** | [typescript-sdk](sdks/typescript) | Node.js & Browser SDK | `npm i @wiscale/velesdb` |
| **๐ WASM** | [velesdb-wasm](crates/velesdb-wasm) | Browser-side vector search | `npm i @wiscale/velesdb-wasm` |
| **๐ฑ Mobile** | [velesdb-mobile](crates/velesdb-mobile) | iOS (Swift) & Android (Kotlin) | [Build instructions](#-mobile-build) |
| **๐ฅ๏ธ Desktop** | [tauri-plugin](crates/tauri-plugin-velesdb) | Tauri v2 AI-powered apps | `cargo add tauri-plugin-velesdb` |
| **๐ฆ LangChain** | [langchain-velesdb](integrations/langchain) | Official VectorStore | `pip install langchain-velesdb` |
| **๐ฆ LlamaIndex** | [llamaindex-velesdb](integrations/llamaindex) | Document indexing | `pip install llama-index-vector-stores-velesdb` |
| **๐ Migration** | [velesdb-migrate](crates/velesdb-migrate) | From Qdrant, Pinecone, Supabase | `cargo install velesdb-migrate` |
---
## ๐ฏ Use Cases
| Use Case | VelesDB Feature |
|-------------------------------|-------------------------------------|
| **RAG Pipelines** | Sub-ms retrieval |
| **AI Agents** | Embedded memory, local context |
| **Desktop Apps (Tauri/Electron)** | Single binary, no server needed |
| **Mobile AI (iOS/Android)** | Native SDKs with 32x memory compression |
| **Browser-side Search** | WASM module, zero backend |
| **Edge/IoT Devices** | 15MB footprint, ARM NEON optimized |
| **On-Prem / Air-Gapped** | No cloud dependency, full data sovereignty |
---
## ๐ Quick Start
### Option 1: Linux Package (.deb) โญ Recommended for Linux
Download from [GitHub Releases](https://github.com/cyberlife-coder/VelesDB/releases):
```bash
# Install
sudo dpkg -i velesdb-1.1.0-amd64.deb
# Binaries installed to /usr/bin
velesdb --version
velesdb-server --version
```
### Option 2: One-liner Script
**Linux / macOS:**
```bash
curl -fsSL https://raw.githubusercontent.com/cyberlife-coder/VelesDB/main/scripts/install.sh | bash
```
**Windows (PowerShell):**
```powershell
irm https://raw.githubusercontent.com/cyberlife-coder/VelesDB/main/scripts/install.ps1 | iex
```
### Option 3: Python (from source)
```bash
# Build from source (requires Rust)
cd crates/velesdb-python
pip install maturin
maturin develop --release
```
```python
import velesdb
db = velesdb.Database("./my_vectors")
collection = db.create_collection("docs", dimension=768, metric="cosine")
collection.upsert([{"id": 1, "vector": [...], "payload": {"title": "Hello"}}])
results = collection.search([...], top_k=10)
```
```bash
# Install from PyPI
pip install velesdb
```
### Option 4: Rust (from source)
```bash
# Clone and build
git clone https://github.com/cyberlife-coder/VelesDB.git
cd VelesDB
cargo build --release
# Binaries in target/release/
./target/release/velesdb-server --help
```
```bash
# Install from crates.io
cargo install velesdb-cli
```
### Option 5: Docker (build locally)
```bash
# Build and run locally
git clone https://github.com/cyberlife-coder/VelesDB.git
cd VelesDB
docker build -t velesdb .
docker run -d -p 8080:8080 -v velesdb_data:/data velesdb
```
```bash
# Pull from GitHub Container Registry
docker pull ghcr.io/cyberlife-coder/velesdb:latest
```
### Option 6: Portable Archives
Download from [GitHub Releases](https://github.com/cyberlife-coder/VelesDB/releases):
| Platform | File |
|----------|------|
| Windows | `velesdb-windows-x86_64.zip` |
| Linux | `velesdb-linux-x86_64.tar.gz` |
| macOS (ARM) | `velesdb-macos-arm64.tar.gz` |
| macOS (Intel) | `velesdb-macos-x86_64.tar.gz` |
### Start Using VelesDB
```bash
# Start the REST API server (data persisted in ./data)
velesdb-server --data-dir ./my_data
# Or use the interactive CLI with VelesQL REPL
velesdb repl
# Verify server is running
curl http://localhost:8080/health
# {"status":"healthy","version":"1.1.0"}
```
๐ **Full installation guide:** [docs/INSTALLATION.md](docs/INSTALLATION.md)
### ๐ฑ Mobile Build (iOS/Android)
```bash
# iOS (macOS required)
rustup target add aarch64-apple-ios aarch64-apple-ios-sim
cargo build --release --target aarch64-apple-ios -p velesdb-mobile
# Android (NDK required)
cargo install cargo-ndk
cargo ndk -t arm64-v8a -t armeabi-v7a build --release -p velesdb-mobile
```
๐ **Full mobile guide:** [crates/velesdb-mobile/README.md](crates/velesdb-mobile/README.md)
---
## ๐ Your First Vector Search
```bash
# 1. Create a collection
curl -X POST http://localhost:8080/collections \
-H "Content-Type: application/json" \
-d '{"name": "my_vectors", "dimension": 4, "metric": "cosine"}'
# 2. Insert vectors with metadata
curl -X POST http://localhost:8080/collections/my_vectors/points \
-H "Content-Type: application/json" \
-d '{
"points": [
{"id": 1, "vector": [1.0, 0.0, 0.0, 0.0], "payload": {"title": "AI Introduction", "category": "tech"}},
{"id": 2, "vector": [0.0, 1.0, 0.0, 0.0], "payload": {"title": "ML Basics", "category": "tech"}},
{"id": 3, "vector": [0.0, 0.0, 1.0, 0.0], "payload": {"title": "History of Computing", "category": "history"}}
]
}'
# 3. Search for similar vectors
curl -X POST http://localhost:8080/collections/my_vectors/search \
-H "Content-Type: application/json" \
-d '{"vector": [0.9, 0.1, 0.0, 0.0], "top_k": 2}'
# 4. Or use VelesQL (SQL-like queries)
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT * FROM my_vectors WHERE vector NEAR $v AND category = '\''tech'\'' LIMIT 5",
"params": {"v": [0.9, 0.1, 0.0, 0.0]}
}'
```
---
## ๐ API Reference
### Collections
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections` | `GET` | List all collections |
| `/collections` | `POST` | Create a collection |
| `/collections/{name}` | `GET` | Get collection info |
| `/collections/{name}` | `DELETE` | Delete a collection |
### Points
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/points` | `POST` | Upsert points |
| `/collections/{name}/points/{id}` | `GET` | Get a point by ID |
| `/collections/{name}/points/{id}` | `DELETE` | Delete a point |
### Search (Vector)
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/search` | `POST` | Vector similarity search |
| `/collections/{name}/search/batch` | `POST` | Batch search (multiple queries) |
| `/collections/{name}/search/multi` | `POST` | Multi-query search |
### Graph
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/graph/edges` | `GET` | Get edges for a node |
| `/collections/{name}/graph/edges` | `POST` | Add edge between nodes |
| `/collections/{name}/graph/traverse` | `POST` | BFS/DFS graph traversal |
| `/collections/{name}/graph/nodes/{node_id}/degree` | `GET` | Get node degree (in/out) |
### Indexes
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/indexes` | `GET` | List indexes |
| `/collections/{name}/indexes` | `POST` | Create index on property |
| `/collections/{name}/indexes/{label}/{property}` | `DELETE` | Delete index |
### VelesQL v2.0 (Unified Query)
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/query` | `POST` | Execute VelesQL (Vector + Graph + ColumnStore queries) |
**VelesQL v2.0 Features:**
- `GROUP BY` / `HAVING` with AND/OR operators
- `ORDER BY` multi-column + `similarity()` function
- `JOIN` with aliases across collections
- `UNION` / `INTERSECT` / `EXCEPT` set operations
- `USING FUSION(strategy='rrf')` hybrid search
- `WITH (max_groups=100)` query-time config
```sql
-- Example: Analytics with aggregation
SELECT category, COUNT(*), AVG(price) FROM products
GROUP BY category HAVING COUNT(*) > 5
-- Example: Hybrid search with fusion
SELECT * FROM docs USING FUSION(strategy='rrf', k=60) LIMIT 20
-- Example: Set operations
SELECT * FROM active UNION SELECT * FROM archived
```
> **Note:** ColumnStore operations (INSERT, UPDATE, SELECT on structured data) are performed via the `/query` endpoint using VelesQL syntax.
### Health
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | `GET` | Health check |
### Request/Response Examples
Create Collection
```bash
curl -X POST http://localhost:8080/collections \
-H "Content-Type: application/json" \
-d '{
"name": "my_vectors",
"dimension": 768,
"metric": "cosine" # Options: cosine, euclidean, dot, hamming, jaccard
}'
```
**Response:**
```json
{"message": "Collection created", "name": "my_vectors"}
```
Upsert Points
```bash
curl -X POST http://localhost:8080/collections/my_vectors/points \
-H "Content-Type: application/json" \
-d '{
"points": [
{
"id": 1,
"vector": [0.1, 0.2, 0.3, ...],
"payload": {"title": "Document 1", "tags": ["ai", "ml"]}
}
]
}'
```
**Response:**
```json
{"message": "Points upserted", "count": 1}
```
Vector Search
```bash
curl -X POST http://localhost:8080/collections/my_vectors/search \
-H "Content-Type: application/json" \
-d '{
"vector": [0.1, 0.2, 0.3, ...],
"top_k": 10
}'
```
**Response:**
```json
{
"results": [
{"id": 1, "score": 0.95, "payload": {"title": "Document 1"}},
{"id": 42, "score": 0.87, "payload": {"title": "Document 42"}}
]
}
```
Batch Search
```bash
curl -X POST http://localhost:8080/collections/my_vectors/search/batch \
-H "Content-Type: application/json" \
-d '{
"searches": [
{"vector": [0.1, 0.2, ...], "top_k": 5},
{"vector": [0.3, 0.4, ...], "top_k": 5}
]
}'
```
**Response:**
```json
{
"results": [
{"results": [{"id": 1, "score": 0.95, "payload": {...}}]},
{"results": [{"id": 2, "score": 0.89, "payload": {...}}]}
],
"timing_ms": 1.23
}
```
VelesQL Query
```bash
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT * FROM my_vectors WHERE vector NEAR $v LIMIT 10",
"params": {"v": [0.1, 0.2, 0.3, ...]}
}'
```
**Response:**
```json
{
"results": [
{"id": 1, "score": 0.95, "payload": {"title": "Document 1"}}
],
"timing_ms": 2.34,
"rows_returned": 1
}
```
---
## ๐งช Real-World Business Scenarios
> **Each scenario shows a business problem that traditionally requires 2-3 databases. VelesDB solves it with ONE query.**
---
### ๐ผ Business Scenario 1: E-commerce Product Discovery
**Industry:** Retail / E-commerce
**Problem:** "Show me products similar to this photo, from trusted suppliers, under $500"
```sql
-- Traditional approach: Pinecone (image search) + Neo4j (supplier trust) + PostgreSQL (price)
-- VelesDB: ONE query
MATCH (product:Product)-[:SUPPLIED_BY]->(supplier:Supplier)
WHERE
similarity(product.image_embedding, $uploaded_photo) > 0.7 -- Vector: visual similarity
AND supplier.trust_score > 4.5 -- Graph: relationship data
AND (SELECT price FROM inventory WHERE sku = product.sku) < 500 -- Column: real-time price
ORDER BY similarity() DESC
LIMIT 12
```
**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Query latency | 350ms (3 DBs) | **2ms** |
| Infrastructure | $2,400/mo | **$0** (local) |
| Dev complexity | 3 integrations | **1 API** |
---
### ๐ผ Business Scenario 2: Fraud Detection in Real-Time
**Industry:** Banking / FinTech
**Problem:** "Flag transactions that look suspicious based on pattern + network + history"
```sql
-- Detect fraud: semantic pattern + transaction network + account history
MATCH (tx:Transaction)-[:FROM]->(account:Account)-[:LINKED_TO*1..3]->(related:Account)
WHERE
similarity(tx.behavior_embedding, $known_fraud_pattern) > 0.6 -- Vector: behavioral similarity
AND related.risk_level = 'high' -- Graph: network analysis
AND (SELECT SUM(amount) FROM transactions
WHERE account_id = account.id
AND timestamp > NOW() - INTERVAL '24 hours') > 10000 -- Column: velocity check
RETURN tx.id, account.id, similarity() as fraud_score
```
**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Detection time | 2-5 seconds | **< 10ms** |
| False positives | 15% | **8%** (better context) |
| Compliance | Cloud concerns | **On-premise OK** |
---
### ๐ผ Business Scenario 3: Healthcare Diagnosis Assistant
**Industry:** Healthcare / MedTech
**Problem:** "Find similar patient cases with treatment outcomes, HIPAA-compliant"
```sql
-- Medical RAG: symptoms + patient network + treatment history
MATCH (patient:Patient)-[:HAS_CONDITION]->(condition:Condition)
-[:TREATED_WITH]->(treatment:Treatment)
WHERE
similarity(condition.symptoms_embedding, $current_symptoms) > 0.75 -- Vector: symptom matching
AND condition.icd10_code IN ('J18.9', 'J12.89') -- Column: specific diagnoses
AND (SELECT success_rate FROM treatment_outcomes
WHERE treatment_id = treatment.id) > 0.8 -- Column: outcome data
RETURN treatment.name, AVG(success_rate) as effectiveness
```
**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Data location | Cloud (HIPAA risk) | **100% on-premise** |
| Query time | 500ms+ | **< 5ms** |
| Integration | 3 vendors | **1 binary** |
---
### ๐ผ Business Scenario 4: AI Agent Memory (RAG + Context)
**Industry:** AI / SaaS
**Problem:** "My AI agent needs conversation history + knowledge base + user preferences"
```sql
-- Agent memory: semantic recall + conversation graph + user context
MATCH (user:User)-[:HAD_CONVERSATION]->(conv:Conversation)
-[:CONTAINS]->(message:Message)
WHERE
similarity(message.embedding, $current_query) > 0.7 -- Vector: relevant past messages
AND conv.timestamp > NOW() - INTERVAL '7 days' -- Column: recent conversations
AND (SELECT preference_value FROM user_preferences
WHERE user_id = user.id AND key = 'topic') = message.topic -- Column: user prefs
ORDER BY conv.timestamp DESC, similarity() DESC
LIMIT 10
```
**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Context retrieval | 100-200ms | **< 1ms** |
| Memory footprint | 500MB+ | **15MB binary** |
| Works offline | โ | **โ
** |
---
### Scenario 0: Technical Deep-Dive (Vector + Graph + ColumnStore)
**Goal:** Demonstrate the power of cross-model queries - finding semantically similar documents through graph relationships with structured data filtering
```sql
-- ๐ฎ The VelesDB Advantage: One query across all three stores
MATCH (doc:Document)-[:AUTHORED_BY]->(author:Author)
WHERE
similarity(doc.embedding, $research_question) > 0.8 -- Vector: semantic search
AND doc.category = 'peer-reviewed' -- Column: structured filter
AND (SELECT citation_count FROM author_metrics -- Column: subquery
WHERE author_id = author.id) > 50
ORDER BY similarity() DESC
LIMIT 5
```
**What's happening:**
1. **Graph traversal**: `MATCH` finds documentโauthor relationships
2. **Vector search**: `similarity()` ranks by semantic relevance to your question
3. **Columnar filter**: `category = 'peer-reviewed'` filters structured metadata
4. **Columnar subquery**: Joins with `author_metrics` table for citation counts
**Expected Output:**
```json
{
"results": [
{
"doc.title": "Neural Memory Consolidation in AI Agents",
"author.name": "Dr. Sarah Chen",
"similarity": 0.94,
"citation_count": 127
}
],
"timing_ms": 0.8
}
```
**Why this matters:** This query would require 3 separate databases and complex synchronization logic in a traditional stack. With VelesDB: **one query, sub-millisecond response**.
---
### Scenario 0b: Multi-Vector Fusion Search (NEAR_FUSED)
**Goal:** Search using multiple query vectors simultaneously with intelligent result fusion
```sql
-- ๐ฎ Multi-modal search: combine text + image embeddings
SELECT * FROM products
WHERE vector NEAR_FUSED [$text_embedding, $image_embedding]
USING FUSION 'rrf' (k=60)
AND category = 'electronics'
ORDER BY similarity() DESC
LIMIT 10
```
**Fusion Strategies Available:**
| Strategy | Syntax | Best For |
|----------|--------|----------|
| **RRF** | `USING FUSION 'rrf' (k=60)` | Robust rank-based fusion (recommended) |
| **Average** | `USING FUSION 'average'` | General purpose, balanced results |
| **Maximum** | `USING FUSION 'maximum'` | Emphasize documents scoring high in ANY query |
| **Weighted** | `USING FUSION 'weighted' (avg_weight=0.5, max_weight=0.3, hit_weight=0.2)` | Custom control over fusion factors |
**Real-World Use Cases:**
```sql
-- E-commerce: "show me products like this photo that match 'wireless headphones'"
SELECT * FROM products
WHERE vector NEAR_FUSED [$image_vector, $text_vector]
USING FUSION 'weighted' (avg_weight=0.6, max_weight=0.3, hit_weight=0.1)
LIMIT 8
-- RAG: Multi-perspective document retrieval
SELECT * FROM documents
WHERE vector NEAR_FUSED [$question_embedding, $context_embedding, $user_profile_embedding]
USING FUSION 'rrf'
LIMIT 5
-- Semantic + Lexical hybrid (BM25 + Vector)
SELECT * FROM articles
WHERE content MATCH 'artificial intelligence'
AND vector NEAR $semantic_embedding
ORDER BY similarity() DESC
LIMIT 10
```
**Expected Output:**
```json
{
"results": [
{"id": 42, "score": 0.91, "fusion_details": {"rrf_rank": 1, "sources": 2}},
{"id": 17, "score": 0.87, "fusion_details": {"rrf_rank": 2, "sources": 2}}
],
"timing_ms": 1.2
}
```
---
### Scenario 0c: Distance Metrics for Every Use Case
**Goal:** Choose the right metric for your data type and domain
VelesDB supports **5 distance metrics** - each optimized for specific use cases:
| Metric | Best For | Example Domain |
|--------|----------|----------------|
| **Cosine** | Text embeddings, normalized vectors | NLP, semantic search |
| **Euclidean** | Spatial data, absolute distances | Geolocation, clustering |
| **DotProduct** | Pre-normalized embeddings, retrieval | RAG, recommendations |
| **Hamming** | Binary vectors, fingerprints | Image hashing, DNA |
| **Jaccard** | Set similarity, sparse features | Tags, categories |
**1. Cosine Similarity (NLP / Semantic Search)**
```bash
# Create collection with cosine metric
curl -X POST http://localhost:8080/collections \
-d '{"name": "documents", "dimension": 768, "metric": "cosine"}'
```
```sql
-- Find semantically similar documents (angle-based, ignores magnitude)
SELECT * FROM documents
WHERE vector NEAR $query_embedding
ORDER BY similarity() DESC
LIMIT 10
```
**Use case:** ChatGPT-style RAG, document similarity, semantic Q&A
---
**2. Euclidean Distance (Spatial / Clustering)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "locations", "dimension": 3, "metric": "euclidean"}'
```
```sql
-- Find nearest physical locations (absolute distance matters)
SELECT * FROM locations
WHERE vector NEAR $gps_coordinates
AND category = 'restaurant'
ORDER BY similarity() ASC -- Lower = closer
LIMIT 5
```
**Use case:** Geospatial search, k-means clustering, anomaly detection
---
**3. Dot Product (RAG / Recommendations)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "products", "dimension": 512, "metric": "dot"}'
```
```sql
-- Maximize relevance score (pre-normalized embeddings)
SELECT * FROM products
WHERE vector NEAR $user_preference_vector
AND in_stock = true
ORDER BY similarity() DESC
LIMIT 8
```
**Use case:** Recommendation engines, MaxIP retrieval, MIPS problems
---
**4. Hamming Distance (Binary Vectors / Fingerprints)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "image_hashes", "dimension": 256, "metric": "hamming"}'
```
```sql
-- Find near-duplicate images (bit-level comparison, 6ns latency!)
SELECT * FROM image_hashes
WHERE vector NEAR $perceptual_hash
AND source = 'user_uploads'
ORDER BY similarity() ASC -- Fewer bit differences = more similar
LIMIT 10
```
**Use case:** Image deduplication, perceptual hashing, DNA sequence matching, malware signatures
---
**5. Jaccard Similarity (Sets / Sparse Features)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "user_tags", "dimension": 100, "metric": "jaccard"}'
```
```sql
-- Find users with similar interests (set overlap)
SELECT * FROM user_tags
WHERE vector NEAR $current_user_tags
ORDER BY similarity() DESC
LIMIT 20
```
**Use case:** Tag-based recommendations, category matching, collaborative filtering
---
**Performance by Metric (768D vectors):**
| Metric | Latency | Throughput | SIMD Optimized |
|--------|---------|------------|----------------|
| **Cosine** | 78 ns | 13M ops/sec | โ
AVX-512 |
| **Euclidean** | 44 ns | 23M ops/sec | โ
AVX-512 |
| **DotProduct** | 66 ns | 15M ops/sec | โ
AVX-512 |
| **Hamming** | **6 ns** | **164M ops/sec** | โ
POPCNT |
| **Jaccard** | 89 ns | 11M ops/sec | โ
AVX2 |
> **Tip:** Hamming is 10x faster than float metrics - ideal for binary embeddings on edge devices!
---
### Scenario 1: Medical Research Assistant
**Goal:** Find recent oncology studies with specific gene mentions, ordered by relevance
```sql
SELECT study_id, title, publication_date
FROM medical_studies
WHERE
vector NEAR $cancer_research_embedding
AND content LIKE '%BRCA1%'
AND publication_date > '2025-01-01'
ORDER BY similarity() DESC
LIMIT 5
```
**Parameters:**
- `$cancer_research_embedding`: [0.23, 0.87, -0.12, ...] (embedding for "advanced cancer immunotherapy")
**Expected Output:**
```json
{
"results": [
{
"study_id": "onco-2025-042",
"title": "BRCA1 Mutations in Immunotherapy Response",
"publication_date": "2025-03-15",
"score": 0.92
},
{
"study_id": "onco-2025-017",
"title": "Gene Editing Approaches for Metastatic Cancer",
"publication_date": "2025-02-28",
"score": 0.87
}
]
}
```
---
### Scenario 2: E-commerce Recommendation Engine
**Goal:** Recommend products similar to a user's purchase history, within their price range
```sql
SELECT product_id, name, price
FROM products
WHERE
vector NEAR $user_preferences
AND price BETWEEN 20.00 AND 100.00
AND category = 'electronics'
ORDER BY similarity() DESC, price ASC
LIMIT 8
```
**Parameters:**
- `$user_preferences`: [0.78, -0.23, 0.45, ...] (embedding based on user's purchase history)
**Expected Output:**
```json
{
"results": [
{
"product_id": "prod-67890",
"name": "Wireless Noise-Cancelling Headphones",
"price": 89.99,
"score": 0.95
},
{
"product_id": "prod-54321",
"name": "Bluetooth Portable Speaker",
"price": 59.99,
"score": 0.91
}
]
}
```
---
### Scenario 3: Cybersecurity Threat Detection
**Goal:** Find similar malware patterns observed in the last 7 days
```sql
SELECT malware_hash, threat_level, first_seen
FROM threat_intel
WHERE
vector NEAR $current_threat_embedding
AND first_seen > NOW() - INTERVAL '7 days'
AND threat_level > 0.8
ORDER BY similarity() DESC, first_seen DESC
LIMIT 10
```
**Parameters:**
- `$current_threat_embedding`: [0.12, -0.87, 0.34, ...] (embedding of current malware signature)
**Troubleshooting Tip:** If no results appear, verify:
1. Threat intelligence feed is updating daily
2. Vector dimensions match collection configuration
3. Timestamp format matches ISO 8601 (YYYY-MM-DD HH:MM:SS)
---
## โก Performance
### ๐ฅ Core Vector Operations (768D - BERT/OpenAI dimensions)
| Operation | Latency | Throughput | vs. Naive |
|-----------|---------|------------|----------|
| **Dot Product (1536D)** | **66 ns** | **15M ops/sec** | ๐ **8x faster** |
| **Euclidean (768D)** | **44 ns** | **23M ops/sec** | ๐ **6x faster** |
| **Cosine (768D)** | **78 ns** | **13M ops/sec** | ๐ **4x faster** |
| **Hamming (Binary)**| **6 ns** | **164M ops/sec** | ๐ **10x faster** |
### ๐ System Performance (10K Vectors, Local)
| Benchmark | Result | Details |
|-----------|--------|---------|
| **HNSW Search** | **57 ยตs** | p50 latency (Cosine) |
| **VelesQL Parsing**| **554 ns** | Simple SELECT |
| **VelesQL Cache Hit**| **48 ns** | HashMap pre-allocation |
| **Recall@10** | **100%** | Perfect mode (brute-force SIMD) |
| **BM25 Search** | **33 ยตs** | Adaptive PostingList (10K docs) |
### ๐ฏ Search Quality (Recall)
| Mode | Recall@10 | Latency (128D) | Use Case |
|------|-----------|----------------|----------|
| Fast | 92.2% | ~26ยตs | Real-time, high throughput |
| Balanced | 98.8% | ~39ยตs | Production recommended |
| Accurate | 100% | ~67ยตs | High precision |
| **Perfect** | **100%** | ~220ยตs | Brute-force SIMD |
### ๐ ๏ธ Optimizations Under the Hood
- **SIMD**: AVX-512/AVX2 auto-detection with 32-wide FMA
- **Prefetch**: CPU cache warming for HNSW traversal (+12% throughput)
- **Contiguous Layout**: 64-byte aligned memory for cache efficiency
- **Batch WAL**: Single disk write per batch import
- **Zero-Copy**: Memory-mapped files for fast startup
> ๐ Full benchmarks: [docs/BENCHMARKS.md](docs/BENCHMARKS.md)
### ๐ฆ Vector Quantization (Memory Reduction)
Reduce memory usage by **4-32x** with minimal recall loss:
| Method | Compression | Recall Loss | Use Case |
|--------|-------------|-------------|----------|
| **SQ8** (8-bit) | **4x** | < 2% | General purpose, Edge |
| **Binary** (1-bit) | **32x** | ~10-15% | Fingerprints, IoT |
```rust
use velesdb_core::quantization::{QuantizedVector, dot_product_quantized_simd};
// Compress 768D vector: 3072 bytes โ 776 bytes (4x reduction)
let quantized = QuantizedVector::from_f32(&embedding);
// SIMD-optimized search (only ~30% slower than f32)
let similarity = dot_product_quantized_simd(&query, &quantized);
```
> ๐ Full guide: [docs/QUANTIZATION.md](docs/QUANTIZATION.md)
---
## ๐ Transformative Benefits: How VelesDB Changes Development
### โก Eliminates Database Sprawl
VelesDB replaces 3+ specialized databases (vector DB + graph DB + document store) with a **single unified engine**.
```mermaid
graph LR
A[App] --> V[VelesDB]
```
**Impact:**
- โ
**70% less infrastructure code**
- โ
**No synchronization headaches**
- โ
**Single query language for all operations**
### ๐ก Enables New Application Categories
With air-gapped deployment and 15MB binary size:
```mermaid
pie title Deployment Locations
"On-Prem Servers" : 35
"Edge Devices" : 25
"Mobile Apps" : 20
"Browser WASM" : 20
```
**Impact:**
- โ
**Build HIPAA-compliant healthcare apps**
- โ
**Create military-grade analytics** for air-gapped environments
- โ
**Enable privacy-first consumer apps** with zero data sharing
### ๐ Redefines Performance Expectations
| Pipeline Step | Cloud Vector DB | VelesDB |
|---------------|-----------------|---------|
| Network round-trip | 50-100ms | **0ms** (local) |
| Vector search | 10-50ms | **0.057ms** |
| Graph traversal | 20-100ms | **0.1ms** |
| **Total latency** | **100-250ms** | **< 1ms** |
> ๐ก **100x faster** enables use cases impossible with cloud: real-time autocomplete, instant RAG, sub-frame game AI
**Impact:**
- โ
**Build real-time AI agents** that respond faster than human perception
- โ
**Enable complex RAG chains** with 10+ sequential retrievals
- โ
**Create instant search experiences** with no loading spinners
### ๐ผ Unified API Simplifies Development
One consistent API across all platforms:
```rust
// Same API everywhere
let results = db.search(query_vector, filters, graph_traversal);
```
**Impact:**
- โ
**Learn once, deploy everywhere**
- โ
**Shared codebase** between web, mobile, and desktop
- โ
**Eliminate platform-specific database code**
---
## โจ Core Features That Transform Development
| Feature | Technical Capability | Real-World Impact |
|---------|----------------------|-------------------|
| **๐ง Vector + Graph Fusion** | Unified query language for semantic + relationship queries | **Build smarter AI agents** with contextual understanding |
| **โก 57ยตs Search** | Native HNSW + AVX-512 SIMD | **Create real-time experiences** previously impossible |
| **๐ฆ 15MB Binary** | Zero dependencies, single executable | **Deploy anywhere** - from servers to edge devices |
| **๐ Air-Gapped Deployment** | Full functionality without internet | **Meet strict compliance** in healthcare/finance |
| **๐ Everywhere Runtime** | Consistent API across server/mobile/browser | **Massive code reuse** across platforms |
| **๐ง SQ8 Quantization** | 4x memory reduction | **Run complex AI** on resource-constrained devices |
| **๐ VelesQL** | SQL-like unified query language | **Simplify complex queries** - no DSL learning curve |
---
## ๐ Real-World Impact Stories
### ๐ฅ Healthcare Diagnostics Assistant
**Before VelesDB:**
- 300ms latency per query
- Patient data in cloud
- Separate systems for medical knowledge and patient data
**With VelesDB:**
- **0.6ms diagnosis suggestions**
- **On-device patient data**
- **Unified medical knowledge graph**
```mermaid
pie title Performance Improvement
"Diagnosis Speed" : 85
"Accuracy" : 10
"Privacy" : 5
```
### ๐ญ Manufacturing Quality Control
**Before VelesDB:**
- Cloud dependency caused production delays
- Separate systems for defect images and part metadata
**With VelesDB:**
```sql
MATCH (part)-[HAS_DEFECT]->(defect)
WHERE defect.vector NEAR $image_vec
AND part.material = 'titanium'
```
- **50% fewer defective shipments**
- **Offline factory floor operation**
- **Unified defect database**
---
## ๐ค Contributing
We welcome contributions! Here's how to get started:
### Development Setup
```bash
# Clone the repo
git clone https://github.com/cyberlife-coder/VelesDB.git
cd VelesDB
# Run tests
cargo test --all-features
# Run with checks (before committing)
cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
```
### Project Structure
```
VelesDB/
โโโ crates/
โ โโโ velesdb-core/ # Core engine library
โ โ โโโ src/
โ โ โ โโโ collection/ # Collection management
โ โ โ โโโ index/ # HNSW index
โ โ โ โโโ storage/ # Persistence layer
โ โ โ โโโ velesql/ # Query language parser
โ โ โ โโโ simd/ # SIMD optimizations
โ โ โโโ tests/
โ โโโ velesdb-server/ # REST API server
โ โโโ velesdb-mobile/ # iOS/Android bindings (UniFFI)
โ โโโ velesdb-wasm/ # WebAssembly module
โ โโโ velesdb-python/ # Python bindings (PyO3)
โโโ benches/ # Benchmarks
โโโ docs/ # Documentation
```
### Good First Issues
Looking for a place to start? Check out issues labeled [`good first issue`](https://github.com/cyberlife-coder/VelesDB/labels/good%20first%20issue).
---
## ๐ Roadmap
```mermaid
gantt
title VelesDB Development Timeline
dateFormat YYYY-MM
section v1.2 โ
Knowledge Graph Storage :done, 2025-10, 2025-12
VelesQL MATCH Clause :done, 2025-11, 2025-12
Vector-Graph Fusion :done, 2025-12, 2026-01
ColumnStore CRUD :done, 2026-01, 2026-01
VelesQL JOIN Cross-Store :done, 2026-01, 2026-01
section v1.3 ๐
Aggregations VelesQL :active, 2026-02, 2026-03
Agent Memory Patterns :2026-02, 2026-04
Documentation :2026-02, 2026-03
section v1.4 ๐
E2E Test Suite :2026-04, 2026-05
Durability & Recovery :2026-04, 2026-06
```
### Progress Overview
| Version | Status | EPICs Done | Progress |
|---------|--------|------------|----------|
| **v1.2.0** | โ
Released | 15/15 |  |
| **v1.3.0** | ๐ In Progress | 0/6 |  |
| **v1.4.0** | ๐ Planned | 0/5 |  |
---
### v1.2.0 โ
Released (January 2026)
15 EPICs Completed - Click to expand
| EPIC | Feature | Impact |
|------|---------|--------|
| EPIC-001 | โ
Code Quality Refactoring | Clean architecture |
| EPIC-002 | โ
GPU Acceleration (wgpu) | 10x throughput |
| EPIC-003 | โ
PyO3 Migration | Python 3.12+ support |
| EPIC-004 | โ
Knowledge Graph Storage | GraphNode, GraphEdge, BFS |
| EPIC-005 | โ
VelesQL MATCH Clause | Cypher-inspired queries |
| EPIC-006 | โ
Agent Toolkit SDK | Python, WASM, Mobile |
| EPIC-007 | โ
Python Bindings Refactoring | Clean API |
| EPIC-008 | โ
Vector-Graph Fusion | `similarity()` in MATCH |
| EPIC-009 | โ
Graph Property Index | 10x faster MATCH |
| EPIC-019 | โ
Scalability 10M entries | Enterprise datasets |
| EPIC-020 | โ
ColumnStore CRUD | Real-time updates |
| EPIC-021 | โ
VelesQL JOIN Cross-Store | Graph โ Table queries |
| EPIC-028 | โ
ORDER BY Multi-Columns | Complex sorting |
| EPIC-029 | โ
Python SDK Core Delegation | DRY bindings |
| EPIC-031 | โ
Multimodel Query Engine | Unified execution |
**Highlights:**
- ๐ง **Knowledge Graph** - Full MATCH clause with BFS traversal
- ๐ฎ **Vector-Graph Fusion** - `WHERE similarity() > 0.8` in graph queries
- ๐ **ColumnStore** - Real-time CRUD with JOIN support
- ๐ฆ **Published** - crates.io, PyPI, npm
---
### v1.3.0 ๐ In Progress (Q1 2026)
| EPIC | Feature | Priority | Progress |
|------|---------|----------|----------|
| EPIC-016 | **SDK Ecosystem Sync** | ๐ฅ Critical | ๐ 21% (3/14 US) |
| EPIC-017 | **Aggregations VelesQL** | ๐ฅ Critical | ๐ 0% |
| EPIC-010 | **Agent Memory Patterns SDK** | ๐ High | ๐ 0% |
| EPIC-018 | **Documentation & Examples** | ๐ High | ๐ 0% |
| EPIC-012 | **TypeScript SDK** | ๐ Medium | ๐ 0% |
| EPIC-013 | **LangChain/LlamaIndex** | ๐ Medium | ๐ 0% |
**Coming Soon:**
```sql
-- Aggregations (EPIC-017)
SELECT category, COUNT(*), AVG(price)
FROM products
GROUP BY category
HAVING COUNT(*) > 10
-- Agent Memory Patterns (EPIC-010)
INSERT INTO agent_memory (episode, embedding, context)
VALUES ('task_123', $vec, '{"goal": "find documents"}')
```
---
### v1.4.0 ๐ Planned (Q2 2026)
| EPIC | Feature | Focus |
|------|---------|-------|
| EPIC-011 | **E2E Test Suite** | Quality assurance |
| EPIC-024 | **Durability & Crash Recovery** | Database-grade reliability |
| EPIC-022 | **Unsafe Auditability** | Security audit |
| EPIC-023 | **Loom Concurrency Proofs** | Thread safety |
| EPIC-025 | **Miri/Fuzzing** | Memory safety |
---
### Future Vision
```mermaid
pie title VelesDB Feature Distribution
"Vector Search" : 30
"Knowledge Graph" : 25
"ColumnStore" : 20
"Query Engine" : 15
"SDKs & Integrations" : 10
```
| Horizon | Features |
|---------|----------|
| **2026 H2** | Sparse vectors, Product Quantization (PQ) |
| **2027** | Distributed mode (Premium), Cluster HA |
| **Beyond** | Agent Hooks & Triggers, Multi-tenancy |
---
## ๐ License
VelesDB is licensed under the [Elastic License 2.0 (ELv2)](LICENSE).
ELv2 is a source-available license that allows free use, modification, and distribution, with restrictions only on providing the software as a managed service.
---
## โญ Support VelesDB
If VelesDB helps your project, here's how you can support us:
### โ Buy Me A Coffee
Building VelesDB takes countless hours. If you find it useful, consider supporting development:
### ๐ท๏ธ Show You Use VelesDB
Add the badge to your project:
[](https://github.com/cyberlife-coder/VelesDB)
```markdown
[](https://github.com/cyberlife-coder/VelesDB)
```
---
Built with โค๏ธ and ๐ฆ Rust
Original Author: Julien Lange โ WiScale
โญ GitHub โข
๐ Documentation โข
๐ Issues โข
๐ฆ Releases
Don't forget to โญ star the repo if you find VelesDB useful!