An open API service indexing awesome lists of open source software.

https://github.com/cyberlife-coder/VelesDB

VelesDB: A high-performance, local-first vector database written in Rust. The SQLite of vector search.
https://github.com/cyberlife-coder/VelesDB

ai embeddings hnsw local-first machine-learning rag rust search-engine vector-database

Last synced: 5 months ago
JSON representation

VelesDB: A high-performance, local-first vector database written in Rust. The SQLite of vector search.

Awesome Lists containing this project

README

          


VelesDB Logo


VelesDB


๐Ÿง  The Local Knowledge Engine for AI Agents ๐Ÿง 

Vector + Graph + ColumnStore Fusion โ€ข 57ยตs Search โ€ข Single Binary โ€ข Privacy-First


๐Ÿšง Work in Progress โ€” We're actively building the ultimate AI memory engine.

Ideas, feedback, and contributions are welcome!


Build
License
Stars
Ask DeepWiki


Search Latency
Binary Size
Recall

[![Star History Chart](https://api.star-history.com/svg?repos=cyberlife-coder/velesdb&type=Date)](https://star-history.com/#cyberlife-coder/velesdb&Date)

---

## ๐ŸŽฏ The Problem We Solve

> **"My RAG agent needs both semantic search AND knowledge relationships. Existing tools force me to choose or glue multiple systems together."**

### Three Pain Points That Cost You Time & Money

| Pain Point | Business Impact | VelesDB Solution |
|------------|-----------------|------------------|
| **๐ŸŒ Latency kills UX** | Cloud vector DBs add 50-100ms/query. 10 retrievals = **1+ second delay** | **57ยตs local** โ€” 1000x faster |
| **๐Ÿ”— Vectors alone aren't enough** | Semantic similarity misses relationships ("Who authored this?") | **Vector + Graph unified** in one query |
| **๐Ÿ”’ Privacy & deployment friction** | Cloud dependencies, API keys, GDPR concerns | **15MB binary** โ€” works offline, air-gapped |

### ๐Ÿ’ฐ The ROI of Switching to VelesDB

| Metric | Before (Cloud Stack) | After (VelesDB) | Savings |
|--------|---------------------|-----------------|---------|
| **Infrastructure** | 3 databases + sync | 1 binary | **70% less code** |
| **Cloud costs** | $500-5000/mo | $0 (local) | **100% savings** |
| **Latency** | 100-300ms | < 1ms | **100x faster** |
| **Compliance** | Complex (data leaves premises) | Simple (local-first) | **HIPAA/GDPR ready** |
| **Dev time** | 3 integrations to maintain | 1 API | **3x faster shipping** |

---

## ๐Ÿ† Why Developers Choose VelesDB

๐Ÿง  Vector + Graph + Columns


Unified semantic search, relationships, AND structured data.
No glue code needed.

โšก 57ยตs Search


Native HNSW + AVX-512 SIMD.
1000x faster than cloud.

๐Ÿ“ฆ 15MB Binary


Zero dependencies.
Works offline, air-gapped.

๐ŸŒ Run Anywhere


Server, Browser, Mobile, Desktop.
Same Rust codebase.

---

## ๐Ÿข Coming From Another Vector DB?

| If you use... | VelesDB gives you... |
|---------------|----------------------|
| **Pinecone** | No API keys, no cloud costs, **100x faster locally**, + Graph + Columns |
| **Qdrant** | Single binary (15MB vs 100MB+), native WASM/Mobile, **unified Vector+Graph** |
| **Milvus** | Zero config vs complex cluster setup, **embedded mode** |
| **pgvector** | Purpose-built for vectors, **700x faster search**, native graph support |
| **ChromaDB** | Production-grade Rust vs Python prototype, **enterprise-ready** |
| **Neo4j + Pinecone** | **One database instead of two**, unified query language |

```sql
-- The query that defines VelesDB: Vector + Graph in ONE statement
MATCH (doc:Document)-[:AUTHORED_BY]->(author:Person)
WHERE similarity(doc.embedding, $question) > 0.8
AND doc.category = 'research'
RETURN author.name, author.email, doc.title
ORDER BY similarity() DESC
LIMIT 5;
```

**This query would require 2 databases and complex sync logic elsewhere. With VelesDB: one query, sub-millisecond response.**

---

## ๐ŸŒ Full Ecosystem / ร‰cosystรจme Complet

VelesDB is designed to run **where your agents live** โ€” from cloud servers to mobile devices to browsers.

| Domain | Component | Description | Install |
|-------------|------------------------------------|------------------------------------------|----------------------------|
| **๐Ÿฆ€ Core** | [velesdb-core](crates/velesdb-core) | Core engine (HNSW, SIMD, VelesQL) | `cargo add velesdb-core` |
| **๐ŸŒ Server**| [velesdb-server](crates/velesdb-server) | REST API (11 endpoints, OpenAPI) | `cargo install velesdb-server` |
| **๐Ÿ’ป CLI** | [velesdb-cli](crates/velesdb-cli) | Interactive REPL for VelesQL | `cargo install velesdb-cli` |
| **๐Ÿ Python** | [velesdb-python](crates/velesdb-python) | PyO3 bindings + NumPy | `pip install velesdb` |
| **๐Ÿ“œ TypeScript** | [typescript-sdk](sdks/typescript) | Node.js & Browser SDK | `npm i @wiscale/velesdb` |
| **๐ŸŒ WASM** | [velesdb-wasm](crates/velesdb-wasm) | Browser-side vector search | `npm i @wiscale/velesdb-wasm` |
| **๐Ÿ“ฑ Mobile** | [velesdb-mobile](crates/velesdb-mobile) | iOS (Swift) & Android (Kotlin) | [Build instructions](#-mobile-build) |
| **๐Ÿ–ฅ๏ธ Desktop** | [tauri-plugin](crates/tauri-plugin-velesdb) | Tauri v2 AI-powered apps | `cargo add tauri-plugin-velesdb` |
| **๐Ÿฆœ LangChain** | [langchain-velesdb](integrations/langchain) | Official VectorStore | `pip install langchain-velesdb` |
| **๐Ÿฆ™ LlamaIndex** | [llamaindex-velesdb](integrations/llamaindex) | Document indexing | `pip install llama-index-vector-stores-velesdb` |
| **๐Ÿ”„ Migration** | [velesdb-migrate](crates/velesdb-migrate) | From Qdrant, Pinecone, Supabase | `cargo install velesdb-migrate` |

---

## ๐ŸŽฏ Use Cases

| Use Case | VelesDB Feature |
|-------------------------------|-------------------------------------|
| **RAG Pipelines** | Sub-ms retrieval |
| **AI Agents** | Embedded memory, local context |
| **Desktop Apps (Tauri/Electron)** | Single binary, no server needed |
| **Mobile AI (iOS/Android)** | Native SDKs with 32x memory compression |
| **Browser-side Search** | WASM module, zero backend |
| **Edge/IoT Devices** | 15MB footprint, ARM NEON optimized |
| **On-Prem / Air-Gapped** | No cloud dependency, full data sovereignty |

---

## ๐Ÿš€ Quick Start

### Option 1: Linux Package (.deb) โญ Recommended for Linux

Download from [GitHub Releases](https://github.com/cyberlife-coder/VelesDB/releases):

```bash
# Install
sudo dpkg -i velesdb-1.1.0-amd64.deb

# Binaries installed to /usr/bin
velesdb --version
velesdb-server --version
```

### Option 2: One-liner Script

**Linux / macOS:**
```bash
curl -fsSL https://raw.githubusercontent.com/cyberlife-coder/VelesDB/main/scripts/install.sh | bash
```

**Windows (PowerShell):**
```powershell
irm https://raw.githubusercontent.com/cyberlife-coder/VelesDB/main/scripts/install.ps1 | iex
```

### Option 3: Python (from source)

```bash
# Build from source (requires Rust)
cd crates/velesdb-python
pip install maturin
maturin develop --release
```

```python
import velesdb

db = velesdb.Database("./my_vectors")
collection = db.create_collection("docs", dimension=768, metric="cosine")
collection.upsert([{"id": 1, "vector": [...], "payload": {"title": "Hello"}}])
results = collection.search([...], top_k=10)
```

```bash
# Install from PyPI
pip install velesdb
```

### Option 4: Rust (from source)

```bash
# Clone and build
git clone https://github.com/cyberlife-coder/VelesDB.git
cd VelesDB
cargo build --release

# Binaries in target/release/
./target/release/velesdb-server --help
```

```bash
# Install from crates.io
cargo install velesdb-cli
```

### Option 5: Docker (build locally)

```bash
# Build and run locally
git clone https://github.com/cyberlife-coder/VelesDB.git
cd VelesDB
docker build -t velesdb .
docker run -d -p 8080:8080 -v velesdb_data:/data velesdb
```

```bash
# Pull from GitHub Container Registry
docker pull ghcr.io/cyberlife-coder/velesdb:latest
```

### Option 6: Portable Archives

Download from [GitHub Releases](https://github.com/cyberlife-coder/VelesDB/releases):

| Platform | File |
|----------|------|
| Windows | `velesdb-windows-x86_64.zip` |
| Linux | `velesdb-linux-x86_64.tar.gz` |
| macOS (ARM) | `velesdb-macos-arm64.tar.gz` |
| macOS (Intel) | `velesdb-macos-x86_64.tar.gz` |

### Start Using VelesDB

```bash
# Start the REST API server (data persisted in ./data)
velesdb-server --data-dir ./my_data

# Or use the interactive CLI with VelesQL REPL
velesdb repl

# Verify server is running
curl http://localhost:8080/health
# {"status":"healthy","version":"1.1.0"}
```

๐Ÿ“– **Full installation guide:** [docs/INSTALLATION.md](docs/INSTALLATION.md)


### ๐Ÿ“ฑ Mobile Build (iOS/Android)

```bash
# iOS (macOS required)
rustup target add aarch64-apple-ios aarch64-apple-ios-sim
cargo build --release --target aarch64-apple-ios -p velesdb-mobile

# Android (NDK required)
cargo install cargo-ndk
cargo ndk -t arm64-v8a -t armeabi-v7a build --release -p velesdb-mobile
```

๐Ÿ“– **Full mobile guide:** [crates/velesdb-mobile/README.md](crates/velesdb-mobile/README.md)

---

## ๐Ÿ“– Your First Vector Search

```bash
# 1. Create a collection
curl -X POST http://localhost:8080/collections \
-H "Content-Type: application/json" \
-d '{"name": "my_vectors", "dimension": 4, "metric": "cosine"}'

# 2. Insert vectors with metadata
curl -X POST http://localhost:8080/collections/my_vectors/points \
-H "Content-Type: application/json" \
-d '{
"points": [
{"id": 1, "vector": [1.0, 0.0, 0.0, 0.0], "payload": {"title": "AI Introduction", "category": "tech"}},
{"id": 2, "vector": [0.0, 1.0, 0.0, 0.0], "payload": {"title": "ML Basics", "category": "tech"}},
{"id": 3, "vector": [0.0, 0.0, 1.0, 0.0], "payload": {"title": "History of Computing", "category": "history"}}
]
}'

# 3. Search for similar vectors
curl -X POST http://localhost:8080/collections/my_vectors/search \
-H "Content-Type: application/json" \
-d '{"vector": [0.9, 0.1, 0.0, 0.0], "top_k": 2}'

# 4. Or use VelesQL (SQL-like queries)
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT * FROM my_vectors WHERE vector NEAR $v AND category = '\''tech'\'' LIMIT 5",
"params": {"v": [0.9, 0.1, 0.0, 0.0]}
}'
```

---

## ๐Ÿ”Œ API Reference

### Collections

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections` | `GET` | List all collections |
| `/collections` | `POST` | Create a collection |
| `/collections/{name}` | `GET` | Get collection info |
| `/collections/{name}` | `DELETE` | Delete a collection |

### Points

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/points` | `POST` | Upsert points |
| `/collections/{name}/points/{id}` | `GET` | Get a point by ID |
| `/collections/{name}/points/{id}` | `DELETE` | Delete a point |

### Search (Vector)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/search` | `POST` | Vector similarity search |
| `/collections/{name}/search/batch` | `POST` | Batch search (multiple queries) |
| `/collections/{name}/search/multi` | `POST` | Multi-query search |

### Graph

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/graph/edges` | `GET` | Get edges for a node |
| `/collections/{name}/graph/edges` | `POST` | Add edge between nodes |
| `/collections/{name}/graph/traverse` | `POST` | BFS/DFS graph traversal |
| `/collections/{name}/graph/nodes/{node_id}/degree` | `GET` | Get node degree (in/out) |

### Indexes

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/collections/{name}/indexes` | `GET` | List indexes |
| `/collections/{name}/indexes` | `POST` | Create index on property |
| `/collections/{name}/indexes/{label}/{property}` | `DELETE` | Delete index |

### VelesQL v2.0 (Unified Query)

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/query` | `POST` | Execute VelesQL (Vector + Graph + ColumnStore queries) |

**VelesQL v2.0 Features:**
- `GROUP BY` / `HAVING` with AND/OR operators
- `ORDER BY` multi-column + `similarity()` function
- `JOIN` with aliases across collections
- `UNION` / `INTERSECT` / `EXCEPT` set operations
- `USING FUSION(strategy='rrf')` hybrid search
- `WITH (max_groups=100)` query-time config

```sql
-- Example: Analytics with aggregation
SELECT category, COUNT(*), AVG(price) FROM products
GROUP BY category HAVING COUNT(*) > 5

-- Example: Hybrid search with fusion
SELECT * FROM docs USING FUSION(strategy='rrf', k=60) LIMIT 20

-- Example: Set operations
SELECT * FROM active UNION SELECT * FROM archived
```

> **Note:** ColumnStore operations (INSERT, UPDATE, SELECT on structured data) are performed via the `/query` endpoint using VelesQL syntax.

### Health

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | `GET` | Health check |

### Request/Response Examples

Create Collection

```bash
curl -X POST http://localhost:8080/collections \
-H "Content-Type: application/json" \
-d '{
"name": "my_vectors",
"dimension": 768,
"metric": "cosine" # Options: cosine, euclidean, dot, hamming, jaccard
}'
```

**Response:**
```json
{"message": "Collection created", "name": "my_vectors"}
```

Upsert Points

```bash
curl -X POST http://localhost:8080/collections/my_vectors/points \
-H "Content-Type: application/json" \
-d '{
"points": [
{
"id": 1,
"vector": [0.1, 0.2, 0.3, ...],
"payload": {"title": "Document 1", "tags": ["ai", "ml"]}
}
]
}'
```

**Response:**
```json
{"message": "Points upserted", "count": 1}
```

Vector Search

```bash
curl -X POST http://localhost:8080/collections/my_vectors/search \
-H "Content-Type: application/json" \
-d '{
"vector": [0.1, 0.2, 0.3, ...],
"top_k": 10
}'
```

**Response:**
```json
{
"results": [
{"id": 1, "score": 0.95, "payload": {"title": "Document 1"}},
{"id": 42, "score": 0.87, "payload": {"title": "Document 42"}}
]
}
```

Batch Search

```bash
curl -X POST http://localhost:8080/collections/my_vectors/search/batch \
-H "Content-Type: application/json" \
-d '{
"searches": [
{"vector": [0.1, 0.2, ...], "top_k": 5},
{"vector": [0.3, 0.4, ...], "top_k": 5}
]
}'
```

**Response:**
```json
{
"results": [
{"results": [{"id": 1, "score": 0.95, "payload": {...}}]},
{"results": [{"id": 2, "score": 0.89, "payload": {...}}]}
],
"timing_ms": 1.23
}
```

VelesQL Query

```bash
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"query": "SELECT * FROM my_vectors WHERE vector NEAR $v LIMIT 10",
"params": {"v": [0.1, 0.2, 0.3, ...]}
}'
```

**Response:**
```json
{
"results": [
{"id": 1, "score": 0.95, "payload": {"title": "Document 1"}}
],
"timing_ms": 2.34,
"rows_returned": 1
}
```

---

## ๐Ÿงช Real-World Business Scenarios

> **Each scenario shows a business problem that traditionally requires 2-3 databases. VelesDB solves it with ONE query.**

---

### ๐Ÿ’ผ Business Scenario 1: E-commerce Product Discovery
**Industry:** Retail / E-commerce
**Problem:** "Show me products similar to this photo, from trusted suppliers, under $500"

```sql
-- Traditional approach: Pinecone (image search) + Neo4j (supplier trust) + PostgreSQL (price)
-- VelesDB: ONE query

MATCH (product:Product)-[:SUPPLIED_BY]->(supplier:Supplier)
WHERE
similarity(product.image_embedding, $uploaded_photo) > 0.7 -- Vector: visual similarity
AND supplier.trust_score > 4.5 -- Graph: relationship data
AND (SELECT price FROM inventory WHERE sku = product.sku) < 500 -- Column: real-time price
ORDER BY similarity() DESC
LIMIT 12
```

**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Query latency | 350ms (3 DBs) | **2ms** |
| Infrastructure | $2,400/mo | **$0** (local) |
| Dev complexity | 3 integrations | **1 API** |

---

### ๐Ÿ’ผ Business Scenario 2: Fraud Detection in Real-Time
**Industry:** Banking / FinTech
**Problem:** "Flag transactions that look suspicious based on pattern + network + history"

```sql
-- Detect fraud: semantic pattern + transaction network + account history
MATCH (tx:Transaction)-[:FROM]->(account:Account)-[:LINKED_TO*1..3]->(related:Account)
WHERE
similarity(tx.behavior_embedding, $known_fraud_pattern) > 0.6 -- Vector: behavioral similarity
AND related.risk_level = 'high' -- Graph: network analysis
AND (SELECT SUM(amount) FROM transactions
WHERE account_id = account.id
AND timestamp > NOW() - INTERVAL '24 hours') > 10000 -- Column: velocity check
RETURN tx.id, account.id, similarity() as fraud_score
```

**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Detection time | 2-5 seconds | **< 10ms** |
| False positives | 15% | **8%** (better context) |
| Compliance | Cloud concerns | **On-premise OK** |

---

### ๐Ÿ’ผ Business Scenario 3: Healthcare Diagnosis Assistant
**Industry:** Healthcare / MedTech
**Problem:** "Find similar patient cases with treatment outcomes, HIPAA-compliant"

```sql
-- Medical RAG: symptoms + patient network + treatment history
MATCH (patient:Patient)-[:HAS_CONDITION]->(condition:Condition)
-[:TREATED_WITH]->(treatment:Treatment)
WHERE
similarity(condition.symptoms_embedding, $current_symptoms) > 0.75 -- Vector: symptom matching
AND condition.icd10_code IN ('J18.9', 'J12.89') -- Column: specific diagnoses
AND (SELECT success_rate FROM treatment_outcomes
WHERE treatment_id = treatment.id) > 0.8 -- Column: outcome data
RETURN treatment.name, AVG(success_rate) as effectiveness
```

**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Data location | Cloud (HIPAA risk) | **100% on-premise** |
| Query time | 500ms+ | **< 5ms** |
| Integration | 3 vendors | **1 binary** |

---

### ๐Ÿ’ผ Business Scenario 4: AI Agent Memory (RAG + Context)
**Industry:** AI / SaaS
**Problem:** "My AI agent needs conversation history + knowledge base + user preferences"

```sql
-- Agent memory: semantic recall + conversation graph + user context
MATCH (user:User)-[:HAD_CONVERSATION]->(conv:Conversation)
-[:CONTAINS]->(message:Message)
WHERE
similarity(message.embedding, $current_query) > 0.7 -- Vector: relevant past messages
AND conv.timestamp > NOW() - INTERVAL '7 days' -- Column: recent conversations
AND (SELECT preference_value FROM user_preferences
WHERE user_id = user.id AND key = 'topic') = message.topic -- Column: user prefs
ORDER BY conv.timestamp DESC, similarity() DESC
LIMIT 10
```

**Business Impact:**
| Metric | Before | After VelesDB |
|--------|--------|---------------|
| Context retrieval | 100-200ms | **< 1ms** |
| Memory footprint | 500MB+ | **15MB binary** |
| Works offline | โŒ | **โœ…** |

---

### Scenario 0: Technical Deep-Dive (Vector + Graph + ColumnStore)
**Goal:** Demonstrate the power of cross-model queries - finding semantically similar documents through graph relationships with structured data filtering

```sql
-- ๐Ÿ”ฎ The VelesDB Advantage: One query across all three stores
MATCH (doc:Document)-[:AUTHORED_BY]->(author:Author)
WHERE
similarity(doc.embedding, $research_question) > 0.8 -- Vector: semantic search
AND doc.category = 'peer-reviewed' -- Column: structured filter
AND (SELECT citation_count FROM author_metrics -- Column: subquery
WHERE author_id = author.id) > 50
ORDER BY similarity() DESC
LIMIT 5
```

**What's happening:**
1. **Graph traversal**: `MATCH` finds documentโ†’author relationships
2. **Vector search**: `similarity()` ranks by semantic relevance to your question
3. **Columnar filter**: `category = 'peer-reviewed'` filters structured metadata
4. **Columnar subquery**: Joins with `author_metrics` table for citation counts

**Expected Output:**
```json
{
"results": [
{
"doc.title": "Neural Memory Consolidation in AI Agents",
"author.name": "Dr. Sarah Chen",
"similarity": 0.94,
"citation_count": 127
}
],
"timing_ms": 0.8
}
```

**Why this matters:** This query would require 3 separate databases and complex synchronization logic in a traditional stack. With VelesDB: **one query, sub-millisecond response**.

---

### Scenario 0b: Multi-Vector Fusion Search (NEAR_FUSED)
**Goal:** Search using multiple query vectors simultaneously with intelligent result fusion

```sql
-- ๐Ÿ”ฎ Multi-modal search: combine text + image embeddings
SELECT * FROM products
WHERE vector NEAR_FUSED [$text_embedding, $image_embedding]
USING FUSION 'rrf' (k=60)
AND category = 'electronics'
ORDER BY similarity() DESC
LIMIT 10
```

**Fusion Strategies Available:**

| Strategy | Syntax | Best For |
|----------|--------|----------|
| **RRF** | `USING FUSION 'rrf' (k=60)` | Robust rank-based fusion (recommended) |
| **Average** | `USING FUSION 'average'` | General purpose, balanced results |
| **Maximum** | `USING FUSION 'maximum'` | Emphasize documents scoring high in ANY query |
| **Weighted** | `USING FUSION 'weighted' (avg_weight=0.5, max_weight=0.3, hit_weight=0.2)` | Custom control over fusion factors |

**Real-World Use Cases:**

```sql
-- E-commerce: "show me products like this photo that match 'wireless headphones'"
SELECT * FROM products
WHERE vector NEAR_FUSED [$image_vector, $text_vector]
USING FUSION 'weighted' (avg_weight=0.6, max_weight=0.3, hit_weight=0.1)
LIMIT 8

-- RAG: Multi-perspective document retrieval
SELECT * FROM documents
WHERE vector NEAR_FUSED [$question_embedding, $context_embedding, $user_profile_embedding]
USING FUSION 'rrf'
LIMIT 5

-- Semantic + Lexical hybrid (BM25 + Vector)
SELECT * FROM articles
WHERE content MATCH 'artificial intelligence'
AND vector NEAR $semantic_embedding
ORDER BY similarity() DESC
LIMIT 10
```

**Expected Output:**
```json
{
"results": [
{"id": 42, "score": 0.91, "fusion_details": {"rrf_rank": 1, "sources": 2}},
{"id": 17, "score": 0.87, "fusion_details": {"rrf_rank": 2, "sources": 2}}
],
"timing_ms": 1.2
}
```

---

### Scenario 0c: Distance Metrics for Every Use Case
**Goal:** Choose the right metric for your data type and domain

VelesDB supports **5 distance metrics** - each optimized for specific use cases:

| Metric | Best For | Example Domain |
|--------|----------|----------------|
| **Cosine** | Text embeddings, normalized vectors | NLP, semantic search |
| **Euclidean** | Spatial data, absolute distances | Geolocation, clustering |
| **DotProduct** | Pre-normalized embeddings, retrieval | RAG, recommendations |
| **Hamming** | Binary vectors, fingerprints | Image hashing, DNA |
| **Jaccard** | Set similarity, sparse features | Tags, categories |

**1. Cosine Similarity (NLP / Semantic Search)**
```bash
# Create collection with cosine metric
curl -X POST http://localhost:8080/collections \
-d '{"name": "documents", "dimension": 768, "metric": "cosine"}'
```
```sql
-- Find semantically similar documents (angle-based, ignores magnitude)
SELECT * FROM documents
WHERE vector NEAR $query_embedding
ORDER BY similarity() DESC
LIMIT 10
```
**Use case:** ChatGPT-style RAG, document similarity, semantic Q&A

---

**2. Euclidean Distance (Spatial / Clustering)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "locations", "dimension": 3, "metric": "euclidean"}'
```
```sql
-- Find nearest physical locations (absolute distance matters)
SELECT * FROM locations
WHERE vector NEAR $gps_coordinates
AND category = 'restaurant'
ORDER BY similarity() ASC -- Lower = closer
LIMIT 5
```
**Use case:** Geospatial search, k-means clustering, anomaly detection

---

**3. Dot Product (RAG / Recommendations)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "products", "dimension": 512, "metric": "dot"}'
```
```sql
-- Maximize relevance score (pre-normalized embeddings)
SELECT * FROM products
WHERE vector NEAR $user_preference_vector
AND in_stock = true
ORDER BY similarity() DESC
LIMIT 8
```
**Use case:** Recommendation engines, MaxIP retrieval, MIPS problems

---

**4. Hamming Distance (Binary Vectors / Fingerprints)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "image_hashes", "dimension": 256, "metric": "hamming"}'
```
```sql
-- Find near-duplicate images (bit-level comparison, 6ns latency!)
SELECT * FROM image_hashes
WHERE vector NEAR $perceptual_hash
AND source = 'user_uploads'
ORDER BY similarity() ASC -- Fewer bit differences = more similar
LIMIT 10
```
**Use case:** Image deduplication, perceptual hashing, DNA sequence matching, malware signatures

---

**5. Jaccard Similarity (Sets / Sparse Features)**
```bash
curl -X POST http://localhost:8080/collections \
-d '{"name": "user_tags", "dimension": 100, "metric": "jaccard"}'
```
```sql
-- Find users with similar interests (set overlap)
SELECT * FROM user_tags
WHERE vector NEAR $current_user_tags
ORDER BY similarity() DESC
LIMIT 20
```
**Use case:** Tag-based recommendations, category matching, collaborative filtering

---

**Performance by Metric (768D vectors):**

| Metric | Latency | Throughput | SIMD Optimized |
|--------|---------|------------|----------------|
| **Cosine** | 78 ns | 13M ops/sec | โœ… AVX-512 |
| **Euclidean** | 44 ns | 23M ops/sec | โœ… AVX-512 |
| **DotProduct** | 66 ns | 15M ops/sec | โœ… AVX-512 |
| **Hamming** | **6 ns** | **164M ops/sec** | โœ… POPCNT |
| **Jaccard** | 89 ns | 11M ops/sec | โœ… AVX2 |

> **Tip:** Hamming is 10x faster than float metrics - ideal for binary embeddings on edge devices!

---

### Scenario 1: Medical Research Assistant
**Goal:** Find recent oncology studies with specific gene mentions, ordered by relevance

```sql
SELECT study_id, title, publication_date
FROM medical_studies
WHERE
vector NEAR $cancer_research_embedding
AND content LIKE '%BRCA1%'
AND publication_date > '2025-01-01'
ORDER BY similarity() DESC
LIMIT 5
```

**Parameters:**
- `$cancer_research_embedding`: [0.23, 0.87, -0.12, ...] (embedding for "advanced cancer immunotherapy")

**Expected Output:**
```json
{
"results": [
{
"study_id": "onco-2025-042",
"title": "BRCA1 Mutations in Immunotherapy Response",
"publication_date": "2025-03-15",
"score": 0.92
},
{
"study_id": "onco-2025-017",
"title": "Gene Editing Approaches for Metastatic Cancer",
"publication_date": "2025-02-28",
"score": 0.87
}
]
}
```

---

### Scenario 2: E-commerce Recommendation Engine
**Goal:** Recommend products similar to a user's purchase history, within their price range

```sql
SELECT product_id, name, price
FROM products
WHERE
vector NEAR $user_preferences
AND price BETWEEN 20.00 AND 100.00
AND category = 'electronics'
ORDER BY similarity() DESC, price ASC
LIMIT 8
```

**Parameters:**
- `$user_preferences`: [0.78, -0.23, 0.45, ...] (embedding based on user's purchase history)

**Expected Output:**
```json
{
"results": [
{
"product_id": "prod-67890",
"name": "Wireless Noise-Cancelling Headphones",
"price": 89.99,
"score": 0.95
},
{
"product_id": "prod-54321",
"name": "Bluetooth Portable Speaker",
"price": 59.99,
"score": 0.91
}
]
}
```

---

### Scenario 3: Cybersecurity Threat Detection
**Goal:** Find similar malware patterns observed in the last 7 days

```sql
SELECT malware_hash, threat_level, first_seen
FROM threat_intel
WHERE
vector NEAR $current_threat_embedding
AND first_seen > NOW() - INTERVAL '7 days'
AND threat_level > 0.8
ORDER BY similarity() DESC, first_seen DESC
LIMIT 10
```

**Parameters:**
- `$current_threat_embedding`: [0.12, -0.87, 0.34, ...] (embedding of current malware signature)

**Troubleshooting Tip:** If no results appear, verify:
1. Threat intelligence feed is updating daily
2. Vector dimensions match collection configuration
3. Timestamp format matches ISO 8601 (YYYY-MM-DD HH:MM:SS)

---

## โšก Performance

### ๐Ÿ”ฅ Core Vector Operations (768D - BERT/OpenAI dimensions)

| Operation | Latency | Throughput | vs. Naive |
|-----------|---------|------------|----------|
| **Dot Product (1536D)** | **66 ns** | **15M ops/sec** | ๐Ÿš€ **8x faster** |
| **Euclidean (768D)** | **44 ns** | **23M ops/sec** | ๐Ÿš€ **6x faster** |
| **Cosine (768D)** | **78 ns** | **13M ops/sec** | ๐Ÿš€ **4x faster** |
| **Hamming (Binary)**| **6 ns** | **164M ops/sec** | ๐Ÿš€ **10x faster** |

### ๐Ÿ“Š System Performance (10K Vectors, Local)

| Benchmark | Result | Details |
|-----------|--------|---------|
| **HNSW Search** | **57 ยตs** | p50 latency (Cosine) |
| **VelesQL Parsing**| **554 ns** | Simple SELECT |
| **VelesQL Cache Hit**| **48 ns** | HashMap pre-allocation |
| **Recall@10** | **100%** | Perfect mode (brute-force SIMD) |
| **BM25 Search** | **33 ยตs** | Adaptive PostingList (10K docs) |

### ๐ŸŽฏ Search Quality (Recall)

| Mode | Recall@10 | Latency (128D) | Use Case |
|------|-----------|----------------|----------|
| Fast | 92.2% | ~26ยตs | Real-time, high throughput |
| Balanced | 98.8% | ~39ยตs | Production recommended |
| Accurate | 100% | ~67ยตs | High precision |
| **Perfect** | **100%** | ~220ยตs | Brute-force SIMD |

### ๐Ÿ› ๏ธ Optimizations Under the Hood

- **SIMD**: AVX-512/AVX2 auto-detection with 32-wide FMA
- **Prefetch**: CPU cache warming for HNSW traversal (+12% throughput)
- **Contiguous Layout**: 64-byte aligned memory for cache efficiency
- **Batch WAL**: Single disk write per batch import
- **Zero-Copy**: Memory-mapped files for fast startup

> ๐Ÿ“Š Full benchmarks: [docs/BENCHMARKS.md](docs/BENCHMARKS.md)

### ๐Ÿ“ฆ Vector Quantization (Memory Reduction)

Reduce memory usage by **4-32x** with minimal recall loss:

| Method | Compression | Recall Loss | Use Case |
|--------|-------------|-------------|----------|
| **SQ8** (8-bit) | **4x** | < 2% | General purpose, Edge |
| **Binary** (1-bit) | **32x** | ~10-15% | Fingerprints, IoT |

```rust
use velesdb_core::quantization::{QuantizedVector, dot_product_quantized_simd};

// Compress 768D vector: 3072 bytes โ†’ 776 bytes (4x reduction)
let quantized = QuantizedVector::from_f32(&embedding);

// SIMD-optimized search (only ~30% slower than f32)
let similarity = dot_product_quantized_simd(&query, &quantized);
```

> ๐Ÿ“– Full guide: [docs/QUANTIZATION.md](docs/QUANTIZATION.md)

---

## ๐Ÿš€ Transformative Benefits: How VelesDB Changes Development

### โšก Eliminates Database Sprawl
VelesDB replaces 3+ specialized databases (vector DB + graph DB + document store) with a **single unified engine**.

```mermaid
graph LR
A[App] --> V[VelesDB]
```

**Impact:**
- โœ… **70% less infrastructure code**
- โœ… **No synchronization headaches**
- โœ… **Single query language for all operations**

### ๐Ÿ’ก Enables New Application Categories
With air-gapped deployment and 15MB binary size:
```mermaid
pie title Deployment Locations
"On-Prem Servers" : 35
"Edge Devices" : 25
"Mobile Apps" : 20
"Browser WASM" : 20
```

**Impact:**
- โœ… **Build HIPAA-compliant healthcare apps**
- โœ… **Create military-grade analytics** for air-gapped environments
- โœ… **Enable privacy-first consumer apps** with zero data sharing

### ๐Ÿš€ Redefines Performance Expectations

| Pipeline Step | Cloud Vector DB | VelesDB |
|---------------|-----------------|---------|
| Network round-trip | 50-100ms | **0ms** (local) |
| Vector search | 10-50ms | **0.057ms** |
| Graph traversal | 20-100ms | **0.1ms** |
| **Total latency** | **100-250ms** | **< 1ms** |

> ๐Ÿ’ก **100x faster** enables use cases impossible with cloud: real-time autocomplete, instant RAG, sub-frame game AI

**Impact:**
- โœ… **Build real-time AI agents** that respond faster than human perception
- โœ… **Enable complex RAG chains** with 10+ sequential retrievals
- โœ… **Create instant search experiences** with no loading spinners

### ๐Ÿ’ผ Unified API Simplifies Development
One consistent API across all platforms:
```rust
// Same API everywhere
let results = db.search(query_vector, filters, graph_traversal);
```

**Impact:**
- โœ… **Learn once, deploy everywhere**
- โœ… **Shared codebase** between web, mobile, and desktop
- โœ… **Eliminate platform-specific database code**

---

## โœจ Core Features That Transform Development

| Feature | Technical Capability | Real-World Impact |
|---------|----------------------|-------------------|
| **๐Ÿง  Vector + Graph Fusion** | Unified query language for semantic + relationship queries | **Build smarter AI agents** with contextual understanding |
| **โšก 57ยตs Search** | Native HNSW + AVX-512 SIMD | **Create real-time experiences** previously impossible |
| **๐Ÿ“ฆ 15MB Binary** | Zero dependencies, single executable | **Deploy anywhere** - from servers to edge devices |
| **๐Ÿ”’ Air-Gapped Deployment** | Full functionality without internet | **Meet strict compliance** in healthcare/finance |
| **๐ŸŒ Everywhere Runtime** | Consistent API across server/mobile/browser | **Massive code reuse** across platforms |
| **๐Ÿง  SQ8 Quantization** | 4x memory reduction | **Run complex AI** on resource-constrained devices |
| **๐Ÿ“ VelesQL** | SQL-like unified query language | **Simplify complex queries** - no DSL learning curve |

---

## ๐Ÿ† Real-World Impact Stories

### ๐Ÿฅ Healthcare Diagnostics Assistant
**Before VelesDB:**
- 300ms latency per query
- Patient data in cloud
- Separate systems for medical knowledge and patient data

**With VelesDB:**
- **0.6ms diagnosis suggestions**
- **On-device patient data**
- **Unified medical knowledge graph**

```mermaid
pie title Performance Improvement
"Diagnosis Speed" : 85
"Accuracy" : 10
"Privacy" : 5
```

### ๐Ÿญ Manufacturing Quality Control
**Before VelesDB:**
- Cloud dependency caused production delays
- Separate systems for defect images and part metadata

**With VelesDB:**
```sql
MATCH (part)-[HAS_DEFECT]->(defect)
WHERE defect.vector NEAR $image_vec
AND part.material = 'titanium'
```
- **50% fewer defective shipments**
- **Offline factory floor operation**
- **Unified defect database**

---

## ๐Ÿค Contributing

We welcome contributions! Here's how to get started:

### Development Setup

```bash
# Clone the repo
git clone https://github.com/cyberlife-coder/VelesDB.git
cd VelesDB

# Run tests
cargo test --all-features

# Run with checks (before committing)
cargo fmt --all
cargo clippy --all-targets --all-features -- -D warnings
```

### Project Structure

```
VelesDB/
โ”œโ”€โ”€ crates/
โ”‚ โ”œโ”€โ”€ velesdb-core/ # Core engine library
โ”‚ โ”‚ โ”œโ”€โ”€ src/
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ collection/ # Collection management
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ index/ # HNSW index
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ storage/ # Persistence layer
โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ velesql/ # Query language parser
โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ simd/ # SIMD optimizations
โ”‚ โ”‚ โ””โ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ velesdb-server/ # REST API server
โ”‚ โ”œโ”€โ”€ velesdb-mobile/ # iOS/Android bindings (UniFFI)
โ”‚ โ”œโ”€โ”€ velesdb-wasm/ # WebAssembly module
โ”‚ โ””โ”€โ”€ velesdb-python/ # Python bindings (PyO3)
โ”œโ”€โ”€ benches/ # Benchmarks
โ””โ”€โ”€ docs/ # Documentation
```

### Good First Issues

Looking for a place to start? Check out issues labeled [`good first issue`](https://github.com/cyberlife-coder/VelesDB/labels/good%20first%20issue).

---

## ๐Ÿ“Š Roadmap

```mermaid
gantt
title VelesDB Development Timeline
dateFormat YYYY-MM
section v1.2 โœ…
Knowledge Graph Storage :done, 2025-10, 2025-12
VelesQL MATCH Clause :done, 2025-11, 2025-12
Vector-Graph Fusion :done, 2025-12, 2026-01
ColumnStore CRUD :done, 2026-01, 2026-01
VelesQL JOIN Cross-Store :done, 2026-01, 2026-01
section v1.3 ๐Ÿ”„
Aggregations VelesQL :active, 2026-02, 2026-03
Agent Memory Patterns :2026-02, 2026-04
Documentation :2026-02, 2026-03
section v1.4 ๐Ÿ“‹
E2E Test Suite :2026-04, 2026-05
Durability & Recovery :2026-04, 2026-06
```

### Progress Overview

| Version | Status | EPICs Done | Progress |
|---------|--------|------------|----------|
| **v1.2.0** | โœ… Released | 15/15 | ![100%](https://progress-bar.xyz/100?title=Complete) |
| **v1.3.0** | ๐Ÿ”„ In Progress | 0/6 | ![4%](https://progress-bar.xyz/4?title=Building) |
| **v1.4.0** | ๐Ÿ“‹ Planned | 0/5 | ![0%](https://progress-bar.xyz/0?title=Planned) |

---

### v1.2.0 โœ… Released (January 2026)

15 EPICs Completed - Click to expand

| EPIC | Feature | Impact |
|------|---------|--------|
| EPIC-001 | โœ… Code Quality Refactoring | Clean architecture |
| EPIC-002 | โœ… GPU Acceleration (wgpu) | 10x throughput |
| EPIC-003 | โœ… PyO3 Migration | Python 3.12+ support |
| EPIC-004 | โœ… Knowledge Graph Storage | GraphNode, GraphEdge, BFS |
| EPIC-005 | โœ… VelesQL MATCH Clause | Cypher-inspired queries |
| EPIC-006 | โœ… Agent Toolkit SDK | Python, WASM, Mobile |
| EPIC-007 | โœ… Python Bindings Refactoring | Clean API |
| EPIC-008 | โœ… Vector-Graph Fusion | `similarity()` in MATCH |
| EPIC-009 | โœ… Graph Property Index | 10x faster MATCH |
| EPIC-019 | โœ… Scalability 10M entries | Enterprise datasets |
| EPIC-020 | โœ… ColumnStore CRUD | Real-time updates |
| EPIC-021 | โœ… VelesQL JOIN Cross-Store | Graph โ†” Table queries |
| EPIC-028 | โœ… ORDER BY Multi-Columns | Complex sorting |
| EPIC-029 | โœ… Python SDK Core Delegation | DRY bindings |
| EPIC-031 | โœ… Multimodel Query Engine | Unified execution |

**Highlights:**
- ๐Ÿง  **Knowledge Graph** - Full MATCH clause with BFS traversal
- ๐Ÿ”ฎ **Vector-Graph Fusion** - `WHERE similarity() > 0.8` in graph queries
- ๐Ÿ“Š **ColumnStore** - Real-time CRUD with JOIN support
- ๐Ÿ“ฆ **Published** - crates.io, PyPI, npm

---

### v1.3.0 ๐Ÿ”„ In Progress (Q1 2026)

| EPIC | Feature | Priority | Progress |
|------|---------|----------|----------|
| EPIC-016 | **SDK Ecosystem Sync** | ๐Ÿ”ฅ Critical | ๐Ÿ”„ 21% (3/14 US) |
| EPIC-017 | **Aggregations VelesQL** | ๐Ÿ”ฅ Critical | ๐Ÿ“‹ 0% |
| EPIC-010 | **Agent Memory Patterns SDK** | ๐Ÿš€ High | ๐Ÿ“‹ 0% |
| EPIC-018 | **Documentation & Examples** | ๐Ÿš€ High | ๐Ÿ“‹ 0% |
| EPIC-012 | **TypeScript SDK** | ๐Ÿ“‹ Medium | ๐Ÿ“‹ 0% |
| EPIC-013 | **LangChain/LlamaIndex** | ๐Ÿ“‹ Medium | ๐Ÿ“‹ 0% |

**Coming Soon:**
```sql
-- Aggregations (EPIC-017)
SELECT category, COUNT(*), AVG(price)
FROM products
GROUP BY category
HAVING COUNT(*) > 10

-- Agent Memory Patterns (EPIC-010)
INSERT INTO agent_memory (episode, embedding, context)
VALUES ('task_123', $vec, '{"goal": "find documents"}')
```

---

### v1.4.0 ๐Ÿ“‹ Planned (Q2 2026)

| EPIC | Feature | Focus |
|------|---------|-------|
| EPIC-011 | **E2E Test Suite** | Quality assurance |
| EPIC-024 | **Durability & Crash Recovery** | Database-grade reliability |
| EPIC-022 | **Unsafe Auditability** | Security audit |
| EPIC-023 | **Loom Concurrency Proofs** | Thread safety |
| EPIC-025 | **Miri/Fuzzing** | Memory safety |

---

### Future Vision

```mermaid
pie title VelesDB Feature Distribution
"Vector Search" : 30
"Knowledge Graph" : 25
"ColumnStore" : 20
"Query Engine" : 15
"SDKs & Integrations" : 10
```

| Horizon | Features |
|---------|----------|
| **2026 H2** | Sparse vectors, Product Quantization (PQ) |
| **2027** | Distributed mode (Premium), Cluster HA |
| **Beyond** | Agent Hooks & Triggers, Multi-tenancy |

---

## ๐Ÿ“œ License

VelesDB is licensed under the [Elastic License 2.0 (ELv2)](LICENSE).

ELv2 is a source-available license that allows free use, modification, and distribution, with restrictions only on providing the software as a managed service.

---

## โญ Support VelesDB

If VelesDB helps your project, here's how you can support us:



Star on GitHub


Share on Twitter

### โ˜• Buy Me A Coffee

Building VelesDB takes countless hours. If you find it useful, consider supporting development:



Buy Me A Coffee

### ๐Ÿท๏ธ Show You Use VelesDB

Add the badge to your project:

[![Powered by VelesDB](https://img.shields.io/badge/Powered_by-VelesDB-blue?style=flat-square)](https://github.com/cyberlife-coder/VelesDB)

```markdown
[![Powered by VelesDB](https://img.shields.io/badge/Powered_by-VelesDB-blue?style=flat-square)](https://github.com/cyberlife-coder/VelesDB)
```

---


Built with โค๏ธ and ๐Ÿฆ€ Rust


Original Author: Julien Lange โ€” WiScale


โญ GitHub โ€ข
๐Ÿ“– Documentation โ€ข
๐Ÿ› Issues โ€ข
๐Ÿ“ฆ Releases


Don't forget to โญ star the repo if you find VelesDB useful!