https://github.com/cyberlife-coder/VelesDB

VelesDB: A high-performance, local-first vector database written in Rust. The SQLite of vector search.
https://github.com/cyberlife-coder/VelesDB
ai embeddings hnsw local-first machine-learning rag rust search-engine vector-database
Last synced: 6 months ago
JSON representation
VelesDB: A high-performance, local-first vector database written in Rust. The SQLite of vector search.
Host: GitHub
URL: https://github.com/cyberlife-coder/VelesDB
Owner: cyberlife-coder
License: other
Created: 2025-12-17T18:01:02.000Z (7 months ago)
Default Branch: main
Last Pushed: 2026-01-20T16:41:20.000Z (6 months ago)
Last Synced: 2026-01-20T16:44:06.436Z (6 months ago)
Topics: ai, embeddings, hnsw, local-first, machine-learning, rag, rust, search-engine, vector-database
Language: Rust
Homepage: https://velesdb.com
Size: 17.4 MB
Stars: 10
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: docs/SECURITY_AUDIT_2025_01_07.md
Awesome Lists containing this project

fucking-awesome-rust - VelesDB - core](crates.io/crates/velesdb-core)] - Embeddable, local-first database whose tri-engine fuses vector search, a property graph, and a columnar store behind one query language (VelesQL), in a single binary. Ships an in-core agentic-memory SDK — semantic / episodic / procedural — with cross-session `why()` recall that traverses the graph to surface linked facts vector search alone misses. (Applications / Database)
awesome-rust - VelesDB - core](https://crates.io/crates/velesdb-core)] - Embeddable, local-first database whose tri-engine fuses vector search, a property graph, and a columnar store behind one query language (VelesQL), in a single binary. Ships an in-core agentic-memory SDK — semantic / episodic / procedural — with cross-session `why()` recall that traverses the graph to surface linked facts vector search alone misses. (Applications / Database)
awesome-vector-database - VelesDB - Embedded vector + graph + columnar database. Rust core (~6MB), HNSW with 5 distance metrics, VelesQL (SQL + NEAR + MATCH). Python and Rust SDKs. (Multidimensional data / Vectors)
awesome-local-first - VelesDB - first vector + graph + columnar database written in Rust (~6MB binary). Source-available (Elastic License 2.0) (⚙️ Development Tools & Libraries / Database Solutions)
awesome-tauri - tauri-plugin-velesdb - Native vector database plugin. 70µs semantic search, ≥95% recall, hybrid BM25+vector, offline-first, full ecosystem integrations and more. (Development / Plugins)
awesome-rust-with-stars - VelesDB - core) | Embeddable, local-first database whose tri-engine fuses vector search, a property graph, and a columnar store behind one query language (VelesQL), in a single binary. Ships an in-core agentic-memory SDK — semantic / episodic / procedural — with cross-session why() recall that traverses the graph to surface linked facts vector search alone misses. | 2026-07-14 | (Applications / Database)
README

          


  





  





  🧠 The Local Knowledge Engine for AI Agents 🧠


  Vector + Graph + ColumnStore Fusion • 57µs Search • Single Binary • Privacy-First





  🚧 Work in Progress — We're actively building the ultimate AI memory engine.


  Ideas, feedback, and contributions are welcome!





  

  

  

  





  

  

  



[![Star History Chart](https://api.star-history.com/svg?repos=cyberlife-coder/velesdb&type=Date)](https://star-history.com/#cyberlife-coder/velesdb&Date)

---

## 🎯 The Problem We Solve

> **"My RAG agent needs both semantic search AND knowledge relationships. Existing tools force me to choose or glue multiple systems together."**

### Three Pain Points That Cost You Time & Money

| Pain Point | Business Impact | VelesDB Solution |

|------------|-----------------|------------------|

| **🐌 Latency kills UX** | Cloud vector DBs add 50-100ms/query. 10 retrievals = **1+ second delay** | **57µs local** — 1000x faster |

| **🔗 Vectors alone aren't enough** | Semantic similarity misses relationships ("Who authored this?") | **Vector + Graph unified** in one query |

| **🔒 Privacy & deployment friction** | Cloud dependencies, API keys, GDPR concerns | **15MB binary** — works offline, air-gapped |

### 💰 The ROI of Switching to VelesDB

| Metric | Before (Cloud Stack) | After (VelesDB) | Savings |

|--------|---------------------|-----------------|---------|

| **Infrastructure** | 3 databases + sync | 1 binary | **70% less code** |

| **Cloud costs** | $500-5000/mo | $0 (local) | **100% savings** |

| **Latency** | 100-300ms | < 1ms | **100x faster** |

| **Compliance** | Complex (data leaves premises) | Simple (local-first) | **HIPAA/GDPR ready** |

| **Dev time** | 3 integrations to maintain | 1 API | **3x faster shipping** |

---

## 🏆 Why Developers Choose VelesDB

🧠 Vector + Graph + Columns

Unified semantic search, relationships, AND structured data.
No glue code needed.


⚡ 57µs Search

Native HNSW + AVX-512 SIMD.
1000x faster than cloud.


📦 15MB Binary

Zero dependencies.
Works offline, air-gapped.


🌍 Run Anywhere

Server, Browser, Mobile, Desktop.
Same Rust codebase.


---

## 🏢 Coming From Another Vector DB?

| If you use... | VelesDB gives you... |

|---------------|----------------------|

| **Pinecone** | No API keys, no cloud costs, **100x faster locally**, + Graph + Columns |

| **Qdrant** | Single binary (15MB vs 100MB+), native WASM/Mobile, **unified Vector+Graph** |

| **Milvus** | Zero config vs complex cluster setup, **embedded mode** |

| **pgvector** | Purpose-built for vectors, **700x faster search**, native graph support |

| **ChromaDB** | Production-grade Rust vs Python prototype, **enterprise-ready** |

| **Neo4j + Pinecone** | **One database instead of two**, unified query language |

```sql

-- The query that defines VelesDB: Vector + Graph in ONE statement

MATCH (doc:Document)-[:AUTHORED_BY]->(author:Person)

WHERE similarity(doc.embedding, $question) > 0.8

  AND doc.category = 'research'

RETURN author.name, author.email, doc.title

ORDER BY similarity() DESC

LIMIT 5;

```

**This query would require 2 databases and complex sync logic elsewhere. With VelesDB: one query, sub-millisecond response.**

---

## 🌍 Full Ecosystem / Écosystème Complet

VelesDB is designed to run **where your agents live** — from cloud servers to mobile devices to browsers.

| Domain      | Component                          | Description                              | Install                     |

|-------------|------------------------------------|------------------------------------------|----------------------------|

| **🦀 Core** | [velesdb-core](crates/velesdb-core) | Core engine (HNSW, SIMD, VelesQL)        | `cargo add velesdb-core`   |

| **🌐 Server**| [velesdb-server](crates/velesdb-server) | REST API (11 endpoints, OpenAPI)         | `cargo install velesdb-server` |

| **💻 CLI**  | [velesdb-cli](crates/velesdb-cli)   | Interactive REPL for VelesQL             | `cargo install velesdb-cli` |

| **🐍 Python** | [velesdb-python](crates/velesdb-python) | PyO3 bindings + NumPy                    | `pip install velesdb`      |

| **📜 TypeScript** | [typescript-sdk](sdks/typescript) | Node.js & Browser SDK                    | `npm i @wiscale/velesdb`   |

| **🌍 WASM** | [velesdb-wasm](crates/velesdb-wasm) | Browser-side vector search               | `npm i @wiscale/velesdb-wasm` |

| **📱 Mobile** | [velesdb-mobile](crates/velesdb-mobile) | iOS (Swift) & Android (Kotlin)           | [Build instructions](#-mobile-build) |

| **🖥️ Desktop** | [tauri-plugin](crates/tauri-plugin-velesdb) | Tauri v2 AI-powered apps               | `cargo add tauri-plugin-velesdb` |

| **🦜 LangChain** | [langchain-velesdb](integrations/langchain) | Official VectorStore                   | `pip install langchain-velesdb` |

| **🦙 LlamaIndex** | [llamaindex-velesdb](integrations/llamaindex) | Document indexing                     | `pip install llama-index-vector-stores-velesdb` |

| **🔄 Migration** | [velesdb-migrate](crates/velesdb-migrate) | From Qdrant, Pinecone, Supabase        | `cargo install velesdb-migrate` |

---

## 🎯 Use Cases

| Use Case                      | VelesDB Feature                     |

|-------------------------------|-------------------------------------|

| **RAG Pipelines**             | Sub-ms retrieval                    |

| **AI Agents**                 | Embedded memory, local context      |

| **Desktop Apps (Tauri/Electron)** | Single binary, no server needed     |

| **Mobile AI (iOS/Android)**   | Native SDKs with 32x memory compression |

| **Browser-side Search**       | WASM module, zero backend           |

| **Edge/IoT Devices**          | 15MB footprint, ARM NEON optimized  |

| **On-Prem / Air-Gapped**      | No cloud dependency, full data sovereignty |

---

## 🚀 Quick Start

### Option 1: Linux Package (.deb) ⭐ Recommended for Linux

Download from [GitHub Releases](https://github.com/cyberlife-coder/VelesDB/releases):

```bash

# Install

sudo dpkg -i velesdb-1.1.0-amd64.deb

# Binaries installed to /usr/bin

velesdb --version

velesdb-server --version

```

### Option 2: One-liner Script

**Linux / macOS:**

```bash

curl -fsSL https://raw.githubusercontent.com/cyberlife-coder/VelesDB/main/scripts/install.sh | bash

```

**Windows (PowerShell):**

```powershell

irm https://raw.githubusercontent.com/cyberlife-coder/VelesDB/main/scripts/install.ps1 | iex

```

### Option 3: Python (from source)

```bash

# Build from source (requires Rust)

cd crates/velesdb-python

pip install maturin

maturin develop --release

```

```python

import velesdb

db = velesdb.Database("./my_vectors")

collection = db.create_collection("docs", dimension=768, metric="cosine")

collection.upsert([{"id": 1, "vector": [...], "payload": {"title": "Hello"}}])

results = collection.search([...], top_k=10)

```

```bash

# Install from PyPI

pip install velesdb

```

### Option 4: Rust (from source)

```bash

# Clone and build

git clone https://github.com/cyberlife-coder/VelesDB.git

cd VelesDB

cargo build --release

# Binaries in target/release/

./target/release/velesdb-server --help

```

```bash

# Install from crates.io

cargo install velesdb-cli

```

### Option 5: Docker (build locally)

```bash

# Build and run locally

git clone https://github.com/cyberlife-coder/VelesDB.git

cd VelesDB

docker build -t velesdb .

docker run -d -p 8080:8080 -v velesdb_data:/data velesdb

```

```bash

# Pull from GitHub Container Registry

docker pull ghcr.io/cyberlife-coder/velesdb:latest

```

### Option 6: Portable Archives

Download from [GitHub Releases](https://github.com/cyberlife-coder/VelesDB/releases):

| Platform | File |

|----------|------|

| Windows | `velesdb-windows-x86_64.zip` |

| Linux | `velesdb-linux-x86_64.tar.gz` |

| macOS (ARM) | `velesdb-macos-arm64.tar.gz` |

| macOS (Intel) | `velesdb-macos-x86_64.tar.gz` |

### Start Using VelesDB

```bash

# Start the REST API server (data persisted in ./data)

velesdb-server --data-dir ./my_data

# Or use the interactive CLI with VelesQL REPL

velesdb repl

# Verify server is running

curl http://localhost:8080/health

# {"status":"healthy","version":"1.1.0"}

```

📖 **Full installation guide:** [docs/INSTALLATION.md](docs/INSTALLATION.md)



### 📱 Mobile Build (iOS/Android)

```bash

# iOS (macOS required)

rustup target add aarch64-apple-ios aarch64-apple-ios-sim

cargo build --release --target aarch64-apple-ios -p velesdb-mobile

# Android (NDK required)

cargo install cargo-ndk

cargo ndk -t arm64-v8a -t armeabi-v7a build --release -p velesdb-mobile

```

📖 **Full mobile guide:** [crates/velesdb-mobile/README.md](crates/velesdb-mobile/README.md)

---

## 📖 Your First Vector Search

```bash

# 1. Create a collection

curl -X POST http://localhost:8080/collections \

  -H "Content-Type: application/json" \

  -d '{"name": "my_vectors", "dimension": 4, "metric": "cosine"}'

# 2. Insert vectors with metadata

curl -X POST http://localhost:8080/collections/my_vectors/points \

  -H "Content-Type: application/json" \

  -d '{

    "points": [

      {"id": 1, "vector": [1.0, 0.0, 0.0, 0.0], "payload": {"title": "AI Introduction", "category": "tech"}},

      {"id": 2, "vector": [0.0, 1.0, 0.0, 0.0], "payload": {"title": "ML Basics", "category": "tech"}},

      {"id": 3, "vector": [0.0, 0.0, 1.0, 0.0], "payload": {"title": "History of Computing", "category": "history"}}

    ]

  }'

# 3. Search for similar vectors

curl -X POST http://localhost:8080/collections/my_vectors/search \

  -H "Content-Type: application/json" \

  -d '{"vector": [0.9, 0.1, 0.0, 0.0], "top_k": 2}'

# 4. Or use VelesQL (SQL-like queries)

curl -X POST http://localhost:8080/query \

  -H "Content-Type: application/json" \

  -d '{

    "query": "SELECT * FROM my_vectors WHERE vector NEAR $v AND category = '\''tech'\'' LIMIT 5",

    "params": {"v": [0.9, 0.1, 0.0, 0.0]}

  }'

```

---

## 🔌 API Reference

### Collections

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/collections` | `GET` | List all collections |

| `/collections` | `POST` | Create a collection |

| `/collections/{name}` | `GET` | Get collection info |

| `/collections/{name}` | `DELETE` | Delete a collection |

### Points

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/collections/{name}/points` | `POST` | Upsert points |

| `/collections/{name}/points/{id}` | `GET` | Get a point by ID |

| `/collections/{name}/points/{id}` | `DELETE` | Delete a point |

### Search (Vector)

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/collections/{name}/search` | `POST` | Vector similarity search |

| `/collections/{name}/search/batch` | `POST` | Batch search (multiple queries) |

| `/collections/{name}/search/multi` | `POST` | Multi-query search |

### Graph

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/collections/{name}/graph/edges` | `GET` | Get edges for a node |

| `/collections/{name}/graph/edges` | `POST` | Add edge between nodes |

| `/collections/{name}/graph/traverse` | `POST` | BFS/DFS graph traversal |

| `/collections/{name}/graph/nodes/{node_id}/degree` | `GET` | Get node degree (in/out) |

### Indexes

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/collections/{name}/indexes` | `GET` | List indexes |

| `/collections/{name}/indexes` | `POST` | Create index on property |

| `/collections/{name}/indexes/{label}/{property}` | `DELETE` | Delete index |

### VelesQL v2.0 (Unified Query)

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/query` | `POST` | Execute VelesQL (Vector + Graph + ColumnStore queries) |

**VelesQL v2.0 Features:**

- `GROUP BY` / `HAVING` with AND/OR operators

- `ORDER BY` multi-column + `similarity()` function

- `JOIN` with aliases across collections

- `UNION` / `INTERSECT` / `EXCEPT` set operations

- `USING FUSION(strategy='rrf')` hybrid search

- `WITH (max_groups=100)` query-time config

```sql

-- Example: Analytics with aggregation

SELECT category, COUNT(*), AVG(price) FROM products 

GROUP BY category HAVING COUNT(*) > 5

-- Example: Hybrid search with fusion

SELECT * FROM docs USING FUSION(strategy='rrf', k=60) LIMIT 20

-- Example: Set operations

SELECT * FROM active UNION SELECT * FROM archived

```

> **Note:** ColumnStore operations (INSERT, UPDATE, SELECT on structured data) are performed via the `/query` endpoint using VelesQL syntax.

### Health

| Endpoint | Method | Description |

|----------|--------|-------------|

| `/health` | `GET` | Health check |

### Request/Response Examples

Create Collection

```bash

curl -X POST http://localhost:8080/collections \

  -H "Content-Type: application/json" \

  -d '{

    "name": "my_vectors",

    "dimension": 768,

    "metric": "cosine"  # Options: cosine, euclidean, dot, hamming, jaccard

  }'

```

**Response:**

```json

{"message": "Collection created", "name": "my_vectors"}

```

Upsert Points

```bash

curl -X POST http://localhost:8080/collections/my_vectors/points \

  -H "Content-Type: application/json" \

  -d '{

    "points": [

      {

        "id": 1,

        "vector": [0.1, 0.2, 0.3, ...],

        "payload": {"title": "Document 1", "tags": ["ai", "ml"]}

      }

    ]

  }'

```

**Response:**

```json

{"message": "Points upserted", "count": 1}

```

Vector Search

```bash

curl -X POST http://localhost:8080/collections/my_vectors/search \

  -H "Content-Type: application/json" \

  -d '{

    "vector": [0.1, 0.2, 0.3, ...],

    "top_k": 10

  }'

```

**Response:**

```json

{

  "results": [

    {"id": 1, "score": 0.95, "payload": {"title": "Document 1"}},

    {"id": 42, "score": 0.87, "payload": {"title": "Document 42"}}

  ]

}

```

Batch Search

```bash

curl -X POST http://localhost:8080/collections/my_vectors/search/batch \

  -H "Content-Type: application/json" \

  -d '{

    "searches": [

      {"vector": [0.1, 0.2, ...], "top_k": 5},

      {"vector": [0.3, 0.4, ...], "top_k": 5}

    ]

  }'

```

**Response:**

```json

{

  "results": [

    {"results": [{"id": 1, "score": 0.95, "payload": {...}}]},

    {"results": [{"id": 2, "score": 0.89, "payload": {...}}]}

  ],

  "timing_ms": 1.23

}

```

VelesQL Query

```bash

curl -X POST http://localhost:8080/query \

  -H "Content-Type: application/json" \

  -d '{

    "query": "SELECT * FROM my_vectors WHERE vector NEAR $v LIMIT 10",

    "params": {"v": [0.1, 0.2, 0.3, ...]}

  }'

```

**Response:**

```json

{

  "results": [

    {"id": 1, "score": 0.95, "payload": {"title": "Document 1"}}

  ],

  "timing_ms": 2.34,

  "rows_returned": 1

}

```

---

## 🧪 Real-World Business Scenarios

> **Each scenario shows a business problem that traditionally requires 2-3 databases. VelesDB solves it with ONE query.**

---

### 💼 Business Scenario 1: E-commerce Product Discovery

**Industry:** Retail / E-commerce  

**Problem:** "Show me products similar to this photo, from trusted suppliers, under $500"

```sql

-- Traditional approach: Pinecone (image search) + Neo4j (supplier trust) + PostgreSQL (price)

-- VelesDB: ONE query

MATCH (product:Product)-[:SUPPLIED_BY]->(supplier:Supplier)

WHERE 

  similarity(product.image_embedding, $uploaded_photo) > 0.7  -- Vector: visual similarity

  AND supplier.trust_score > 4.5                               -- Graph: relationship data

  AND (SELECT price FROM inventory WHERE sku = product.sku) < 500  -- Column: real-time price

ORDER BY similarity() DESC

LIMIT 12

```

**Business Impact:**

| Metric | Before | After VelesDB |

|--------|--------|---------------|

| Query latency | 350ms (3 DBs) | **2ms** |

| Infrastructure | $2,400/mo | **$0** (local) |

| Dev complexity | 3 integrations | **1 API** |

---

### 💼 Business Scenario 2: Fraud Detection in Real-Time

**Industry:** Banking / FinTech  

**Problem:** "Flag transactions that look suspicious based on pattern + network + history"

```sql

-- Detect fraud: semantic pattern + transaction network + account history

MATCH (tx:Transaction)-[:FROM]->(account:Account)-[:LINKED_TO*1..3]->(related:Account)

WHERE 

  similarity(tx.behavior_embedding, $known_fraud_pattern) > 0.6  -- Vector: behavioral similarity

  AND related.risk_level = 'high'                                 -- Graph: network analysis

  AND (SELECT SUM(amount) FROM transactions 

       WHERE account_id = account.id 

       AND timestamp > NOW() - INTERVAL '24 hours') > 10000       -- Column: velocity check

RETURN tx.id, account.id, similarity() as fraud_score

```

**Business Impact:**

| Metric | Before | After VelesDB |

|--------|--------|---------------|

| Detection time | 2-5 seconds | **< 10ms** |

| False positives | 15% | **8%** (better context) |

| Compliance | Cloud concerns | **On-premise OK** |

---

### 💼 Business Scenario 3: Healthcare Diagnosis Assistant

**Industry:** Healthcare / MedTech  

**Problem:** "Find similar patient cases with treatment outcomes, HIPAA-compliant"

```sql

-- Medical RAG: symptoms + patient network + treatment history

MATCH (patient:Patient)-[:HAS_CONDITION]->(condition:Condition)

      -[:TREATED_WITH]->(treatment:Treatment)

WHERE 

  similarity(condition.symptoms_embedding, $current_symptoms) > 0.75  -- Vector: symptom matching

  AND condition.icd10_code IN ('J18.9', 'J12.89')                     -- Column: specific diagnoses

  AND (SELECT success_rate FROM treatment_outcomes 

       WHERE treatment_id = treatment.id) > 0.8                       -- Column: outcome data

RETURN treatment.name, AVG(success_rate) as effectiveness

```

**Business Impact:**

| Metric | Before | After VelesDB |

|--------|--------|---------------|

| Data location | Cloud (HIPAA risk) | **100% on-premise** |

| Query time | 500ms+ | **< 5ms** |

| Integration | 3 vendors | **1 binary** |

---

### 💼 Business Scenario 4: AI Agent Memory (RAG + Context)

**Industry:** AI / SaaS  

**Problem:** "My AI agent needs conversation history + knowledge base + user preferences"

```sql

-- Agent memory: semantic recall + conversation graph + user context

MATCH (user:User)-[:HAD_CONVERSATION]->(conv:Conversation)

      -[:CONTAINS]->(message:Message)

WHERE 

  similarity(message.embedding, $current_query) > 0.7     -- Vector: relevant past messages

  AND conv.timestamp > NOW() - INTERVAL '7 days'          -- Column: recent conversations

  AND (SELECT preference_value FROM user_preferences 

       WHERE user_id = user.id AND key = 'topic') = message.topic  -- Column: user prefs

ORDER BY conv.timestamp DESC, similarity() DESC

LIMIT 10

```

**Business Impact:**

| Metric | Before | After VelesDB |

|--------|--------|---------------|

| Context retrieval | 100-200ms | **< 1ms** |

| Memory footprint | 500MB+ | **15MB binary** |

| Works offline | ❌ | **✅** |

---

### Scenario 0: Technical Deep-Dive (Vector + Graph + ColumnStore)

**Goal:** Demonstrate the power of cross-model queries - finding semantically similar documents through graph relationships with structured data filtering

```sql

-- 🔮 The VelesDB Advantage: One query across all three stores

MATCH (doc:Document)-[:AUTHORED_BY]->(author:Author)

WHERE 

  similarity(doc.embedding, $research_question) > 0.8   -- Vector: semantic search

  AND doc.category = 'peer-reviewed'                     -- Column: structured filter

  AND (SELECT citation_count FROM author_metrics         -- Column: subquery

       WHERE author_id = author.id) > 50

ORDER BY similarity() DESC

LIMIT 5

```

**What's happening:**

1. **Graph traversal**: `MATCH` finds document→author relationships

2. **Vector search**: `similarity()` ranks by semantic relevance to your question

3. **Columnar filter**: `category = 'peer-reviewed'` filters structured metadata

4. **Columnar subquery**: Joins with `author_metrics` table for citation counts

**Expected Output:**

```json

{

  "results": [

    {

      "doc.title": "Neural Memory Consolidation in AI Agents",

      "author.name": "Dr. Sarah Chen",

      "similarity": 0.94,

      "citation_count": 127

    }

  ],

  "timing_ms": 0.8

}

```

**Why this matters:** This query would require 3 separate databases and complex synchronization logic in a traditional stack. With VelesDB: **one query, sub-millisecond response**.

---

### Scenario 0b: Multi-Vector Fusion Search (NEAR_FUSED)

**Goal:** Search using multiple query vectors simultaneously with intelligent result fusion

```sql

-- 🔮 Multi-modal search: combine text + image embeddings

SELECT * FROM products 

WHERE vector NEAR_FUSED [$text_embedding, $image_embedding] 

  USING FUSION 'rrf' (k=60)

  AND category = 'electronics'

ORDER BY similarity() DESC

LIMIT 10

```

**Fusion Strategies Available:**

| Strategy | Syntax | Best For |

|----------|--------|----------|

| **RRF** | `USING FUSION 'rrf' (k=60)` | Robust rank-based fusion (recommended) |

| **Average** | `USING FUSION 'average'` | General purpose, balanced results |

| **Maximum** | `USING FUSION 'maximum'` | Emphasize documents scoring high in ANY query |

| **Weighted** | `USING FUSION 'weighted' (avg_weight=0.5, max_weight=0.3, hit_weight=0.2)` | Custom control over fusion factors |

**Real-World Use Cases:**

```sql

-- E-commerce: "show me products like this photo that match 'wireless headphones'"

SELECT * FROM products 

WHERE vector NEAR_FUSED [$image_vector, $text_vector] 

  USING FUSION 'weighted' (avg_weight=0.6, max_weight=0.3, hit_weight=0.1)

LIMIT 8

-- RAG: Multi-perspective document retrieval

SELECT * FROM documents 

WHERE vector NEAR_FUSED [$question_embedding, $context_embedding, $user_profile_embedding]

  USING FUSION 'rrf'

LIMIT 5

-- Semantic + Lexical hybrid (BM25 + Vector)

SELECT * FROM articles

WHERE content MATCH 'artificial intelligence'

  AND vector NEAR $semantic_embedding

ORDER BY similarity() DESC

LIMIT 10

```

**Expected Output:**

```json

{

  "results": [

    {"id": 42, "score": 0.91, "fusion_details": {"rrf_rank": 1, "sources": 2}},

    {"id": 17, "score": 0.87, "fusion_details": {"rrf_rank": 2, "sources": 2}}

  ],

  "timing_ms": 1.2

}

```

---

### Scenario 0c: Distance Metrics for Every Use Case

**Goal:** Choose the right metric for your data type and domain

VelesDB supports **5 distance metrics** - each optimized for specific use cases:

| Metric | Best For | Example Domain |

|--------|----------|----------------|

| **Cosine** | Text embeddings, normalized vectors | NLP, semantic search |

| **Euclidean** | Spatial data, absolute distances | Geolocation, clustering |

| **DotProduct** | Pre-normalized embeddings, retrieval | RAG, recommendations |

| **Hamming** | Binary vectors, fingerprints | Image hashing, DNA |

| **Jaccard** | Set similarity, sparse features | Tags, categories |

**1. Cosine Similarity (NLP / Semantic Search)**

```bash

# Create collection with cosine metric

curl -X POST http://localhost:8080/collections \

  -d '{"name": "documents", "dimension": 768, "metric": "cosine"}'

```

```sql

-- Find semantically similar documents (angle-based, ignores magnitude)

SELECT * FROM documents 

WHERE vector NEAR $query_embedding

ORDER BY similarity() DESC

LIMIT 10

```

**Use case:** ChatGPT-style RAG, document similarity, semantic Q&A

---

**2. Euclidean Distance (Spatial / Clustering)**

```bash

curl -X POST http://localhost:8080/collections \

  -d '{"name": "locations", "dimension": 3, "metric": "euclidean"}'

```

```sql

-- Find nearest physical locations (absolute distance matters)

SELECT * FROM locations 

WHERE vector NEAR $gps_coordinates

  AND category = 'restaurant'

ORDER BY similarity() ASC  -- Lower = closer

LIMIT 5

```

**Use case:** Geospatial search, k-means clustering, anomaly detection

---

**3. Dot Product (RAG / Recommendations)**

```bash

curl -X POST http://localhost:8080/collections \

  -d '{"name": "products", "dimension": 512, "metric": "dot"}'

```

```sql

-- Maximize relevance score (pre-normalized embeddings)

SELECT * FROM products 

WHERE vector NEAR $user_preference_vector

  AND in_stock = true

ORDER BY similarity() DESC

LIMIT 8

```

**Use case:** Recommendation engines, MaxIP retrieval, MIPS problems

---

**4. Hamming Distance (Binary Vectors / Fingerprints)**

```bash

curl -X POST http://localhost:8080/collections \

  -d '{"name": "image_hashes", "dimension": 256, "metric": "hamming"}'

```

```sql

-- Find near-duplicate images (bit-level comparison, 6ns latency!)

SELECT * FROM image_hashes 

WHERE vector NEAR $perceptual_hash

  AND source = 'user_uploads'

ORDER BY similarity() ASC  -- Fewer bit differences = more similar

LIMIT 10

```

**Use case:** Image deduplication, perceptual hashing, DNA sequence matching, malware signatures

---

**5. Jaccard Similarity (Sets / Sparse Features)**

```bash

curl -X POST http://localhost:8080/collections \

  -d '{"name": "user_tags", "dimension": 100, "metric": "jaccard"}'

```

```sql

-- Find users with similar interests (set overlap)

SELECT * FROM user_tags 

WHERE vector NEAR $current_user_tags

ORDER BY similarity() DESC

LIMIT 20

```

**Use case:** Tag-based recommendations, category matching, collaborative filtering

---

**Performance by Metric (768D vectors):**

| Metric | Latency | Throughput | SIMD Optimized |

|--------|---------|------------|----------------|

| **Cosine** | 78 ns | 13M ops/sec | ✅ AVX-512 |

| **Euclidean** | 44 ns | 23M ops/sec | ✅ AVX-512 |

| **DotProduct** | 66 ns | 15M ops/sec | ✅ AVX-512 |

| **Hamming** | **6 ns** | **164M ops/sec** | ✅ POPCNT |

| **Jaccard** | 89 ns | 11M ops/sec | ✅ AVX2 |

> **Tip:** Hamming is 10x faster than float metrics - ideal for binary embeddings on edge devices!

---

### Scenario 1: Medical Research Assistant

**Goal:** Find recent oncology studies with specific gene mentions, ordered by relevance

```sql

SELECT study_id, title, publication_date 

FROM medical_studies 

WHERE 

  vector NEAR $cancer_research_embedding 

  AND content LIKE '%BRCA1%' 

  AND publication_date > '2025-01-01'

ORDER BY similarity() DESC 

LIMIT 5

```

**Parameters:**

- `$cancer_research_embedding`: [0.23, 0.87, -0.12, ...] (embedding for "advanced cancer immunotherapy")

**Expected Output:**

```json

{

  "results": [

    {

      "study_id": "onco-2025-042", 

      "title": "BRCA1 Mutations in Immunotherapy Response",

      "publication_date": "2025-03-15",

      "score": 0.92

    },

    {

      "study_id": "onco-2025-017",

      "title": "Gene Editing Approaches for Metastatic Cancer",

      "publication_date": "2025-02-28",

      "score": 0.87

    }

  ]

}

```

---

### Scenario 2: E-commerce Recommendation Engine

**Goal:** Recommend products similar to a user's purchase history, within their price range

```sql

SELECT product_id, name, price 

FROM products 

WHERE 

  vector NEAR $user_preferences 

  AND price BETWEEN 20.00 AND 100.00 

  AND category = 'electronics'

ORDER BY similarity() DESC, price ASC 

LIMIT 8

```

**Parameters:**

- `$user_preferences`: [0.78, -0.23, 0.45, ...] (embedding based on user's purchase history)

**Expected Output:**

```json

{

  "results": [

    {

      "product_id": "prod-67890",

      "name": "Wireless Noise-Cancelling Headphones",

      "price": 89.99,

      "score": 0.95

    },

    {

      "product_id": "prod-54321",

      "name": "Bluetooth Portable Speaker",

      "price": 59.99,

      "score": 0.91

    }

  ]

}

```

---

### Scenario 3: Cybersecurity Threat Detection

**Goal:** Find similar malware patterns observed in the last 7 days

```sql

SELECT malware_hash, threat_level, first_seen 

FROM threat_intel 

WHERE 

  vector NEAR $current_threat_embedding 

  AND first_seen > NOW() - INTERVAL '7 days'

  AND threat_level > 0.8

ORDER BY similarity() DESC, first_seen DESC

LIMIT 10

```

**Parameters:**

- `$current_threat_embedding`: [0.12, -0.87, 0.34, ...] (embedding of current malware signature)

**Troubleshooting Tip:** If no results appear, verify:

1. Threat intelligence feed is updating daily

2. Vector dimensions match collection configuration

3. Timestamp format matches ISO 8601 (YYYY-MM-DD HH:MM:SS)

---

## ⚡ Performance

### 🔥 Core Vector Operations (768D - BERT/OpenAI dimensions)

| Operation | Latency | Throughput | vs. Naive |

|-----------|---------|------------|----------|

| **Dot Product (1536D)** | **66 ns** | **15M ops/sec** | 🚀 **8x faster** |

| **Euclidean (768D)** | **44 ns** | **23M ops/sec** | 🚀 **6x faster** |

| **Cosine (768D)** | **78 ns** | **13M ops/sec** | 🚀 **4x faster** |

| **Hamming (Binary)**| **6 ns** | **164M ops/sec** | 🚀 **10x faster** |

### 📊 System Performance (10K Vectors, Local)

| Benchmark | Result | Details |

|-----------|--------|---------|

| **HNSW Search** | **57 µs** | p50 latency (Cosine) |

| **VelesQL Parsing**| **554 ns** | Simple SELECT |

| **VelesQL Cache Hit**| **48 ns** | HashMap pre-allocation |

| **Recall@10** | **100%** | Perfect mode (brute-force SIMD) |

| **BM25 Search** | **33 µs** | Adaptive PostingList (10K docs) |

### 🎯 Search Quality (Recall)

| Mode | Recall@10 | Latency (128D) | Use Case |

|------|-----------|----------------|----------|

| Fast | 92.2% | ~26µs | Real-time, high throughput |

| Balanced | 98.8% | ~39µs | Production recommended |

| Accurate | 100% | ~67µs | High precision |

| **Perfect** | **100%** | ~220µs | Brute-force SIMD |

### 🛠️ Optimizations Under the Hood

- **SIMD**: AVX-512/AVX2 auto-detection with 32-wide FMA

- **Prefetch**: CPU cache warming for HNSW traversal (+12% throughput)

- **Contiguous Layout**: 64-byte aligned memory for cache efficiency

- **Batch WAL**: Single disk write per batch import

- **Zero-Copy**: Memory-mapped files for fast startup

> 📊 Full benchmarks: [docs/BENCHMARKS.md](docs/BENCHMARKS.md)

### 📦 Vector Quantization (Memory Reduction)

Reduce memory usage by **4-32x** with minimal recall loss:

| Method | Compression | Recall Loss | Use Case |

|--------|-------------|-------------|----------|

| **SQ8** (8-bit) | **4x** | < 2% | General purpose, Edge |

| **Binary** (1-bit) | **32x** | ~10-15% | Fingerprints, IoT |

```rust

use velesdb_core::quantization::{QuantizedVector, dot_product_quantized_simd};

// Compress 768D vector: 3072 bytes → 776 bytes (4x reduction)

let quantized = QuantizedVector::from_f32(&embedding);

// SIMD-optimized search (only ~30% slower than f32)

let similarity = dot_product_quantized_simd(&query, &quantized);

```

> 📖 Full guide: [docs/QUANTIZATION.md](docs/QUANTIZATION.md)

---

## 🚀 Transformative Benefits: How VelesDB Changes Development

### ⚡ Eliminates Database Sprawl

VelesDB replaces 3+ specialized databases (vector DB + graph DB + document store) with a **single unified engine**.

```mermaid

graph LR

A[App] --> V[VelesDB]

```

**Impact:**

- ✅ **70% less infrastructure code**

- ✅ **No synchronization headaches**

- ✅ **Single query language for all operations**

### 💡 Enables New Application Categories

With air-gapped deployment and 15MB binary size:

```mermaid

pie title Deployment Locations

    "On-Prem Servers" : 35

    "Edge Devices" : 25

    "Mobile Apps" : 20

    "Browser WASM" : 20

```

**Impact:**

- ✅ **Build HIPAA-compliant healthcare apps**

- ✅ **Create military-grade analytics** for air-gapped environments

- ✅ **Enable privacy-first consumer apps** with zero data sharing

### 🚀 Redefines Performance Expectations

| Pipeline Step | Cloud Vector DB | VelesDB |

|---------------|-----------------|---------|

| Network round-trip | 50-100ms | **0ms** (local) |

| Vector search | 10-50ms | **0.057ms** |

| Graph traversal | 20-100ms | **0.1ms** |

| **Total latency** | **100-250ms** | **< 1ms** |

> 💡 **100x faster** enables use cases impossible with cloud: real-time autocomplete, instant RAG, sub-frame game AI

**Impact:**

- ✅ **Build real-time AI agents** that respond faster than human perception

- ✅ **Enable complex RAG chains** with 10+ sequential retrievals

- ✅ **Create instant search experiences** with no loading spinners

### 💼 Unified API Simplifies Development

One consistent API across all platforms:

```rust

// Same API everywhere

let results = db.search(query_vector, filters, graph_traversal);

```

**Impact:**

- ✅ **Learn once, deploy everywhere**

- ✅ **Shared codebase** between web, mobile, and desktop

- ✅ **Eliminate platform-specific database code**

---

## ✨ Core Features That Transform Development

| Feature | Technical Capability | Real-World Impact |

|---------|----------------------|-------------------|

| **🧠 Vector + Graph Fusion** | Unified query language for semantic + relationship queries | **Build smarter AI agents** with contextual understanding |

| **⚡ 57µs Search** | Native HNSW + AVX-512 SIMD | **Create real-time experiences** previously impossible |

| **📦 15MB Binary** | Zero dependencies, single executable | **Deploy anywhere** - from servers to edge devices |

| **🔒 Air-Gapped Deployment** | Full functionality without internet | **Meet strict compliance** in healthcare/finance |

| **🌍 Everywhere Runtime** | Consistent API across server/mobile/browser | **Massive code reuse** across platforms |

| **🧠 SQ8 Quantization** | 4x memory reduction | **Run complex AI** on resource-constrained devices |

| **📝 VelesQL** | SQL-like unified query language | **Simplify complex queries** - no DSL learning curve |

---

## 🏆 Real-World Impact Stories

### 🏥 Healthcare Diagnostics Assistant

**Before VelesDB:**

- 300ms latency per query

- Patient data in cloud

- Separate systems for medical knowledge and patient data

**With VelesDB:**

- **0.6ms diagnosis suggestions**

- **On-device patient data**

- **Unified medical knowledge graph**

```mermaid

pie title Performance Improvement

    "Diagnosis Speed" : 85

    "Accuracy" : 10

    "Privacy" : 5

```

### 🏭 Manufacturing Quality Control

**Before VelesDB:**

- Cloud dependency caused production delays

- Separate systems for defect images and part metadata

**With VelesDB:**

```sql

MATCH (part)-[HAS_DEFECT]->(defect)

WHERE defect.vector NEAR $image_vec

AND part.material = 'titanium'

```

- **50% fewer defective shipments**

- **Offline factory floor operation**

- **Unified defect database**

---

## 🤝 Contributing

We welcome contributions! Here's how to get started:

### Development Setup

```bash

# Clone the repo

git clone https://github.com/cyberlife-coder/VelesDB.git

cd VelesDB

# Run tests

cargo test --all-features

# Run with checks (before committing)

cargo fmt --all

cargo clippy --all-targets --all-features -- -D warnings

```

### Project Structure

```

VelesDB/

├── crates/

│   ├── velesdb-core/     # Core engine library

│   │   ├── src/

│   │   │   ├── collection/   # Collection management

│   │   │   ├── index/        # HNSW index

│   │   │   ├── storage/      # Persistence layer

│   │   │   ├── velesql/      # Query language parser

│   │   │   └── simd/         # SIMD optimizations

│   │   └── tests/

│   ├── velesdb-server/   # REST API server

│   ├── velesdb-mobile/   # iOS/Android bindings (UniFFI)

│   ├── velesdb-wasm/     # WebAssembly module

│   └── velesdb-python/   # Python bindings (PyO3)

├── benches/              # Benchmarks

└── docs/                 # Documentation

```

### Good First Issues

Looking for a place to start? Check out issues labeled [`good first issue`](https://github.com/cyberlife-coder/VelesDB/labels/good%20first%20issue).

---

## 📊 Roadmap

```mermaid

gantt

    title VelesDB Development Timeline

    dateFormat YYYY-MM

    section v1.2 ✅

    Knowledge Graph Storage     :done, 2025-10, 2025-12

    VelesQL MATCH Clause        :done, 2025-11, 2025-12

    Vector-Graph Fusion         :done, 2025-12, 2026-01

    ColumnStore CRUD            :done, 2026-01, 2026-01

    VelesQL JOIN Cross-Store    :done, 2026-01, 2026-01

    section v1.3 🔄

    Aggregations VelesQL        :active, 2026-02, 2026-03

    Agent Memory Patterns       :2026-02, 2026-04

    Documentation               :2026-02, 2026-03

    section v1.4 📋

    E2E Test Suite              :2026-04, 2026-05

    Durability & Recovery       :2026-04, 2026-06

```

### Progress Overview

| Version | Status | EPICs Done | Progress |

|---------|--------|------------|----------|

| **v1.2.0** | ✅ Released | 15/15 | ![100%](https://progress-bar.xyz/100?title=Complete) |

| **v1.3.0** | 🔄 In Progress | 0/6 | ![4%](https://progress-bar.xyz/4?title=Building) |

| **v1.4.0** | 📋 Planned | 0/5 | ![0%](https://progress-bar.xyz/0?title=Planned) |

---

### v1.2.0 ✅ Released (January 2026)

15 EPICs Completed - Click to expand

| EPIC | Feature | Impact |

|------|---------|--------|

| EPIC-001 | ✅ Code Quality Refactoring | Clean architecture |

| EPIC-002 | ✅ GPU Acceleration (wgpu) | 10x throughput |

| EPIC-003 | ✅ PyO3 Migration | Python 3.12+ support |

| EPIC-004 | ✅ Knowledge Graph Storage | GraphNode, GraphEdge, BFS |

| EPIC-005 | ✅ VelesQL MATCH Clause | Cypher-inspired queries |

| EPIC-006 | ✅ Agent Toolkit SDK | Python, WASM, Mobile |

| EPIC-007 | ✅ Python Bindings Refactoring | Clean API |

| EPIC-008 | ✅ Vector-Graph Fusion | `similarity()` in MATCH |

| EPIC-009 | ✅ Graph Property Index | 10x faster MATCH |

| EPIC-019 | ✅ Scalability 10M entries | Enterprise datasets |

| EPIC-020 | ✅ ColumnStore CRUD | Real-time updates |

| EPIC-021 | ✅ VelesQL JOIN Cross-Store | Graph ↔ Table queries |

| EPIC-028 | ✅ ORDER BY Multi-Columns | Complex sorting |

| EPIC-029 | ✅ Python SDK Core Delegation | DRY bindings |

| EPIC-031 | ✅ Multimodel Query Engine | Unified execution |

**Highlights:**

- 🧠 **Knowledge Graph** - Full MATCH clause with BFS traversal

- 🔮 **Vector-Graph Fusion** - `WHERE similarity() > 0.8` in graph queries

- 📊 **ColumnStore** - Real-time CRUD with JOIN support

- 📦 **Published** - crates.io, PyPI, npm

---

### v1.3.0 🔄 In Progress (Q1 2026)

| EPIC | Feature | Priority | Progress |

|------|---------|----------|----------|

| EPIC-016 | **SDK Ecosystem Sync** | 🔥 Critical | 🔄 21% (3/14 US) |

| EPIC-017 | **Aggregations VelesQL** | 🔥 Critical | 📋 0% |

| EPIC-010 | **Agent Memory Patterns SDK** | 🚀 High | 📋 0% |

| EPIC-018 | **Documentation & Examples** | 🚀 High | 📋 0% |

| EPIC-012 | **TypeScript SDK** | 📋 Medium | 📋 0% |

| EPIC-013 | **LangChain/LlamaIndex** | 📋 Medium | 📋 0% |

**Coming Soon:**

```sql

-- Aggregations (EPIC-017)

SELECT category, COUNT(*), AVG(price) 

FROM products 

GROUP BY category

HAVING COUNT(*) > 10

-- Agent Memory Patterns (EPIC-010)  

INSERT INTO agent_memory (episode, embedding, context)

VALUES ('task_123', $vec, '{"goal": "find documents"}')

```

---

### v1.4.0 📋 Planned (Q2 2026)

| EPIC | Feature | Focus |

|------|---------|-------|

| EPIC-011 | **E2E Test Suite** | Quality assurance |

| EPIC-024 | **Durability & Crash Recovery** | Database-grade reliability |

| EPIC-022 | **Unsafe Auditability** | Security audit |

| EPIC-023 | **Loom Concurrency Proofs** | Thread safety |

| EPIC-025 | **Miri/Fuzzing** | Memory safety |

---

### Future Vision

```mermaid

pie title VelesDB Feature Distribution

    "Vector Search" : 30

    "Knowledge Graph" : 25

    "ColumnStore" : 20

    "Query Engine" : 15

    "SDKs & Integrations" : 10

```

| Horizon | Features |

|---------|----------|

| **2026 H2** | Sparse vectors, Product Quantization (PQ) |

| **2027** | Distributed mode (Premium), Cluster HA |

| **Beyond** | Agent Hooks & Triggers, Multi-tenancy |

---

## 📜 License

VelesDB is licensed under the [Elastic License 2.0 (ELv2)](LICENSE).

ELv2 is a source-available license that allows free use, modification, and distribution, with restrictions only on providing the software as a managed service.

---

## ⭐ Support VelesDB

If VelesDB helps your project, here's how you can support us:



  

    

  

  

    

  



### ☕ Buy Me A Coffee

Building VelesDB takes countless hours. If you find it useful, consider supporting development:



  

    

  



### 🏷️ Show You Use VelesDB

Add the badge to your project:

[![Powered by VelesDB](https://img.shields.io/badge/Powered_by-VelesDB-blue?style=flat-square)](https://github.com/cyberlife-coder/VelesDB)

```markdown

[![Powered by VelesDB](https://img.shields.io/badge/Powered_by-VelesDB-blue?style=flat-square)](https://github.com/cyberlife-coder/VelesDB)

```

---



  Built with ❤️ and 🦀 Rust





  Original Author: Julien Lange — WiScale





  ⭐ GitHub •

  📖 Documentation •

  🐛 Issues •

  📦 Releases





  _{Don't forget to ⭐ star the repo if you find VelesDB useful!}