https://github.com/knowusuboaky/vectrixdb
Where vectors come alive - A lightweight, visual-first vector database with embedded ML models. No API keys required.
https://github.com/knowusuboaky/vectrixdb
embeddings graphrag machine-learning onnx python semantic-search similarity-search vector-database
Last synced: about 1 month ago
JSON representation
Where vectors come alive - A lightweight, visual-first vector database with embedded ML models. No API keys required.
- Host: GitHub
- URL: https://github.com/knowusuboaky/vectrixdb
- Owner: knowusuboaky
- License: other
- Created: 2026-01-31T05:47:14.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2026-05-15T18:32:09.000Z (about 1 month ago)
- Last Synced: 2026-05-15T20:58:01.363Z (about 1 month ago)
- Topics: embeddings, graphrag, machine-learning, onnx, python, semantic-search, similarity-search, vector-database
- Language: Python
- Homepage: https://pypi.org/project/vectrixdb/
- Size: 689 KB
- Stars: 2
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# VectrixDB
[](https://opensource.org/licenses/Apache-2.0)
[](https://pypi.org/project/vectrixdb/)
[](https://pypi.org/project/vectrixdb/)
[](https://pepy.tech/project/vectrixdb)
[](https://github.com/knowusuboaky/VectrixDB/issues)
[](mailto:kwadwo.owusuboakye@outlook.com)
**Where vectors come alive.**
A lightweight vector database with embedded ML models, beautiful dashboard, and GraphRAG - no API keys required.
---
## Features
- **4 Search Modes** - Dense, Hybrid, Ultimate, and Graph (GraphRAG)
- **8 Storage Backends** - Memory, SQLite, Lakebase, DeltaLake, CosmosDB, PostgreSQL, OpenSearch, Aurora PostgreSQL
- **Embedded Models** - Works offline with bundled ONNX models
- **Model Selection** - Choose from bundled, HuggingFace, or GitHub release models
- **Document Index** - Hierarchical document storage with chunking
- **Visual Dashboard** - Built-in web UI for managing collections
- **Zero Config** - Just `pip install` and start using
---
## Installation
### From PyPI (Recommended)
```bash
pip install vectrixdb
```
### From GitHub (Latest)
```bash
pip install git+https://github.com/knowusuboaky/VectrixDB.git
```
### Specific Version from GitHub
```bash
pip install git+https://github.com/knowusuboaky/VectrixDB.git@v2.0.0
```
### From Source
```bash
git clone https://github.com/knowusuboaky/VectrixDB.git
cd VectrixDB
pip install -e .
```
### Optional Dependencies
```bash
# HuggingFace sentence-transformers
pip install vectrixdb[hf]
# FastEmbed (lightweight ONNX embeddings)
pip install vectrixdb[fastembed]
# All embedding providers
pip install vectrixdb[embeddings]
# Visualization (UMAP)
pip install vectrixdb[viz]
# Everything
pip install vectrixdb[all]
```
---
## Quick Start
```python
from vectrixdb import Vectrix
db = Vectrix("my_docs")
db.add(["Python is great", "JavaScript powers the web", "Rust is fast"])
results = db.search("programming")
print(results.top.text)
```
---
## Search Modes
VectrixDB offers 4 search modes, each building on the previous:
| Mode | Components | Best For |
|------|------------|----------|
| `dense` | Vector similarity | Fast semantic search |
| `hybrid` | Dense + Sparse + Reranker | Keyword + semantic matching |
| `ultimate` | Hybrid + ColBERT | Maximum accuracy |
| `graph` | Ultimate + Knowledge Graph | Complex reasoning (GraphRAG) |
```python
# Choose your mode
db = Vectrix("docs", mode="dense") # Fastest
db = Vectrix("docs", mode="hybrid") # Balanced
db = Vectrix("docs", mode="ultimate") # Best quality
db = Vectrix("docs", mode="graph") # GraphRAG
```
---
## Model Selection
Customize models for each component. Models load from 3 sources:
### 1. Bundled Models (Offline, No Downloads)
Pre-packaged ONNX models that work without internet (~100MB total):
```python
db = Vectrix(
"docs",
mode="ultimate",
dense_model="e5-small", # 384 dim, 33MB
sparse_model="bm25", # 1MB
reranker_model="L12", # 33MB
late_interaction_model="colbert", # 33MB
)
```
| Component | Alias | Model | Dimension | Size |
|-----------|-------|-------|-----------|------|
| Dense | `e5-small` | intfloat/e5-small-v2 | 384 | 33MB |
| Sparse | `bm25` | BM25 vocabulary | - | 1MB |
| Reranker | `L12` | ms-marco-MiniLM-L12-v2 | - | 33MB |
| ColBERT | `colbert` | answerai-colbert-small-v1 | 128 | 33MB |
### 2. GitHub Release Models (Auto-Downloaded)
Larger models hosted on GitHub releases (downloaded on first use):
```python
db = Vectrix(
"docs",
mode="ultimate",
dense_model="bge-base", # 768 dim, higher quality
sparse_model="bm25",
reranker_model="bge-reranker", # Higher quality
late_interaction_model="colbert-v2",
)
```
| Alias | Model | Dimension | Size |
|-------|-------|-----------|------|
| `bge-base` | BAAI/bge-base-en-v1.5 | 768 | 110MB |
| `bge-small` | BAAI/bge-small-en-v1.5 | 384 | 127MB |
| `bge-reranker` | BAAI/bge-reranker-base | - | 212MB |
| `colbert-v2` | colbert-ir/colbertv2.0 | 128 | 67MB |
| `splade` | SPLADE++ | - | 508MB |
### 3. HuggingFace Models
Use any compatible model from HuggingFace (requires `pip install vectrixdb[hf]`):
```python
db = Vectrix(
"docs",
mode="hybrid",
dense_model="BAAI/bge-large-en-v1.5",
sparse_model="naver/splade-cocondenser-ensembledistil",
reranker_model="cross-encoder/ms-marco-MiniLM-L-12-v2",
)
```
**Compatible models:**
- Dense: `BAAI/bge-large-en-v1.5`, `intfloat/e5-large-v2`, `sentence-transformers/all-mpnet-base-v2`
- Sparse: `naver/splade-cocondenser-ensembledistil`
- Reranker: `cross-encoder/ms-marco-MiniLM-L-12-v2`, `BAAI/bge-reranker-base`
- ColBERT: `jinaai/jina-colbert-v2`, `colbert-ir/colbertv2.0`
---
## Storage Backends
VectrixDB supports 8 storage backends:
| Backend | Type | Persistence | Modes | Best For |
|---------|------|-------------|-------|----------|
| `memory` | In-Memory | No | All | Testing, small datasets |
| `sqlite` | File-based | Yes | All | Local development |
| `lakebase` | PostgreSQL + pgvector | Yes | All | Databricks Lakebase |
| `delta_lake` | Delta Lake | Yes | All | Databricks Unity Catalog |
| `cosmosdb` | Azure CosmosDB | Yes | All | Azure cloud |
| `postgresql` | PostgreSQL + pgvector | Yes | All | Self-hosted PostgreSQL |
| `opensearch` | AWS OpenSearch | Yes | Dense, Hybrid | AWS managed search |
| `aurora_postgresql` | AWS Aurora + pgvector | Yes | All | AWS managed PostgreSQL |
### Memory Storage (Default)
```python
from vectrixdb import VectrixDB, StorageConfig, StorageBackend
# In-memory (default, no persistence)
db = VectrixDB()
# Or explicitly
config = StorageConfig(backend=StorageBackend.MEMORY)
db = VectrixDB(storage_config=config)
```
### SQLite Storage (Local Persistence)
```python
from vectrixdb import VectrixDB
# SQLite with file path
db = VectrixDB(path="./my_vectors")
# Creates: ./my_vectors/vectrix.db
```
### Lakebase Storage (Databricks)
```python
from vectrixdb import Vectrix, VectrixDB
# Connect to Lakebase (PostgreSQL + pgvector)
lakebase = VectrixDB.with_lakebase(
host="your-lakebase-host.cloud.databricks.com",
database="databricks_postgres",
user="your-user",
password="your-oauth-token", # OAuth JWT from Lakebase Connect
port=5432,
schema="public", # Optional, defaults to "public"
)
# Use with Vectrix
db = Vectrix(
"products",
mode="ultimate",
storage_backend=lakebase,
)
db.add(texts=["Product A", "Product B"])
results = db.search("query")
```
### Delta Lake Storage (Databricks Unity Catalog)
```python
from vectrixdb import VectrixDB
# Connect to Delta Lake via Databricks SQL
delta = VectrixDB.with_delta_lake(
workspace_url="https://your-workspace.cloud.databricks.com",
token="dapi_your_token",
catalog="main",
schema="vectrixdb",
warehouse_id="your_warehouse_id",
)
# Use with Vectrix
db = Vectrix("products", mode="hybrid", storage_backend=delta)
```
### CosmosDB Storage (Azure)
```python
from vectrixdb import VectrixDB, StorageConfig, StorageBackend
config = StorageConfig(
backend=StorageBackend.COSMOSDB,
cosmos_endpoint="https://your-account.documents.azure.com:443/",
cosmos_key="your-primary-key",
cosmos_database="vectrixdb",
)
db = VectrixDB(storage_config=config)
```
### OpenSearch Storage (AWS)
AWS OpenSearch Serverless with native k-NN vector search.
> **Note**: OpenSearch supports `dense` and `hybrid` modes only.
```python
from vectrixdb import VectrixDB
opensearch = VectrixDB.with_opensearch(
endpoint="https://xxx.us-east-1.aoss.amazonaws.com",
region="us-east-1",
)
```
### Aurora PostgreSQL Storage (AWS)
AWS Aurora PostgreSQL with pgvector. Supports all modes including `ultimate`.
```python
from vectrixdb import VectrixDB
aurora = VectrixDB.with_aurora_postgresql(
host="cluster.xxx.us-east-1.rds.amazonaws.com",
database="vectrixdb",
user="admin",
password="password",
)
```
### Adaptive Schema
Schema adapts based on selected mode:
| Mode | Columns Created |
|------|-----------------|
| `dense` | `id`, `dense_embedding`, `metadata`, `text_content`, `created_at`, `updated_at` |
| `hybrid` | + `sparse_embedding` |
| `ultimate` | + `late_interaction_embedding` |
| `graph` | Same as ultimate + graph tables |
---
## Document Index
Hierarchical document storage with automatic chunking:
```python
from vectrixdb import DocumentIndex, chunk_text, chunk_with_context
# Create document index
doc_index = DocumentIndex("./docs_index")
# Chunk text (simple)
chunks = chunk_text(
"Your long document text here...",
chunk_size=1000,
chunk_overlap=200,
)
# Chunk markdown with context (preserves headings)
chunks = chunk_with_context(
markdown_text,
chunk_size=1200,
chunk_overlap=200,
)
# Returns: [{"content": "...", "heading": "Section Title", "level": 2}, ...]
# Build tree from markdown
from vectrixdb import build_tree_from_markdown, build_tree_from_pdf
tree = build_tree_from_markdown(markdown_content)
tree = build_tree_from_pdf(pdf_path)
```
### Document Index with Storage Backend
```python
from vectrixdb import DocumentIndex, VectrixDB
# Connect to storage
lakebase = VectrixDB.with_lakebase(...)
# Document index uses storage backend
doc_index = DocumentIndex(storage=lakebase)
# Save documents and nodes
doc_index.save_document({
"doc_id": "doc_001",
"title": "My Document",
"doc_type": "markdown",
"page_count": 5,
})
# Query documents
docs = doc_index.list_documents()
nodes = doc_index.get_document_nodes("doc_001")
```
---
## Metadata & Filtering
```python
db.add(
texts=["iPhone 15", "Galaxy S24", "Pixel 8"],
metadata=[
{"brand": "Apple", "price": 999},
{"brand": "Samsung", "price": 899},
{"brand": "Google", "price": 699}
]
)
# Filter by metadata
results = db.search("smartphone", filter={"brand": "Apple"})
# Complex filters
results = db.search("phone", filter={
"brand": {"$in": ["Apple", "Samsung"]},
"price": {"$lt": 1000}
})
```
---
## Advanced API
For full control, use the `VectrixDB` class directly:
```python
from vectrixdb import VectrixDB, Collection
# Create database
db = VectrixDB(path="./my_db")
# Create collection with specific dimension
coll = db.create_collection("products", dimension=384)
# Add vectors directly
coll.add(
ids=["p1", "p2"],
vectors=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
metadata=[{"name": "Product A"}, {"name": "Product B"}],
)
# Search with vectors
results = coll.search(query=[0.1, 0.2, ...], limit=10)
# List collections
collections = db.list_collections()
# Delete collection
db.delete_collection("products")
```
---
## Embedded Models API
Use embedding models directly:
```python
from vectrixdb import (
DenseEmbedder,
SparseEmbedder,
RerankerEmbedder,
LateInteractionEmbedder,
)
# Dense embeddings
dense = DenseEmbedder(model="e5-small")
vectors = dense.embed(["Hello world", "How are you?"])
# Sparse embeddings (BM25)
sparse = SparseEmbedder()
sparse_vectors = sparse.embed(["Hello world"])
# Reranker
reranker = RerankerEmbedder(model="L12")
scores = reranker.rerank("query", ["doc1", "doc2", "doc3"])
# Late interaction (ColBERT)
colbert = LateInteractionEmbedder(model="colbert")
token_embeddings = colbert.embed(["Hello world"])
```
---
## REST API
Start the server:
```bash
VECTRIXDB_API_KEY=your_secret vectrixdb serve --port 7337
```
Open the dashboard at `http://localhost:7337/dashboard`
### API Examples
```bash
# Create collection
curl -X POST http://localhost:7337/api/v1/collections \
-H "Content-Type: application/json" \
-H "api-key: your_secret" \
-d '{"name": "docs", "dimension": 384}'
# Add documents (auto-embedding)
curl -X POST http://localhost:7337/api/v1/collections/docs/text-upsert \
-H "Content-Type: application/json" \
-H "api-key: your_secret" \
-d '{"points": [{"id": "1", "text": "Hello world"}]}'
# Search
curl -X POST http://localhost:7337/api/v1/collections/docs/text-search \
-H "Content-Type: application/json" \
-H "api-key: your_secret" \
-d '{"query_text": "greeting", "limit": 10}'
```
---
## GraphRAG
Build knowledge graphs from documents:
```python
from vectrixdb import Vectrix, create_openai_config
# Create with graph mode
db = Vectrix("docs", mode="graph")
# Or with custom LLM config
config = create_openai_config(
api_key="your-openai-key",
model="gpt-4o-mini",
)
db = Vectrix(
"docs",
mode="graph",
graphrag_config=config,
)
# Add documents (extracts entities & relationships)
db.add(["Apple announced the iPhone 15 in September 2023."])
# Search with graph reasoning
results = db.search("What products did Apple release?")
```
---
## Project Structure
```
VectrixDB/
├── vectrixdb/
│ ├── core/ # Vector index, storage, search
│ │ ├── storage.py # All storage backends
│ │ ├── collection.py
│ │ ├── database.py
│ │ ├── document_index.py
│ │ ├── graphrag/ # Knowledge graph
│ │ └── search/ # Search algorithms
│ ├── api/ # FastAPI server
│ ├── models/ # Embedded ONNX models
│ │ └── data/ # Bundled model files
│ ├── dashboard/ # Web UI
│ ├── easy.py # Vectrix simple API
│ └── cli.py # Command line
├── tests/
└── pyproject.toml
```
---
## Requirements
- Python 3.9+
- No API keys needed (for bundled models)
- Models are bundled or auto-downloaded
---
## License
Apache 2.0
---
## Author
**Kwadwo Daddy Nyame Owusu - Boakye**
GitHub: [@knowusuboaky](https://github.com/knowusuboaky)