{"id":33817105,"url":"https://github.com/hivellm/vectorizer","last_synced_at":"2026-04-29T23:00:36.904Z","repository":{"id":316192605,"uuid":"1062311976","full_name":"hivellm/vectorizer","owner":"hivellm","description":"A high-performance, in-memory vector database written in Rust, designed for semantic search and top-k nearest neighbor queries in AI-driven applications, with binary file persistence for durability.","archived":false,"fork":false,"pushed_at":"2026-04-29T22:44:02.000Z","size":461703,"stargazers_count":21,"open_issues_count":10,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-04-29T23:00:33.982Z","etag":null,"topics":["database","embbedings","rust","rust-lang","vector","vector-database","vector-search","vectorization"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hivellm.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":"audit.toml","citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2025-09-23T05:12:35.000Z","updated_at":"2026-04-29T22:44:07.000Z","dependencies_parsed_at":"2025-10-08T15:19:03.393Z","dependency_job_id":null,"html_url":"https://github.com/hivellm/vectorizer","commit_stats":null,"previous_names":["hivellm/vectorizer"],"tags_count":112,"template":false,"template_full_name":null,"purl":"pkg:github/hivellm/vectorizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hivellm%2Fvectorizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hivellm%2Fvectorizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hivellm%2Fvectorizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hivellm%2Fvectorizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hivellm","download_url":"https://codeload.github.com/hivellm/vectorizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hivellm%2Fvectorizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32447312,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-29T22:27:22.272Z","status":"ssl_error","status_checked_at":"2026-04-29T22:10:49.234Z","response_time":110,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["database","embbedings","rust","rust-lang","vector","vector-database","vector-search","vectorization"],"created_at":"2025-12-06T22:00:35.867Z","updated_at":"2026-04-29T23:00:36.897Z","avatar_url":"https://github.com/hivellm.png","language":"Rust","readme":"# Vectorizer\n\n[![Rust](https://img.shields.io/badge/rust-1.92%2B-orange.svg)](https://www.rust-lang.org/)\n[![Rust Edition](https://img.shields.io/badge/edition-2024-blue.svg)](https://doc.rust-lang.org/edition-guide/rust-2024/index.html)\n[![License](https://img.shields.io/badge/license-Apache--2.0-green.svg)](LICENSE)\n[![Crates.io](https://img.shields.io/crates/v/vectorizer.svg)](https://crates.io/crates/vectorizer)\n[![GitHub release](https://img.shields.io/github/release/hivellm/vectorizer.svg)](https://github.com/hivellm/vectorizer/releases)\n[![Production Ready](https://img.shields.io/badge/status-production%20ready-success.svg)](https://github.com/hivellm/vectorizer)\n\nHigh-performance vector database and search engine in Rust for semantic search, document indexing, and AI applications. Ships as a Cargo workspace (5 crates) with binary RPC + HTTP transports, a React dashboard, and native SDKs for Rust, Python, TypeScript, Go, and C#.\n\n## ✨ Key Features\n\n### Transport \u0026 API\n- **VectorizerRPC** (default, port `15503`) — binary MessagePack over TCP, multiplexed connection pool. See [wire spec](docs/specs/VECTORIZER_RPC.md).\n- **REST API** (port `15002`) — universal HTTP fallback, powers the dashboard and any caller that doesn't speak raw TCP.\n- **gRPC** — Qdrant-compatible service.\n- **GraphQL** — full REST parity with async-graphql + GraphiQL playground.\n- **MCP** — 31 focused tools for AI model integration (Cursor, Claude Desktop, etc.).\n- **UMICP Protocol** — native JSON types + tool discovery endpoint.\n\n### Performance\n- **SIMD acceleration** — AVX2-optimized vector ops with runtime CPU detection (5-10x faster).\n- **Metal GPU** — macOS Apple Silicon via [`hive-gpu`](https://github.com/hivellm/hive-gpu) 0.2; logs render real device name, driver, VRAM.\n- **Sub-3ms search** (CPU) / **\u003c1ms** (GPU) via HNSW indexing.\n- **4-5x faster than Qdrant** in head-to-head benchmarks (0.16-0.23ms vs 0.80-0.87ms avg latency).\n\n### Storage\n- **`.vecdb` unified format** — 20-30% space savings, automatic snapshots.\n- **Memory-mapped storage** — datasets larger than RAM, efficient OS paging.\n- **Product Quantization** — 64x memory reduction with minimal accuracy loss.\n- **Scalar Quantization** + cache hit ratio metrics.\n\n### High Availability \u0026 Scaling\n- **Raft consensus** via openraft (pinned `=0.10.0-alpha.17`) — automatic leader election in 1-5s, write-redirect via HTTP 307, WAL-backed durable replication, DNS discovery for Kubernetes headless services.\n- **Master-Replica** — TCP streaming replication with full/partial sync, exponential reconnect backoff (5s→60s).\n- **Distributed sharding** — horizontal scaling with automatic routing; distributed hybrid search via `RemoteHybridSearch` RPC with dense-only fallback for mixed-version clusters.\n- **HiveHub cluster mode** — multi-tenant with quotas, usage tracking, tenant isolation, mandatory MMap storage, 1GB cache cap.\n\n### Search\n- **Semantic similarity** — Cosine, Euclidean, Dot Product.\n- **Hybrid search** — Dense + Sparse with Reciprocal Rank Fusion (RRF).\n- **Intelligent search** — query expansion, semantic reranking.\n- **Multi-collection search** across projects.\n- **Graph relationships** — automatic edge discovery, neighbor exploration, shortest-path finding.\n\n### Embeddings \u0026 Docs\n- **Built-in providers** — TF-IDF, BM25, FastEmbed, BERT, MiniLM, custom models.\n- **Document conversion** — PDF, DOCX, XLSX, PPTX, HTML, XML, images (14 formats).\n- **Qdrant API compatibility** — Snapshots, Sharding, Cluster Management, Query (with prefetch), Search Groups, Matrix, Named Vectors (partial), PQ/Binary quantization config.\n- **Summarization** — extractive, keyword, sentence, abstractive (OpenAI GPT).\n\n### Security\n- **JWT + API Key** authentication with RBAC.\n- **JWT secret is mandatory** — boot refuses to start with empty / default / \u003c32 char secrets when auth is enabled.\n- **First-run root credentials** written to `{data_dir}/.root_credentials` (0o600), never logged.\n- **Payload encryption** — optional ECC-P256 + AES-256-GCM, zero-knowledge, per-collection policies ([docs](docs/features/encryption/README.md)).\n- **TLS 1.2/1.3** with mTLS, configurable cipher suites, ALPN.\n- **Per-API-key rate limiting** with tiers + overrides.\n- **Path-traversal guard** on file discovery; canonicalized base, symlink-escape refusal.\n\n### UI\n- **Web Dashboard** — React + TypeScript; JWT login, graph CRUD (edges, neighbors, paths), collection management, API sandbox, setup wizard with glassmorphism design. Embedded in the binary (~26MB, no external assets needed).\n- **Desktop GUI** — Electron + vis-network for visual database management.\n\n## 🎉 Latest Release: v3.1.0\n\nHighlights — see [CHANGELOG.md](./CHANGELOG.md) for the full breakdown.\n\n**Added**\n- **`POST /insert_vectors`** — bulk-insert pre-computed embeddings with caller-supplied vector ids. Skips the embedding pipeline; the request body carries the vectors as raw `Vec\u003cf32\u003e`. For clients with their own embedder, idempotent re-ingest by client id, or upsert without auto-chunking. See [`docs/users/api/BATCH.md`](docs/users/api/BATCH.md).\n- **Client `id` honored on `/insert` and `/insert_texts`** — the `id` field on each text entry is now used as the resulting `Vector.id` (non-chunked) or as the prefix for `\u003cid\u003e#\u003cchunk_index\u003e` chunk ids. Re-ingesting the same id upserts in place instead of duplicating; delete-by-doc and citation round-trips no longer need a UUID lookup.\n- **`payload.parent_id`** on chunked vectors — links every chunk back to its source document (the request's `id`, or a single shared UUID v4 when omitted). Lets clients group, count, or delete every chunk of a logical document without re-deriving membership from a defensive `_id` duplicate.\n\n**Changed**\n- **`/insert_texts` chunked payload layout flipped from nested to flat — BREAKING for clients that read `payload.metadata.\u003cfield\u003e` directly.** Pre-3.1.0 chunks landed as `{content, metadata: {file_path, chunk_index, _id, casa, parlamentar, ...}}` — Qdrant payload filters `payload.parlamentar = \"X\"` silently missed every chunked row. 3.1.0 emits `{content, file_path, chunk_index, parent_id, _id, casa, parlamentar, ...}` with every key at the root. Server-side readers (`FileOperations`, `file_watcher`, MCP `search_semantic`) tolerate both shapes during the deprecation window. Migration guide: [CHANGELOG `[3.1.0]`](./CHANGELOG.md#migrating-from-30x-chunked-payloads).\n\n## Previous Release: v3.0.0\n\nHighlights — see [CHANGELOG.md](./CHANGELOG.md) for the full breakdown.\n\n**Breaking**\n- **RPC is default transport** (`rpc.enabled: true`, port `15503`). REST stays on `15002`. Migration guide: [`docs/migration/rpc-default.md`](docs/migration/rpc-default.md). Opt out with `rpc.enabled: false`.\n- **gRPC `SearchResult.score` narrowed `double` → `float`**. Clients on the pre-v3 proto must regenerate.\n- **JWT secret must be explicitly configured** — no more insecure default. Generate via `openssl rand -hex 64` and inject via `VECTORIZER_JWT_SECRET`.\n- **Configs moved under `config/`** — `config.yml` → `config/config.yml`, presets under `config/presets/`. Legacy `./config.yml` still works with a deprecation warning (removed in v3.1).\n- **Cargo workspace split** — `vectorizer-core`, `vectorizer-protocol`, `vectorizer`, `vectorizer-server`, `vectorizer-cli`. Callers reaching into the server layer need to switch from `vectorizer::{server,api,grpc,logging,umicp}::*` to `vectorizer_server::*`.\n\n**Removed**\n- **Standalone JavaScript SDK dropped** — TypeScript SDK ships compiled CJS + ESM, usable from plain JS. Migrate `@hivehub/vectorizer-sdk-js` → `@hivehub/vectorizer-sdk`.\n- **TypeScript SDK scope is `@hivehub`**, not `@hivellm` (docs corrected).\n- **Framework integration packages dropped** — `langchain`, `langchain-js`, `langflow`, `n8n`, `tensorflow`, `pytorch` adapters. Published versions stay installable; integrate against native SDKs directly.\n\n**Added**\n- **Layered config loader** — `VECTORIZER_MODE=dev|production` merges `config/modes/\u003cmode\u003e.yml` over base. Deep YAML merge with null-clear semantics. See [`docs/deployment/configuration.md`](docs/deployment/configuration.md).\n- **Docker collapsed to one compose** with profiles — `docker compose --profile \u003cdefault|dev|ha|hub\u003e up -d`.\n- **C# SDK RPC transport** (`Vectorizer.Sdk.Rpc` 3.0.0) — TCP + MessagePack framing, connection pool, ASP.NET Core DI.\n- **`#![deny(missing_docs)]` + `cargo doc -D warnings` CI gate** — cleared 2,219 missing-docs warnings to 0.\n- **`unwrap_used` / `expect_used` denied workspace-wide** — every production `.unwrap()` either returns `Result` or sits behind a documented `#[allow]`.\n\n**Changed**\n- **`rmcp` 0.10 → 1.5** — MCP SDK major rewrite; builder-based construction across every handler.\n- **Second-pass dep migrations** — reqwest 0.13, arrow/parquet 58, zip 8, tantivy 0.26, hmac 0.13 + sha2 0.11, hf-hub 0.5, sysinfo 0.38, candle 0.10.2, bcrypt 0.19, openraft pinned `=0.10.0-alpha.17`.\n- **Frontend majors** — React 19, react-router 7, TypeScript 6 (dashboard), vitest 4, eslint 10, Electron 41, Vue-router 5 (GUI).\n- **`parking_lot` migration complete** — all `std::sync::{Mutex,RwLock}` off the hot path; CI grep gate prevents regression.\n- **Hot-path `rand` / `hmac` / `tonic 0.14` / `prost 0.14` / `bincode 2.0`** upgraded.\n\n## 🚀 Quick Start\n\n### Install Script (Linux/macOS)\n\n```bash\ncurl -fsSL https://raw.githubusercontent.com/hivellm/vectorizer/main/scripts/install.sh | bash\n```\n\nInstalls CLI + systemd service. Commands: `sudo systemctl {status|restart|stop} vectorizer`, `sudo journalctl -u vectorizer -f`.\n\n### Install Script (Windows)\n\n```powershell\npowershell -c \"irm https://raw.githubusercontent.com/hivellm/vectorizer/main/scripts/install.ps1 | iex\"\n```\n\nInstalls CLI + Windows Service (requires Admin). Commands: `Get-Service Vectorizer`, `{Start|Stop|Restart}-Service Vectorizer`.\n\n### Docker\n\n```bash\ndocker run -d \\\n  --name vectorizer \\\n  -p 15002:15002 -p 15503:15503 \\\n  -v $(pwd)/vectorizer-data:/vectorizer/data \\\n  -e VECTORIZER_AUTH_ENABLED=true \\\n  -e VECTORIZER_ADMIN_USERNAME=admin \\\n  -e VECTORIZER_ADMIN_PASSWORD=your-secure-password \\\n  -e VECTORIZER_JWT_SECRET=$(openssl rand -hex 64) \\\n  --restart unless-stopped \\\n  hivehub/vectorizer:latest\n```\n\n**Docker Compose with profiles:**\n\n```bash\ncp .env.example .env\n# Edit .env with your credentials\ndocker compose --profile default up -d          # standalone\ndocker compose --profile dev up -d              # dev overlay\ndocker compose --profile ha up -d               # Raft cluster\ndocker compose --profile hub up -d              # multi-tenant\n```\n\nProfiles are mutually exclusive on host port `15002`.\n\nImages: [Docker Hub](https://hub.docker.com/r/hivehub/vectorizer) · [GHCR](https://github.com/hivellm/vectorizer/pkgs/container/vectorizer)\n\n### Build from Source\n\n```bash\ngit clone https://github.com/hivellm/vectorizer.git\ncd vectorizer\n\ncargo build --release                          # Basic\ncargo build --release --features hive-gpu      # macOS Metal\ncargo build --release --features full          # All features\n./target/release/vectorizer\n```\n\n### Access Points\n\n| Surface | URL | Notes |\n|---|---|---|\n| **VectorizerRPC** (primary) | `vectorizer://localhost:15503` | Binary MessagePack over TCP — see [operator guide](docs/deployment/rpc.md) |\n| **REST API** | `http://localhost:15002` | Universal HTTP fallback |\n| **Web Dashboard** | `http://localhost:15002/dashboard/` | React UI, embedded in binary |\n| **MCP Server** | `http://localhost:15002/mcp` | 31 tools for AI agents |\n| **GraphQL** | `http://localhost:15002/graphql` | GraphiQL at `/graphql` |\n| **UMICP Discovery** | `http://localhost:15002/umicp/discover` | |\n| **Health Check** | `http://localhost:15002/health` | |\n\n\u003e **Upgrading from v2.x?** RPC is now on by default on port `15503`. REST is unchanged. If you can't expose the new port, set `rpc.enabled: false`. See [v3.x migration guide](docs/migration/rpc-default.md).\n\n### Configuration\n\nConfigs live under `config/`:\n\n```\nconfig/\n├── config.yml             # Base config (your deployment)\n├── config.example.yml     # Reference\n├── modes/\n│   ├── dev.yml            # Layered override: verbose logs, loopback, watcher on\n│   └── production.yml     # Layered override: warn logs, larger threads/cache, zstd, scheduled snapshots\n└── presets/               # Standalone full configs (legacy style)\n    ├── production.yml\n    ├── cluster.yml\n    ├── hub.yml\n    └── development.yml\n```\n\n**Layered loader (recommended):**\n\n```bash\nVECTORIZER_MODE=production ./target/release/vectorizer\n```\n\nMerges `config/modes/production.yml` over `config/config.yml`. Typos in the mode override fail fast at boot.\n\n### Authentication\n\nAuth is **enabled by default in Docker**. Default creds — **change in production**.\n\n```bash\n# Login\ncurl -X POST http://localhost:15002/auth/login \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"username\":\"admin\",\"password\":\"admin\"}'\n\n# JWT in requests\ncurl http://localhost:15002/collections \\\n  -H \"Authorization: Bearer YOUR_JWT_TOKEN\"\n\n# Create API key (JWT required)\ncurl -X POST http://localhost:15002/auth/keys \\\n  -H \"Content-Type: application/json\" \\\n  -H \"Authorization: Bearer YOUR_JWT_TOKEN\" \\\n  -d '{\"name\":\"Production\",\"permissions\":[\"read\",\"write\"],\"expires_in_days\":90}'\n\n# API key in requests (NO Bearer prefix)\ncurl http://localhost:15002/collections \\\n  -H \"Authorization: YOUR_API_KEY\"\n```\n\n| Method | Header | Use case |\n|---|---|---|\n| JWT | `Authorization: Bearer \u003ctoken\u003e` | Dashboard, short-lived sessions |\n| API Key | `Authorization: \u003ckey\u003e` | MCP, CLI, long-lived integrations |\n\n**Production must set:**\n- `VECTORIZER_JWT_SECRET` — ≥32 chars, not the historical default. Boot aborts otherwise.\n- `VECTORIZER_ADMIN_PASSWORD` — strong, ≥32 chars.\n\nFirst-run root credentials are written to `{data_dir}/.root_credentials` (0o600), never printed to stdout. Read and delete after first login.\n\nSee [Docker Authentication Guide](docs/users/getting-started/DOCKER_AUTHENTICATION.md) and [Security Policy](SECURITY.md).\n\n## 📊 Performance\n\n| Metric | Value |\n|---|---|\n| Search latency (CPU) | \u003c 3ms |\n| Search latency (Metal GPU) | \u003c 1ms |\n| Throughput | 4,400-6,000 QPS (vs Qdrant 1,100-1,300) |\n| Storage reduction | 20-30% (`.vecdb`) + PQ 64x |\n| MCP tools | 31 |\n| Document formats | 14 |\n\n### Benchmark vs Qdrant\n\n- **Search**: 4-5x faster (0.16-0.23ms vs 0.80-0.87ms avg latency).\n- **Insert**: Fire-and-forget pattern, configurable batch / body limits, background processing.\n- **Scenarios**: Small (1K) / Medium (5K) / Large (10K) vectors × dimensions 384 / 512 / 768.\n\nSee [Benchmark Documentation](./docs/specs/BENCHMARKING.md).\n\n## 🔄 Feature Comparison\n\n| Feature | Vectorizer | Qdrant | pgvector | Pinecone | Weaviate | Milvus | Chroma |\n|---|---|---|---|---|---|---|---|\n| **Core** |\n| Language | Rust | Rust | C | C++/Go | Go | C++/Go | Python |\n| License | Apache 2.0 | Apache 2.0 | PostgreSQL | Proprietary | BSD | Apache 2.0 | Apache 2.0 |\n| **APIs** |\n| REST | ✅ | ✅ | via PG | ✅ | ✅ | ✅ | ✅ |\n| gRPC (Qdrant-compat) | ✅ | ✅ | ❌ | ✅ | ✅ | ✅ | ❌ |\n| GraphQL | ✅ + GraphiQL | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |\n| MCP | ✅ 31 tools | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |\n| Binary RPC | ✅ MessagePack | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |\n| **SDKs** | Rust, Python, TS, Go, C# | All | All | Most | Most | Most | Python |\n| **Performance** |\n| Search latency | \u003c 3ms CPU / \u003c 1ms GPU | 1-5ms | 5-50ms | 50-100ms | 10-50ms | 5-20ms | 10-100ms |\n| SIMD | ✅ AVX2 | ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |\n| GPU | ✅ Metal | ✅ CUDA | ❌ | ✅ Cloud | ❌ | ✅ CUDA | ❌ |\n| **Storage** |\n| HNSW | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |\n| PQ (64x) | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ |\n| Scalar Quantization | ✅ | ✅ | ❌ | ✅ | ❌ | ✅ | ❌ |\n| MMap | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | ❌ |\n| **Advanced** |\n| Graph Relationships | ✅ auto + GUI | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |\n| Document Processing | ✅ 14 formats | ❌ | ❌ | ❌ | ✅ | ❌ | ✅ |\n| Hybrid Search | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |\n| Query Expansion | ✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |\n| Qdrant API compat | ✅ + migration | N/A | ❌ | ❌ | ❌ | ❌ | ❌ |\n| **Scaling** |\n| Sharding | ✅ | ✅ | via PG | ✅ Cloud | ✅ | ✅ | ❌ |\n| Replication | ✅ Raft + Master-Replica | ✅ | via PG | ✅ Cloud | ✅ | ✅ | ❌ |\n| **Management** |\n| Dashboard | ✅ React + graph GUI | ✅ basic | pgAdmin | ✅ Cloud | ✅ | ✅ | ✅ basic |\n| Desktop GUI | ✅ Electron | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ |\n| **Security** |\n| JWT + API Keys | ✅ | ✅ | via PG | ✅ Cloud | ✅ | ✅ | ✅ |\n| Payload Encryption | ✅ ECC-P256 + AES-GCM | ❌ | via PG | ✅ Cloud | ❌ | ❌ | ❌ |\n\n### Key Differentiators\n\n- **MCP integration** (31 tools) — native AI-agent protocol.\n- **Graph relationships** — auto-discovery + full GUI (edges, path-finding, neighbor exploration).\n- **GraphQL** — full REST parity + GraphiQL.\n- **Document processing** — 14 formats built in.\n- **Qdrant compatibility** — full API + migration tools.\n- **Performance** — 4-5x faster than Qdrant in benchmarks.\n- **Binary RPC default** — MessagePack over TCP on port 15503 for low-overhead client traffic.\n- **Complete SDK coverage** — Rust, Python, TypeScript (+JS), Go, C# — all on v3.0.0.\n\n**Best fit:** AI apps needing MCP, document ingestion, graph relationships, and sub-ms search with an embedded dashboard.\n\n## 🎯 Use Cases\n\n- **RAG systems** — semantic search with automatic document conversion.\n- **Document search** — PDFs, Office, web content.\n- **Code analysis** — semantic code navigation.\n- **Knowledge bases** — enterprise multi-format search.\n\n## 🔧 MCP Integration\n\nCursor / Claude Desktop config:\n\n```json\n{\n  \"mcpServers\": {\n    \"vectorizer\": {\n      \"url\": \"http://localhost:15002/mcp\",\n      \"type\": \"streamablehttp\"\n    }\n  }\n}\n```\n\n### Available Tools (31)\n\n**Core operations (9)**\n`list_collections` · `create_collection` · `get_collection_info` · `insert_text` · `get_vector` · `update_vector` · `delete_vector` · `search` · `multi_collection_search`\n\n**Advanced search (4)**\n`search_intelligent` (query expansion) · `search_semantic` (reranking) · `search_extra` (combined) · `search_hybrid` (dense + sparse RRF)\n\n**Discovery \u0026 files (7)**\n`filter_collections` · `expand_queries` · `get_file_content` · `list_files` · `get_file_chunks` · `get_project_outline` · `get_related_files`\n\n**Graph (8)**\n`graph_list_nodes` · `graph_get_neighbors` · `graph_find_related` · `graph_find_path` · `graph_create_edge` · `graph_delete_edge` · `graph_discover_edges` · `graph_discover_status`\n\n**Maintenance (3)**\n`list_empty_collections` · `cleanup_empty_collections` · `get_collection_stats`\n\n\u003e Cluster-management operations are REST-only for security.\n\n## 📦 Client SDKs\n\nServer-side at **v3.1.0**. The Rust SDK tracks server versioning and is also at v3.1.0; the TypeScript, Python, Go, and C# SDKs are on v3.0.x and bump when they need a breaking server contract. The TypeScript SDK ships compiled CJS + ESM — usable from plain JavaScript, no separate JS package needed.\n\n| SDK | Install |\n|---|---|\n| Python | `pip install vectorizer-sdk` |\n| TypeScript / JS | `npm install @hivehub/vectorizer-sdk` |\n| Rust | `cargo add vectorizer-sdk` |\n| C# | `dotnet add package Vectorizer.Sdk` (REST) · `Vectorizer.Sdk.Rpc` (RPC) |\n| Go | `go get github.com/hivellm/vectorizer-sdk-go` |\n\nEvery SDK accepts both `vectorizer://host[:port]` (RPC, default port 15503) and `http(s)://host[:port]` (REST) URLs through the same endpoint parser.\n\n## 🔄 Qdrant Migration\n\n- **Config migration** — parse Qdrant YAML/JSON → Vectorizer format.\n- **Data migration** — export from Qdrant, import into Vectorizer.\n- **Validation** — integrity + compatibility checks.\n- **REST compatibility** — full Qdrant API at `/qdrant/*`.\n\n```rust\nuse vectorizer::migration::qdrant::{QdrantDataExporter, QdrantDataImporter};\n\nlet exported = QdrantDataExporter::export_collection(\n    \"http://localhost:6333\",\n    \"my_collection\"\n).await?;\n\nlet result = QdrantDataImporter::import_collection(\u0026store, \u0026exported).await?;\n```\n\nSee [Qdrant Migration Guide](./docs/specs/QDRANT_MIGRATION.md).\n\n## ☁️ HiveHub Cloud\n\nMulti-tenant cluster mode integration with [HiveHub.Cloud](https://hivehub.cloud).\n\n- **Tenant isolation** — owner-scoped collections.\n- **Quota enforcement** — collections / vectors / storage per tenant.\n- **Usage tracking** — automatic reporting.\n- **User-scoped backups**.\n\n```yaml\nhub:\n  enabled: true\n  api_url: \"https://api.hivehub.cloud\"\n  tenant_isolation: \"collection\"\n  usage_report_interval: 300\n```\n\n```bash\nexport HIVEHUB_SERVICE_API_KEY=\"your-service-api-key\"\n```\n\n**Cluster-mode requirements** (enforced at boot):\n\n| Requirement | Default |\n|---|---|\n| MMap storage (Memory storage rejected) | Enforced |\n| Max cache memory across all caches | 1 GB |\n| File watcher | Disabled |\n| Strict config validation | Enabled |\n\n```yaml\ncluster:\n  enabled: true\n  node_id: \"node-1\"\n  memory:\n    max_cache_memory_bytes: 1073741824\n    enforce_mmap_storage: true\n    disable_file_watcher: true\n    strict_validation: true\n```\n\nSee [HiveHub Integration](./docs/features/HUB_INTEGRATION.md) and [Cluster Memory Limits](./docs/specs/CLUSTER_MEMORY.md).\n\n## 🏗️ Workspace Layout\n\n```\ncrates/\n├── vectorizer-core/       # Foundation: error, codec, quantization, simd, compression, paths\n├── vectorizer-protocol/   # RPC wire types + tonic-generated gRPC\n├── vectorizer/            # Engine (umbrella): db, embedding, models, cache, persistence, search, ...\n├── vectorizer-server/     # Transport: HTTP / gRPC / MCP / RPC + binary\n└── vectorizer-cli/        # CLI binaries\nsdks/rust/                 # Rust SDK — re-exports vectorizer-protocol wire types\n```\n\nRuntime directories resolve to platform-standard locations (`~/.local/share/vectorizer/` on Linux, `~/Library/Application Support/vectorizer/` on macOS, `%APPDATA%\\vectorizer\\` on Windows), overridable via `VECTORIZER_DATA_DIR` / `VECTORIZER_LOGS_DIR`.\n\n## 📚 Documentation\n\n- [User Documentation](./docs/users/) — install + tutorials\n- [API Reference](./docs/specs/API_REFERENCE.md) — REST\n- [VectorizerRPC Spec](./docs/specs/VECTORIZER_RPC.md) — wire protocol\n- [RPC Operator Guide](./docs/deployment/rpc.md)\n- [Configuration](./docs/deployment/configuration.md) — layered loader\n- [v3.x Migration](./docs/migration/rpc-default.md) — RPC-default rollout\n- [Dashboard Integration](./docs/features/DASHBOARD_INTEGRATION.md)\n- [Qdrant Compatibility](./docs/users/qdrant/)\n- [HiveHub Integration](./docs/features/HUB_INTEGRATION.md)\n- [Cluster Memory Limits](./docs/specs/CLUSTER_MEMORY.md)\n- [MCP Guide](./docs/specs/MCP.md)\n- [Encryption](./docs/features/encryption/README.md)\n- [Technical Specs](./docs/specs/) — architecture, performance, implementation\n\n## 📄 License\n\nApache License 2.0 — see [LICENSE](./LICENSE).\n\n## 🤝 Contributing\n\nSee [CONTRIBUTING.md](./CONTRIBUTING.md).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhivellm%2Fvectorizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhivellm%2Fvectorizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhivellm%2Fvectorizer/lists"}