{"id":15396597,"url":"https://github.com/hupe1980/vecgo","last_synced_at":"2026-01-19T13:02:13.207Z","repository":{"id":227875920,"uuid":"769834931","full_name":"hupe1980/vecgo","owner":"hupe1980","description":"🧬🔍🗄️ Unlock the power of vector indexing and search in your Go applications with the HNSW algorithm for approximate nearest neighbor search, seamlessly embedded within your application.","archived":false,"fork":false,"pushed_at":"2026-01-15T23:43:57.000Z","size":1553,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-16T03:09:28.341Z","etag":null,"topics":["ann","embeddings","golang","hnsw","vector","vectorstore"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hupe1980.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-03-10T07:33:57.000Z","updated_at":"2026-01-15T23:44:00.000Z","dependencies_parsed_at":"2024-06-21T12:57:27.017Z","dependency_job_id":"8bea5c04-e64d-4a02-a64c-9fda92548411","html_url":"https://github.com/hupe1980/vecgo","commit_stats":null,"previous_names":["hupe1980/vecgo"],"tags_count":12,"template":false,"template_full_name":null,"purl":"pkg:github/hupe1980/vecgo","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hupe1980%2Fvecgo","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hupe1980%2Fvecgo/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hupe1980%2Fvecgo/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hupe1980%2Fvecgo/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hupe1980","download_url":"https://codeload.github.com/hupe1980/vecgo/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hupe1980%2Fvecgo/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28568833,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T12:50:50.164Z","status":"ssl_error","status_checked_at":"2026-01-19T12:50:42.704Z","response_time":67,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ann","embeddings","golang","hnsw","vector","vectorstore"],"created_at":"2024-10-01T15:34:19.983Z","updated_at":"2026-01-19T13:02:13.195Z","avatar_url":"https://github.com/hupe1980.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🧬🔍 Vecgo\n\n![CI](https://github.com/hupe1980/vecgo/workflows/CI/badge.svg)\n[![Go Reference](https://pkg.go.dev/badge/github.com/hupe1980/vecgo.svg)](https://pkg.go.dev/github.com/hupe1980/vecgo)\n[![goreportcard](https://goreportcard.com/badge/github.com/hupe1980/vecgo)](https://goreportcard.com/report/github.com/hupe1980/vecgo)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\n**Vecgo** is a **pure Go, embeddable, hybrid vector database** designed for high-performance production workloads. It combines commit-oriented durability with [HNSW](https://arxiv.org/abs/1603.09320) + [DiskANN](https://papers.nips.cc/paper/2019/hash/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html) indexing for best-in-class performance.\n\n⚠️ This is experimental and subject to breaking changes.\n\n## ✨ Key Differentiators\n\n- ⚡ **Faster \u0026 lighter than external services** — no network overhead, no sidecar, 15MB binary\n- 🔧 **More capable than simple libraries** — durability, MVCC, hybrid search, cloud storage\n- 🎯 **Simpler than CGO wrappers** — pure Go toolchain, static binaries, cross-compilation\n- 🏗️ **Modern architecture** — commit-oriented durability (append-only versioned commits), no WAL complexity\n\n## 📊 Performance\n\nVecgo is optimized for high-throughput, low-latency vector search with:\n- **FilterCursor** — zero-allocation push-based iteration\n- **Zero-Copy Vectors** — direct access to mmap'd memory\n- **SIMD Distance** — AVX-512/AVX2/NEON/SVE2 runtime detection\n\nRun benchmarks locally to see performance on your hardware:\n\n```bash\ncd benchmark_test \u0026\u0026 go test -bench=. -benchmem -timeout=15m\n```\n\n\u003e See [benchmark_test/baseline.txt](benchmark_test/baseline.txt) for reference results.\n\n## 🎯 Features\n\n### 📊 Index Types\n\n| Index | Description | Use Case |\n|-------|-------------|----------|\n| **[HNSW](https://arxiv.org/abs/1603.09320)** | Hierarchical Navigable Small World graph | In-memory L0 (16-way sharded, lock-free search, arena allocator) |\n| **[DiskANN/Vamana](https://papers.nips.cc/paper/2019/hash/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html)** | Disk-resident graph with quantization | Large-scale on-disk segments with PQ/RaBitQ |\n| **[FreshDiskANN](https://dl.acm.org/doi/10.1145/3448016.3457550)** | Streaming updates for Vamana | Lock-free reads, soft deletion, background consolidation |\n| **Flat** | Exact nearest-neighbor with SIMD | Exact search, small segments |\n\n### 🗜️ Quantization\n\nQuantization reduces **in-memory index size** for DiskANN segments. Full vectors remain on disk for reranking.\n\n| Method | RAM Reduction | Recall | Best For |\n|--------|---------------|--------|----------|\n| **[Product Quantization (PQ)](https://hal.inria.fr/inria-00514462v2/document)** | 8-64× | 90-95% | Large-scale, high compression |\n| **[Optimized PQ (OPQ)](https://www.microsoft.com/en-us/research/publication/optimized-product-quantization-for-approximate-nearest-neighbor-search/)** | 8-64× | 93-97% | Best recall with compression |\n| **Scalar Quantization (SQ8)** | 4× | 95-99% | General purpose, balanced |\n| **Binary Quantization (BQ)** | 32× | 70-85% | Pre-filtering, coarse search |\n| **[RaBitQ](https://arxiv.org/abs/2405.12497)** | ~30× | 80-90% | Better BQ alternative (SIGMOD '24) |\n| **INT4** | 8× | 90-95% | Memory-constrained |\n\n\u003e 📖 See [Performance Tuning Guide](docs/tuning.md#quantization) for detailed quantization configuration.\n\n### 🏢 Enterprise Features\n\n- ☁️ **Cloud-Native Storage** — S3/GCS/Azure via pluggable BlobStore interface\n- 🔒 **Commit-Oriented Durability** — Atomic commits with immutable segments\n- 🔀 **[Hybrid Search](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)** — BM25 + vector similarity with RRF fusion\n- 📸 **Snapshot Isolation** — Lock-free reads via MVCC\n- ⏰ **Time-Travel Queries** — `WithTimestamp()` / `WithVersion()` to query historical state\n- 🏷️ **Typed Metadata** — Schema-enforced metadata with filtering\n- 📊 **Query Statistics** — `WithStats()` + `Explain()` for debugging\n- 🎯 **Segment Pruning** — Triangle inequality, Bloom filters, numeric range stats\n- 🚀 **SIMD Optimized** — AVX-512/AVX2/NEON/SVE2 runtime detection\n\n## 🚀 Quick Start\n\n### 📦 Installation\n\n```bash\ngo get github.com/hupe1980/vecgo\n```\n\n**Platform Requirements:** Vecgo requires a **64-bit** architecture (amd64 or arm64). SIMD optimizations use AVX-512/AVX2 on x86-64 and NEON/SVE2 on ARM64.\n\n### 💻 Basic Usage\n\n```go\npackage main\n\nimport (\n    \"context\"\n    \"fmt\"\n    \"log\"\n\n    \"github.com/hupe1980/vecgo\"\n    \"github.com/hupe1980/vecgo/metadata\"\n)\n\nfunc main() {\n    ctx := context.Background()\n\n    // Create a new index (128 dimensions, L2 distance)\n    db, err := vecgo.Open(ctx, vecgo.Local(\"./data\"), vecgo.Create(128, vecgo.MetricL2))\n    if err != nil {\n        log.Fatal(err)\n    }\n    defer db.Close()\n\n    // Insert with fluent builder API\n    vector := make([]float32, 128)\n    rec := vecgo.NewRecord(vector).\n        WithMetadata(\"category\", metadata.String(\"electronics\")).\n        WithMetadata(\"price\", metadata.Float(99.99)).\n        WithPayload([]byte(`{\"desc\": \"Product description\"}`)).\n        Build()\n    \n    id, err := db.InsertRecord(ctx, rec)\n    if err != nil {\n        log.Fatal(err)\n    }\n    fmt.Printf(\"Inserted ID: %d\\n\", id)\n\n    // Or use the simple API\n    id, err = db.Insert(ctx, vector, nil, nil)\n\n    // Commit to disk (data is durable after this)\n    if err := db.Commit(ctx); err != nil {\n        log.Fatal(err)\n    }\n\n    // Search — returns IDs, scores, metadata, and payload by default\n    query := make([]float32, 128)\n    results, err := db.Search(ctx, query, 10)\n    if err != nil {\n        log.Fatal(err)\n    }\n\n    for _, r := range results {\n        fmt.Printf(\"ID: %d, Score: %.4f\\n\", r.ID, r.Score)\n    }\n\n    // High-throughput mode (IDs + scores only)\n    results, _ = db.Search(ctx, query, 10, vecgo.WithoutData())\n}\n```\n\n### 🔄 Re-open Existing Index\n\n```go\n// Dimension and metric are auto-loaded from manifest\ndb, err := vecgo.Open(ctx, vecgo.Local(\"./data\"))\n```\n\n### ☁️ Cloud Storage (Writer/Reader Separation)\n\n```go\nimport (\n    \"github.com/hupe1980/vecgo\"\n    \"github.com/hupe1980/vecgo/blobstore/s3\"\n)\n\n// === Writer Node (build index locally, then sync to S3) ===\ndb, _ := vecgo.Open(ctx, vecgo.Local(\"/data/vecgo\"), vecgo.Create(128, vecgo.MetricL2))\ndb.Insert(ctx, vector, nil, nil)\ndb.Close()\n// Sync: aws s3 sync /data/vecgo s3://my-bucket/vecgo/\n\n// === Reader Nodes (stateless, horizontally scalable) ===\nstore, _ := s3.New(ctx, \"my-bucket\", s3.WithPrefix(\"vecgo/\"))\n\n// Remote() is automatically read-only\ndb, err := vecgo.Open(ctx, vecgo.Remote(store))\n\n// Writes return ErrReadOnly\n_, err = db.Insert(ctx, vec, nil, nil)  // err == vecgo.ErrReadOnly\n\n// With explicit cache directory for faster repeated queries\ndb, err := vecgo.Open(ctx, vecgo.Remote(store),\n    vecgo.WithCacheDir(\"/fast/nvme\"),\n    vecgo.WithBlockCacheSize(4 \u003c\u003c 30),  // 4GB\n)\n```\n\n### 🔍 Filtered Search\n\n```go\n// Define schema for type safety\nschema := metadata.Schema{\n    \"category\": metadata.FieldTypeString,\n    \"price\":    metadata.FieldTypeFloat,\n}\n\ndb, _ := vecgo.Open(ctx, vecgo.Local(\"./data\"),\n    vecgo.Create(128, vecgo.MetricL2),\n    vecgo.WithSchema(schema),\n)\n\n// Search with filter\nfilter := metadata.NewFilterSet(\n    metadata.Filter{Key: \"category\", Operator: metadata.OpEqual, Value: metadata.String(\"electronics\")},\n    metadata.Filter{Key: \"price\", Operator: metadata.OpLessThan, Value: metadata.Float(100.0)},\n)\n\nresults, _ := db.Search(ctx, query, 10, vecgo.WithFilter(filter))\n```\n\n### 🔀 Hybrid Search (Vector + BM25)\n\n```go\n// Insert with text for BM25 indexing\ndoc := metadata.Document{\n    \"text\": metadata.String(\"machine learning neural networks\"),\n}\ndb.Insert(ctx, vector, doc, nil)\n\n// Hybrid search with RRF fusion\nresults, _ := db.HybridSearch(ctx, vector, \"neural networks\", 10)\n```\n\n### ⏰ Time-Travel Queries\n\nQuery historical snapshots without affecting the current state:\n\n```go\n// Open at a specific point in time\nyesterday := time.Now().Add(-24 * time.Hour)\ndb, _ := vecgo.Open(ctx, vecgo.Local(\"./data\"), vecgo.WithTimestamp(yesterday))\n\n// Or open at a specific version ID\ndb, _ := vecgo.Open(ctx, vecgo.Local(\"./data\"), vecgo.WithVersion(42))\n\n// Query as if it were that moment in time\nresults, _ := db.Search(ctx, query, 10)\n```\n\n**How it works:**\n- Old manifests are preserved (each points to immutable segments)\n- Compaction still runs — creates NEW optimized segments\n- Old segments retained until `Vacuum()` removes expired manifests\n- Storage: `~current_data × (1 + retained_versions × churn_rate)`\n\n**Use cases:**\n- 🔍 Debug production issues: \"What did the index look like before the bad deployment?\"\n- 📊 A/B testing: Compare recall against historical versions\n- 🔄 Recovery: Roll back to a known-good state\n\n**Managing retention:**\n```go\n// Configure retention policy\npolicy := vecgo.RetentionPolicy{KeepVersions: 10}\ndb, _ := vecgo.Open(ctx, vecgo.Local(\"./data\"), vecgo.WithRetentionPolicy(policy))\n\n// Reclaim disk space from expired versions\ndb.Vacuum(ctx)\n```\n\n### 📊 Query Statistics \u0026 Explain\n\nUnderstand query execution for debugging and optimization:\n\n```go\nvar stats vecgo.QueryStats\nresults, _ := db.Search(ctx, query, 10, vecgo.WithStats(\u0026stats))\n\n// Summary explanation\nfmt.Println(stats.Explain())\n// Output: \"searched 3 segments (1 pruned by stats, 0 by bloom), \n//          scanned 1200 vectors in 2.1ms, recalled 847 candidates (0.7 hit rate)\"\n\n// Detailed statistics\nfmt.Printf(\"Segments searched: %d\\n\", stats.SegmentsSearched)\nfmt.Printf(\"Segments pruned (stats): %d\\n\", stats.SegmentsPrunedByStats)\nfmt.Printf(\"Segments pruned (bloom): %d\\n\", stats.SegmentsPrunedByBloom)\nfmt.Printf(\"Vectors scanned: %d\\n\", stats.VectorsScanned)\nfmt.Printf(\"Candidates recalled: %d\\n\", stats.CandidatesRecalled)\nfmt.Printf(\"Latency: %v\\n\", stats.Latency)\nfmt.Printf(\"Graph hops: %d\\n\", stats.GraphHops)\nfmt.Printf(\"Cost estimate: %.2f\\n\", stats.CostEstimate())\n```\n\n### 🎯 Segment Pruning \u0026 Manifest Stats\n\nVecgo automatically prunes irrelevant segments using advanced statistics:\n\n| Pruning Strategy | Description |\n|------------------|-------------|\n| **Triangle Inequality** | Skip segments where `|query - centroid| \u003e radius + threshold` |\n| **Bloom Filters** | Skip segments missing required categorical values |\n| **Numeric Range Stats** | Skip segments with min/max outside filter range |\n| **Categorical Cardinality** | Prioritize high-entropy segments for broad queries |\n\nThese statistics are automatically computed during `Commit()` and stored in the manifest (v3 format).\n\n```go\n// Get current statistics\ndbStats := db.Stats()\nfmt.Printf(\"Manifest version: %d\\n\", dbStats.ManifestID)\nfmt.Printf(\"Total vectors: %d\\n\", dbStats.TotalVectors)\nfmt.Printf(\"Segment count: %d\\n\", dbStats.SegmentCount)\n```\n\n### 📦 Insert Modes\n\nVecgo offers three insert modes optimized for different workloads:\n\n| Mode | Method | Searchable | Best For |\n|------|--------|------------|----------|\n| **Single** | `Insert()` | ✅ Immediately | Real-time updates |\n| **Batch** | `BatchInsert()` | ✅ Immediately | Medium batches (10-100) |\n| **Deferred** | `BatchInsertDeferred()` | ❌ After flush | Bulk loading |\n\n```go\n// 1. SINGLE INSERT — Real-time updates (HNSW-indexed immediately)\n//    Use when: you need vectors searchable immediately\nid, err := db.Insert(ctx, vector, metadata, payload)\n\n// 2. BATCH INSERT — Indexed batch (HNSW-indexed immediately)\n//    Use when: you have medium batches and need immediate search\nids, err := db.BatchInsert(ctx, vectors, metadatas, payloads)\n\n// 3. DEFERRED INSERT — Bulk loading (NO HNSW indexing)\n//    Use when: you're bulk loading and don't need immediate search\n//    Vectors become searchable after Commit() triggers flush\nids, err := db.BatchInsertDeferred(ctx, vectors, metadatas, payloads)\ndb.Commit(ctx) // Flush to disk, now searchable via DiskANN\n```\n\n**When to use Deferred mode:**\n- Initial data loading (embeddings from a corpus)\n- Periodic bulk updates (nightly reindex)\n- Migration from another database\n\n**When NOT to use Deferred mode:**\n- Real-time RAG (documents must be searchable immediately)\n- Interactive applications with instant feedback\n\n```go\n// Batch delete\nerr = db.BatchDelete(ctx, ids)\n```\n\n## 💾 Durability Model\n\nVecgo uses **commit-oriented durability** — append-only versioned commits:\n\n```mermaid\nsequenceDiagram\n    participant App as Application\n    participant MT as MemTable (RAM)\n    participant Seg as Segment (Disk)\n    participant Man as Manifest\n\n    App-\u003e\u003eMT: Insert(vector, metadata)\n    Note over MT: Buffered in memory\u003cbr/\u003e❌ NOT durable\n    \n    App-\u003e\u003eMT: Insert(vector, metadata)\n    \n    App-\u003e\u003eSeg: Commit()\n    MT-\u003e\u003eSeg: Write immutable segment\n    Seg-\u003e\u003eMan: Update manifest atomically\n    Note over Seg,Man: ✅ DURABLE after Commit()\n```\n\n| State | Survives Crash? |\n|-------|-----------------|\n| After `Insert()`, before `Commit()` | ❌ No |\n| After `Commit()` | ✅ Yes |\n| After `Close()` | ✅ Yes (auto-commits pending) |\n\n**Why commit-oriented?**\n- 🧹 Simpler code — no WAL rotation, recovery, or checkpointing\n- ⚡ Faster batch inserts — no fsync per insert\n- ☁️ Cloud-native — pure segment writes, ideal for S3/GCS\n- 🚀 Instant startup — no recovery/replay, just read manifest\n\n## 📚 Documentation\n\n- 📖 **API Reference**: [pkg.go.dev/github.com/hupe1980/vecgo](https://pkg.go.dev/github.com/hupe1980/vecgo)\n- 🏗️ **[Architecture Guide](docs/architecture.md)** — Engine internals, storage tiers, concurrency model\n- ⚙️ **[Performance Tuning](docs/tuning.md)** — HNSW parameters, compaction, caching\n- 🔧 **[Operations Guide](docs/operations.md)** — Monitoring, troubleshooting\n- 💾 **[Recovery \u0026 Durability](docs/recovery.md)** — Crash safety, data guarantees\n- 🚀 **[Deployment Guide](docs/deployment.md)** — Local vs. cloud patterns\n\n## � Examples\n\n| Example | Description |\n|---------|-------------|\n| [basic](examples/basic) | Create index, insert, search, commit |\n| [modern](examples/modern) | Fluent API, schema-enforced metadata, scan iterator |\n| [rag](examples/rag) | Retrieval-Augmented Generation workflow |\n| [cloud_tiered](examples/cloud_tiered) | Writer/reader separation with S3 |\n| [bulk_load](examples/bulk_load) | High-throughput ingestion with `BatchInsertDeferred` |\n| [time_travel](examples/time_travel) | Query historical versions by time or version ID |\n| [explain](examples/explain) | Query statistics, cost estimation, performance debugging |\n| [observability](examples/observability) | Prometheus metrics integration |\n\n## �📄 Algorithm References\n\n- **HNSW**: Malkov \u0026 Yashunin, \"[Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs](https://arxiv.org/abs/1603.09320)\", IEEE TPAMI 2018\n- **DiskANN/Vamana**: Subramanya et al., \"[DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node](https://papers.nips.cc/paper/2019/hash/09853c7fb1d3f8ee67a61b6bf4a7f8e6-Abstract.html)\", NeurIPS 2019\n- **FreshDiskANN**: Singh et al., \"[FreshDiskANN: A Fast and Accurate Graph-Based ANN Index for Streaming Similarity Search](https://dl.acm.org/doi/10.1145/3448016.3457550)\", SIGMOD 2021\n- **Product Quantization**: Jégou et al., \"[Product Quantization for Nearest Neighbor Search](https://hal.inria.fr/inria-00514462v2/document)\", IEEE TPAMI 2011\n- **OPQ**: Ge et al., \"[Optimized Product Quantization](https://www.microsoft.com/en-us/research/publication/optimized-product-quantization-for-approximate-nearest-neighbor-search/)\", IEEE CVPR 2013\n- **RaBitQ**: Gao \u0026 Long, \"[RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search](https://arxiv.org/abs/2405.12497)\", SIGMOD 2024\n- **RRF**: Cormack et al., \"[Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf)\", SIGIR 2009\n\n## 🤝 Contributing\n\nContributions welcome! Please open an issue or pull request.\n\n## 📜 License\n\nLicensed under the Apache License 2.0. See [LICENSE](LICENSE) for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhupe1980%2Fvecgo","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhupe1980%2Fvecgo","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhupe1980%2Fvecgo/lists"}