{"id":50961554,"url":"https://github.com/elcruzo/vectorhub","last_synced_at":"2026-06-18T14:33:21.553Z","repository":{"id":328129739,"uuid":"1059159025","full_name":"elcruzo/vectorhub","owner":"elcruzo","description":"Embeddings are heavy, and storing them at scale is painful. VectorHub is my fix. It shards Redis for speed, exposes a gRPC interface for fast insert/search, and replicates cleanly","archived":false,"fork":false,"pushed_at":"2025-12-11T09:16:48.000Z","size":112,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-12T05:32:19.410Z","etag":null,"topics":["embeddings","golang","grpc","redis","sharding","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elcruzo.png","metadata":{"files":{"readme":"README.MD","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-18T04:40:16.000Z","updated_at":"2025-12-11T09:16:51.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/elcruzo/vectorhub","commit_stats":null,"previous_names":["elcruzo/vectorhub"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/elcruzo/vectorhub","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcruzo%2Fvectorhub","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcruzo%2Fvectorhub/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcruzo%2Fvectorhub/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcruzo%2Fvectorhub/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elcruzo","download_url":"https://codeload.github.com/elcruzo/vectorhub/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcruzo%2Fvectorhub/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34495378,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-18T02:00:06.871Z","response_time":128,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["embeddings","golang","grpc","redis","sharding","vector-database"],"created_at":"2026-06-18T14:33:19.893Z","updated_at":"2026-06-18T14:33:21.506Z","avatar_url":"https://github.com/elcruzo.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# VectorHub\n\nA high-performance distributed vector database system designed for scale. VectorHub shards Redis for speed, exposes a gRPC interface for fast insert/search operations, and replicates cleanly for high availability. Stress-tested to handle over **1 million vector writes per minute** while keeping lookups under **100ms**.\n\n## Features\n\n- **Horizontal Scaling**: Consistent hashing-based sharding across multiple Redis instances\n- **High Performance**: Optimized for 1M+ vector operations/minute with sub-100ms search latency\n- **Replication**: Built-in replication with automatic failover and lag monitoring\n- **gRPC API**: Fast binary protocol with streaming support\n- **Multiple Distance Metrics**: Cosine similarity, Euclidean distance, and dot product\n- **Monitoring**: Prometheus metrics and health endpoints\n- **Production Ready**: Docker support, comprehensive testing, and operational tooling\n\n## Quick Start\n\n### Prerequisites\n\n- Go 1.21+\n- Docker \u0026 Docker Compose\n- Redis (for local development)\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/elcruzo/vectorhub\ncd vectorhub\n\n# Start with Docker Compose (recommended)\ndocker-compose up -d\n\n# Or build and run locally\nmake build\n./bin/vectorhub -config configs/config.yaml\n```\n\n### Basic Usage\n\n```go\npackage main\n\nimport (\n    \"context\"\n    \"log\"\n    \n    \"github.com/elcruzo/vectorhub/pkg/client\"\n)\n\nfunc main() {\n    // Connect to VectorHub\n    client, err := client.NewClient(\u0026client.Config{\n        Address: \"localhost:50051\",\n    })\n    if err != nil {\n        log.Fatal(err)\n    }\n    defer client.Close()\n    \n    ctx := context.Background()\n    \n    // Create an index\n    err = client.CreateIndex(ctx, client.CreateIndexOptions{\n        Name:         \"embeddings\",\n        Dimension:    128,\n        Metric:       \"cosine\",\n        ShardCount:   8,\n        ReplicaCount: 2,\n    })\n    \n    // Insert a vector\n    err = client.Insert(ctx, \"embeddings\", \"doc-1\", \n        []float32{0.1, 0.2, 0.3, /* ... */}, \n        map[string]string{\"category\": \"document\"})\n    \n    // Search for similar vectors\n    results, err := client.Search(ctx, \"embeddings\", \n        []float32{0.1, 0.2, 0.3, /* ... */}, \n        client.SearchOptions{\n            TopK: 10,\n            IncludeMetadata: true,\n        })\n    \n    for _, result := range results {\n        log.Printf(\"ID: %s, Score: %f\", result.ID, result.Score)\n    }\n}\n```\n\n## Architecture\n\n```\n┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐\n│   gRPC Client   │    │   gRPC Client   │    │   gRPC Client   │\n└─────────────────┘    └─────────────────┘    └─────────────────┘\n         │                       │                       │\n         └───────────────────────┼───────────────────────┘\n                                 │\n                    ┌─────────────────┐\n                    │  VectorHub      │\n                    │  gRPC Server    │\n                    └─────────────────┘\n                                 │\n                    ┌─────────────────┐\n                    │  Shard Manager  │\n                    │ (Consistent Hash)│\n                    └─────────────────┘\n                                 │\n        ┌────────────┬────────────┼────────────┬────────────┐\n        │            │            │            │            │\n   ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐\n   │ Redis 0 │  │ Redis 1 │  │ Redis 2 │  │ Redis 3 │  │ Redis N │\n   │ Primary │  │ Replica │  │ Primary │  │ Replica │  │   ...   │\n   └─────────┘  └─────────┘  └─────────┘  └─────────┘  └─────────┘\n```\n\n### Key Components\n\n- **Vector Service**: Core gRPC service handling CRUD operations\n- **Shard Manager**: Routes requests using consistent hashing\n- **Replication Manager**: Handles primary-replica synchronization\n- **Storage Layer**: Redis adapter with connection pooling\n- **Metrics Collector**: Prometheus metrics for monitoring\n\n## Performance\n\n### Benchmarks\n\n- **Insert Throughput**: 1M+ vectors/minute\n- **Search Latency**: \u003c100ms for 99th percentile\n- **Batch Operations**: 10K+ vectors per batch\n- **Memory Efficiency**: \u003c1KB overhead per vector\n\n### Optimization Features\n\n- Connection pooling and keepalive\n- Parallel batch operations\n- Efficient vector serialization\n- Query result caching\n- Background health monitoring\n\n## Configuration\n\n### Basic Configuration (config.yaml)\n\n```yaml\nserver:\n  grpc_port: 50051\n  metrics_port: 9090\n\nredis:\n  addresses:\n    - \"localhost:6379\"\n    - \"localhost:6380\"\n  password: \"\"\n  pool_size: 100\n\nsharding:\n  shard_count: 8\n  replica_count: 2\n  virtual_nodes: 150\n\nreplication:\n  factor: 2\n  sync_interval_seconds: 5\n  async_replication: true\n\nmetrics:\n  enabled: true\n  namespace: \"vectorhub\"\n```\n\n### Environment Variables\n\nAll configuration options can be overridden with environment variables:\n\n```bash\nexport VECTORHUB_REDIS__ADDRESSES=\"redis1:6379,redis2:6379\"\nexport VECTORHUB_SHARDING__SHARD_COUNT=16\nexport VECTORHUB_LOGGING__LEVEL=debug\n```\n\n## API Reference\n\n### gRPC Service Methods\n\n- `Insert(vector)` - Insert a single vector\n- `BatchInsert(vectors)` - Insert multiple vectors in parallel\n- `Search(query, options)` - Find similar vectors\n- `Get(id)` - Retrieve a vector by ID\n- `Update(id, vector)` - Update an existing vector\n- `Delete(id)` - Delete a vector\n- `CreateIndex(options)` - Create a new vector index\n- `DropIndex(name)` - Delete an index\n- `GetStats(index)` - Get index statistics\n\n### Distance Metrics\n\n- **Cosine Similarity**: `cosine` (default for normalized vectors)\n- **Euclidean Distance**: `euclidean` (good for spatial data)\n- **Dot Product**: `dot_product` (fast for high-dimensional data)\n\n## Monitoring\n\n### Metrics\n\nVectorHub exposes Prometheus metrics on `/metrics`:\n\n- `vectorhub_vectors_inserted_total`\n- `vectorhub_searches_total`\n- `vectorhub_latency_search_seconds`\n- `vectorhub_shards_status`\n- `vectorhub_replication_lag_seconds`\n\n### Health Checks\n\n- gRPC health checks: Use `grpc_health_probe`\n- HTTP health endpoint: `GET /health` on metrics port\n- Shard health monitoring with automatic failover\n\n## Development\n\n### Building\n\n```bash\n# Install dependencies\nmake install-tools\n\n# Generate protobuf code\nmake proto\n\n# Run tests\nmake test\n\n# Build binary\nmake build\n\n# Run benchmarks\nmake benchmark\n```\n\n### Testing\n\n```bash\n# Unit tests\nmake test-unit\n\n# Integration tests (requires Redis)\nmake test-integration\n\n# Benchmark tests\nmake test-benchmark\n\n# Coverage report\nmake coverage\n```\n\n### Docker Development\n\n```bash\n# Build Docker image\nmake docker-build\n\n# Run with Docker Compose\ndocker-compose up -d\n\n# View logs\ndocker-compose logs -f vectorhub\n\n# Scale Redis instances\ndocker-compose up -d --scale redis=6\n```\n\n## Production Deployment\n\n### Docker Compose\n\nThe included `docker-compose.yml` provides a production-ready setup with:\n\n- 4 Redis instances for sharding\n- VectorHub server\n- Prometheus for metrics\n- Grafana for dashboards\n\n### Kubernetes\n\nDeploy to Kubernetes using the provided manifests:\n\n```bash\nkubectl apply -f deployments/k8s/\n```\n\n### Scaling Considerations\n\n- **Horizontal Scaling**: Add more Redis shards\n- **Vertical Scaling**: Increase memory and CPU resources\n- **Replication**: Increase replica count for higher availability\n- **Load Balancing**: Use multiple VectorHub instances behind a load balancer\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch (`git checkout -b feature/amazing-feature`)\n3. Commit your changes (`git commit -m 'Add amazing feature'`)\n4. Push to the branch (`git push origin feature/amazing-feature`)\n5. Open a Pull Request\n\n### Development Guidelines\n\n- Follow Go best practices and `gofmt` formatting\n- Write comprehensive tests for new features\n- Update documentation for API changes\n- Benchmark performance-critical code\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Roadmap\n\n- [ ] Vector compression and quantization\n- [ ] GPU acceleration for similarity computation\n- [ ] Graph-based indexing (HNSW)\n- [ ] Multi-tenant isolation\n- [ ] REST API gateway\n- [ ] Vector analytics and insights\n\n## Support\n\n- **Issues**: [GitHub Issues](https://github.com/elcruzo/vectorhub/issues)\n- **Discussions**: [GitHub Discussions](https://github.com/elcruzo/vectorhub/discussions)\n- **Documentation**: [Wiki](https://github.com/elcruzo/vectorhub/wiki)\n\n---\n\n**VectorHub** - Built for scale, optimized for speed, designed for reliability.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felcruzo%2Fvectorhub","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felcruzo%2Fvectorhub","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felcruzo%2Fvectorhub/lists"}