{"id":26478850,"url":"https://github.com/torinriley/vecstream","last_synced_at":"2026-03-06T13:04:06.848Z","repository":{"id":282108778,"uuid":"947515838","full_name":"torinriley/VecStream","owner":"torinriley","description":"Efficient, scalable, and lightweight vector database","archived":false,"fork":false,"pushed_at":"2025-03-30T02:07:30.000Z","size":302,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-30T13:27:00.247Z","etag":null,"topics":["databse","db","in-memory","in-memory-database","vector","vector-database"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/torinriley.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-12T20:10:40.000Z","updated_at":"2025-03-30T02:07:26.000Z","dependencies_parsed_at":"2025-07-12T21:02:33.880Z","dependency_job_id":null,"html_url":"https://github.com/torinriley/VecStream","commit_stats":null,"previous_names":["torinriley/vecstream"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/torinriley/VecStream","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FVecStream","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FVecStream/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FVecStream/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FVecStream/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/torinriley","download_url":"https://codeload.github.com/torinriley/VecStream/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FVecStream/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30178286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T12:39:21.703Z","status":"ssl_error","status_checked_at":"2026-03-06T12:36:09.819Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["databse","db","in-memory","in-memory-database","vector","vector-database"],"created_at":"2025-03-20T01:22:34.140Z","updated_at":"2026-03-06T13:04:06.814Z","avatar_url":"https://github.com/torinriley.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003ch1 align=\"center\"\u003eVecStream\u003c/h1\u003e\n  \u003ch3 align=\"center\"\u003e\n    A lightweight, efficient vector database with similarity search capabilities, designed for machine learning and AI applications.\n  \u003c/h3\u003e\n\n  \u003cp align=\"center\"\u003e\n    \u003ca href=\"https://github.com/torinriley/VecStream/actions/workflows/tests.yml\"\u003e\n      \u003cimg src=\"https://github.com/torinriley/VecStream/actions/workflows/tests.yml/badge.svg\" alt=\"Tests\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/torinriley/VecStream/actions/workflows/benchmarks.yml\"\u003e\n      \u003cimg src=\"https://github.com/torinriley/VecStream/actions/workflows/benchmarks.yml/badge.svg\" alt=\"Benchmarks\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://badge.fury.io/py/vecstream\"\u003e\n      \u003cimg src=\"https://badge.fury.io/py/vecstream.svg\" alt=\"PyPI version\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pypi.org/project/vecstream/\"\u003e\n      \u003cimg src=\"https://img.shields.io/pypi/pyversions/vecstream.svg\" alt=\"Python versions\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/torinriley/VecStream/blob/main/LICENSE\"\u003e\n      \u003cimg src=\"https://img.shields.io/github/license/torinriley/VecStream.svg\" alt=\"License\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://pepy.tech/project/vecstream\"\u003e\n      \u003cimg src=\"https://static.pepy.tech/badge/vecstream\" alt=\"Downloads\"\u003e\n    \u003c/a\u003e\n    \u003ca href=\"https://github.com/torinriley/VecStream/issues\"\u003e\n      \u003cimg src=\"https://img.shields.io/github/issues/torinriley/VecStream.svg\" alt=\"GitHub issues\"\u003e\n    \u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\n\n## Features\n\n- Fast similarity search using optimized indexing\n- HNSW indexing for significantly improved search performance\n- Vector collections/namespaces for organizing different types of embeddings\n- Metadata filtering for fine-grained search control\n- Efficient binary storage format for vectors and metadata\n- Automatic text embedding with sentence-transformers\n- Rich command-line interface with beautiful output\n- Cross-platform support (Windows, macOS, Linux)\n- Customizable storage locations\n- Metadata support for enhanced document management\n- Built-in text similarity search\n\n## Installation\n\n```bash\npip install vecstream\n```\n\n## Quick Start\n\n### Using the CLI\n\n```bash\n# Add a document\nvecstream add \"Machine learning is transforming technology\" doc1\n\n# Search for similar documents\nvecstream search \"AI and machine learning\" --k 3\n\n# Search with metadata filtering\nvecstream search \"cloud computing\" --filter '{\"category\": \"ai\", \"year\": 2023}'\n\n# Get document by ID\nvecstream get doc1\n\n# View database information\nvecstream info\n\n# Create and use a collection\nvecstream create_collection research\nvecstream add \"Neural networks research\" doc2 --collection research\n\n# Use custom storage location\nvecstream add \"Custom storage test\" doc3 --db-path \"./my_vectors\"\n\n# Remove a document\nvecstream remove doc1\n```\n\n### Using the Python API\n\n```python\nfrom vecstream.collections import CollectionManager\nfrom vecstream.binary_store import BinaryVectorStore\n\n# Using collections for different vector types\nmanager = CollectionManager(\"./vector_db\")\nresearch_collection = manager.create_collection(\"research\")\nproducts_collection = manager.create_collection(\"products\")\n\n# Add vectors with metadata to collections\nresearch_collection.add_vector(\n    id=\"paper1\",\n    vector=[1.0, 0.0, 0.0],\n    metadata={\"topic\": \"AI\", \"year\": 2023, \"author\": \"Smith\"}\n)\n\n# Search with metadata filtering\nresults = research_collection.search_similar(\n    query=[1.0, 0.0, 0.0],\n    k=5,\n    filter_metadata={\"year\": 2023, \"topic\": \"AI\"}\n)\n\n# Basic binary store usage (compatible with earlier versions)\nstore = BinaryVectorStore(\"./vector_db\")\n\n# Add vectors with metadata\nstore.add_vector(\n    id=\"doc1\",\n    vector=[1.0, 0.0, 0.0],\n    metadata={\"text\": \"Example document\", \"tags\": [\"test\"]}\n)\n\n# Search similar vectors\nresults = store.search_similar([1.0, 0.0, 0.0], k=5)\n\n# Get vector with metadata\nvector, metadata = store.get_vector_with_metadata(\"doc1\")\n```\n\n## Storage Locations\n\nBy default, VecStream stores its data in:\n- Windows: `%APPDATA%/VecStream/store/`\n- macOS/Linux: `~/.vecstream/store/`\n\nYou can specify a custom storage location using the `--db-path` option in CLI commands or by passing the path to `CollectionManager` or `BinaryVectorStore`.\n\n## Storage Format\n\nVecStream uses an efficient binary storage format:\n- Vectors: NumPy `.npy` format for fast access\n- Metadata: JSON format for flexibility\n- Automatic compression and optimization\n- Collections organized in subdirectories\n\n## CLI Features\n\nThe command-line interface provides:\n- **Vector Management**: Add, get, update and remove vectors with `add`, `get`, and `remove` commands\n- **Similarity Search**: Fast vector search with `search` command with adjustable k-nearest neighbors\n- **HNSW Indexing**: Significantly faster search performance for large datasets (up to 100x faster)\n- **Collections**: Organize vectors by type with `collection create`, `collection list`, and other commands\n- **Metadata Filtering**: Filter search results with `--filter '{\"key\": \"value\"}'` syntax\n- **Nested Filters**: Support for dot notation in filters like `--filter '{\"details.color\": \"red\"}'`\n- **Beautiful UI**: Rich, colored output and progress indicators for long operations\n- **Database Stats**: View detailed database information with `info` command\n- **Custom Storage**: Specify storage locations with `--db-path` option\n\n## Python API\n\nThe Python API offers:\n- **HNSW Indexing**: Fast approximate nearest-neighbor search with customizable parameters:\n  ```python\n  from vecstream.hnsw_index import HNSWIndex\n  index = HNSWIndex(dim=128, M=16, ef_construction=200)\n  ```\n- **Collections**: Organize vectors with the CollectionManager:\n  ```python\n  from vecstream.collections import CollectionManager\n  manager = CollectionManager(\"./vector_db\", use_hnsw=True)\n  collection = manager.create_collection(\"images\")\n  ```\n- **Metadata Filtering**: Fine-grained search control:\n  ```python\n  results = collection.search_similar(query, filter_metadata={\"category\": \"electronics\"})\n  ```\n- **Nested Filtering**: Access nested properties with dot notation:\n  ```python\n  results = collection.search_similar(query, filter_metadata={\"details.color\": \"black\"})\n  ```\n- **Binary Storage**: Efficient serialization for large datasets:\n  ```python\n  from vecstream.binary_store import BinaryVectorStore\n  store = BinaryVectorStore(\"./vector_db\")\n  ```\n- **Vector Operations**: Direct access to similarity calculations, normalization, and more\n- **Type Safety**: Strong typing and error handling with descriptive exceptions\n\n## Requirements\n\n- Python 3.8 or higher\n- NumPy\n- SciPy\n- sentence-transformers\n- Rich (for CLI)\n- Click (for CLI)\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## Version History\n\n- 0.3.0 (2024-03-XX)\n  - Added HNSW indexing for faster similarity search\n  - Added collections/namespaces for organizing vectors\n  - Added metadata filtering for search results\n  - Improved CLI with collection management commands\n  - Performance optimizations\n\n- 0.2.0 (2024-03-XX)\n  - Added binary vector store\n  - Improved persistent storage\n  - Enhanced CLI functionality\n  - Added metadata support\n\n- 0.1.0 (2024-03-XX)\n  - Initial release\n  - Basic vector storage and search functionality\n  - CLI interface\n  - Client-server architecture\n\n\n\n# Documentation\n\n| Document | Description | Link |\n|----------|-------------|------|\n| API Reference | Complete reference of VecStream's classes, methods, and CLI commands | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md) |\n| Advanced Usage | Detailed examples and best practices for using VecStream | [Advanced Usage](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md) |\n\n## Key Features\n\n| Feature | Description | Documentation |\n|---------|-------------|---------------|\n| HNSW Indexing | Fast approximate nearest neighbor search for large datasets | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#hnswindex), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#hnsw-indexing-for-faster-search) |\n| Collections | Organize vectors with metadata for better organization | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#collection), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#working-with-collections) |\n| Metadata Filtering | Filter search results using metadata properties | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#metadata-filtering), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#advanced-metadata-filtering) |\n| Binary Storage | Efficient storage format for large vector datasets | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#binaryvectorstore), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#binary-storage-for-efficiency) |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftorinriley%2Fvecstream","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftorinriley%2Fvecstream","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftorinriley%2Fvecstream/lists"}