https://github.com/torinriley/vecstream
Efficient, scalable, in memory vector database
https://github.com/torinriley/vecstream
databse db in-memory in-memory-database vector vector-database
Last synced: 2 months ago
JSON representation
Efficient, scalable, in memory vector database
- Host: GitHub
- URL: https://github.com/torinriley/vecstream
- Owner: torinriley
- Created: 2025-03-12T20:10:40.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2025-03-12T20:22:11.000Z (2 months ago)
- Last Synced: 2025-03-12T21:26:30.753Z (2 months ago)
- Topics: databse, db, in-memory, in-memory-database, vector, vector-database
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
VecStream
A lightweight, efficient vector database with similarity search capabilities, designed for machine learning and AI applications.
## Features
- Fast similarity search using optimized indexing
- HNSW indexing for significantly improved search performance
- Vector collections/namespaces for organizing different types of embeddings
- Metadata filtering for fine-grained search control
- Efficient binary storage format for vectors and metadata
- Automatic text embedding with sentence-transformers
- Rich command-line interface with beautiful output
- Cross-platform support (Windows, macOS, Linux)
- Customizable storage locations
- Metadata support for enhanced document management
- Built-in text similarity search## Installation
```bash
pip install vecstream
```## Quick Start
### Using the CLI
```bash
# Add a document
vecstream add "Machine learning is transforming technology" doc1# Search for similar documents
vecstream search "AI and machine learning" --k 3# Search with metadata filtering
vecstream search "cloud computing" --filter '{"category": "ai", "year": 2023}'# Get document by ID
vecstream get doc1# View database information
vecstream info# Create and use a collection
vecstream create_collection research
vecstream add "Neural networks research" doc2 --collection research# Use custom storage location
vecstream add "Custom storage test" doc3 --db-path "./my_vectors"# Remove a document
vecstream remove doc1
```### Using the Python API
```python
from vecstream.collections import CollectionManager
from vecstream.binary_store import BinaryVectorStore# Using collections for different vector types
manager = CollectionManager("./vector_db")
research_collection = manager.create_collection("research")
products_collection = manager.create_collection("products")# Add vectors with metadata to collections
research_collection.add_vector(
id="paper1",
vector=[1.0, 0.0, 0.0],
metadata={"topic": "AI", "year": 2023, "author": "Smith"}
)# Search with metadata filtering
results = research_collection.search_similar(
query=[1.0, 0.0, 0.0],
k=5,
filter_metadata={"year": 2023, "topic": "AI"}
)# Basic binary store usage (compatible with earlier versions)
store = BinaryVectorStore("./vector_db")# Add vectors with metadata
store.add_vector(
id="doc1",
vector=[1.0, 0.0, 0.0],
metadata={"text": "Example document", "tags": ["test"]}
)# Search similar vectors
results = store.search_similar([1.0, 0.0, 0.0], k=5)# Get vector with metadata
vector, metadata = store.get_vector_with_metadata("doc1")
```## Storage Locations
By default, VecStream stores its data in:
- Windows: `%APPDATA%/VecStream/store/`
- macOS/Linux: `~/.vecstream/store/`You can specify a custom storage location using the `--db-path` option in CLI commands or by passing the path to `CollectionManager` or `BinaryVectorStore`.
## Storage Format
VecStream uses an efficient binary storage format:
- Vectors: NumPy `.npy` format for fast access
- Metadata: JSON format for flexibility
- Automatic compression and optimization
- Collections organized in subdirectories## CLI Features
The command-line interface provides:
- **Vector Management**: Add, get, update and remove vectors with `add`, `get`, and `remove` commands
- **Similarity Search**: Fast vector search with `search` command with adjustable k-nearest neighbors
- **HNSW Indexing**: Significantly faster search performance for large datasets (up to 100x faster)
- **Collections**: Organize vectors by type with `collection create`, `collection list`, and other commands
- **Metadata Filtering**: Filter search results with `--filter '{"key": "value"}'` syntax
- **Nested Filters**: Support for dot notation in filters like `--filter '{"details.color": "red"}'`
- **Beautiful UI**: Rich, colored output and progress indicators for long operations
- **Database Stats**: View detailed database information with `info` command
- **Custom Storage**: Specify storage locations with `--db-path` option## Python API
The Python API offers:
- **HNSW Indexing**: Fast approximate nearest-neighbor search with customizable parameters:
```python
from vecstream.hnsw_index import HNSWIndex
index = HNSWIndex(dim=128, M=16, ef_construction=200)
```
- **Collections**: Organize vectors with the CollectionManager:
```python
from vecstream.collections import CollectionManager
manager = CollectionManager("./vector_db", use_hnsw=True)
collection = manager.create_collection("images")
```
- **Metadata Filtering**: Fine-grained search control:
```python
results = collection.search_similar(query, filter_metadata={"category": "electronics"})
```
- **Nested Filtering**: Access nested properties with dot notation:
```python
results = collection.search_similar(query, filter_metadata={"details.color": "black"})
```
- **Binary Storage**: Efficient serialization for large datasets:
```python
from vecstream.binary_store import BinaryVectorStore
store = BinaryVectorStore("./vector_db")
```
- **Vector Operations**: Direct access to similarity calculations, normalization, and more
- **Type Safety**: Strong typing and error handling with descriptive exceptions## Requirements
- Python 3.8 or higher
- NumPy
- SciPy
- sentence-transformers
- Rich (for CLI)
- Click (for CLI)## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Version History
- 0.3.0 (2024-03-XX)
- Added HNSW indexing for faster similarity search
- Added collections/namespaces for organizing vectors
- Added metadata filtering for search results
- Improved CLI with collection management commands
- Performance optimizations- 0.2.0 (2024-03-XX)
- Added binary vector store
- Improved persistent storage
- Enhanced CLI functionality
- Added metadata support- 0.1.0 (2024-03-XX)
- Initial release
- Basic vector storage and search functionality
- CLI interface
- Client-server architecture# Documentation
| Document | Description | Link |
|----------|-------------|------|
| API Reference | Complete reference of VecStream's classes, methods, and CLI commands | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md) |
| Advanced Usage | Detailed examples and best practices for using VecStream | [Advanced Usage](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md) |## Key Features
| Feature | Description | Documentation |
|---------|-------------|---------------|
| HNSW Indexing | Fast approximate nearest neighbor search for large datasets | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#hnswindex), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#hnsw-indexing-for-faster-search) |
| Collections | Organize vectors with metadata for better organization | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#collection), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#working-with-collections) |
| Metadata Filtering | Filter search results using metadata properties | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#metadata-filtering), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#advanced-metadata-filtering) |
| Binary Storage | Efficient storage format for large vector datasets | [API Reference](https://github.com/torinriley/VecStream/blob/main/docs/api_reference.md#binaryvectorstore), [Usage Examples](https://github.com/torinriley/VecStream/blob/main/docs/advanced_usage.md#binary-storage-for-efficiency) |