https://github.com/rryam/vecturakit

Swift-based vector database for on-device RAG using MLTensor
https://github.com/rryam/vecturakit

mlx-swift rag swift

Last synced: 4 months ago
JSON representation

Swift-based vector database for on-device RAG using MLTensor

Host: GitHub
URL: https://github.com/rryam/vecturakit
Owner: rryam
License: mit
Created: 2025-01-17T06:12:21.000Z (5 months ago)
Default Branch: main
Last Pushed: 2025-01-25T12:14:06.000Z (4 months ago)
Last Synced: 2025-01-25T13:19:48.313Z (4 months ago)
Topics: mlx-swift, rag, swift
Language: Swift
Homepage:
Size: 65.4 KB
Stars: 31
Watchers: 2
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # VecturaKit

VecturaKit is a Swift-based vector database designed for on-device apps, enabling user experiences through local vector storage and retrieval. Inspired by [Dripfarm's SVDB](https://github.com/Dripfarm/SVDB), **VecturaKit** uses `MLTensor` and [`swift-embeddings`](https://github.com/jkrukowski/swift-embeddings).

## Features

- On-Device Storage: Maintain data privacy and reduce latency by storing vectors directly on the device.

- Batch Processing: Efficiently add multiple documents in parallel.

- Persistent Storage: Documents are automatically saved and loaded between sessions.

- Configurable Search: Customize search results with thresholds and result limits.

## Installation

To integrate VecturaKit into your project using Swift Package Manager, add the following dependency in your `Package.swift` file:

```swift

dependencies: [

    .package(url: "https://github.com/rryam/VecturaKit.git", branch: "main"),

],

```

## Usage

1. Import VecturaKit

```swift

import VecturaKit

```

2. Create Configuration and Initialize Database

```swift

let config = VecturaConfig(

    name: "my-vector-db",

    dimension: 384,  // Matches the default BERT model dimension

    searchOptions: VecturaConfig.SearchOptions(

        defaultNumResults: 10,

        minThreshold: 0.7

    )

)

let vectorDB = try VecturaKit(config: config)

```

3. Add Documents

Single document:

```swift

let text = "Sample text to be embedded"

let documentId = try await vectorDB.addDocument(

    text: text,

    id: UUID(),  // Optional, will be generated if not provided

    modelId: "sentence-transformers/all-MiniLM-L6-v2"  // Optional, this is the default

)

```

Multiple documents in batch:

```swift

let texts = [

    "First document text",

    "Second document text",

    "Third document text"

]

let documentIds = try await vectorDB.addDocuments(

    texts: texts,

    ids: nil,  // Optional array of UUIDs

    modelId: "sentence-transformers/all-MiniLM-L6-v2"

)

```

4. Search Documents

Search by text:

```swift

let results = try await vectorDB.search(

    query: "search query",

    numResults: 5,  // Optional

    threshold: 0.8,  // Optional

    modelId: "sentence-transformers/all-MiniLM-L6-v2"  // Optional

)

for result in results {

    print("Document ID: \(result.id)")

    print("Text: \(result.text)")

    print("Similarity Score: \(result.score)")

    print("Created At: \(result.createdAt)")

}

```

Search by vector embedding:

```swift

let results = try await vectorDB.search(

    query: embeddingArray,  // [Float] matching config.dimension

    numResults: 5,  // Optional

    threshold: 0.8  // Optional

)

```

5. Document Management

Update document:

```swift

try await vectorDB.updateDocument(

    id: documentId,

    newText: "Updated text",

    modelId: "sentence-transformers/all-MiniLM-L6-v2"  // Optional

)

```

Delete documents:

```swift

try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])

```

Reset database:

```swift

try await vectorDB.reset()

```

## Command Line Interface

VecturaKit comes with a built-in CLI tool for database operations:

```bash

# Add documents

vectura add "First document" "Second document" "Third document" \

  --db-name "my-vector-db" \

  --dimension 384 \

  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Search documents

vectura search "search query" \

  --db-name "my-vector-db" \

  --dimension 384 \

  --threshold 0.7 \

  --num-results 5 \

  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Update document

vectura update  "Updated text content" \

  --db-name "my-vector-db" \

  --dimension 384 \

  --model-id "sentence-transformers/all-MiniLM-L6-v2"

# Delete documents

vectura delete   \

  --db-name "my-vector-db" \

  --dimension 384

# Reset database

vectura reset \

  --db-name "my-vector-db" \

  --dimension 384

# Run demo with sample data

vectura mock \

  --db-name "my-vector-db" \

  --dimension 384 \

  --threshold 0.7 \

  --num-results 10 \

  --model-id "sentence-transformers/all-MiniLM-L6-v2"

```

Common options:

- `--db-name, -d`: Database name (default: "vectura-cli-db")

- `--dimension, -v`: Vector dimension (default: 384)

- `--threshold, -t`: Minimum similarity threshold (default: 0.7)

- `--num-results, -n`: Number of results to return (default: 10)

- `--model-id, -m`: Model ID for embeddings (default: "sentence-transformers/all-MiniLM-L6-v2")

## Contributing

Contributions are welcome! Please fork the repository and submit a pull request with your improvements.

## License

VecturaKit is released under the MIT License. See the LICENSE file for more information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/rryam/vecturakit

Awesome Lists containing this project

README