https://github.com/alash3al/vecdb

a vector embedding database with multiple storage engines and AI embedding integrations
https://github.com/alash3al/vecdb

ai database gemini machine-learning text-embedding vector-database vector-embeddings

Last synced: 9 months ago
JSON representation

a vector embedding database with multiple storage engines and AI embedding integrations

Host: GitHub
URL: https://github.com/alash3al/vecdb
Owner: alash3al
License: mit
Created: 2024-07-08T21:06:47.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2024-08-08T08:39:34.000Z (almost 2 years ago)
Last Synced: 2025-03-30T10:33:39.250Z (over 1 year ago)
Topics: ai, database, gemini, machine-learning, text-embedding, vector-database, vector-embeddings
Language: Go
Homepage:
Size: 53.7 KB
Stars: 33
Watchers: 1
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          VecDB

======

> a very simple vector embedding database, 

> you can say that it is a hash-table that let you find items similar to the item you're searching for.

Why!

====

> I'm a databases enthusiast, and this is a for fun and learning project that could be used in production ;).

> 

> **P.S**: I like to re-invent the wheel in my free time, because it is my free time!

Data Model

==========

> I'm using the `{key => value}` model,

> - `key` should be a unique value that represents the item.

> - `value` should be the vector itself (List of Floats).

Configurations

==============

> by default `vecdb` searches for `config.yml` in the current working directory.

> but you can override it using the `--config /path/to/config.yml` flag by providing your own custom file path.

```yaml

# http server related configs

server:

  # the address to listen on in the form of '[host]:port'

  listen: "0.0.0.0:3000"

# storage related configs

store:

  # the driver you want to use

  # currently vecdb supports "bolt" which is based on boltdb the in process embedded the database

  driver: "bolt"

  # the arguments required by the driver

  # for bolt, it requires a key called `database` points to the path you want to store the data in.

  args:

    database: "./vec.db"

# embeddings related configs

embedder:

  # whether to enable the embedder and all endpoints using it or not

  enabled: true

  # the driver you want to use, currently vecdb supports gemini

  driver: gemini

  # the arguments required by the driver

  # currently gemini driver requires `api_key` and `text_embedding_model`

  args:

    # by default vecdb will replace anything between ${..} with the actual value from the ENV var

    api_key: "${GEMINI_API_KEY}"

    text_embedding_model: "text-embedding-004"

```

Components

===========

- Raw Vectors Layer (low-level)

  - send [VectorWriteRequest](#VectorWriteRequest) to `POST /v1/vectors/write` when you have a vector and want to store it somewhere.

  - send [VectorSearchRequest](#VectorSearchRequest) to `POST /v1/vectors/search` when you have a vector and want to list all similar vectors' keys/ids ordered by cosine similarity in descending order.

- Embedding Layer (optional)

  - send [TextEmbeddingWriteRequest](#TextEmbeddingWriteRequest) to `POST /v1/embeddings/text/write` when you have a text and want `vecdb` to build and store the vector for you using the configured embedder (gemini for now).

  - send [TextEmbeddingSearchRequest](#TextEmbeddingSearchRequest) to `POST /v1/embeddings/text/search` when you have a text and want `vecdb` to build a vector and search for similar vectors' keys for you ordered by cosine similarity in descending order.

Requests

========

### VectorWriteRequest

```json5

{

  "bucket": "BUCKET_NAME", // consider it a collection or a table

  "key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)

  "vector": [1.929292, 0.3848484, -1.9383838383, ... ] // the vector you want to store 

}

```

### VectorSearchRequest

```json5

{

  "bucket": "BUCKET_NAME", // consider it a collection or a table

  "vector": [1.929292, 0.3848484, -1.9383838383, ... ], // you will get a list ordered by cosine-similarity in descending order

  "min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get

  "max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)

}

```

### TextEmbeddingWriteRequest

> if you set `embedder.enabled` to `true`.

```json5

{

  "bucket": "BUCKET_NAME", // consider it a collection or a table

  "key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)

  "content": "This is some text representing the product" // this will be converted to a vector using the configured embedder 

}

```

### TextEmbeddingSearchRequest

> if you set `embedder.enabled` to `true`.

```json5

{

  "bucket": "BUCKET_NAME", // consider it a collection or a table

  "content": "A Product Text", // you will get a list ordered by cosine-similarity in descending order

  "min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get

  "max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)

}

```

Download/Install

================

- [Binary](https://github.com/alash3al/vecdb/releases)

- [Docker Image](https://github.com/alash3al/vecdb/pkgs/container/vecdb)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alash3al/vecdb

Awesome Lists containing this project

README