https://github.com/alash3al/vecdb
a vector embedding database with multiple storage engines and AI embedding integrations
https://github.com/alash3al/vecdb
ai database gemini machine-learning text-embedding vector-database vector-embeddings
Last synced: 8 months ago
JSON representation
a vector embedding database with multiple storage engines and AI embedding integrations
- Host: GitHub
- URL: https://github.com/alash3al/vecdb
- Owner: alash3al
- License: mit
- Created: 2024-07-08T21:06:47.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-08T08:39:34.000Z (almost 2 years ago)
- Last Synced: 2025-03-30T10:33:39.250Z (about 1 year ago)
- Topics: ai, database, gemini, machine-learning, text-embedding, vector-database, vector-embeddings
- Language: Go
- Homepage:
- Size: 53.7 KB
- Stars: 33
- Watchers: 1
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
VecDB
======
> a very simple vector embedding database,
> you can say that it is a hash-table that let you find items similar to the item you're searching for.
Why!
====
> I'm a databases enthusiast, and this is a for fun and learning project that could be used in production ;).
>
> **P.S**: I like to re-invent the wheel in my free time, because it is my free time!
Data Model
==========
> I'm using the `{key => value}` model,
> - `key` should be a unique value that represents the item.
> - `value` should be the vector itself (List of Floats).
Configurations
==============
> by default `vecdb` searches for `config.yml` in the current working directory.
> but you can override it using the `--config /path/to/config.yml` flag by providing your own custom file path.
```yaml
# http server related configs
server:
# the address to listen on in the form of '[host]:port'
listen: "0.0.0.0:3000"
# storage related configs
store:
# the driver you want to use
# currently vecdb supports "bolt" which is based on boltdb the in process embedded the database
driver: "bolt"
# the arguments required by the driver
# for bolt, it requires a key called `database` points to the path you want to store the data in.
args:
database: "./vec.db"
# embeddings related configs
embedder:
# whether to enable the embedder and all endpoints using it or not
enabled: true
# the driver you want to use, currently vecdb supports gemini
driver: gemini
# the arguments required by the driver
# currently gemini driver requires `api_key` and `text_embedding_model`
args:
# by default vecdb will replace anything between ${..} with the actual value from the ENV var
api_key: "${GEMINI_API_KEY}"
text_embedding_model: "text-embedding-004"
```
Components
===========
- Raw Vectors Layer (low-level)
- send [VectorWriteRequest](#VectorWriteRequest) to `POST /v1/vectors/write` when you have a vector and want to store it somewhere.
- send [VectorSearchRequest](#VectorSearchRequest) to `POST /v1/vectors/search` when you have a vector and want to list all similar vectors' keys/ids ordered by cosine similarity in descending order.
- Embedding Layer (optional)
- send [TextEmbeddingWriteRequest](#TextEmbeddingWriteRequest) to `POST /v1/embeddings/text/write` when you have a text and want `vecdb` to build and store the vector for you using the configured embedder (gemini for now).
- send [TextEmbeddingSearchRequest](#TextEmbeddingSearchRequest) to `POST /v1/embeddings/text/search` when you have a text and want `vecdb` to build a vector and search for similar vectors' keys for you ordered by cosine similarity in descending order.
Requests
========
### VectorWriteRequest
```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
"vector": [1.929292, 0.3848484, -1.9383838383, ... ] // the vector you want to store
}
```
### VectorSearchRequest
```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"vector": [1.929292, 0.3848484, -1.9383838383, ... ], // you will get a list ordered by cosine-similarity in descending order
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}
```
### TextEmbeddingWriteRequest
> if you set `embedder.enabled` to `true`.
```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"key": "product-id-1", // should be unique and represents a valid value in your main data store (example: the row id in your mysql/postgres ... etc)
"content": "This is some text representing the product" // this will be converted to a vector using the configured embedder
}
```
### TextEmbeddingSearchRequest
> if you set `embedder.enabled` to `true`.
```json5
{
"bucket": "BUCKET_NAME", // consider it a collection or a table
"content": "A Product Text", // you will get a list ordered by cosine-similarity in descending order
"min_cosine_similarity": 0.0, // the more you increase, the fewer data you will get
"max_result_count": 10 // max vectors to return (vecdb will first order by cosine similarity then apply the limit)
}
```
Download/Install
================
- [Binary](https://github.com/alash3al/vecdb/releases)
- [Docker Image](https://github.com/alash3al/vecdb/pkgs/container/vecdb)