https://github.com/awa-ai/awadb

AI Native database for embedding vectors
https://github.com/awa-ai/awadb

ai-native aigc chatgpt embedding-vectors llm vectordb

Last synced: over 1 year ago
JSON representation

AI Native database for embedding vectors

Host: GitHub
URL: https://github.com/awa-ai/awadb
Owner: awa-ai
License: apache-2.0
Created: 2023-05-19T16:22:02.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-01-11T14:00:38.000Z (over 2 years ago)
Last Synced: 2024-05-17T13:42:47.554Z (about 2 years ago)
Topics: ai-native, aigc, chatgpt, embedding-vectors, llm, vectordb
Language: C++
Homepage: https://ljeagle.github.io/awadb
Size: 4.13 MB
Stars: 159
Watchers: 6
Forks: 14
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: ROADMAP.md

Awesome Lists containing this project

awesome-llmops - Awadb - ai/awadb.svg?style=flat-square) | (Search / Vector search)

README

          # AwaDB - AI Native Database for embedding vectors

Easily Use - No boring database schema definition. No need to pay attention to vector indexing details.  

Realtime Search - Lock free realtime index keeps new data fresh with millisecond level latency. No wait no manual operation.  

Stability - AwaDB builds upon over 5 years experience running production workloads at scale using a system called [Vearch](https://github.com/vearch/vearch), combined with best-of-breed ideas and practices from the community.

## Run awadb locally on Mac OSX or Linux

First install awadb:

```bash

pip3 install awadb

```

Then use as below:

```bash

import awadb

# 1. Initialize awadb client!

awadb_client = awadb.Client()

# 2. Create table

awadb_client.Create("test_llm1") 

# 3. Add sentences, the sentence is embedded with SentenceTransformer by default

#    You can also embed the sentences all by yourself with OpenAI or other LLMs

awadb_client.Add([{'embedding_text':'The man is happy'}, {'source' : 'pic1'}])

awadb_client.Add([{'embedding_text':'The man is very happy'}, {'source' : 'pic2'}])

awadb_client.Add([{'embedding_text':'The cat is happy'}, {'source' : 'pic3'}])

awadb_client.Add([{'embedding_text':'The man is eating'}, {'source':'pic4'}])

# 4. Search the most Top3 sentences by the specified query

query = "The man is happy"

results = awadb_client.Search(query, 3)

# Output the results

print(results)

```

Here the text is embedded by SentenceTransformer which is supported by [Hugging Face](https://huggingface.co)  

More detailed python local library usage you can read [here](https://ljeagle.github.io/awadb/)

## Run AwaDB as a service 

If you are on the Windows platform or want a awadb service, you can download and deploy the awadb docker.

The installation of awadb docker please see [here](https://github.com/awa-ai/awadb/tree/main/docs/source/docker_deploy.md)

- Python Usage

First, Install gRPC and awadb service python client as below:

```bash

pip3 install grpcio

pip3 install awadb-client

```

A simple example as below:

```bash

# Import the package and module

from awadb_client import Awa

# Initialize awadb client

client = Awa()

# Add dict with vector to table 'example1'

client.add("example1", {'name':'david', 'feature':[1.3, 2.5, 1.9]})

client.add("example1", {'name':'jim', 'feature':[1.1, 1.4, 2.3]})

# Search

results = client.search("example1", [1.0, 2.0, 3.0])

# Output results

print(results)

# '_id' is the primary key of each document

# It can be specified clearly when adding documents

# Here no field '_id' is specified, it is generated by the awadb server 

db_name: "default"

table_name: "example1"

results {

  total: 2

  msg: "Success"

  result_items {

    score: 0.860000074

    fields {

      name: "_id" 

      value: "64ddb69d-6038-4311-9118-605686d758d9"

    }

    fields {

      name: "name"

      value: "jim"

    }

  }

  result_items {

    score: 1.55

    fields {

      name: "_id"

      value: "f9f3035b-faaf-48d4-a947-801416c005b3"

    }

    fields {

      name: "name"

      value: "david"

    }

  }

}

result_code: SUCCESS

```

More python sdk for service is [here](https://ljeagle.github.io/awadb/)  

- RESTful Usage

```bash

# add documents to table 'test' of db 'default', no need to create table first

curl -H "Content-Type: application/json" -X POST -d '{"db":"default", "table":"test", "docs":[{"_id":1, "name":"lj", "age":23, "f":[1,0]},{"_id":2, "name":"david", "age":32, "f":[1,2]}]}' http://localhost:8080/add

# search documents by the vector field 'f' of the value '[1, 1]'

curl -H "Content-Type: application/json" -X POST -d '{"db":"default", "table":"test", "vector_query":{"f":[1, 1]}}' http://localhost:8080/search

```

More detailed RESTful API is [here](https://github.com/awa-ai/awadb/tree/main/docs/source/restful_tutorial.md)

## What are the Embeddings?

Any unstructured data(image/text/audio/video) can be transferred to vectors which are generally understanded by computers through AI(LLMs or other deep neural networks).   

  

For example, "The man is happy"-this sentence can be transferred to a 384-dimension vector(a list of numbers `[0.23, 1.98, ....]`) by SentenceTransformer language model. This process is called embedding.

More detailed information about embeddings can be read from [OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)

Awadb uses [Sentence Transformers](https://huggingface.co/sentence-transformers) to embed the sentence by default, while you can also use OpenAI or other LLMs to do the embeddings according to your needs.

## Get involved

- [Issues and PR](https://github.com/awa-ai/awadb/issues)  

- [Roadmap and Contribution](https://github.com/awa-ai/awadb/blob/main/ROADMAP.md)

## License

[Apache 2.0](./LICENSE)

## Community

Join the AwaDB community to share any problem, suggestion, or discussion with us:

- [Discord](https://discord.gg/GP7QxRrDjB)

- [Slack](https://awadbhq.slack.com)

- [Reddit](https://www.reddit.com/r/Awadb/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/awa-ai/awadb

Awesome Lists containing this project

README