Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/awa-ai/awadb
AI Native database for embedding vectors
https://github.com/awa-ai/awadb
ai-native aigc chatgpt embedding-vectors llm vectordb
Last synced: 3 months ago
JSON representation
AI Native database for embedding vectors
- Host: GitHub
- URL: https://github.com/awa-ai/awadb
- Owner: awa-ai
- License: apache-2.0
- Created: 2023-05-19T16:22:02.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-01-11T14:00:38.000Z (10 months ago)
- Last Synced: 2024-05-17T13:42:47.554Z (6 months ago)
- Topics: ai-native, aigc, chatgpt, embedding-vectors, llm, vectordb
- Language: C++
- Homepage: https://ljeagle.github.io/awadb
- Size: 4.13 MB
- Stars: 159
- Watchers: 6
- Forks: 14
- Open Issues: 4
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Roadmap: ROADMAP.md
Awesome Lists containing this project
- awesome-llmops - Awadb - ai/awadb.svg?style=flat-square) | (Search / Vector search)
README
# AwaDB - AI Native Database for embedding vectors
Easily Use - No boring database schema definition. No need to pay attention to vector indexing details.
Realtime Search - Lock free realtime index keeps new data fresh with millisecond level latency. No wait no manual operation.
Stability - AwaDB builds upon over 5 years experience running production workloads at scale using a system called [Vearch](https://github.com/vearch/vearch), combined with best-of-breed ideas and practices from the community.
## Run awadb locally on Mac OSX or Linux
First install awadb:
```bash
pip3 install awadb
```Then use as below:
```bash
import awadb
# 1. Initialize awadb client!
awadb_client = awadb.Client()# 2. Create table
awadb_client.Create("test_llm1")# 3. Add sentences, the sentence is embedded with SentenceTransformer by default
# You can also embed the sentences all by yourself with OpenAI or other LLMs
awadb_client.Add([{'embedding_text':'The man is happy'}, {'source' : 'pic1'}])
awadb_client.Add([{'embedding_text':'The man is very happy'}, {'source' : 'pic2'}])
awadb_client.Add([{'embedding_text':'The cat is happy'}, {'source' : 'pic3'}])
awadb_client.Add([{'embedding_text':'The man is eating'}, {'source':'pic4'}])# 4. Search the most Top3 sentences by the specified query
query = "The man is happy"
results = awadb_client.Search(query, 3)# Output the results
print(results)
```
Here the text is embedded by SentenceTransformer which is supported by [Hugging Face](https://huggingface.co)
More detailed python local library usage you can read [here](https://ljeagle.github.io/awadb/)## Run AwaDB as a service
If you are on the Windows platform or want a awadb service, you can download and deploy the awadb docker.
The installation of awadb docker please see [here](https://github.com/awa-ai/awadb/tree/main/docs/source/docker_deploy.md)- Python Usage
First, Install gRPC and awadb service python client as below:
```bash
pip3 install grpcio
pip3 install awadb-client
```A simple example as below:
```bash
# Import the package and module
from awadb_client import Awa# Initialize awadb client
client = Awa()# Add dict with vector to table 'example1'
client.add("example1", {'name':'david', 'feature':[1.3, 2.5, 1.9]})
client.add("example1", {'name':'jim', 'feature':[1.1, 1.4, 2.3]})# Search
results = client.search("example1", [1.0, 2.0, 3.0])# Output results
print(results)# '_id' is the primary key of each document
# It can be specified clearly when adding documents
# Here no field '_id' is specified, it is generated by the awadb server
db_name: "default"
table_name: "example1"
results {
total: 2
msg: "Success"
result_items {
score: 0.860000074
fields {
name: "_id"
value: "64ddb69d-6038-4311-9118-605686d758d9"
}
fields {
name: "name"
value: "jim"
}
}
result_items {
score: 1.55
fields {
name: "_id"
value: "f9f3035b-faaf-48d4-a947-801416c005b3"
}
fields {
name: "name"
value: "david"
}
}
}
result_code: SUCCESS
```
More python sdk for service is [here](https://ljeagle.github.io/awadb/)- RESTful Usage
```bash
# add documents to table 'test' of db 'default', no need to create table first
curl -H "Content-Type: application/json" -X POST -d '{"db":"default", "table":"test", "docs":[{"_id":1, "name":"lj", "age":23, "f":[1,0]},{"_id":2, "name":"david", "age":32, "f":[1,2]}]}' http://localhost:8080/add# search documents by the vector field 'f' of the value '[1, 1]'
curl -H "Content-Type: application/json" -X POST -d '{"db":"default", "table":"test", "vector_query":{"f":[1, 1]}}' http://localhost:8080/search
```
More detailed RESTful API is [here](https://github.com/awa-ai/awadb/tree/main/docs/source/restful_tutorial.md)## What are the Embeddings?
Any unstructured data(image/text/audio/video) can be transferred to vectors which are generally understanded by computers through AI(LLMs or other deep neural networks).
For example, "The man is happy"-this sentence can be transferred to a 384-dimension vector(a list of numbers `[0.23, 1.98, ....]`) by SentenceTransformer language model. This process is called embedding.More detailed information about embeddings can be read from [OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings)
Awadb uses [Sentence Transformers](https://huggingface.co/sentence-transformers) to embed the sentence by default, while you can also use OpenAI or other LLMs to do the embeddings according to your needs.
## Get involved
- [Issues and PR](https://github.com/awa-ai/awadb/issues)
- [Roadmap and Contribution](https://github.com/awa-ai/awadb/blob/main/ROADMAP.md)## License
[Apache 2.0](./LICENSE)
## Community
Join the AwaDB community to share any problem, suggestion, or discussion with us:
- [Discord](https://discord.gg/GP7QxRrDjB)
- [Slack](https://awadbhq.slack.com)
- [Reddit](https://www.reddit.com/r/Awadb/)