https://github.com/epsilla-cloud/vectordb

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/
https://github.com/epsilla-cloud/vectordb

ai chatgpt data data-science database embeddings embeddings-similarity infrastructure llms machine-learning neural-network neural-search rag retrieval search-engine vector-database vector-search

Last synced: about 1 year ago
JSON representation

Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/

Host: GitHub
URL: https://github.com/epsilla-cloud/vectordb
Owner: epsilla-cloud
License: gpl-3.0
Created: 2023-07-09T02:28:31.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-08-26T23:39:44.000Z (almost 2 years ago)
Last Synced: 2024-10-29T15:48:13.787Z (over 1 year ago)
Topics: ai, chatgpt, data, data-science, database, embeddings, embeddings-similarity, infrastructure, llms, machine-learning, neural-network, neural-search, rag, retrieval, search-engine, vector-database, vector-search
Language: C++
Homepage: https://www.epsilla.com
Size: 994 KB
Stars: 905
Watchers: 5
Forks: 37
Open Issues: 12
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-ChatGPT-repositories - vectordb - Epsilla is a high performance Vector Database Management System. Try out hosted Epsilla at https://cloud.epsilla.com/ (Others)
awesome-llmops - Epsilla - cloud/vectordb.svg?style=flat-square) | (Search / Vector search)
awesome-vector-search - Epsilla - A High Performance Vector Database Management System, Hippocampus For AI
awesome-infra-for-ai - epsilla-cloud/vectordb - Epsilla is a high-performance, open-source vector database management system focused on scalable and cost-effective similarity search for embedding vectors. (Model Serving & Inference / Vector Databases & Retrieval Infrastructure)

README

          






**A 10x faster, cheaper, and better vector database**

Documentation •

Discord •

Twitter •

Blog •

YouTube •

Feedback







Epsilla is an open-source vector database. Our focus is on ensuring scalability, high performance, and cost-effectiveness of vector search. EpsillaDB bridges the gap between information retrieval and memory retention in Large Language Models.

## Quick Start using Docker

**1. Run Backend in Docker**

```shell

docker pull epsilla/vectordb

docker run --pull=always -d -p 8888:8888 -v /data:/data epsilla/vectordb

```

**2. Interact with Python Client**

```shell

pip install pyepsilla

```

```python

from pyepsilla import vectordb

client = vectordb.Client(host='localhost', port='8888')

client.load_db(db_name="MyDB", db_path="/data/epsilla")

client.use_db(db_name="MyDB")

client.create_table(

    table_name="MyTable",

    table_fields=[

        {"name": "ID", "dataType": "INT", "primaryKey": True},

        {"name": "Doc", "dataType": "STRING"},

    ],

    indices=[

      {"name": "Index", "field": "Doc"},

    ]

)

client.insert(

    table_name="MyTable",

    records=[

        {"ID": 1, "Doc": "Jupiter is the largest planet in our solar system."},

        {"ID": 2, "Doc": "Cheetahs are the fastest land animals, reaching speeds over 60 mph."},

        {"ID": 3, "Doc": "Vincent van Gogh painted the famous work \"Starry Night.\""},

        {"ID": 4, "Doc": "The Amazon River is the longest river in the world."},

        {"ID": 5, "Doc": "The Moon completes one orbit around Earth every 27 days."},

    ],

)

client.query(

    table_name="MyTable",

    query_text="Celestial bodies and their characteristics",

    limit=2

)

# Result

# {

#     'message': 'Query search successfully.',

#     'result': [

#         {'Doc': 'Jupiter is the largest planet in our solar system.', 'ID': 1},

#         {'Doc': 'The Moon completes one orbit around Earth every 27 days.', 'ID': 5}

#     ],

#     'statusCode': 200

# }

```

## Features:

* High performance and production-scale similarity search for embedding vectors.

* Full fledged database management system with familiar database, table, and field concepts. Vector is just another field type.

* Metadata filtering.

* Hybrid search with a fusion of dense and sparse vectors.

* Built-in embedding support, with natural language in natural language out search experience.

* Cloud native architecture with compute storage separation, serverless, and multi-tenancy.

* Rich ecosystem integrations including LangChain and LlamaIndex.

* Python/JavaScript/Ruby clients, and REST API interface.

Epsilla's core is written in C++ and leverages the advanced academic parallel graph traversal techniques for vector indexing, achieving 10 times faster vector search than HNSW while maintaining precision levels of over 99.9%.

## Epsilla Cloud

Try our fully managed vector DBaaS at Epsilla Cloud

## (Experimental) Use Epsilla as a python library without starting a docker image

**1. Build Epsilla Python Bindings lib package**

```shell

cd engine/scripts

(If on Ubuntu, run this first: bash setup-dev.sh)

bash install_oatpp_modules.sh

cd ..

bash build.sh

ls -lh build/*.so

```

**2. Run test with python bindings lib "epsilla.so" "libvectordb_dylib.so in the folder "build" built in the previous step**

```shell

cd engine

export PYTHONPATH=./build/

export DB_PATH=/tmp/db33

python3 test/bindings/python/test.py

```

Here are some sample code:

```python

import epsilla

epsilla.load_db(db_name="db", db_path="/data/epsilla")

epsilla.use_db(db_name="db")

epsilla.create_table(

    table_name="MyTable",

    table_fields=[

        {"name": "ID", "dataType": "INT", "primaryKey": True},

        {"name": "Doc", "dataType": "STRING"},

        {"name": "EmbeddingEuclidean", "dataType": "VECTOR_FLOAT", "dimensions": 4, "metricType": "EUCLIDEAN"}

    ]

)

epsilla.insert(

    table_name="MyTable",

    records=[

        {"ID": 1, "Doc": "Berlin", "EmbeddingEuclidean": [0.05, 0.61, 0.76, 0.74]},

        {"ID": 2, "Doc": "London", "EmbeddingEuclidean": [0.19, 0.81, 0.75, 0.11]},

        {"ID": 3, "Doc": "Moscow", "EmbeddingEuclidean": [0.36, 0.55, 0.47, 0.94]}

    ]

)

(code, response) = epsilla.query(

    table_name="MyTable",

    query_field="EmbeddingEuclidean",

    response_fields=["ID", "Doc", "EmbeddingEuclidean"],

    query_vector=[0.35, 0.55, 0.47, 0.94],

    filter="ID < 6",

    limit=10,

    with_distance=True

)

print(code, response)

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/epsilla-cloud/vectordb

Awesome Lists containing this project

README