https://github.com/0xDebabrata/citrus

(distributed) vector database
https://github.com/0xDebabrata/citrus

approximate-nearest-neighbor-search embeddings hnsw nearest-neighbor-search semantic-search semantic-search-engine similarity-search vector-database vector-search-engine vectors

Last synced: 3 months ago
JSON representation

(distributed) vector database

Host: GitHub
URL: https://github.com/0xDebabrata/citrus
Owner: 0xDebabrata
License: apache-2.0
Created: 2023-04-16T21:00:38.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-08-18T05:47:30.000Z (about 1 year ago)
Last Synced: 2025-06-26T16:18:56.389Z (3 months ago)
Topics: approximate-nearest-neighbor-search, embeddings, hnsw, nearest-neighbor-search, semantic-search, semantic-search-engine, similarity-search, vector-database, vector-search-engine, vectors
Language: Python
Homepage: https://searchcitrus.com
Size: 935 KB
Stars: 104
Watchers: 3
Forks: 13
Open Issues: 2
Metadata Files:
- Readme: README.md
- Funding: FUNDING.yml
- License: LICENSE

Awesome Lists containing this project

awesome-vector-databases - citrus - A distributed vector database designed for scalable and efficient vector similarity search. It is purpose-built for handling large-scale vector data and search workloads. ([Read more](/details/citrus.md)) `open-source` `distributed` `vector search` `scalable` (Vector Database Engines)

README

# 🍋 citrus.
### open-source (distributed) vector database

## Installation

```bash
pip install citrusdb
```

## Getting started

#### 1. Create index
```py
import citrusdb

# Initialize client
citrus = citrusdb.Client()

# Create index
citrus.create_index(
name="example",
max_elements=1000, # increases dynamically as you insert more vectors
)
```

#### 2. Insert elements
```py
ids = [1, 2, 3]
documents = [
"Your time is limited, so don't waste it living someone else's life",
"I'd rather be optimistic and wrong than pessimistic and right.",
"Running a start-up is like chewing glass and staring into the abyss."
]

citrus.add(index="example", ids=ids, documents=documents)
```
You can directly pass vector embeddings as well. If you're passing a list of strings like we have done here, ensure you have your `OPENAI_API_KEY` in the environment. By default we use OpenAI to to generate the embeddings. Please reach out if you're looking for support from a different provider!

#### 3. Search
```py
results = citrus.query(
index="example",
documents=["What is it like to launch a startup"],
k=1,
include=["document", "metadata"]
)

print(results)
```
You can specify if you want the associated text document and metadata to be returned.
By default, only the IDs are returned.

Go launch a repl on [Replit](https://replit.com) and see what result you get after running the query! `result` will contain the `ids` of the top `k` search hits.

## Example
[chat w/ replit ai podcast](https://replit.searchcitrus.com)

[pokedex search](https://replit.com/@debabratajr/pokedex-search)

## Facing issues?
Feel free to open issues on this repository! Discord server coming soon!

*PS: citrus isn't fully distributed just yet. We're getting there though ;)*

---

Special thanks to

DevKit - The Essential Developer Toolkit

DSoC 2023

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/0xDebabrata/citrus

Awesome Lists containing this project

README