https://github.com/jina-ai/executor-annlite-indexer

Last synced: 7 months ago
JSON representation

Host: GitHub
URL: https://github.com/jina-ai/executor-annlite-indexer
Owner: jina-ai
Created: 2022-05-02T10:41:36.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-11-24T08:46:25.000Z (almost 3 years ago)
Last Synced: 2025-03-07T03:46:28.762Z (7 months ago)
Language: Python
Size: 38.1 KB
Stars: 1
Watchers: 25
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          
# AnnLiteIndexer

AnnLiteIndexer indexes Documents into a `DocumentArray`  using `storage='annlite'`. Underneath, the `DocumentArray`  uses 

 [AnnLite](https://github.com/jina-ai/annlite) to store and search Documents efficiently. 

## Usage

#### via Docker image (recommended)

```python

from jina import Flow

	

f = Flow().add(uses='jinahub+docker://AnnLiteIndexer')

```

#### via source code

```python

from jina import Flow

	

f = Flow().add(uses='jinahub://AnnLiteIndexer')

```

- To override `__init__` args & kwargs, use `.add(..., uses_with: {'key': 'value'})`

- To override class metas, use `.add(..., uses_metas: {'key': 'value})`

## Vector Search

The following example shows how to perform vector search using`f.post(on='/search', inputs=[Document(embedding=np.array([1,1]))])`.

```python

from jina import Flow

from docarray import Document

import numpy as np

f = Flow().add(

         uses='jinahub://AnnLiteIndexer',

         uses_with={'n_dim': 2},

     )

with f:

    f.post(

        on='/index',

        inputs=[

            Document(id='a', embedding=np.array([1, 3])),

            Document(id='b', embedding=np.array([1, 1])),

        ],

    )

    docs = f.post(

        on='/search',

        inputs=[Document(embedding=np.array([1, 1]))],

    )

# will print "The ID of the best match of [1,1] is: b"

print('The ID of the best match of [1,1] is: ', docs[0].matches[0].id)

```

### Filter the indexed Documents:

You can filter the indexed Documents by calling the `/filter` endpoint.

For instance :

```python

from jina import Flow

f = Flow().add(

    uses='jinahub+docker://AnnLiteIndexer',

    uses_with={

        'data_path': 'data_path/',

        'n_dim': 256,

        'columns': [('price', 'float')],

    },

)

```

Then you can pass a filter as a parameters when searching for document:

```python

from docarray import Document, DocumentArray

import numpy as np

docs = DocumentArray(

    [

        Document(id=f'r{i}', embedding=np.random.rand(3), tags={'price': i})

        for i in range(50)

    ]

)

filter_ = {'price': {'$eq': 3}}

with f:

    f.index(docs)

    response_docs = f.post(on='/filter', parameters={'filter': filter_})

    print(response_docs[:,'tags__price'])

>>>

```

### Using filtering in search

To do filtering with the AnnLiteIndexer you should first define columns and precise the dimension of your embedding space.

For instance :

```python

from jina import Flow

f = Flow().add(

    uses='jinahub+docker://AnnLiteIndexer',

    uses_with={

        'data_path': 'data_path/',

        'n_dim': 256,

        'columns': [('price', 'float')],

    },

)

```

Then you can pass a filter as a parameters when searching for document:

```python

from docarray import Document, DocumentArray

import numpy as np

docs = DocumentArray(

    [

        Document(id=f'r{i}', embedding=np.random.rand(3), tags={'price': i})

        for i in range(50)

    ]

)

filter_ = {'price': {'$lte': 30}}

with f:

    f.index(docs)

    doc_query = DocumentArray([Document(embedding=np.random.rand(3))])

    f.search(doc_query, parameters={'filter': filter_})

```

For more information please refer to the docarray [documentation](https://docarray.jina.ai/advanced/document-store/annlite/#vector-search-with-filter)

## tests

Test can be run setting the `PYTHONPATH` into the root of this repository

```

export PYTHONPATH=$PYTHONPATH:`pwd`

```

and then running

```

pytest tests

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jina-ai/executor-annlite-indexer

Awesome Lists containing this project

README