https://github.com/shibing624/text2vec-encoder

**Text2vecEncoder** wraps the text2vec model with jina. It encodes text data into dense vectors.
https://github.com/shibing624/text2vec-encoder

Last synced: 7 months ago
JSON representation

**Text2vecEncoder** wraps the text2vec model with jina. It encodes text data into dense vectors.

Host: GitHub
URL: https://github.com/shibing624/text2vec-encoder
Owner: shibing624
License: apache-2.0
Created: 2022-05-19T14:39:00.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-07-16T02:29:40.000Z (about 3 years ago)
Last Synced: 2025-03-01T00:41:40.114Z (7 months ago)
Language: Python
Size: 28.3 KB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Text2vecEncoder

**Text2vecEncoder** wraps the torch-version of transformers from huggingface. It encodes text data into dense vectors.

**Text2vecEncoder** receives [`Documents`](https://docs.jina.ai/fundamentals/document/) with `text` attributes.

The `text` attribute represents the text to be encoded. This Executor will encode each `text` into a dense vector and store them in the `embedding` attribute of the `Document`.

## Usage

From source deployment:

examples: [examples/client_demo.py](examples/client_demo.py)

```python

from docarray import Document, DocumentArray

da = DocumentArray([Document(text='如何更换花呗绑定银行卡'), Document(text='hello'), Document(text='你好'), ])

r = da.post('jinahub://Text2vecEncoder')

print(r.to_json())

```

output:

```shell

"embedding": [-0.0004445354570634663, -0.2973471283 ...]

```

Use the prebuilt images from Jina Hub in your Flow and encode an text into a dense vector.

```python

from jina import Flow, Document

f = Flow().add(uses='jinahub+docker://Text2vecEncoder')

doc = Document(content='如何更换花呗绑定银行卡')

with f:

    f.post(on='/encode', inputs=doc, on_done=lambda resp: print(resp.docs[0].embedding))

```

### Set `volumes`

With the `volumes` attribute, you can map the cache directory to your local cache directory, in order to avoid downloading 

the model each time you start the Flow.

```python

from jina import Flow

flow = Flow().add(

    uses='jinahub+docker://Text2vecEncoder',

    volumes='.cache/huggingface:/root/.cache/huggingface'

)

```

Alternatively, you can reference the docker image in the `yml` config and specify the `volumes` configuration.

`flow.yml`:

```yaml

jtype: Flow

executors:

  - name: encoder

    uses: 'jinahub+docker://Text2vecEncoder'

    volumes: '.cache/huggingface:/root/.cache/huggingface'

```

And then use it like so:

```python

from jina import Flow

flow = Flow.load_config('flow.yml')

```

### Use other pre-trained models

You can specify the model to use with the parameter `pretrained_model_name_or_path`:

```python

from jina import Flow, Document

f = Flow().add(

    uses='jinahub+docker://Text2vecEncoder',

    uses_with={'pretrained_model_name_or_path': 'sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2'}

)

doc = Document(content='如何更换花呗绑定银行卡')

with f:

    f.post(on='/encode', inputs=doc, on_done=lambda resp: print(resp.docs[0].embedding))

```

You can check the supported pre-trained models [here](https://huggingface.co/transformers/pretrained_models.html)

### Use GPUs

To enable GPU, you can set the `device` parameter to a cuda device.

Make sure your machine is cuda-compatible.

If you're using a docker container, make sure to add the `gpu` tag and enable 

GPU access to Docker with `gpus='all'`.

Furthermore, make sure you satisfy the prerequisites mentioned in 

[Executor on GPU tutorial](https://docs.jina.ai/tutorials/gpu_executor/#prerequisites).

```python

from jina import Flow, Document

f = Flow().add(

    uses='jinahub+docker://Text2vecEncoder/gpu',

    uses_with={'device': 'cuda'}, gpus='all'

)

doc = Document(content='如何更换花呗绑定银行卡')

with f:

    f.post(on='/encode', inputs=doc, on_done=lambda resp: print(resp.docs[0].embedding))

```

## Reference

- [Huggingface Transformers](https://huggingface.co/transformers/pretrained_models.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/shibing624/text2vec-encoder

Awesome Lists containing this project

README