Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/llukas22/retsim-pytorch
A pytorch port of Google's RETSim model used in UniSim
https://github.com/llukas22/retsim-pytorch
embedding pytorch word-similarity
Last synced: 10 days ago
JSON representation
A pytorch port of Google's RETSim model used in UniSim
- Host: GitHub
- URL: https://github.com/llukas22/retsim-pytorch
- Owner: LLukas22
- License: mit
- Created: 2024-03-24T14:29:39.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-03-25T14:56:58.000Z (8 months ago)
- Last Synced: 2024-04-25T17:21:29.068Z (7 months ago)
- Topics: embedding, pytorch, word-similarity
- Language: Python
- Homepage:
- Size: 9.38 MB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# retsim-pytorch
[![PyPI Version](https://img.shields.io/pypi/v/retsim-pytorch.svg)](https://pypi.org/project/retsim-pytorch)
[![Supported Python Versions](https://img.shields.io/pypi/pyversions/retsim-pytorch.svg)](https://pypi.org/project/retsim-pytorch)Welcome to `retsim-pytorch`, the PyTorch adaptation of Google's [RETSim](https://arxiv.org/abs/2311.17264) (Resilient and Efficient Text Similarity) model, which is part of the [UniSim (Universal Similarity)](https://github.com/google/unisim) framework.
This model is designed for efficient and accurate multilingual fuzzy string matching, near-duplicate detection, and assessing string similarity. For more information, please refer to the [UniSim documentation](https://github.com/google/unisim).
## Installation
You can easily install `retsim-pytorch` via pip:
```shell
pip install retsim-pytorch
```## Usage
You can configure the model using the `RETSimConfig` class. By default, it utilizes the same configuration as the original UniSim model. If you wish to use the same weights as the original Google model, you can download a SafeTensors port of the weights [here](./weights/model.safetensors).
Here's how to use the model in your code:
```python
import torch
from retsim_pytorch import RETSim, RETSimConfig
from retsim_pytorch.preprocessing import binarize# Configure the model
config = RETSimConfig()
model = RETSim(config)# Prepare and run inference
binarized_inputs, chunk_ids = binarize(["hello world"])
embedded, unpooled = model(torch.tensor(binarized_inputs))
```