An open API service indexing awesome lists of open source software.

https://github.com/paithiov909/apportita

Utility for Handling ‘magnitude’ Pretrained Word Embeddings
https://github.com/paithiov909/apportita

embeddings r r-package

Last synced: 6 months ago
JSON representation

Utility for Handling ‘magnitude’ Pretrained Word Embeddings

Awesome Lists containing this project

README

          

---
output: github_document
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```

# apportita

[![apportita status badge](https://paithiov909.r-universe.dev/badges/apportita)](https://paithiov909.r-universe.dev)
[![R-CMD-check](https://github.com/paithiov909/apportita/workflows/R-CMD-check/badge.svg)](https://github.com/paithiov909/apportita/actions)

Apportita is a partial R port from [plasticityai/magnitude](https://github.com/plasticityai/magnitude), which is a fast, simple utility library for handling vector embeddings.

Apportita would cover only partial features of the original Magnitude library. The main goal of this package is to enable access to user's local magnitude data store.
In other words, apportita would not support streaming access to remote sqlite files.

The package mainly targets the range where the [magnitude-light](https://github.com/davebulaval/magnitude-light) library does, thus some functionalities would not be implemented.

## Usage

### Construct a Magnitude connection

```{r load_file}
library(apportita)

## apportita includes a sample magnitude file
## trained with `movie_review` dataset from 'text2vec' package.
conn <- apportita::magnitude(system.file("magnitude/w2v_en_sample.magnitude", package = "apportita"))

dim(conn) ## check the dimension of Magnitude table
```

Remember to close the connection after querying to the Magnitude table.

```r
close(conn)
```

### Querying

```{r query}
apportita::query(conn, c("i", "watch", "a", "movie"))

apportita::doesnt_match(conn, "book", c("i", "love", "movie"), n = 3)

apportita::most_similar(conn, "book", c("i", "love", "movie"), n = 3)
```

### Calculate distance/similarity from words to words

Apportita supports methods provided in the [proxyC](https://github.com/koheiw/proxyC) package for calculating distances and similarities.

```{r calc}
apportita::calc_dist(conn, "book", c("i", "love", "movie"))

apportita::calc_simil(conn, "book", c("i", "love", "movie"))
```

Experimentally, `apportita::calc_wrd` also supports computing the Word Rotator's Distance, which is a textual similarity measure based on optimal transport, presented in .

```{r wrd}
apportita::calc_wrd(conn, c("i", "love", "movie"), c("the", "movie", "shows", "blue", "sky"))
```

### Slicing samples

```{r slice}
apportita::slice_n(conn, n = 2, offset = 5)

apportita::slice_index(conn, index = c(20, 100, 600))

apportita::slice_frac(conn, frac = .01)
```

```{r close_conn, include=FALSE}
close(conn)
```

## License

MIT license.

Icons made by [Freepik](https://www.freepik.com) from
[www.flaticon.com](https://www.flaticon.com/).