https://github.com/paithiov909/apportita
Utility for Handling ‘magnitude’ Pretrained Word Embeddings
https://github.com/paithiov909/apportita
embeddings r r-package
Last synced: 6 months ago
JSON representation
Utility for Handling ‘magnitude’ Pretrained Word Embeddings
- Host: GitHub
- URL: https://github.com/paithiov909/apportita
- Owner: paithiov909
- License: other
- Created: 2021-07-19T07:18:24.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2025-01-25T19:05:11.000Z (9 months ago)
- Last Synced: 2025-01-25T20:17:16.539Z (9 months ago)
- Topics: embeddings, r, r-package
- Language: R
- Homepage: https://paithiov909.github.io/apportita/
- Size: 7.79 MB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.Rmd
- License: LICENSE
Awesome Lists containing this project
README
---
output: github_document
---```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```[](https://paithiov909.r-universe.dev)
[](https://github.com/paithiov909/apportita/actions)Apportita is a partial R port from [plasticityai/magnitude](https://github.com/plasticityai/magnitude), which is a fast, simple utility library for handling vector embeddings.
Apportita would cover only partial features of the original Magnitude library. The main goal of this package is to enable access to user's local magnitude data store.
In other words, apportita would not support streaming access to remote sqlite files.The package mainly targets the range where the [magnitude-light](https://github.com/davebulaval/magnitude-light) library does, thus some functionalities would not be implemented.
## Usage
### Construct a Magnitude connection
```{r load_file}
library(apportita)## apportita includes a sample magnitude file
## trained with `movie_review` dataset from 'text2vec' package.
conn <- apportita::magnitude(system.file("magnitude/w2v_en_sample.magnitude", package = "apportita"))dim(conn) ## check the dimension of Magnitude table
```Remember to close the connection after querying to the Magnitude table.
```r
close(conn)
```### Querying
```{r query}
apportita::query(conn, c("i", "watch", "a", "movie"))apportita::doesnt_match(conn, "book", c("i", "love", "movie"), n = 3)
apportita::most_similar(conn, "book", c("i", "love", "movie"), n = 3)
```### Calculate distance/similarity from words to words
Apportita supports methods provided in the [proxyC](https://github.com/koheiw/proxyC) package for calculating distances and similarities.
```{r calc}
apportita::calc_dist(conn, "book", c("i", "love", "movie"))apportita::calc_simil(conn, "book", c("i", "love", "movie"))
```Experimentally, `apportita::calc_wrd` also supports computing the Word Rotator's Distance, which is a textual similarity measure based on optimal transport, presented in .
```{r wrd}
apportita::calc_wrd(conn, c("i", "love", "movie"), c("the", "movie", "shows", "blue", "sky"))
```### Slicing samples
```{r slice}
apportita::slice_n(conn, n = 2, offset = 5)apportita::slice_index(conn, index = c(20, 100, 600))
apportita::slice_frac(conn, frac = .01)
``````{r close_conn, include=FALSE}
close(conn)
```## License
MIT license.
Icons made by [Freepik](https://www.freepik.com) from
[www.flaticon.com](https://www.flaticon.com/).