https://github.com/vspinu/simdist
High performance similarity and distance metrics for sparse representations
https://github.com/vspinu/simdist
r-package rcpp similarity-measures sparse-matrices
Last synced: 3 months ago
JSON representation
High performance similarity and distance metrics for sparse representations
- Host: GitHub
- URL: https://github.com/vspinu/simdist
- Owner: vspinu
- Created: 2017-03-04T20:42:31.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2018-09-18T18:36:36.000Z (almost 7 years ago)
- Last Synced: 2025-04-01T18:57:26.533Z (3 months ago)
- Topics: r-package, rcpp, similarity-measures, sparse-matrices
- Language: R
- Size: 984 KB
- Stars: 3
- Watchers: 4
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: NEWS.md
Awesome Lists containing this project
README
[](https://travis-ci.org/vspinu/simdist)
[](https://cran.r-project.org/package=simdist)
[](https://github.com/vspinu/simdist)
[](https://cran.r-project.org/package=simdist)High performance distances and similarities for various dense and sparse
representations with primary focus on applications in NLP and recommender
systems.## Supported and Planned Object Types
- `matrix` from base R
- `dgCMatrix`, `dgRMatrix` and `dgTMatrix` from Matrix package
- `simple_triplet_matrix` from `slam` package
- `data.frames` in primary-secondary-value (psv) format
- `list` of named numeric or character vectors## Distances for 2D Representations
| | `matrix` | `dgCMatrix` | `dgRMatrix` | `dgTMatrix` | `slam` | `psv` | `list` |
| ---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| `cosine` | ✔ | ✔ | ✔ | ✔ | | ✔ | |
| `euclidean` | ✔ | ✔ | ✔ | ✔ | | ✔ | |
| `mahalanobis` | | | | | | | |
| `jaccard` | | | | | | | |## Aggregation Distances for 3D Representations
| | `dgCMatrix` | `dgRMatrix` | `dgTMatrix` | `slam` | `psv` | `list` |
| ---: | :---: | :---: | :---: | :---: | :---: | :---: |
| `centroid` | ✔ | ✔ | ✔ | | ✔ | |
| `semantic_min_max`1 | ✔ | ✔ | ✔ | | ✔ | |
| `semantic_min_sum`2 | ✔ | ✔ | ✔ | | ✔ | |[1] More commonly known as "Relaxed Word Mover Distance" (RWMD) proposed in _Kusner et. al. [‘From Word Embeddings To Document Distances’](http://jmlr.org/proceedings/papers/v37/kusnerb15.pdf) (2015)_.
[2] Similar to RWMD measure, proposed in _Mihalcea et.al. ['Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity'](https://pdfs.semanticscholar.org/1374/617e135eaa772e52c9a2e8253f49483676d6.pdf) (2006)_
## Transformations
`norm_l1`, `norm_l2`.