https://github.com/frankie567/word-similarity
Utility to compute proximity factor between two sets of words.
https://github.com/frankie567/word-similarity
Last synced: 6 months ago
JSON representation
Utility to compute proximity factor between two sets of words.
- Host: GitHub
- URL: https://github.com/frankie567/word-similarity
- Owner: frankie567
- License: mit
- Created: 2017-08-03T21:05:37.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2017-08-04T07:17:10.000Z (about 8 years ago)
- Last Synced: 2025-02-08T20:47:30.737Z (8 months ago)
- Language: JavaScript
- Size: 11.7 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Word similarity
[](https://travis-ci.org/frankie567/word-similarity)
Utility written in JS to compute proximity factor between two sets of words.
## Algorithm
The algorithm is quite simple:
* First, implode each sets `A` and `B` into single strings `Sa` and `Sb`;
* Compute the k-grams for `Sa` and `Sb`, respectively `kSa` and `kSb`;
* Return the Jaccard coefficient: `| kSa ∩ kSb | / | kSa ∪ kSb |`Example with `A = [usa, basket]` and `B = [basket, usa]` and `k = 2`:
* `Sa = usabasket` and `Sb = basketusa`
* `kSa = [us, sa, ab, ba, as, sk, ke, et]` and `kSa = [ba, as, sk, ke, et, tu, us, sa]`
* `| kSa ∩ kSb | = 7`, `| kSa ∪ kSb | = 9`, so similarity is `0.7777777777777778`## Installation
Install the dependencies and build:
```
npm install && npm run build
```## Usage
A command-line script is provided to test the algorithm quickly:
```
node dist/command.js -a [word] -a [word] -b [word] -b [word] -b [word]
```Each `word` after an `-a` option will be appended to to the first set of words, each `word` after a `-b` option will be appended to the second set of words.
Optionaly, you can provide the parameter `k`, the length of the k-grams (**default to 2**) :
```
node dist/command.js -a [word] -a [word] -b [word] -b [word] -b [word] -k 3
```## Examples
### With 2-grams
```
node dist/command.js -a basket -b basket
# 1
``````
node dist/command.js -a usa -a basket -b basket -b usa
# 0.7777777777777778
``````
node dist/command.js -a basket -a usa -b basket -b ball -b usa
# 0.5833333333333334
``````
node dist/command.js -a usa -a basket -b basket -b ball -b usa
# 0.5833333333333334
```### With 3-grams
```
node dist/command.js -a basket -b basket -k 3
# 1
``````
node dist/command.js -a usa -a basket -b basket -b usa -k 3
# 0.5555555555555556
``````
node dist/command.js -a basket -a usa -b basket -b ball -b usa -k 3
# 0.38461538461538464
``````
node dist/command.js -a usa -a basket -b basket -b ball -b usa -k 3
# 0.38461538461538464
```### Real-world examples (STS2016)
*abortion vote delayed amid debate irish politicians will spend a second day debating divisive laws that will legislate for the first time for abortion in limited circumstances*
*abortion bill second night of debating in the dail new irish politicians have begun a second night of debating divisive laws that will legislate for the first time for abortion in limited circumstances*
**With 2-grams**:
```
node dist/command.js -a abortion -a vote -a delayed -a amid -a debate -a irish -a politicians -a will -a spend -a a -a second -a day -a debating -a divisive -a laws -a that -a will -a legislate -a for -a the -a first -a time -a for -a abortion -a in -a limited -a circumstances -b abortion -b bill -b second -b night -b of -b debating -b in -b the -b dail -b new -b irish -b politicians -b have -b begun -b a -b second -b night -b of -b debating -b divisive -b laws -b that -b will -b legislate -b for -b the -b first -b time -b for -b abortion -b in -b limited -b circumstances -k 2
# 0.7027027027027027
```**With 3-grams**:
```
node dist/command.js -a abortion -a vote -a delayed -a amid -a debate -a irish -a politicians -a will -a spend -a a -a second -a day -a debating -a divisive -a laws -a that -a will -a legislate -a for -a the -a first -a time -a for -a abortion -a in -a limited -a circumstances -b abortion -b bill -b second -b night -b of -b debating -b in -b the -b dail -b new -b irish -b politicians -b have -b begun -b a -b second -b night -b of -b debating -b divisive -b laws -b that -b will -b legislate -b for -b the -b first -b time -b for -b abortion -b in -b limited -b circumstances -k 3
# 0.608433734939759
```---------------------------------------
*how should i store eggs in the refrigerator*
*how should i store steaks in the freezer*
**With 2-grams**:
```
node dist/command.js -a how -a should -a i -a store -a eggs -a in -a the -a refrigerator -b how -b should -b i -b store -b steaks -b in -b the -b freezer -k 2
# 0.5945945945945946
```**With 3-grams**:
```
node dist/command.js -a how -a should -a i -a store -a eggs -a in -a the -a refrigerator -b how -b should -b i -b store -b steaks -b in -b the -b freezer -k 3
# 0.4222222222222222
```