https://github.com/zoobereq/semantic_similarities

A tool to assess semantic similarity between English words
https://github.com/zoobereq/semantic_similarities

embeddings nlp semantics similarity similarity-measures similarity-score word-embedding-evaluation word-embeddings word2vec word2vec-embeddinngs wordnet

Last synced: 4 months ago
JSON representation

A tool to assess semantic similarity between English words

Host: GitHub
URL: https://github.com/zoobereq/semantic_similarities
Owner: zoobereq
Created: 2022-10-12T01:31:17.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-10-12T01:59:07.000Z (about 3 years ago)
Last Synced: 2025-04-04T02:24:15.439Z (7 months ago)
Topics: embeddings, nlp, semantics, similarity, similarity-measures, similarity-score, word-embedding-evaluation, word-embeddings, word2vec, word2vec-embeddinngs, wordnet
Language: Python
Homepage:
Size: 3.91 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: Readme.md

Awesome Lists containing this project

README

## Semantic Word Similarity

### Motivation
Computing word similarity is a fundamental problem in NLP and used in many applications such as plagiarism detection, question answering, and surverying diachronic language change.

### Method
The program implements and evaluates several methods of computing semantic word similarity:
- WordNet shortest-path similarity
- Wu-Palmer WordNet semantic depth similarity
- Word embeddings cosine similarity

### Code
The program first computes semantic similarity between the following six word pairs:
- *jaguar : cat*
- *jaguar : car*
- *king : queen*
- *king : rook*
- *tiger : zoo*
- *tiger : cat*

WordNet-based similarity scores are computed by selecting a pair of senses that yields the highest similarity score for both shortest-path and Wu-Palmer algorithms. The cosine similarity is computed for dense high-dimensional vector representations derived from [GloVe Wiki Gigaword 50](https://nlp.stanford.edu/projects/glove/). Users are free to implement different word embedding models.

The resulting similarity scores are then compared against human ratings, extracted from the [WordSimilarity-353 Test Collection](https://aclweb.org/aclwiki/WordSimilarity-353_Test_Collection_(State_of_the_art)). Here again, users are free to implement their own baseline.

### Evaluation
The correlation between machine and human scores is expressed with the Spearman Correlation metric, first for the above-referenced six word pairs, and subsequently for 203 word pairs extracted from the WordSimilarity-353 Test Collection.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/zoobereq/semantic_similarities

Awesome Lists containing this project

README