https://github.com/zoobereq/semantic_similarities
A tool to assess semantic similarity between English words
https://github.com/zoobereq/semantic_similarities
embeddings nlp semantics similarity similarity-measures similarity-score word-embedding-evaluation word-embeddings word2vec word2vec-embeddinngs wordnet
Last synced: 4 months ago
JSON representation
A tool to assess semantic similarity between English words
- Host: GitHub
- URL: https://github.com/zoobereq/semantic_similarities
- Owner: zoobereq
- Created: 2022-10-12T01:31:17.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2022-10-12T01:59:07.000Z (about 3 years ago)
- Last Synced: 2025-04-04T02:24:15.439Z (7 months ago)
- Topics: embeddings, nlp, semantics, similarity, similarity-measures, similarity-score, word-embedding-evaluation, word-embeddings, word2vec, word2vec-embeddinngs, wordnet
- Language: Python
- Homepage:
- Size: 3.91 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: Readme.md
Awesome Lists containing this project
README
## Semantic Word Similarity
### Motivation
Computing word similarity is a fundamental problem in NLP and used in many applications such as plagiarism detection, question answering, and surverying diachronic language change.### Method
The program implements and evaluates several methods of computing semantic word similarity:
- WordNet shortest-path similarity
- Wu-Palmer WordNet semantic depth similarity
- Word embeddings cosine similarity### Code
The program first computes semantic similarity between the following six word pairs:
- *jaguar : cat*
- *jaguar : car*
- *king : queen*
- *king : rook*
- *tiger : zoo*
- *tiger : cat*WordNet-based similarity scores are computed by selecting a pair of senses that yields the highest similarity score for both shortest-path and Wu-Palmer algorithms. The cosine similarity is computed for dense high-dimensional vector representations derived from [GloVe Wiki Gigaword 50](https://nlp.stanford.edu/projects/glove/). Users are free to implement different word embedding models.
The resulting similarity scores are then compared against human ratings, extracted from the [WordSimilarity-353 Test Collection](https://aclweb.org/aclwiki/WordSimilarity-353_Test_Collection_(State_of_the_art)). Here again, users are free to implement their own baseline.
### Evaluation
The correlation between machine and human scores is expressed with the Spearman Correlation metric, first for the above-referenced six word pairs, and subsequently for 203 word pairs extracted from the WordSimilarity-353 Test Collection.