https://github.com/Huffon/sentence-similarity
This repository contains various ways to calculate sentence vector similarity using NLP models
https://github.com/Huffon/sentence-similarity
natural-language-processing sentence-embedding sentence-similarity vector-similarity
Last synced: 5 months ago
JSON representation
This repository contains various ways to calculate sentence vector similarity using NLP models
- Host: GitHub
- URL: https://github.com/Huffon/sentence-similarity
- Owner: Huffon
- Archived: true
- Created: 2019-11-04T09:37:35.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2020-04-14T06:03:17.000Z (over 5 years ago)
- Last Synced: 2025-05-14T00:52:06.067Z (5 months ago)
- Topics: natural-language-processing, sentence-embedding, sentence-similarity, vector-similarity
- Language: Python
- Homepage:
- Size: 215 KB
- Stars: 199
- Watchers: 11
- Forks: 34
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-document-similarity - Sentence Similarity Calculator (ELMo, BERT and Universal Sentence Encoder, and different similarity measures)
README
# Sentence Similarity Calculator
This repo contains various ways to calculate the similarity between source and target sentences. You can choose **the pre-trained models** you want to use such as _ELMo_, _BERT_ and _Universal Sentence Encoder (USE)_.And you can also choose **the method** to be used to get the similarity:
1. Cosine similarity
2. Manhattan distance
3. Euclidean distance
4. Angular distance
5. Inner product
6. TS-SS score
7. Pairwise-cosine similarity
8. Pairwise-cosine similarity + IDF
You can experiment with (**The number of models**) x (**The number of methods**) combinations!
## Installation
- This project is developed under **conda** enviroment
- After cloning this repository, you can simply install all the dependent libraries described in `requirements.txt` with `bash install.sh````
conda create -n sensim python=3.7
conda activate sensim
git clone https://github.com/Huffon/sentence-similarity.git
cd sentence-similarity
bash install.sh
```
## Usage
- To **test** your own sentences, you should fill out [**corpus.txt**](corpus.txt) with sentences as below:```
I ate an apple.
I went to the Apple.
I ate an orange.
...
```- Then, **choose** the **model** and **method** to be used to calculate the similarity between source and target sentences
```
python sensim.py
--model MODEL_NAME [use, bert, elmo]
--method METHOD_NAME [cosine, manhattan, euclidean, inner,
ts-ss, angular, pairwise, pairwise-idf]
--verbose LOG_OPTION (bool)
```
## Examples
- In this section, you can see the example result of `sentence-similarity`
- As you know, there is a no **silver-bullet** which can calculate **_perfect similarity_** between sentences
- You should conduct various experiments with your dataset
- _**Caution**_: `TS-SS score` might not fit with **sentence** similarity task, since this method originally devised to calculate the similarity between long documents
- **Result**:
![]()
## References
### Papers
- [Universal Sentence Encoder](https://arxiv.org/abs/1803.11175)
- [Deep contextualized word representations](https://arxiv.org/abs/1802.05365)
- [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
- [Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks](https://arxiv.org/abs/1908.10084)
- [BERTScore: Evaluating Text Generation with BERT](https://arxiv.org/abs/1904.09675)
- [A Hybrid Geometric Approach for Measuring Similarity Level Among Documents and Document Clustering](https://ieeexplore.ieee.org/document/7474366/metrics#metrics)
### Libraries
- [TF-hub's Universal Sentence Encoder](https://tfhub.dev/google/universal-sentence-encoder/2)
- [Allen NLP's ELMo](https://github.com/allenai/allennlp)
- [Sentence Transformers](https://github.com/UKPLab/sentence-transformers)
- [BERTScore](https://github.com/Tiiiger/bert_score)
- [Vector Similarity](https://github.com/taki0112/Vector_Similarity)
### Articles
- [An Overview of Sentence Embedding Methods](http://mlexplained.com/2017/12/28/an-overview-of-sentence-embedding-methods/)
- [Comparing Sentence Similarity Methods](http://nlp.town/blog/sentence-similarity/)
- [The Current Best of Universal Word Embeddings and Sentence Embeddings](https://medium.com/huggingface/universal-word-sentence-embeddings-ce48ddc8fc3a)