An open API service indexing awesome lists of open source software.

https://github.com/scionoftech/text_similarity

gensim, Jaccard Similarity and Cosine Similarity to measure the TextSimilarity
https://github.com/scionoftech/text_similarity

Last synced: 2 months ago
JSON representation

gensim, Jaccard Similarity and Cosine Similarity to measure the TextSimilarity

Awesome Lists containing this project

README

          

# Text_Similarity
Text similarity has to determine how 'close' two pieces of text are both in surface closeness **lexical similarity** and meaning **semantic similarity**. For instance, how similar are the phrases “the cat ate the mouse” with “the mouse ate the cat food” by just looking at the words?

![./images/text_similarity.png](./images/text_similarity.png)

## Quora Question Pairs Dataset
There are over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair.

We can download dataset from [Quora Question Pairs Dataset](https://www.kaggle.com/quora/question-pairs-dataset)