https://github.com/scionoftech/text_similarity
gensim, Jaccard Similarity and Cosine Similarity to measure the TextSimilarity
https://github.com/scionoftech/text_similarity
Last synced: 2 months ago
JSON representation
gensim, Jaccard Similarity and Cosine Similarity to measure the TextSimilarity
- Host: GitHub
- URL: https://github.com/scionoftech/text_similarity
- Owner: scionoftech
- Created: 2019-12-31T16:18:20.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-12-31T16:29:56.000Z (about 6 years ago)
- Last Synced: 2024-12-27T17:23:33.766Z (about 1 year ago)
- Language: Jupyter Notebook
- Size: 123 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Text_Similarity
Text similarity has to determine how 'close' two pieces of text are both in surface closeness **lexical similarity** and meaning **semantic similarity**. For instance, how similar are the phrases “the cat ate the mouse” with “the mouse ate the cat food” by just looking at the words?

## Quora Question Pairs Dataset
There are over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair.
We can download dataset from [Quora Question Pairs Dataset](https://www.kaggle.com/quora/question-pairs-dataset)