https://github.com/scionoftech/text_similarity

gensim, Jaccard Similarity and Cosine Similarity to measure the TextSimilarity
https://github.com/scionoftech/text_similarity

Last synced: 2 months ago
JSON representation

gensim, Jaccard Similarity and Cosine Similarity to measure the TextSimilarity

Host: GitHub
URL: https://github.com/scionoftech/text_similarity
Owner: scionoftech
Created: 2019-12-31T16:18:20.000Z (about 6 years ago)
Default Branch: master
Last Pushed: 2019-12-31T16:29:56.000Z (about 6 years ago)
Last Synced: 2024-12-27T17:23:33.766Z (about 1 year ago)
Language: Jupyter Notebook
Size: 123 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Text_Similarity
Text similarity has to determine how 'close' two pieces of text are both in surface closeness **lexical similarity** and meaning **semantic similarity**. For instance, how similar are the phrases “the cat ate the mouse” with “the mouse ate the cat food” by just looking at the words?

![./images/text_similarity.png](./images/text_similarity.png)

## Quora Question Pairs Dataset
There are over 400,000 lines of potential question duplicate pairs. Each line contains IDs for each question in the pair, the full text for each question, and a binary value that indicates whether the line truly contains a duplicate pair.

We can download dataset from [Quora Question Pairs Dataset](https://www.kaggle.com/quora/question-pairs-dataset)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scionoftech/text_similarity

Awesome Lists containing this project

README