https://github.com/sayamalt/text-similarity-quantifier

Successfully developed a machine learning model for computing the similarity score between two text paragraphs taken as input from a webpage.
https://github.com/sayamalt/text-similarity-quantifier

bag-of-words cosine-similarity cosine-similarity-scores countvectorizer flask machine-learning nlp pandas python text-preprocessing tfidf

Last synced: 5 months ago
JSON representation

Successfully developed a machine learning model for computing the similarity score between two text paragraphs taken as input from a webpage.

Host: GitHub
URL: https://github.com/sayamalt/text-similarity-quantifier
Owner: SayamAlt
Created: 2022-04-15T20:13:04.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2022-06-01T20:18:02.000Z (about 3 years ago)
Last Synced: 2024-12-28T08:09:42.032Z (6 months ago)
Topics: bag-of-words, cosine-similarity, cosine-similarity-scores, countvectorizer, flask, machine-learning, nlp, pandas, python, text-preprocessing, tfidf
Language: Jupyter Notebook
Homepage:
Size: 8.04 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Text-Similarity-Quantifier

## Objective

Establish an algorithm that can quantify the degree of similarity between the two text documents based on semantic similarity.

Semantic Textual Similarity (STS) assesses the degree to which two sentences
are semantically equivalent to each other.

1 means highly similar

0 means highly dissimilar

## Technologies Used

Python

Libraries Used:

Numpy

Pandas

Seaborn

Matplotlib.pyplot

Joblib

warnings

string

Gensim Downloader

Sklearn

nltk

math

json

requests

Flask

Machine Learning

Natural Language Processing

## API Endpoint

The final algorithm should be exposed as a Server API Endpoint. In order to test this API, make sure you hit a request to the server to get the result as a response to the API. The request-response body should be in the following format:

Request body: {“text1”: ”nuclear body seeks new tech …....”, ”text2”: ”terror suspects face arrest ……”}
Response body: {“similarity score”: 0.2 }

Note: “text1”, “text2”, and “similarity score” keys should be kept as it is, without any change.

## Important aspect to consider

The given dataset does not contain any label. Therefore, can be treated as an unsupervised learning problem. However, this does not imply that supervised techniques/algorithms are not applicable. The candidate is free to use any technique.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sayamalt/text-similarity-quantifier

Awesome Lists containing this project

README