https://github.com/harshpatel44/cosine-text-similarity-algorithm
This repository contains Cosine text similarity algorithm to compare 2 documents
https://github.com/harshpatel44/cosine-text-similarity-algorithm
Last synced: 2 months ago
JSON representation
This repository contains Cosine text similarity algorithm to compare 2 documents
- Host: GitHub
- URL: https://github.com/harshpatel44/cosine-text-similarity-algorithm
- Owner: Harshpatel44
- Created: 2020-06-07T00:18:09.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2020-06-07T02:16:16.000Z (almost 5 years ago)
- Last Synced: 2025-01-24T15:41:51.631Z (4 months ago)
- Language: Python
- Size: 2.93 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Cosine-Text-Similarity-Algorithm
This repository contains Cosine text similarity algorithm to compare 2 documents.
So if we want to compare 2 files i.e. 'file1' and 'file2', and we have 3rd file containing all the tokens in file 'term_list'
This algorithm works on the formula : (a*b) / ||a|| . ||b||
Here a and b is the frequency of 'file1' and 'file2' respectively.