https://github.com/navierula/document-similarity
https://github.com/navierula/document-similarity
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/navierula/document-similarity
- Owner: navierula
- License: mit
- Created: 2017-06-27T00:14:02.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-06-28T00:27:14.000Z (almost 8 years ago)
- Last Synced: 2025-01-21T22:43:43.946Z (5 months ago)
- Language: Python
- Size: 1.95 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Document Similarity
There exists a **plethora of textual data** on the Internet. The is good...because the data can be used to perhaps assist
machine learning algorithms. This is also bad...because the data may repeat itself sometimes.I want to examine the bad, particularly in regards to the idea
of deduplication.