https://github.com/navierula/document-similarity

Last synced: 3 months ago
JSON representation

Host: GitHub
URL: https://github.com/navierula/document-similarity
Owner: navierula
License: mit
Created: 2017-06-27T00:14:02.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-06-28T00:27:14.000Z (almost 8 years ago)
Last Synced: 2025-01-21T22:43:43.946Z (5 months ago)
Language: Python
Size: 1.95 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Document Similarity

There exists a **plethora of textual data** on the Internet. The is good...because the data can be used to perhaps assist
machine learning algorithms. This is also bad...because the data may repeat itself sometimes.

I want to examine the bad, particularly in regards to the idea
of deduplication.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/navierula/document-similarity

Awesome Lists containing this project

README