https://github.com/kakshay21/ml-documentretrieval

Clustering wikipedia dataset based with K-NN by TF-IDF
https://github.com/kakshay21/ml-documentretrieval

clustering graphlab knn-model machine-learning nearest-neighbours tf-idf

Last synced: 4 months ago
JSON representation

Clustering wikipedia dataset based with K-NN by TF-IDF

Host: GitHub
URL: https://github.com/kakshay21/ml-documentretrieval
Owner: kakshay21
Created: 2017-08-15T12:25:58.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2017-10-25T11:51:12.000Z (over 7 years ago)
Last Synced: 2025-01-09T10:49:21.798Z (6 months ago)
Topics: clustering, graphlab, knn-model, machine-learning, nearest-neighbours, tf-idf
Language: Jupyter Notebook
Homepage:
Size: 55.1 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# ML-DocumentRetrieval

In this project, I explored wiki datasets which contains article on famous celebrities.

You can see it [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrieval.ipynb)

Later, I compared TF-IDF with raw count [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrievalPractice.ipynb)
## RESULTS
Please wait for few seconds to render this [link](https://render.githubusercontent.com/view/ipynb?commit=6e14d951092ca8bb6e84826898a9402344a6167a&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6b616b7368617932312f4d4c2d446f63756d656e7452657472696576616c2f366531346439353130393263613862623665383438323638393861393430323334346136313637612f646f63756d656e7452657472696576616c50726163746963652e6970796e62&nwo=kakshay21%2FML-DocumentRetrieval&path=documentRetrievalPractice.ipynb&repository_id=100374879&repository_type=Repository#Comparing-the-difference-in-clustering-with-tf-idf-than-to-raw-word-count)
Clearly, from above two examples of Victoria Beckham and Elton John, we can say that tf-idf is more accurate than raw count.

For those who want to try this, Please install graphlab from [here](https://turi.com/download/academic.html)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/kakshay21/ml-documentretrieval

Awesome Lists containing this project

README