An open API service indexing awesome lists of open source software.

https://github.com/kakshay21/ml-documentretrieval

Clustering wikipedia dataset based with K-NN by TF-IDF
https://github.com/kakshay21/ml-documentretrieval

clustering graphlab knn-model machine-learning nearest-neighbours tf-idf

Last synced: 4 months ago
JSON representation

Clustering wikipedia dataset based with K-NN by TF-IDF

Awesome Lists containing this project

README

        

# ML-DocumentRetrieval

In this project, I explored wiki datasets which contains article on famous celebrities.

You can see it [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrieval.ipynb)

Later, I compared TF-IDF with raw count [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrievalPractice.ipynb)
## RESULTS
Please wait for few seconds to render this [link](https://render.githubusercontent.com/view/ipynb?commit=6e14d951092ca8bb6e84826898a9402344a6167a&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6b616b7368617932312f4d4c2d446f63756d656e7452657472696576616c2f366531346439353130393263613862623665383438323638393861393430323334346136313637612f646f63756d656e7452657472696576616c50726163746963652e6970796e62&nwo=kakshay21%2FML-DocumentRetrieval&path=documentRetrievalPractice.ipynb&repository_id=100374879&repository_type=Repository#Comparing-the-difference-in-clustering-with-tf-idf-than-to-raw-word-count)
Clearly, from above two examples of Victoria Beckham and Elton John, we can say that tf-idf is more accurate than raw count.

For those who want to try this, Please install graphlab from [here](https://turi.com/download/academic.html)