https://github.com/kakshay21/ml-documentretrieval
Clustering wikipedia dataset based with K-NN by TF-IDF
https://github.com/kakshay21/ml-documentretrieval
clustering graphlab knn-model machine-learning nearest-neighbours tf-idf
Last synced: 4 months ago
JSON representation
Clustering wikipedia dataset based with K-NN by TF-IDF
- Host: GitHub
- URL: https://github.com/kakshay21/ml-documentretrieval
- Owner: kakshay21
- Created: 2017-08-15T12:25:58.000Z (almost 8 years ago)
- Default Branch: master
- Last Pushed: 2017-10-25T11:51:12.000Z (over 7 years ago)
- Last Synced: 2025-01-09T10:49:21.798Z (6 months ago)
- Topics: clustering, graphlab, knn-model, machine-learning, nearest-neighbours, tf-idf
- Language: Jupyter Notebook
- Homepage:
- Size: 55.1 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ML-DocumentRetrieval
In this project, I explored wiki datasets which contains article on famous celebrities.
You can see it [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrieval.ipynb)
Later, I compared TF-IDF with raw count [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrievalPractice.ipynb)
## RESULTS
Please wait for few seconds to render this [link](https://render.githubusercontent.com/view/ipynb?commit=6e14d951092ca8bb6e84826898a9402344a6167a&enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6b616b7368617932312f4d4c2d446f63756d656e7452657472696576616c2f366531346439353130393263613862623665383438323638393861393430323334346136313637612f646f63756d656e7452657472696576616c50726163746963652e6970796e62&nwo=kakshay21%2FML-DocumentRetrieval&path=documentRetrievalPractice.ipynb&repository_id=100374879&repository_type=Repository#Comparing-the-difference-in-clustering-with-tf-idf-than-to-raw-word-count)
Clearly, from above two examples of Victoria Beckham and Elton John, we can say that tf-idf is more accurate than raw count.For those who want to try this, Please install graphlab from [here](https://turi.com/download/academic.html)