{"id":19617553,"url":"https://github.com/kakshay21/ml-documentretrieval","last_synced_at":"2026-06-07T21:01:24.179Z","repository":{"id":97810335,"uuid":"100374879","full_name":"kakshay21/ML-DocumentRetrieval","owner":"kakshay21","description":"Clustering wikipedia dataset based with K-NN by TF-IDF","archived":false,"fork":false,"pushed_at":"2017-10-25T11:51:12.000Z","size":57780,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-09T10:49:21.798Z","etag":null,"topics":["clustering","graphlab","knn-model","machine-learning","nearest-neighbours","tf-idf"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kakshay21.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-08-15T12:25:58.000Z","updated_at":"2018-04-13T12:39:32.000Z","dependencies_parsed_at":null,"dependency_job_id":"c9b2bda7-39b8-496e-b03f-0b955986ffeb","html_url":"https://github.com/kakshay21/ML-DocumentRetrieval","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakshay21%2FML-DocumentRetrieval","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakshay21%2FML-DocumentRetrieval/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakshay21%2FML-DocumentRetrieval/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kakshay21%2FML-DocumentRetrieval/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kakshay21","download_url":"https://codeload.github.com/kakshay21/ML-DocumentRetrieval/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240907855,"owners_count":19876691,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","graphlab","knn-model","machine-learning","nearest-neighbours","tf-idf"],"created_at":"2024-11-11T11:04:44.187Z","updated_at":"2026-06-07T21:01:24.143Z","avatar_url":"https://github.com/kakshay21.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# ML-DocumentRetrieval\n\nIn this project, I explored wiki datasets which contains article on famous celebrities.\n\nYou can see it [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrieval.ipynb)\n\nLater, I compared TF-IDF with raw count [here](https://github.com/kakshay21/ML-DocumentRetrieval/blob/master/documentRetrievalPractice.ipynb)\n## RESULTS\nPlease wait for few seconds to render this [link](https://render.githubusercontent.com/view/ipynb?commit=6e14d951092ca8bb6e84826898a9402344a6167a\u0026enc_url=68747470733a2f2f7261772e67697468756275736572636f6e74656e742e636f6d2f6b616b7368617932312f4d4c2d446f63756d656e7452657472696576616c2f366531346439353130393263613862623665383438323638393861393430323334346136313637612f646f63756d656e7452657472696576616c50726163746963652e6970796e62\u0026nwo=kakshay21%2FML-DocumentRetrieval\u0026path=documentRetrievalPractice.ipynb\u0026repository_id=100374879\u0026repository_type=Repository#Comparing-the-difference-in-clustering-with-tf-idf-than-to-raw-word-count)\nClearly, from above two examples of Victoria Beckham and Elton John, we can say that tf-idf is more accurate than raw count.\n\nFor those who want to try this, Please install graphlab from [here](https://turi.com/download/academic.html)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkakshay21%2Fml-documentretrieval","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkakshay21%2Fml-documentretrieval","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkakshay21%2Fml-documentretrieval/lists"}