{"id":21561528,"url":"https://github.com/subhadarship/text-clustering","last_synced_at":"2025-03-18T04:46:31.807Z","repository":{"id":102480387,"uuid":"221847052","full_name":"subhadarship/text-clustering","owner":"subhadarship","description":"Clustering text data (data mining fall 2019)","archived":false,"fork":false,"pushed_at":"2021-11-14T02:35:05.000Z","size":26868,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-24T11:28:43.040Z","etag":null,"topics":["bert","clustering","glove-embeddings","lda","nlp","roberta","topic-modeling","visualization"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/subhadarship.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-11-15T04:57:23.000Z","updated_at":"2024-08-01T13:59:24.000Z","dependencies_parsed_at":"2023-03-13T15:19:27.435Z","dependency_job_id":null,"html_url":"https://github.com/subhadarship/text-clustering","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Ftext-clustering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Ftext-clustering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Ftext-clustering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/subhadarship%2Ftext-clustering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/subhadarship","download_url":"https://codeload.github.com/subhadarship/text-clustering/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244159990,"owners_count":20408019,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bert","clustering","glove-embeddings","lda","nlp","roberta","topic-modeling","visualization"],"created_at":"2024-11-24T09:27:01.325Z","updated_at":"2025-03-18T04:46:31.782Z","avatar_url":"https://github.com/subhadarship.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text Clustering\n\n![Alt Text](https://media.giphy.com/media/VSYyrK0MzKSyI/giphy.gif)\n\nclustering text data (data mining fall 2019)\n\n## Notes\n- [`present.html`](https://github.com/subhadarship/text-clustering/blob/master/present.html) is created using revel.js\n- [`topic_modeling_big.ipynb`](https://github.com/subhadarship/text-clustering/blob/master/topic_modeling_big.ipynb) uses more data for LDA than [`topic_modeling.ipynb`](https://github.com/subhadarship/text-clustering/blob/master/topic_modeling.ipynb). But it still cannot handle very large data (e.g. 1M)\n\n## TODO\n- [x] big data LDA (still less than a million samples)\n- [x] visualize results LDA\n- [x] big data neural models (check k-means time)\n- [x] find optimum number of clusters for neural models\n- [x] visualize results neural models\n\n## Write-up\nsee [`text_clustering.pdf`](https://github.com/subhadarship/text-clustering/blob/master/text_clustering.pdf)\n\n## LICENSE\n\n[MIT](https://github.com/subhadarship/text-clustering/tree/master/LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubhadarship%2Ftext-clustering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsubhadarship%2Ftext-clustering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsubhadarship%2Ftext-clustering/lists"}