{"id":20975374,"url":"https://github.com/ternion-1121/yt-comments-clustering","last_synced_at":"2026-05-06T14:31:33.353Z","repository":{"id":243251575,"uuid":"811851571","full_name":"TERNION-1121/YT-Comments-Clustering","owner":"TERNION-1121","description":"An NLP project to cluster YouTube comments on the basis of their similarity of words.","archived":false,"fork":false,"pushed_at":"2024-08-07T12:45:50.000Z","size":7540,"stargazers_count":1,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-12-31T00:31:41.341Z","etag":null,"topics":["clustering","google-youtube-api","grouping","kmeans","kmeans-clustering","matplotlib-pyplot","natural-language-processing","nlp","pandas","python","python3","sentiment-analysis","tfidf","wordcloud","youtube","youtube-api"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TERNION-1121.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-07T12:38:13.000Z","updated_at":"2025-03-21T17:08:54.000Z","dependencies_parsed_at":"2024-06-07T16:24:23.420Z","dependency_job_id":"949ce95b-fe75-4dac-813c-e1e7e6ae9ecb","html_url":"https://github.com/TERNION-1121/YT-Comments-Clustering","commit_stats":null,"previous_names":["ternion-1121/yt-comments-clustering"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/TERNION-1121/YT-Comments-Clustering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TERNION-1121%2FYT-Comments-Clustering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TERNION-1121%2FYT-Comments-Clustering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TERNION-1121%2FYT-Comments-Clustering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TERNION-1121%2FYT-Comments-Clustering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TERNION-1121","download_url":"https://codeload.github.com/TERNION-1121/YT-Comments-Clustering/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TERNION-1121%2FYT-Comments-Clustering/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32698094,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T08:33:17.875Z","status":"ssl_error","status_checked_at":"2026-05-06T08:33:17.221Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","google-youtube-api","grouping","kmeans","kmeans-clustering","matplotlib-pyplot","natural-language-processing","nlp","pandas","python","python3","sentiment-analysis","tfidf","wordcloud","youtube","youtube-api"],"created_at":"2024-11-19T04:43:25.006Z","updated_at":"2026-05-06T14:31:33.332Z","avatar_url":"https://github.com/TERNION-1121.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e 💻 YouTube Comments Clustering 👾 \u003c/h1\u003e\n\u003cp align=\"center\"\u003e An NLP project to cluster YouTube comments on the basis of their similarity of words \u003c/p\u003e\n\n## 📜 Description\n\nAn [NLP](https://en.wikipedia.org/wiki/Natural_language_processing) Project in **Python3** that clusters YouTube comments made on a particular video into distinct groups on the basis of their similarity of words, and visualises the results using wordclouds and a bar graph plot; primarily using techniques like [k-Means clustering](https://en.wikipedia.org/wiki/K-means_clustering) and the [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf).\n\n\n\u003cbr\u003e\n\n\u003cdiv style=\"text-align: center;\"\u003e\n  \u003ctable style=\"margin: auto;\"\u003e\n    \u003ctr\u003e\n      \u003ctd rowspan=\"2\" align=\"center\"\u003e\n        \u003cimg src=\"data/comments_1_figures/Figure_1.png\" alt=\"Image 1\" width=\"200\"\u003e\n      \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\n        \u003cimg src=\"data/comments_1_figures/Figure_2.png\" alt=\"Image 2\" width=\"200\"\u003e\n      \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\n        \u003cimg src=\"data/comments_1_figures/Figure_3.png\" alt=\"Image 3\" width=\"200\"\u003e\n      \u003c/td\u003e\n    \u003c/tr\u003e\n    \u003ctr\u003e\n      \u003ctd align=\"center\"\u003e\n        \u003cimg src=\"data/comments_1_figures/Figure_4.png\" alt=\"Image 4\" width=\"200\"\u003e\n      \u003c/td\u003e\n      \u003ctd align=\"center\"\u003e\n        \u003cimg src=\"data/comments_1_figures/Figure_5.png\" alt=\"Image 5\" width=\"200\"\u003e\n      \u003c/td\u003e\n    \u003c/tr\u003e\n  \u003c/table\u003e\n\u003c/div\u003e\n\n\n\u003cimg src=\"data/comments_1_figures/plot.png\"\u003e\n\n\n\u003e Sample word clouds and bar graph plot to analyse the clustered comments' data; comments from this [video](https://youtu.be/IUTGFQpKaPU?si=pTZMHHYwLmggecWe)\n\n\u003cbr\u003e\n\u003chr\u003e\n\n### The \"Why\" of the project\n[This video](https://youtu.be/a-AqvPtjjts?si=jhjXuKKShjwqg_gb) whipped up the inspiration within me to create something like this, sometime in the future. And who knew this was the best time to begin fulfulling this long held longing!\n\nPondering for a few days had hit me up with this idea to cluster YouTube comments. \n\nAsked Why? :thinking:\n- Firstly it could help one identify the genre of comments that were made the most on a particular video, and\n- Secondly how many people resonated with them (i.e. which kind of comments were liked the most)\n\nA simple yet an effective way to analyse people's reviews and opinions on a particular video. \nSounds fair and square?\n\n\u003cbr\u003e\n\u003chr\u003e\n\n### ⌨ Usage\nClick [here](/USAGE.md) to navigate to the `USAGE.md` file and go through the steps to make use of this project by yourself!\n\n\n\u003cbr\u003e\n\u003chr\u003e\n\n### 🎯 Learnings\nThis was my first NLP project, that too in Python! \n\nIt was a nice experience learning about the basics of _What NLP is_, _the NLP pipeline_, _Text pre-processing and representation_, and to use these concepts in actual code.\n\nOne of the resources (in Hindi) I found really helpful was this [YouTube playlist](https://youtube.com/playlist?list=PLKnIA16_RmvZo7fp5kkIth6nRTeQQsjfX\u0026si=a96yQTCTpoyOLMWO), these videos were really insightful and helped me understand my requirements and plan of action along the making of this project.\n\nNot only did I get familiarized with the basics of `pandas`, but a part of this project also focused majorly on how to fetch the YouTube comments using the Google API. Trying to code that, along with a couple of documentations, references and resources available online, turned out to be a profound adventure on it's own.\n\n\u003cbr\u003e\n\u003chr\u003e\n\n### ✏️ On Contributions\nI have tried what I could to structure the code nicely; had also spent considerable time to speed up the text-preprocessing times. However, if one could help out with a better code or overall project organisation, or more optimised methods in various parts of the project, that would be highly appreciated!\n\nEven README contributions would be of profound help!\n\n\u003cbr\u003e\n\nI hope you found this project, and it's explanation valuable. Let me know about anything that could be made better. \nThanks for your time!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fternion-1121%2Fyt-comments-clustering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fternion-1121%2Fyt-comments-clustering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fternion-1121%2Fyt-comments-clustering/lists"}