An open API service indexing awesome lists of open source software.

https://github.com/ternion-1121/yt-comments-clustering

An NLP project to cluster YouTube comments on the basis of their similarity of words.
https://github.com/ternion-1121/yt-comments-clustering

clustering google-youtube-api grouping kmeans kmeans-clustering matplotlib-pyplot natural-language-processing nlp pandas python python3 sentiment-analysis tfidf wordcloud youtube youtube-api

Last synced: about 1 month ago
JSON representation

An NLP project to cluster YouTube comments on the basis of their similarity of words.

Awesome Lists containing this project

README

          

💻 YouTube Comments Clustering 👾


An NLP project to cluster YouTube comments on the basis of their similarity of words

## 📜 Description

An [NLP](https://en.wikipedia.org/wiki/Natural_language_processing) Project in **Python3** that clusters YouTube comments made on a particular video into distinct groups on the basis of their similarity of words, and visualises the results using wordclouds and a bar graph plot; primarily using techniques like [k-Means clustering](https://en.wikipedia.org/wiki/K-means_clustering) and the [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf).






Image 1


Image 2


Image 3




Image 4


Image 5



> Sample word clouds and bar graph plot to analyse the clustered comments' data; comments from this [video](https://youtu.be/IUTGFQpKaPU?si=pTZMHHYwLmggecWe)




### The "Why" of the project
[This video](https://youtu.be/a-AqvPtjjts?si=jhjXuKKShjwqg_gb) whipped up the inspiration within me to create something like this, sometime in the future. And who knew this was the best time to begin fulfulling this long held longing!

Pondering for a few days had hit me up with this idea to cluster YouTube comments.

Asked Why? :thinking:
- Firstly it could help one identify the genre of comments that were made the most on a particular video, and
- Secondly how many people resonated with them (i.e. which kind of comments were liked the most)

A simple yet an effective way to analyse people's reviews and opinions on a particular video.
Sounds fair and square?




### ⌨ Usage
Click [here](/USAGE.md) to navigate to the `USAGE.md` file and go through the steps to make use of this project by yourself!




### 🎯 Learnings
This was my first NLP project, that too in Python!

It was a nice experience learning about the basics of _What NLP is_, _the NLP pipeline_, _Text pre-processing and representation_, and to use these concepts in actual code.

One of the resources (in Hindi) I found really helpful was this [YouTube playlist](https://youtube.com/playlist?list=PLKnIA16_RmvZo7fp5kkIth6nRTeQQsjfX&si=a96yQTCTpoyOLMWO), these videos were really insightful and helped me understand my requirements and plan of action along the making of this project.

Not only did I get familiarized with the basics of `pandas`, but a part of this project also focused majorly on how to fetch the YouTube comments using the Google API. Trying to code that, along with a couple of documentations, references and resources available online, turned out to be a profound adventure on it's own.




### ✏️ On Contributions
I have tried what I could to structure the code nicely; had also spent considerable time to speed up the text-preprocessing times. However, if one could help out with a better code or overall project organisation, or more optimised methods in various parts of the project, that would be highly appreciated!

Even README contributions would be of profound help!


I hope you found this project, and it's explanation valuable. Let me know about anything that could be made better.
Thanks for your time!