https://github.com/elysian01/nlp-powered-youtube-analytics
This project aims to make it easier to evaluate YouTube videos by using advanced NLP techniques. The system will give short summaries of videos and their comments, automatically analyze sentiments and emotions, and identify key themes. This helps users make well-informed decisions about the quality and relevance of the video content.
https://github.com/elysian01/nlp-powered-youtube-analytics
Last synced: 8 months ago
JSON representation
This project aims to make it easier to evaluate YouTube videos by using advanced NLP techniques. The system will give short summaries of videos and their comments, automatically analyze sentiments and emotions, and identify key themes. This helps users make well-informed decisions about the quality and relevance of the video content.
- Host: GitHub
- URL: https://github.com/elysian01/nlp-powered-youtube-analytics
- Owner: Elysian01
- Created: 2024-01-23T21:45:52.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-08-21T13:17:14.000Z (about 1 year ago)
- Last Synced: 2024-12-28T16:27:11.677Z (9 months ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 31.9 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# NLP-Powered-YouTube-Analytics
This project aims to make it easier to evaluate YouTube videos by using advanced NLP techniques. The system will give short summaries of videos and their comments, automatically analyze sentiments and emotions, and identify key themes. This helps users make well-informed decisions about the quality and relevance of the video content.
Report Link: [Click Here](https://docs.google.com/document/d/1WBjYbMlW2Iq_buCSvS_iXdT2j3Uz7f7lNQM_kZJp7hw/edit?usp=sharing)
## Project Requirements
```
pip install -r requirements.txt
```You need to start the NLTK Downloader and download all the data you need.
Open a Python console and do the following:```
>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/
```Install spaCy **en_core_web_sm** model
```bash
python -m spacy download en_core_web_sm
```
## Run the ProjectStart the Flask based server
```bash
python server.py
```Start the Reactjs Server
```bash
cd frontend
npm i
npm start
```## Links
[Dataset](https://docs.google.com/spreadsheets/d/19Ovg-9q9wAAQVc9SHOT6oYHjEuG4deT_lI7yojaJajQ/edit?usp=sharing)
[Report](https://docs.google.com/document/d/1WBjYbMlW2Iq_buCSvS_iXdT2j3Uz7f7lNQM_kZJp7hw/edit?usp=sharing)
[Slides](https://docs.google.com/presentation/d/1sQjjrjBkwPi_WgRv4XcIHdtHa8g8-kIgA3fjDy42N-8/edit?usp=sharing)## Simple Explanation
So, this project was inspired from my real-life incident, like i was trying to learn REACTJS, and on youtube i typed "Reactjs beginners tutorial", 100s of video came, and each ranging from 25 to 50 hours long. Now, inorder to learn from a proper useful resource i started analyzing each video, based on comments, learning-material provided, exercises given or not, teaching is friendly or not, is it complete or not, this major topic is covered or not, etc etc. This took lot of time, as its manual. So, from here the idea came up to automate this procedure.
So, "NLP Powered Youtube Analyzes" aims to make it easier to evaluate YouTube videos by using advanced NLP techniques. The system will give short summaries of videos and their comments, automatically analyze sentiments and emotions, and identify key themes. This helps users make well-informed decisions about the quality and relevance of the video content.
This project has 5 key sections
1. Youtube API Testing: fetching comments (500) and video transcript if possible
2. Data Preprocessing:
In preprocessing, since comments can multilingual, we first did "LANGUAGE DETECTION", to filter out English comments, which is then followed by standard NLP pre-processing like removing punctuations, lowercase, lemmatization, etc.
3. Sentiment Analyzes:
For sentiment analyzes, we used pretrained models from NLTK and Hugging-Face, in order to select the best model, we manually created dataset of over 1000 comments, which is manually annotated by me and my friends. and then run this pretrained model on top of it, and selected the one which gives us the highest accuracy. (i.e. Hugging Face (‘finiteautomata/bertweet-base-sentiment-analysis’))
4. Transcript Summarization:
5. Comment Summarization:
------------------------------------------------
Key Phrase extraction:Keyphrase extraction is the process of automatically identifying and extracting the most important phrases or terms from a piece of text. like the main topics, themes, or concepts discussed in the text, and they are typically extracted based on their relevance, frequency, or significance within the context of the document.
I did this using "YAKE" python library
-----------------------------------------