Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/rahulguptagzb09/youtube-scrapper-and-category-classifier

YouTube Scrapper And Category Classifier
https://github.com/rahulguptagzb09/youtube-scrapper-and-category-classifier

deep-learning keras machine-learning sklearn youtube youtube-api youtube-api-v3 youtube-search

Last synced: about 1 month ago
JSON representation

YouTube Scrapper And Category Classifier

Host: GitHub
URL: https://github.com/rahulguptagzb09/youtube-scrapper-and-category-classifier
Owner: rahulguptagzb09
License: gpl-3.0
Created: 2019-04-07T08:19:21.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2019-04-07T08:51:16.000Z (almost 6 years ago)
Last Synced: 2024-10-19T19:47:38.524Z (3 months ago)
Topics: deep-learning, keras, machine-learning, sklearn, youtube, youtube-api, youtube-api-v3, youtube-search
Language: Python
Size: 1.39 MB
Stars: 1
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- Changelog: History_2.csv
- License: LICENSE

Awesome Lists containing this project

README

        # YouTube-Scrapper-And-Category-Classifier

YouTube Scrapper And Category Classifier

Scraping Data-


The scraping is done by YouTube Data API V3. The API provides search list function which takes search query as parameter along with other parameters like region, type. This API return result in JSON format. 


I wrote a function which uses this API and return a dictionary with column names as keys and content data as values. Through this I was able to get maximum, accurate and relevant results.


The scraping script generates a CSV file from the results.


Text Classification-


For text classification I used one model from each category mentioned in assignment.


1.	From first category, I used SVM model because it was more accurate and scalable. SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces. Support vector machines algorithm categorizes unlabelled data, and is one of the most widely used clustering algorithms in industrial applications.


SVM Accuracy Score:  32.91015625


Precision: 0.329102 


Recall: 0.329102 


F1: 0.329102


2.	From second category, I used shallow NN model because it was based on learning data representations, as opposed to task-specific algorithms. Learning can be supervised, semi-supervised or unsupervised. The NN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship. The network moves through the layers calculating the probability of each output. The NN gives better results on datasets that are not easily separable and are to complicated for naïve algorithms to classify.


Loss: 0.166


Accuracy: 0.941
  

F1 Score: 0.789 


Precision: 0.950 


Recall: 0.680


3.	From third category, I used shallow RNN model because in which data can flow in any direction, are used for applications such as language modelling. Long short-term memory is particularly effective for this use. RNNs can use their internal state (memory) to process sequences of inputs. This makes them applicable to tasks such as unsegmented, connected handwriting recognition or speech recognition. RNNs are better at understanding the sequence of text than any other because they does not lose the order of the text.


Loss: 0.464


Accuracy: 0.833
 

F1 Score: 0.000 


Precision: 0.000 


Recall: 0.000