Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/17bit0216/predicting-viral-news

bayes-classifier jupyter-notebook logistic-regression machine-learning machine-learning-algorithms scv

Last synced: about 1 month ago
JSON representation

Host: GitHub
URL: https://github.com/17bit0216/predicting-viral-news
Owner: 17BIT0216
Created: 2020-04-25T05:05:27.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2020-04-27T05:57:09.000Z (almost 5 years ago)
Last Synced: 2024-11-17T21:40:57.848Z (3 months ago)
Topics: bayes-classifier, jupyter-notebook, logistic-regression, machine-learning, machine-learning-algorithms, scv
Language: Jupyter Notebook
Homepage: https://17bit0216.github.io/Predicting-Viral-News/
Size: 6.1 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Predicting-Viral-News
Problem Statement: Crawl and collect data from New channel website and predict whether a news is going to be Viral or Not. Solution:

Data-Outsourcing: What I implemented first is a crawler using BeautifulSoup and Request to crawl the websites and wrote the data into csv file, a example scrapper is present in the Fox-Scrapper. Likewise collected seperately news headlines and viral new from several news channel and stored them in to two seperate csv files. First in News.csv and the second one is Viral_news.csv which stores the Viral News Section.

Data-Cleaning: Implemented Data Cleaning using MS-excel.

Actuall Prediction: There are four coloumns in the both files and one extra label coloumn which contains whether it is viral or not. As all the fields contained text only, first i tried to get the number of clicks on each headlines within 30 mins,but was not able to get that data for that. So, left with text(Became a Classification Problem) fields only i decided to use TF_TDF(Term frequency-Inverse Document Frequency) to convert text in numerical data so it can be fed to the ML algorithms.

Algorithms Used: Logistic Regression, Support Vector Machine, GaussianNB.

Thanks Team @ Bipolar factory