Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/17bit0216/predicting-viral-news
https://github.com/17bit0216/predicting-viral-news
bayes-classifier jupyter-notebook logistic-regression machine-learning machine-learning-algorithms scv
Last synced: 2 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/17bit0216/predicting-viral-news
- Owner: 17BIT0216
- Created: 2020-04-25T05:05:27.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-04-27T05:57:09.000Z (over 4 years ago)
- Last Synced: 2024-05-20T21:35:39.770Z (6 months ago)
- Topics: bayes-classifier, jupyter-notebook, logistic-regression, machine-learning, machine-learning-algorithms, scv
- Language: Jupyter Notebook
- Homepage: https://17bit0216.github.io/Predicting-Viral-News/
- Size: 6.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Predicting-Viral-News
Problem Statement: Crawl and collect data from New channel website and predict whether a news is going to be Viral or Not. Solution:Data-Outsourcing: What I implemented first is a crawler using BeautifulSoup and Request to crawl the websites and wrote the data into csv file, a example scrapper is present in the Fox-Scrapper. Likewise collected seperately news headlines and viral new from several news channel and stored them in to two seperate csv files. First in News.csv and the second one is Viral_news.csv which stores the Viral News Section.
Data-Cleaning: Implemented Data Cleaning using MS-excel.
Actuall Prediction: There are four coloumns in the both files and one extra label coloumn which contains whether it is viral or not. As all the fields contained text only, first i tried to get the number of clicks on each headlines within 30 mins,but was not able to get that data for that. So, left with text(Became a Classification Problem) fields only i decided to use TF_TDF(Term frequency-Inverse Document Frequency) to convert text in numerical data so it can be fed to the ML algorithms.
Algorithms Used: Logistic Regression, Support Vector Machine, GaussianNB.
Thanks Team @ Bipolar factory