https://github.com/mehrab-kalantari/news-popularity-prediction
News popularity prediction dataset analysis and modeling
https://github.com/mehrab-kalantari/news-popularity-prediction
data-preprocessing data-understanding feature-engineering feature-extraction feature-selection hypothesis-testing machine-learning regression-models supervised-learning
Last synced: 3 months ago
JSON representation
News popularity prediction dataset analysis and modeling
- Host: GitHub
- URL: https://github.com/mehrab-kalantari/news-popularity-prediction
- Owner: Mehrab-Kalantari
- Created: 2023-07-26T16:47:30.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2023-07-26T17:30:12.000Z (almost 2 years ago)
- Last Synced: 2025-01-16T09:42:26.679Z (4 months ago)
- Topics: data-preprocessing, data-understanding, feature-engineering, feature-extraction, feature-selection, hypothesis-testing, machine-learning, regression-models, supervised-learning
- Language: Jupyter Notebook
- Homepage:
- Size: 2.59 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# News Popularity Prediction
[Dataset on kaggle](https://www.kaggle.com/datasets/thehapyone/uci-online-news-popularity-data-set)## Contents
### Data understanding and EDA
* Histogram plot
* Data queries
* Box plot
* Correlation matrix### Hypothesis tests for a better understanding
* Pearson correlation test
* Spearman correlation test
* Kendall-tau correlation test
* T test
* Z test### Data preprocessing and feature selection
* Missing data values
* Categorical to numerical
* OHE
* Outlier detection
* K-sigma method
* Feature scaling
* Standard scaling
* Min-max normalization
* Robust scaling
* Feature selection
* Forward selection
* Backward selection
* Feature extraction
* PCA### Modeling (Regression)
* Linear regression
* Polynomial regression
* Ridge regression
* Lasso regression### Evaluation
* R2 score