Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/massimilianovisintainer/fake-news-prediction-model

Fake News Prediction Model
https://github.com/massimilianovisintainer/fake-news-prediction-model

machine-learning nltk numpy pandas python3 sklearn

Last synced: about 23 hours ago
JSON representation

Fake News Prediction Model

Awesome Lists containing this project

README

        

## Fake News Prediction with Logistic Regression

### Introduction

This repository contains the code for a Fake News prediction system using Logistic Regression. The code is based on a Jupyter notebook originally generated by Colab.

### Functionality

This code performs the following tasks:

* **Import Libraries:** Imports necessary libraries like pandas, numpy, nltk etc. for data manipulation, text processing and machine learning.
* **Data Preprocessing:**
* Loads the training data (`train.csv`) into a pandas dataframe.
* Handles missing values by replacing them with empty strings.
* Combines author name and title into a single "content" column.
* Separates the data (content) and the target label (fake/real).
* Applies stemming to reduce words to their root form and removes stopwords (common words like "the", "and").
* Converts textual data into numerical features using TF-IDF vectorizer.
* **Train-Test Split:** Splits the data into training and testing sets for model evaluation.
* **Model Training:** Trains a Logistic Regression model on the training data.
* **Evaluation:**
* Evaluates the model's accuracy on both training and testing data.
* **Prediction:**
* Makes a prediction on a new unseen piece of text data (example from the testing set).
* Classifies the news as Real or Fake based on the prediction.

### Running the Code

This code is intended to be run in a Jupyter Notebook environment. You can follow these steps:

1. Download the code and data files.
2. Open the `Fake_News_Prediction.ipynb` file in a Jupyter Notebook environment.
3. Run the code cells sequentially.

### Dependencies

* Python 3.x
* pandas
* numpy
* nltk
* scikit-learn

l