Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kelvintechnical/logistic-regression-for-binary-classification-


https://github.com/kelvintechnical/logistic-regression-for-binary-classification-

Last synced: 27 days ago
JSON representation

Awesome Lists containing this project

README

        

Logistic Regression for Binary Classification

Project Overview


This project demonstrates how to implement and visualize a Logistic Regression model for binary classification using synthetic data. It covers generating and visualizing two distinct data categories, training a logistic regression model, evaluating its accuracy, and visualizing the decision boundary. Logistic Regression is a fundamental algorithm in machine learning, often used as an introduction to binary classification tasks.

Code Walkthrough

1. Importing Libraries


# Importing necessary libraries

import numpy as np # For handling arrays and math operations
import matplotlib.pyplot as plt # For plotting graphs
from sklearn.model_selection import train_test_split # For splitting data into train and test sets
from sklearn.linear_model import LogisticRegression # For logistic regression model
from sklearn.metrics import accuracy_score # For checking model accuracy

Explanation: We begin by importing the necessary libraries:




  • numpy: To handle arrays and perform mathematical operations.


  • matplotlib.pyplot: For plotting data and visualizations.


  • sklearn.model_selection.train_test_split: To split the data into training and test sets.


  • sklearn.linear_model.LogisticRegression: To create a logistic regression model.


  • sklearn.metrics.accuracy_score: To measure the accuracy of our model on test data.

2. Data Generation


# Data Generation

np.random.seed(0)
data_size = 100
category_0 = np.random.normal(2, 0.5, (data_size, 2))
category_1 = np.random.normal(4, 0.5, (data_size, 2))
X = np.vstack((category_0, category_1))
y = np.hstack((np.zeros(data_size), np.ones(data_size)))

Explanation: We generate synthetic data for two categories:



  • We set the random seed to ensure reproducibility.


  • data_size defines the number of data points in each category.


  • category_0 and category_1: Two clusters of data points generated with different centers (2 and 4), creating two distinct groups.


  • X combines both categories into one dataset, and y creates labels (0 and 1) for each category.

3. Data Visualization


# Data Visualization

plt.figure(figsize=(8, 6))
plt.scatter(category_0[:, 0], category_0[:, 1], color='blue', label='Category 0')
plt.scatter(category_1[:, 0], category_1[:, 1], color='red', label='Category 1')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Data Visualization')
plt.legend()
plt.show()

Explanation: We visualize the generated data to observe the two distinct categories:




  • plt.scatter plots each category with a different color.

  • Labels and legends are added for clarity.


  • plt.show() displays the plot, helping us understand how data points are distributed in each category.

4. Model Training


# Model Training

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = LogisticRegression()
model.fit(X_train, y_train)

Explanation: We split the data into training and test sets and train the logistic regression model:




  • train_test_split divides the dataset (70% training and 30% testing).


  • LogisticRegression() creates the model, and model.fit() trains it with the training data.

5. Prediction and Accuracy Evaluation


# Prediction and Accuracy Evaluation

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)

Explanation: We make predictions and evaluate the model’s accuracy:




  • model.predict(X_test) generates predictions on the test data.


  • accuracy_score compares predictions to actual values, and we print the accuracy.

6. Decision Boundary Visualization


# Decision Boundary Visualization

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.RdYlBu)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', s=50, cmap=plt.cm.RdYlBu)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Decision Boundary')
plt.show()

Explanation: We visualize the decision boundary of the logistic regression model:




  • meshgrid creates a grid of points covering the feature space.

  • We predict the category for each grid point and reshape the predictions to match the grid.


  • contourf displays the decision boundary, showing where the model classifies each region.

  • The scatter plot overlays the original data points on the decision boundary for comparison.

Follow Me


Stay connected with my latest projects and insights: