Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/kelvintechnical/logistic-regression-for-binary-classification-

Last synced: 27 days ago
JSON representation

Host: GitHub
URL: https://github.com/kelvintechnical/logistic-regression-for-binary-classification-
Owner: kelvintechnical
Created: 2024-11-13T20:54:04.000Z (2 months ago)
Default Branch: main
Last Pushed: 2024-11-13T23:00:19.000Z (2 months ago)
Last Synced: 2024-11-13T23:30:24.105Z (2 months ago)
Language: Python
Size: 0 Bytes
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        

Logistic Regression for Binary Classification


Project Overview

This project demonstrates how to implement and visualize a Logistic Regression model for binary classification using synthetic data. It covers generating and visualizing two distinct data categories, training a logistic regression model, evaluating its accuracy, and visualizing the decision boundary. Logistic Regression is a fundamental algorithm in machine learning, often used as an introduction to binary classification tasks.


Code Walkthrough


1. Importing Libraries

# Importing necessary libraries

import numpy as np  # For handling arrays and math operations

import matplotlib.pyplot as plt  # For plotting graphs

from sklearn.model_selection import train_test_split  # For splitting data into train and test sets

from sklearn.linear_model import LogisticRegression  # For logistic regression model

from sklearn.metrics import accuracy_score  # For checking model accuracy



Explanation: We begin by importing the necessary libraries:



  

numpy: To handle arrays and perform mathematical operations.

  

matplotlib.pyplot: For plotting data and visualizations.

  

sklearn.model_selection.train_test_split: To split the data into training and test sets.

  

sklearn.linear_model.LogisticRegression: To create a logistic regression model.

  

sklearn.metrics.accuracy_score: To measure the accuracy of our model on test data.



2. Data Generation

# Data Generation

np.random.seed(0)

data_size = 100

category_0 = np.random.normal(2, 0.5, (data_size, 2))

category_1 = np.random.normal(4, 0.5, (data_size, 2))

X = np.vstack((category_0, category_1))

y = np.hstack((np.zeros(data_size), np.ones(data_size)))



Explanation: We generate synthetic data for two categories:



  We set the random seed to ensure reproducibility.

  

data_size defines the number of data points in each category.

  

category_0 and category_1: Two clusters of data points generated with different centers (2 and 4), creating two distinct groups.

  

X combines both categories into one dataset, and y creates labels (0 and 1) for each category.



3. Data Visualization

# Data Visualization

plt.figure(figsize=(8, 6))

plt.scatter(category_0[:, 0], category_0[:, 1], color='blue', label='Category 0')

plt.scatter(category_1[:, 0], category_1[:, 1], color='red', label='Category 1')

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.title('Data Visualization')

plt.legend()

plt.show()



Explanation: We visualize the generated data to observe the two distinct categories:



  

plt.scatter plots each category with a different color.

  Labels and legends are added for clarity.

  

plt.show() displays the plot, helping us understand how data points are distributed in each category.



4. Model Training

# Model Training

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

model = LogisticRegression()

model.fit(X_train, y_train)



Explanation: We split the data into training and test sets and train the logistic regression model:



  

train_test_split divides the dataset (70% training and 30% testing).

  

LogisticRegression() creates the model, and model.fit() trains it with the training data.



5. Prediction and Accuracy Evaluation

# Prediction and Accuracy Evaluation

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)



Explanation: We make predictions and evaluate the model’s accuracy:



  

model.predict(X_test) generates predictions on the test data.

  

accuracy_score compares predictions to actual values, and we print the accuracy.



6. Decision Boundary Visualization

# Decision Boundary Visualization

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1

y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))

Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)

plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.RdYlBu)

plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', s=50, cmap=plt.cm.RdYlBu)

plt.xlabel('Feature 1')

plt.ylabel('Feature 2')

plt.title('Logistic Regression Decision Boundary')

plt.show()



Explanation: We visualize the decision boundary of the logistic regression model:



  

meshgrid creates a grid of points covering the feature space.

  We predict the category for each grid point and reshape the predictions to match the grid.

  

contourf displays the decision boundary, showing where the model classifies each region.

  The scatter plot overlays the original data points on the decision boundary for comparison.



Follow Me

Stay connected with my latest projects and insights:



  

Bluesky: kelvintechnical.bsky.social



  

X (formerly Twitter): kelvintechnical



  

LinkedIn: Kelvin R. Tobias