Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kelvintechnical/logistic-regression-for-binary-classification-
https://github.com/kelvintechnical/logistic-regression-for-binary-classification-
Last synced: 27 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/kelvintechnical/logistic-regression-for-binary-classification-
- Owner: kelvintechnical
- Created: 2024-11-13T20:54:04.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-13T23:00:19.000Z (2 months ago)
- Last Synced: 2024-11-13T23:30:24.105Z (2 months ago)
- Language: Python
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Logistic Regression for Binary Classification
Project Overview
This project demonstrates how to implement and visualize a Logistic Regression model for binary classification using synthetic data. It covers generating and visualizing two distinct data categories, training a logistic regression model, evaluating its accuracy, and visualizing the decision boundary. Logistic Regression is a fundamental algorithm in machine learning, often used as an introduction to binary classification tasks.
Code Walkthrough
1. Importing Libraries
# Importing necessary libraries
import numpy as np # For handling arrays and math operations
import matplotlib.pyplot as plt # For plotting graphs
from sklearn.model_selection import train_test_split # For splitting data into train and test sets
from sklearn.linear_model import LogisticRegression # For logistic regression model
from sklearn.metrics import accuracy_score # For checking model accuracyExplanation: We begin by importing the necessary libraries:
-
numpy
: To handle arrays and perform mathematical operations. -
matplotlib.pyplot
: For plotting data and visualizations. -
sklearn.model_selection.train_test_split
: To split the data into training and test sets. -
sklearn.linear_model.LogisticRegression
: To create a logistic regression model. -
sklearn.metrics.accuracy_score
: To measure the accuracy of our model on test data.
2. Data Generation
# Data Generation
np.random.seed(0)
data_size = 100
category_0 = np.random.normal(2, 0.5, (data_size, 2))
category_1 = np.random.normal(4, 0.5, (data_size, 2))
X = np.vstack((category_0, category_1))
y = np.hstack((np.zeros(data_size), np.ones(data_size)))
Explanation: We generate synthetic data for two categories:
- We set the random seed to ensure reproducibility.
-
data_size
defines the number of data points in each category. -
category_0
andcategory_1
: Two clusters of data points generated with different centers (2 and 4), creating two distinct groups. -
X
combines both categories into one dataset, andy
creates labels (0 and 1) for each category.
3. Data Visualization
# Data Visualization
plt.figure(figsize=(8, 6))
plt.scatter(category_0[:, 0], category_0[:, 1], color='blue', label='Category 0')
plt.scatter(category_1[:, 0], category_1[:, 1], color='red', label='Category 1')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Data Visualization')
plt.legend()
plt.show()
Explanation: We visualize the generated data to observe the two distinct categories:
-
plt.scatter
plots each category with a different color. - Labels and legends are added for clarity.
-
plt.show()
displays the plot, helping us understand how data points are distributed in each category.
4. Model Training
# Model Training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = LogisticRegression()
model.fit(X_train, y_train)
Explanation: We split the data into training and test sets and train the logistic regression model:
-
train_test_split
divides the dataset (70% training and 30% testing). -
LogisticRegression()
creates the model, andmodel.fit()
trains it with the training data.
5. Prediction and Accuracy Evaluation
# Prediction and Accuracy Evaluation
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Model Accuracy:", accuracy)
Explanation: We make predictions and evaluate the model’s accuracy:
-
model.predict(X_test)
generates predictions on the test data. -
accuracy_score
compares predictions to actual values, and we print the accuracy.
6. Decision Boundary Visualization
# Decision Boundary Visualization
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.1), np.arange(y_min, y_max, 0.1))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.3, cmap=plt.cm.RdYlBu)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', s=50, cmap=plt.cm.RdYlBu)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Decision Boundary')
plt.show()
Explanation: We visualize the decision boundary of the logistic regression model:
-
meshgrid
creates a grid of points covering the feature space. - We predict the category for each grid point and reshape the predictions to match the grid.
-
contourf
displays the decision boundary, showing where the model classifies each region. - The scatter plot overlays the original data points on the decision boundary for comparison.
Follow Me
Stay connected with my latest projects and insights:
-
Bluesky: kelvintechnical.bsky.social -
X (formerly Twitter): kelvintechnical -
LinkedIn: Kelvin R. Tobias