Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vishrut-b/ml-project-with-pytorch-breast-cancer-classification

An exploration of machine learning techniques applied to classify breast cancer as malignant or benign.
https://github.com/vishrut-b/ml-project-with-pytorch-breast-cancer-classification

breast-cancer-classification machine-learning python pytorch scikit-learn

Last synced: about 13 hours ago
JSON representation

An exploration of machine learning techniques applied to classify breast cancer as malignant or benign.

Awesome Lists containing this project

README

        

# Breast Cancer Classification Using Scikit-Learn and PyTorch

This project is an exploration of machine learning techniques applied to classify breast cancer as malignant or benign. Leveraging both **scikit-learn** and **PyTorch**, the project demonstrates the full machine learning pipeline, from data preprocessing to model evaluation.

## Table of Contents
- [Introduction](#introduction)
- [Project Overview](#project-overview)
- [Dataset](#dataset)
- [Methodology](#methodology)
- [Implementation Details](#implementation-details)
- [Results](#results)
- [References](#references)

---

## Introduction
Breast cancer is one of the most common cancers globally, and early and accurate detection is crucial for effective treatment. This project aims to build a reliable classification model to distinguish between malignant and benign cases using advanced machine learning techniques.

---

## Project Overview
The project includes:
1. Data collection and preprocessing using **scikit-learn**.
2. Implementation of a fully connected neural network using **PyTorch**.
3. Evaluation of the model's performance on the test dataset.
4. Visualization of results and analysis of performance metrics.

---

## Dataset
The **Breast Cancer Wisconsin (Diagnostic) Dataset** from UCI Machine Learning Repository is used. Key details include:
- **Instances**: 569
- **Attributes**: 30 numerical features describing cell nuclei characteristics
- **Classes**:
- `0` - Malignant
- `1` - Benign
- **Class Distribution**:
- Malignant: 212
- Benign: 357

Features include measurements such as mean radius, texture, perimeter, area, and smoothness, among others. For a full description, refer to the dataset's [documentation](https://goo.gl/U2Uwz2).

---

## Methodology
1. **Data Preprocessing**:
- Splitting the dataset into training and test sets.
- Standardizing features to have a mean of 0 and standard deviation of 1 using `StandardScaler` from scikit-learn.
- Converting data into PyTorch tensors for compatibility with the neural network.

2. **Model Architecture**:
- A neural network with one hidden layer of 64 neurons.
- Activation functions:
- ReLU for non-linearity in the hidden layer.
- Sigmoid for binary classification output.
- Loss function: Binary Cross-Entropy Loss.
- Optimizer: Adam optimizer for efficient gradient descent.

3. **Training**:
- The model was trained for 100 epochs with a learning rate of 0.01.
- Periodic evaluation of loss and accuracy during training.

4. **Evaluation**:
- Assessing the model's performance on both training and test sets.
- Calculating accuracy and visualizing results.

---

## Implementation Details
### Key Libraries
- **Scikit-learn**: For data preprocessing and splitting.
- **PyTorch**: For neural network construction and training.
- **Matplotlib**: For visualization.

### Neural Network Code Snippet
```python
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(NeuralNet, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, output_size)
self.sigmoid = nn.Sigmoid()

def forward(self, x):
out = self.fc1(x)
out = self.relu(out)
out = self.fc2(out)
out = self.sigmoid(out)
return out
```

---

## Results
### Training Metrics:
- **Final Training Accuracy**: 99.78%
- **Final Loss**: 0.0107

### Test Metrics:
- **Accuracy on Test Data**: 99.34%

### Highlights:
- Excellent model performance with minimal overfitting.
- Effective handling of imbalanced class distribution.

---

---

## References
- [Breast Cancer Wisconsin Dataset Documentation](https://goo.gl/U2Uwz2)
- PyTorch Documentation: [https://pytorch.org/docs/](https://pytorch.org/docs/)
- Scikit-learn Documentation: [https://scikit-learn.org/](https://scikit-learn.org/)

---