https://github.com/harris-giki/cancerdetectionmodel_ml

Simple Logistic Regression and Neural Network powered Machine Learning models that predicts whether a breast tumor is malignant or benign based on input features extracted from a breast cancer dataset.
https://github.com/harris-giki/cancerdetectionmodel_ml

cancer-detection development keras keras-tensorflow logistic-regression machine-learning neural-network scikit-learn streamlit tensorflow

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/harris-giki/cancerdetectionmodel_ml
Owner: Harris-giki
Created: 2024-11-11T07:14:06.000Z (8 months ago)
Default Branch: main
Last Pushed: 2025-01-06T16:42:57.000Z (6 months ago)
Last Synced: 2025-02-03T20:20:27.570Z (5 months ago)
Topics: cancer-detection, development, keras, keras-tensorflow, logistic-regression, machine-learning, neural-network, scikit-learn, streamlit, tensorflow
Language: Jupyter Notebook
Homepage: https://breastcancermodel-neuralnetwork.streamlit.app/
Size: 238 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Cancer Detection Model

A machine learning model designed to assist in the early detection of cancer, using both a traditional logistic regression model and a more advanced neural network. This project covers the end-to-end pipeline, from data preprocessing to model evaluation, with flexibility to run on Google Colab or a local environment.

Project Overview

Dataset

Installation and Setup

Methodology

Data Preprocessing

Feature Selection

Logistic Regression

Neural Network

Evaluation Metrics

Running the Project

Project Structure

Future Directions

References

Project Overview

This project aims to build efficient and accurate models for cancer detection based on a structured dataset. It includes two approaches:

Logistic Regression: A traditional machine learning model used as a baseline for the task.

Neural Network: A more advanced deep learning approach using a neural network to predict cancer malignancy.

Both approaches are designed to identify patterns associated with cancer diagnoses and can serve as a foundation for future work on similar medical diagnostic applications.

Neural Network Architecture

The deep learning approach leverages a neural network for cancer detection. Below is the architecture of the neural network used for training the model.

![Neural Network Architecture](NeuralNetworkArchitecture.png)

The neural network consists of several layers:
- **Input Layer**: Accepts features from the dataset.
- **Hidden Layer**: Dense layer for feature extraction and learning complex patterns.
- **Output Layer**: Produces the final prediction (Cancerous Tumour or Non-Cancerous Tumour).

Dataset

The dataset for this project is available as data.csv. It includes a range of features commonly associated with cancer diagnoses. Make sure to download this file from the repository or use the link provided below to get a copy if you're running the project locally.

Data Source

File: data.csv

Format: CSV

Attributes: (list any key features, if known)

Note: For users on Google Colab, the dataset is automatically loaded from the repository. For local setup, see the installation instructions below.

Installation and Setup

Running on Google Colab

Open the Google Colab Notebook provided in the repository.

Mount Google Drive if needed and upload data.csv.

Run the cells sequentially to execute the full pipeline for both models.

Running Locally

Clone the repository:

git clone https://github.com/your-username/cancer-detection.git

Navigate to the project directory:
```
cd cancer-detection
```

Install required dependencies:
```
pip install -r requirements.txt
```

Download the dataset as data.csv and place it in the root directory of the project.

Run the main notebook or script:

jupyter notebook cancer_detection.ipynb

python cancer_detection.py

Methodology

Data Preprocessing

Proper data preprocessing is essential for effective model training. The steps taken include:

Data Cleaning: Handling missing values, removing duplicate entries, and ensuring data consistency.

Normalization: Scaling features to a standard range for improved model convergence.

Feature Selection

Feature selection improves model accuracy and reduces overfitting by using only the most significant predictors.

Correlation Analysis: Selected features based on correlation to remove multicollinearity.

Dimensionality Reduction: Reduced the dataset’s dimensionality to improve computational efficiency.

Logistic Regression

The logistic regression model is implemented as a baseline model for cancer detection. It is a traditional machine learning method that works well for binary classification tasks, where the outcome is either malignant or benign.

Model: Logistic Regression is used to identify patterns and predict the likelihood of cancer malignancy.

Training: The model is trained on the preprocessed data using standard optimization techniques.

Neural Network

The neural network implementation is a more advanced approach using TensorFlow/Keras. It offers the flexibility to handle complex relationships in the data and can outperform traditional models in certain scenarios.

Model: A simple neural network with one hidden layer is implemented using Keras.

Activation Function: ReLU is used in the hidden layer, and a sigmoid activation is used for the output layer.

Training: The model is trained on standardized data, using an optimizer like Adam and loss function as binary cross-entropy.

Evaluation Metrics

We used the following metrics to evaluate model performance:

Accuracy: Measures overall correct predictions.

Precision, Recall, and F1-Score: Essential for handling class imbalance and understanding true/false positive rates.

ROC-AUC Score: Assesses model performance across all classification thresholds.

Running the Project

This project can be run in two main environments:

Google Colab: Open the notebook in Colab, upload data.csv and run the cells.

Local Setup: Clone the repo, install dependencies, and execute the notebook or script.

Example Usage

from cancer_detection import CancerDetection

# Initialize and run model training for both Logistic Regression and Neural Network

logistic_model = CancerDetection(model_type='logistic_regression')

logistic_model.train_and_evaluate()

neural_network_model = CancerDetection(model_type='neural_network')

neural_network_model.train_and_evaluate()

Project Structure

cancer-detection/

│

├── data/                        # Data files (data.csv)

├── notebooks/                   # Jupyter notebooks for experimentation and model development

│   ├── cancer_detection.ipynb   # Main notebook file

│

├── scripts/                     # Python scripts for modularity

│   ├── preprocess.py            # Preprocessing functions

│   ├── model_training.py        # Model training functions

│   ├── logistic_regression.py   # Logistic regression model code

│   ├── neural_network.py        # Neural network model code

│

├── README.md                    # Project README

└──requirements.txt             # Python dependencies

Future Directions

This project provides a foundational model for cancer detection, but several potential enhancements are possible:

Integrate Deep Learning Models: Experiment with more complex deep learning models such as Convolutional Neural Networks (CNNs) for image-based cancer datasets.

Add Cross-Validation: Further ensure model robustness by adding cross-validation techniques.

Explore Transfer Learning: Use pre-trained models for cancer detection with more advanced techniques.

Incorporate Real-World Testing: Test with a larger, more diverse dataset to improve generalizability.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome