Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/harris-giki/cancerdetectionmodel_ml

hello world
https://github.com/harris-giki/cancerdetectionmodel_ml

Last synced: about 1 month ago
JSON representation

hello world

Awesome Lists containing this project

README

        

Cancer Detection Model

A machine learning model designed to assist in the early detection of cancer, using both a traditional logistic regression model and a more advanced neural network. This project covers the end-to-end pipeline, from data preprocessing to model evaluation, with flexibility to run on Google Colab or a local environment.

Table of Contents



  1. Project Overview

  2. Dataset

  3. Installation and Setup

  4. Methodology


  5. Running the Project

  6. Project Structure

  7. Future Directions

  8. References

Project Overview


This project aims to build efficient and accurate models for cancer detection based on a structured dataset. It includes two approaches:



  • Logistic Regression: A traditional machine learning model used as a baseline for the task.


  • Neural Network: A more advanced deep learning approach using a neural network to predict cancer malignancy.


Both approaches are designed to identify patterns associated with cancer diagnoses and can serve as a foundation for future work on similar medical diagnostic applications.

Neural Network Architecture


The deep learning approach leverages a neural network for cancer detection. Below is the architecture of the neural network used for training the model.

![Neural Network Architecture](NeuralNetworkArchitecture.png)

The neural network consists of several layers:
- **Input Layer**: Accepts features from the dataset.
- **Hidden Layer**: Dense layer for feature extraction and learning complex patterns.
- **Output Layer**: Produces the final prediction (Cancerous Tumour or Non-Cancerous Tumour).

Dataset


The dataset for this project is available as data.csv. It includes a range of features commonly associated with cancer diagnoses. Make sure to download this file from the repository or use the link provided below to get a copy if you're running the project locally.

Data Source




  • File: data.csv


  • Format: CSV


  • Attributes: (list any key features, if known)


Note: For users on Google Colab, the dataset is automatically loaded from the repository. For local setup, see the installation instructions below.

Installation and Setup

Running on Google Colab



  1. Open the Google Colab Notebook provided in the repository.

  2. Mount Google Drive if needed and upload data.csv.

  3. Run the cells sequentially to execute the full pipeline for both models.

Running Locally



  1. Clone the repository:
    git clone https://github.com/your-username/cancer-detection.git


  2. Navigate to the project directory:
    cd cancer-detection


  3. Install required dependencies:
    pip install -r requirements.txt


  4. Download the dataset as data.csv and place it in the root directory of the project.

  5. Run the main notebook or script:
    jupyter notebook cancer_detection.ipynb

    or
    python cancer_detection.py


Methodology

Data Preprocessing


Proper data preprocessing is essential for effective model training. The steps taken include:




  • Data Cleaning: Handling missing values, removing duplicate entries, and ensuring data consistency.


  • Normalization: Scaling features to a standard range for improved model convergence.

Feature Selection


Feature selection improves model accuracy and reduces overfitting by using only the most significant predictors.




  • Correlation Analysis: Selected features based on correlation to remove multicollinearity.


  • Dimensionality Reduction: Reduced the dataset’s dimensionality to improve computational efficiency.

Logistic Regression


The logistic regression model is implemented as a baseline model for cancer detection. It is a traditional machine learning method that works well for binary classification tasks, where the outcome is either malignant or benign.




  • Model: Logistic Regression is used to identify patterns and predict the likelihood of cancer malignancy.


  • Training: The model is trained on the preprocessed data using standard optimization techniques.

Neural Network


The neural network implementation is a more advanced approach using TensorFlow/Keras. It offers the flexibility to handle complex relationships in the data and can outperform traditional models in certain scenarios.




  • Model: A simple neural network with one hidden layer is implemented using Keras.


  • Activation Function: ReLU is used in the hidden layer, and a sigmoid activation is used for the output layer.


  • Training: The model is trained on standardized data, using an optimizer like Adam and loss function as binary cross-entropy.

Evaluation Metrics


We used the following metrics to evaluate model performance:




  • Accuracy: Measures overall correct predictions.


  • Precision, Recall, and F1-Score: Essential for handling class imbalance and understanding true/false positive rates.


  • ROC-AUC Score: Assesses model performance across all classification thresholds.

Running the Project


This project can be run in two main environments:




  1. Google Colab: Open the notebook in Colab, upload data.csv and run the cells.


  2. Local Setup: Clone the repo, install dependencies, and execute the notebook or script.

Example Usage


from cancer_detection import CancerDetection

# Initialize and run model training for both Logistic Regression and Neural Network
logistic_model = CancerDetection(model_type='logistic_regression')
logistic_model.train_and_evaluate()

neural_network_model = CancerDetection(model_type='neural_network')
neural_network_model.train_and_evaluate()

Project Structure

cancer-detection/


├── data/ # Data files (data.csv)
├── notebooks/ # Jupyter notebooks for experimentation and model development
│ ├── cancer_detection.ipynb # Main notebook file

├── scripts/ # Python scripts for modularity
│ ├── preprocess.py # Preprocessing functions
│ ├── model_training.py # Model training functions
│ ├── logistic_regression.py # Logistic regression model code
│ ├── neural_network.py # Neural network model code

├── README.md # Project README
├── requirements.txt # Python dependencies
└── LICENSE # License for the project

Future Directions


This project provides a foundational model for cancer detection, but several potential enhancements are possible:




  • Integrate Deep Learning Models: Experiment with more complex deep learning models such as Convolutional Neural Networks (CNNs) for image-based cancer datasets.


  • Add Cross-Validation: Further ensure model robustness by adding cross-validation techniques.


  • Explore Transfer Learning: Use pre-trained models for cancer detection with more advanced techniques.


  • Incorporate Real-World Testing: Test with a larger, more diverse dataset to improve generalizability.