Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/harris-giki/cancerdetectionmodel_ml
hello world
https://github.com/harris-giki/cancerdetectionmodel_ml
Last synced: about 1 month ago
JSON representation
hello world
- Host: GitHub
- URL: https://github.com/harris-giki/cancerdetectionmodel_ml
- Owner: Harris-giki
- Created: 2024-11-11T07:14:06.000Z (2 months ago)
- Default Branch: main
- Last Pushed: 2024-11-11T07:33:41.000Z (2 months ago)
- Last Synced: 2024-11-11T08:25:02.912Z (2 months ago)
- Language: Jupyter Notebook
- Size: 125 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Cancer Detection Model
A machine learning model designed to assist in the early detection of cancer, using both a traditional logistic regression model and a more advanced neural network. This project covers the end-to-end pipeline, from data preprocessing to model evaluation, with flexibility to run on Google Colab or a local environment.
Table of Contents
- Project Overview
- Dataset
- Installation and Setup
- Methodology
- Running the Project
- Project Structure
- Future Directions
- References
Project Overview
This project aims to build efficient and accurate models for cancer detection based on a structured dataset. It includes two approaches:
-
Logistic Regression: A traditional machine learning model used as a baseline for the task. -
Neural Network: A more advanced deep learning approach using a neural network to predict cancer malignancy.
Both approaches are designed to identify patterns associated with cancer diagnoses and can serve as a foundation for future work on similar medical diagnostic applications.
Neural Network Architecture
The deep learning approach leverages a neural network for cancer detection. Below is the architecture of the neural network used for training the model.
![Neural Network Architecture](NeuralNetworkArchitecture.png)
The neural network consists of several layers:
- **Input Layer**: Accepts features from the dataset.
- **Hidden Layer**: Dense layer for feature extraction and learning complex patterns.
- **Output Layer**: Produces the final prediction (Cancerous Tumour or Non-Cancerous Tumour).
Dataset
The dataset for this project is available as data.csv
. It includes a range of features commonly associated with cancer diagnoses. Make sure to download this file from the repository or use the link provided below to get a copy if you're running the project locally.
Data Source
-
File:data.csv
-
Format: CSV -
Attributes: (list any key features, if known)
Note: For users on Google Colab, the dataset is automatically loaded from the repository. For local setup, see the installation instructions below.
Installation and Setup
Running on Google Colab
- Open the Google Colab Notebook provided in the repository.
- Mount Google Drive if needed and upload
data.csv
. - Run the cells sequentially to execute the full pipeline for both models.
Running Locally
- Clone the repository:
git clone https://github.com/your-username/cancer-detection.git
- Navigate to the project directory:
cd cancer-detection
- Install required dependencies:
pip install -r requirements.txt
- Download the dataset as
data.csv
and place it in the root directory of the project. - Run the main notebook or script:
jupyter notebook cancer_detection.ipynb
or
python cancer_detection.py
Methodology
Data Preprocessing
Proper data preprocessing is essential for effective model training. The steps taken include:
-
Data Cleaning: Handling missing values, removing duplicate entries, and ensuring data consistency. -
Normalization: Scaling features to a standard range for improved model convergence.
Feature Selection
Feature selection improves model accuracy and reduces overfitting by using only the most significant predictors.
-
Correlation Analysis: Selected features based on correlation to remove multicollinearity. -
Dimensionality Reduction: Reduced the dataset’s dimensionality to improve computational efficiency.
Logistic Regression
The logistic regression model is implemented as a baseline model for cancer detection. It is a traditional machine learning method that works well for binary classification tasks, where the outcome is either malignant or benign.
-
Model: Logistic Regression is used to identify patterns and predict the likelihood of cancer malignancy. -
Training: The model is trained on the preprocessed data using standard optimization techniques.
Neural Network
The neural network implementation is a more advanced approach using TensorFlow/Keras. It offers the flexibility to handle complex relationships in the data and can outperform traditional models in certain scenarios.
-
Model: A simple neural network with one hidden layer is implemented using Keras. -
Activation Function: ReLU is used in the hidden layer, and a sigmoid activation is used for the output layer. -
Training: The model is trained on standardized data, using an optimizer like Adam and loss function as binary cross-entropy.
Evaluation Metrics
We used the following metrics to evaluate model performance:
-
Accuracy: Measures overall correct predictions. -
Precision, Recall, and F1-Score: Essential for handling class imbalance and understanding true/false positive rates. -
ROC-AUC Score: Assesses model performance across all classification thresholds.
Running the Project
This project can be run in two main environments:
-
Google Colab: Open the notebook in Colab, uploaddata.csv
and run the cells. -
Local Setup: Clone the repo, install dependencies, and execute the notebook or script.
Example Usage
from cancer_detection import CancerDetection
# Initialize and run model training for both Logistic Regression and Neural Network
logistic_model = CancerDetection(model_type='logistic_regression')
logistic_model.train_and_evaluate()
neural_network_model = CancerDetection(model_type='neural_network')
neural_network_model.train_and_evaluate()
Project Structure
cancer-detection/
│
├── data/ # Data files (data.csv)
├── notebooks/ # Jupyter notebooks for experimentation and model development
│ ├── cancer_detection.ipynb # Main notebook file
│
├── scripts/ # Python scripts for modularity
│ ├── preprocess.py # Preprocessing functions
│ ├── model_training.py # Model training functions
│ ├── logistic_regression.py # Logistic regression model code
│ ├── neural_network.py # Neural network model code
│
├── README.md # Project README
├── requirements.txt # Python dependencies
└── LICENSE # License for the project
Future Directions
This project provides a foundational model for cancer detection, but several potential enhancements are possible:
-
Integrate Deep Learning Models: Experiment with more complex deep learning models such as Convolutional Neural Networks (CNNs) for image-based cancer datasets. -
Add Cross-Validation: Further ensure model robustness by adding cross-validation techniques. -
Explore Transfer Learning: Use pre-trained models for cancer detection with more advanced techniques. -
Incorporate Real-World Testing: Test with a larger, more diverse dataset to improve generalizability.