https://github.com/kientech/breast-cancer-classification-using-machine-learning
The goal is to leverage data-driven models to assist in the early detection and diagnosis of breast cancer, potentially aiding healthcare professionals in making informed decisions
https://github.com/kientech/breast-cancer-classification-using-machine-learning
data-science machine-learning numpy pandas python3 scikitlearn-machine-learning seaborn
Last synced: 10 months ago
JSON representation
The goal is to leverage data-driven models to assist in the early detection and diagnosis of breast cancer, potentially aiding healthcare professionals in making informed decisions
- Host: GitHub
- URL: https://github.com/kientech/breast-cancer-classification-using-machine-learning
- Owner: kientech
- Created: 2024-08-25T06:05:18.000Z (over 1 year ago)
- Default Branch: master
- Last Pushed: 2024-08-26T03:32:25.000Z (over 1 year ago)
- Last Synced: 2025-01-26T19:44:06.723Z (12 months ago)
- Topics: data-science, machine-learning, numpy, pandas, python3, scikitlearn-machine-learning, seaborn
- Language: Jupyter Notebook
- Homepage:
- Size: 234 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Breast Cancer Classification Using Machine Learning
This repository contains a machine learning project focused on classifying breast cancer as either malignant or benign using various machine learning algorithms. The goal is to leverage data-driven models to assist in the early detection and diagnosis of breast cancer, potentially aiding healthcare professionals in making informed decisions.
## Overview
Breast cancer is one of the most common cancers among women worldwide. Early detection and accurate diagnosis are crucial for effective treatment. In this project, we use machine learning techniques to classify breast cancer tumors based on features extracted from breast tissue samples.
## Project Workflow
1. **Data Collection**:
- The dataset used is the [Breast Cancer Wisconsin (Diagnostic) Dataset](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)), which contains features computed from digitized images of fine needle aspirate (FNA) of breast masses.
- The dataset includes features such as radius, texture, perimeter, area, smoothness, compactness, and more.
2. **Data Preprocessing**:
- Handling missing data, if any.
- Feature scaling to ensure that all features contribute equally to the model.
- Splitting the data into training and testing sets.
3. **Exploratory Data Analysis (EDA)**:
- Visualizing the distribution of features.
- Understanding the correlation between different features.
- Identifying any patterns or insights that could help in the classification task.
4. **Model Selection**:
- Implementing and comparing multiple machine learning models, including:
- Logistic Regression
- Support Vector Machine (SVM)
- Random Forest
- k-Nearest Neighbors (k-NN)
- Gradient Boosting
- Using cross-validation to select the best-performing model.
5. **Model Evaluation**:
- Evaluating the models using metrics such as accuracy, precision, recall, F1-score, and AUC-ROC.
- Analyzing confusion matrices to understand the performance of the models.
- Tuning hyperparameters to optimize model performance.
6. **Deployment (Optional)**:
- Deploying the best model using a web interface or API, allowing users to input data and receive predictions.
## Getting Started
### Prerequisites
Ensure you have Python installed, along with the following libraries:
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
You can install the required libraries using:
```bash
pip install -r requirements.txt
```
### Running the Project
1. **Clone the Repository**:
```bash
git clone https://github.com/kientech/Breast-Cancer-Classification-Using-Machine-Learning
cd Breast-Cancer-Classification-Using-Machine-Learning
```
2. **Explore the Data**:
- The data is stored in the `data/` directory.
- Use the Jupyter notebooks or scripts in the `notebooks/` or `scripts/` directory to explore and preprocess the data.
3. **Train the Models**:
- Run the `model_training.ipynb` notebook to train and evaluate the models.
- The best model will be saved in the `models/` directory.
4. **Make Predictions**:
- Use the trained model to make predictions on new data.
## Results
- The best-performing model achieved an accuracy of **XX%**, with a precision of **XX%**, recall of **XX%**, and an AUC-ROC score of **XX**.
- The model is effective at distinguishing between malignant and benign tumors, providing a valuable tool for medical diagnosis.
## Contributions
Contributions are welcome! If you have any suggestions, improvements, or new features to add, feel free to open a pull request or an issue.