Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/3rd-son/knn-breast-cancer-prediction-model-
A breast cancer prediction model using KNN with an accuracy of 96%
https://github.com/3rd-son/knn-breast-cancer-prediction-model-
jupyter-notebook knn-classifier matplotlib numpy pandas python scikit-learn scipy search
Last synced: 8 days ago
JSON representation
A breast cancer prediction model using KNN with an accuracy of 96%
- Host: GitHub
- URL: https://github.com/3rd-son/knn-breast-cancer-prediction-model-
- Owner: 3rd-Son
- Created: 2023-06-17T13:09:51.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-06-17T13:28:33.000Z (over 1 year ago)
- Last Synced: 2023-08-17T16:57:56.411Z (over 1 year ago)
- Topics: jupyter-notebook, knn-classifier, matplotlib, numpy, pandas, python, scikit-learn, scipy, search
- Language: Jupyter Notebook
- Homepage:
- Size: 132 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Breast Cancer Diagnosis Model
Overview
This machine learning model predicts whether a cell is malignant or benign based on features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.
Dataset
The dataset used for training and evaluation is the "Breast Cancer Wisconsin (Diagnostic)" dataset available from the UCI Machine Learning Repository.
Dataset URL: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29
The dataset contains the following attributes:
- ID number
- Diagnosis (M = malignant, B = benign)
- Ten real-valued features computed for each cell nucleus: radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension.
- The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.
The dataset contains a total of 569 instances, with 357 benign and 212 malignant samples.
Model Architecture
The model uses the k-nearest neighbors (KNN) algorithm for classification. KNN is a non-parametric method that classifies new instances based on their similarity to training instances. In this case, the model calculates the distances to the k nearest neighbors in the feature space and assigns the majority class label among those neighbors.
Evaluation
The model was trained and evaluated using standard machine learning practices, including train-test split, hyperparameter tuning, and cross-validation. The evaluation metrics used for assessing the model performance include accuracy, precision, recall, and F1-score.
During evaluation, the model achieved an accuracy of 96% on the test set.
Usage
To use the trained model for prediction, follow these steps:
- Ensure that you have Python and the required dependencies installed.
- Load the trained model into your Python environment.
- Provide input data with the ten real-valued features for the cell nucleus.
- Call the appropriate method to obtain the predicted diagnosis (malignant or benign).
Example code:
import numpy as np
from sklearn.neighbors import KNeighborsClassifier# Load the trained model
model = KNeighborsClassifier(n_neighbors=5)
model.load_model('trained_model.pkl')# Provide input data
input_data = np.array([[radius, texture, perimeter, area, smoothness, compactness, concavity, concave_points, symmetry, fractal_dimension]])# Make predictions
predictions = model.predict(input_data)print(predictions)