Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/3rd-son/knn-breast-cancer-prediction-model-

A breast cancer prediction model using KNN with an accuracy of 96%
https://github.com/3rd-son/knn-breast-cancer-prediction-model-

jupyter-notebook knn-classifier matplotlib numpy pandas python scikit-learn scipy search

Last synced: 8 days ago
JSON representation

A breast cancer prediction model using KNN with an accuracy of 96%

Awesome Lists containing this project

README

        

Breast Cancer Diagnosis Model

Overview


This machine learning model predicts whether a cell is malignant or benign based on features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass.

Dataset


The dataset used for training and evaluation is the "Breast Cancer Wisconsin (Diagnostic)" dataset available from the UCI Machine Learning Repository.


Dataset URL: https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29


The dataset contains the following attributes:



  1. ID number

  2. Diagnosis (M = malignant, B = benign)

  3. Ten real-valued features computed for each cell nucleus: radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension.

  4. The mean, standard error, and "worst" or largest (mean of the three largest values) of these features were computed for each image, resulting in 30 features.


The dataset contains a total of 569 instances, with 357 benign and 212 malignant samples.

Model Architecture


The model uses the k-nearest neighbors (KNN) algorithm for classification. KNN is a non-parametric method that classifies new instances based on their similarity to training instances. In this case, the model calculates the distances to the k nearest neighbors in the feature space and assigns the majority class label among those neighbors.

Evaluation


The model was trained and evaluated using standard machine learning practices, including train-test split, hyperparameter tuning, and cross-validation. The evaluation metrics used for assessing the model performance include accuracy, precision, recall, and F1-score.


During evaluation, the model achieved an accuracy of 96% on the test set.

Usage


To use the trained model for prediction, follow these steps:



  1. Ensure that you have Python and the required dependencies installed.

  2. Load the trained model into your Python environment.

  3. Provide input data with the ten real-valued features for the cell nucleus.

  4. Call the appropriate method to obtain the predicted diagnosis (malignant or benign).


Example code:


import numpy as np

from sklearn.neighbors import KNeighborsClassifier

# Load the trained model
model = KNeighborsClassifier(n_neighbors=5)
model.load_model('trained_model.pkl')

# Provide input data
input_data = np.array([[radius, texture, perimeter, area, smoothness, compactness, concavity, concave_points, symmetry, fractal_dimension]])

# Make predictions
predictions = model.predict(input_data)

print(predictions)