Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/praatibhsurana/breast-cancer-prediction-svm
A SVM classifier coded in Python using Scikit-Learn to classify whether a patient's tumor is malignant or benign.
https://github.com/praatibhsurana/breast-cancer-prediction-svm
kaggle-dataset linear-classifier machine-learning-algorithms python scikit-learn svm-classifier
Last synced: 5 days ago
JSON representation
A SVM classifier coded in Python using Scikit-Learn to classify whether a patient's tumor is malignant or benign.
- Host: GitHub
- URL: https://github.com/praatibhsurana/breast-cancer-prediction-svm
- Owner: praatibhsurana
- Created: 2020-10-26T10:50:23.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2020-10-28T07:57:10.000Z (over 4 years ago)
- Last Synced: 2024-01-15T01:19:16.771Z (about 1 year ago)
- Topics: kaggle-dataset, linear-classifier, machine-learning-algorithms, python, scikit-learn, svm-classifier
- Language: Python
- Homepage:
- Size: 168 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
## Brief
The project was carried out on the breast cancer dataset compiled for research. It can be found at: [UCI ML Repository](https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Diagnostic%29) and also on [Kaggle](https://www.kaggle.com/uciml/breast-cancer-wisconsin-data)### Attribute Information:
1) ID number
2) Diagnosis (M = malignant, B = benign)
3-32)Ten real-valued features are computed for each cell nucleus:
a) radius (mean of distances from center to points on the perimeter)
b) texture (standard deviation of gray-scale values)
c) perimeter
d) area
e) smoothness (local variation in radius lengths)
f) compactness (perimeter^2 / area - 1.0)
g) concavity (severity of concave portions of the contour)
h) concave points (number of concave portions of the contour)
i) symmetry
j) fractal dimension ("coastline approximation" - 1)The mean, standard error and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.All feature values are recoded with four significant digits. |
Missing attribute values: none |
Class distribution: 357 benign, 212 malignant### Correlation Heatmap of the various parameters after basic EDA
![Correlation Heatmap](https://github.com/praatibhsurana/Breast-Cancer-Prediction-SVM/blob/master/corr_heatmap.png?raw=true)### Model
A SVM Classifier was used. Preprocessing and EDA was carried out and the 26 best parameters that affected the prediction were chosen. A little bit of tweaking on the C parameter and use of rbf kernel yielded better results as compared to a linear kernel.
The scores obtained were as follows:
1) Accuracy = 0.93
2) Precision = 0.95
3) Recall = 0.74
4) F1-Score = 0.83The score can be improved on further analysis and experimentation with various kernels and tweaking of 'C' and 'gamma' parameters.