https://github.com/AFAgarap/wisconsin-breast-cancer

[ICMLSC 2018] On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset
https://github.com/AFAgarap/wisconsin-breast-cancer

binary-classification classification linear-regression logistic-regression machine-learning machine-learning-algorithms multilayer-perceptron nearest-neighbours-classifier recurrent-neural-network scikit-learn softmax-regression supervised-learning support-vector-machine tensorflow

Last synced: 7 months ago
JSON representation

[ICMLSC 2018] On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset

Host: GitHub
URL: https://github.com/AFAgarap/wisconsin-breast-cancer
Owner: AFAgarap
License: apache-2.0
Archived: true
Created: 2017-09-11T15:38:48.000Z (almost 8 years ago)
Default Branch: master
Last Pushed: 2023-03-24T23:54:18.000Z (over 2 years ago)
Last Synced: 2024-08-04T10:01:23.749Z (11 months ago)
Topics: binary-classification, classification, linear-regression, logistic-regression, machine-learning, machine-learning-algorithms, multilayer-perceptron, nearest-neighbours-classifier, recurrent-neural-network, scikit-learn, softmax-regression, supervised-learning, support-vector-machine, tensorflow
Language: Python
Homepage: https://arxiv.org/pdf/1711.07831.pdf
Size: 7.57 MB
Stars: 55
Watchers: 6
Forks: 26
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-ai-cancer - AFAgarap/wisconsin-breast-cancer - Codebase for *On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset [ICMLSC 2018 / arXiv 1711.07831]* (Code / Repositories)

README

        On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset

===

![](https://img.shields.io/badge/DOI-cs.LG%2F1711.07831-blue.svg)

[![DOI](https://zenodo.org/badge/103154598.svg)](https://zenodo.org/badge/latestdoi/103154598)

![](https://img.shields.io/badge/license-Apache--2.0-blue.svg)

[![PyPI](https://img.shields.io/pypi/pyversions/Django.svg)]()

*Note*: This repository is retired and will not be ported to use TF2. However, you may use this as a reference in doing so.

*This paper was presented at the 2nd International Conference on Machine Learning and Soft Computing (ICMLSC) in Phu Quoc Island, Vietnam last February 2-4, 2018.*

The full paper on this project may be read at [arXiv.org](http://arxiv.org/abs/1711.07831).

## Abstract

This paper presents a comparison of six machine learning (ML) algorithms: 

GRU-SVM[4], Linear Regression, Multilayer Perceptron (MLP),

Nearest Neighbor (NN) search, Softmax Regression, and Support Vector Machine (SVM) on the Wisconsin Diagnostic Breast

Cancer (WDBC) dataset [22]

by measuring their classification test accuracy and their sensitivity and specificity values. The said dataset consists

of features which were computed from digitized images of FNA tests on a breast mass[22]. For the implementation of

the ML algorithms, the dataset was partitioned in the following fashion: 70% for training phase, and 30% for the

testing phase. The hyper-parameters used for all the classifiers were manually assigned. Results show that all the

presented ML algorithms performed well (all exceeded 90% test accuracy) on the classification task. The MLP algorithm

stands out among the implemented algorithms with a test accuracy of ~99.04% Lastly, the results are comparable

with the findings of the related studies[18

, 23].

## Citation

To cite the paper, kindly use the following BibTex entry:

```

@inproceedings{Agarap:2018:BCD:3184066.3184080,

 author = {Agarap, Abien Fred M.},

 title = {On Breast Cancer Detection: An Application of Machine Learning Algorithms on the Wisconsin Diagnostic Dataset},

 booktitle = {Proceedings of the 2Nd International Conference on Machine Learning and Soft Computing},

 series = {ICMLSC '18},

 year = {2018},

 isbn = {978-1-4503-6336-5},

 location = {Phu Quoc Island, Viet Nam},

 pages = {5--9},

 numpages = {5},

 url = {http://doi.acm.org/10.1145/3184066.3184080},

 doi = {10.1145/3184066.3184080},

 acmid = {3184080},

 publisher = {ACM},

 address = {New York, NY, USA},

 keywords = {artificial intelligence, artificial neural networks, classification, linear regression, machine learning, multilayer perceptron, nearest neighbors, softmax regression, supervised learning, support vector machine, wisconsin diagnostic breast cancer dataset},

}

```

To cite the repository/software, kindly use the following BibTex entry:

```

@misc{abien_fred_agarap_2017_1098533,

  author       = {Abien Fred Agarap},

  title        = {AFAgarap/wisconsin-breast-cancer: v0.1.0-alpha},

  month        = dec,

  year         = 2017,

  doi          = {10.5281/zenodo.1098533},

  url          = {https://doi.org/10.5281/zenodo.1098533}

}

```

## Machine Learning (ML) Algorithms

* GRU-SVM

* Linear Regression

* Multilayer Perceptron

* Nearest Neighbor

* Softmax Regression

* L2-SVM

## Results

All experiments in this study were conducted on a laptop computer with Intel Core(TM) i5-6300HQ CPU @ 2.30GHz x 4,

16GB of DDR3 RAM, and NVIDIA GeForce GTX 960M 4GB DDR5 GPU.

![](results/training_accuracy.png)

**Figure 1. Training accuracy of the machine learning algorithms on breast cancer detection using WDBC.**

Figure 1 shows the training accuracy of the ML algorithms: (1) GRU-SVM finished its training in 2 minutes and 54

seconds with an average training accuracy of 90.6857639%, (2) Linear Regression finished its training in 35 seconds

with an average training accuracy of 92.8906257%, (3) MLP finished its training in 28 seconds with an average training

accuracy of 96.9286785%, (4) Softmax Regression finished its training in 25 seconds with an average training accuracy

of 97.366573%, and (5) L2-SVM finished its training in 14 seconds with an average training accuracy of 97.734375%.

There was no recorded training accuracy for Nearest Neighbor search since it does not require any training, as the norm

equations (L1 and L2) are directly applied on the dataset to determine the “nearest neighbor” of a given data

point p_{i} ∈ p.




**Table 1. Summary of experiment results on the machine learning algorithms.**

|Parameter|GRU-SVM|Linear Regression|MLP|L1-NN|L2-NN|Softmax Regression|L2-SVM|

|---------|-------|-----------------|---|-----|-----|------------------|------|

|Accuracy|93.75%|96.09375%|99.038449585420729%|93.567252%|94.736844%|97.65625%|96.09375%|

|Data points|384000|384000|512896|171|171|384000|384000|

|Epochs|3000|3000|3000|1|1|3000|3000|

|FPR|16.666667%|10.204082%|1.267042%|6.25%|9.375%|5.769231%|6.382979%|

|FNR|0|0|0.786157%|6.542056%|2.803738%|0|2.469136%|

|TPR|100%|100%|99.213843%|93.457944%|97.196262%|100%|97.530864%|

|TNR|83.333333%|89.795918%|98.732958%|93.75%|90.625%|94.230769%|93.617021%|

Table 1 summarizes the results of the experiment on the ML algorithms. The parameters recorded were test accuracy,

number of data points (`epochs * dataset_size`), epochs, false positive rate (FPR), false negative rate (FNR), true

positive rate (FPR), and true negative rate (TNR). All code implementations of the algorithms were written using Python

with TensorFlow as the machine intelligence library.

## License

```buildoutcfg

Copyright 2017 Abien Fred Agarap

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License.

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/AFAgarap/wisconsin-breast-cancer

Awesome Lists containing this project

README