An open API service indexing awesome lists of open source software.

https://github.com/rishit-dagli/breast-cancer-prediction-ml-python

Make predictions for breast cancer, malignant or benign using the Breast Cancer data set
https://github.com/rishit-dagli/breast-cancer-prediction-ml-python

breast-cancer-classification breast-cancer-prediction breast-cancer-wisconsin logistic-regression machine-learning python-3

Last synced: 11 months ago
JSON representation

Make predictions for breast cancer, malignant or benign using the Breast Cancer data set

Awesome Lists containing this project

README

          

# Breast-cancer-prediction-ML-Python

![GitHub followers](https://img.shields.io/github/followers/Rishit-dagli?label=Follow&style=social)
[![Twitter](https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Fgithub.com%2FRishit-dagli%2Fpopup_box)](https://twitter.com/intent/tweet?text=Wow:&url=https://github.com/Rishit-dagli/Breast-cancer-prediction-ML-Python)
![Twitter Follow](https://img.shields.io/twitter/follow/rishit_dagli?label=Follow&style=social)

Make predictions for breast cancer, malignant or benign using the Breast Cancer data set

Dataset - Breast Cancer Wisconsin (Original) Data Set

This code demonstrates logistic regression on the dataset and also uses gradient descent to lower the BCE(binary cross entropy).
## Dataset description

![](/pictures/breast%20cancer%20description.PNG)


  1. Sample code number: id number

  2. Clump Thickness: 1 - 10

  3. Uniformity of Cell Size: 1 - 10

  4. Uniformity of Cell Shape: 1 - 10

  5. Marginal Adhesion: 1 - 10

  6. Single Epithelial Cell Size: 1 - 10

  7. Bare Nuclei: 1 - 10

  8. Bland Chromatin: 1 - 10

  9. Normal Nucleoli: 1 - 10

  10. Mitoses: 1 - 10

  11. Class: (2 for benign, 4 for malignant)


Libraries required



  1. numpy


    pip install numpy

  2. pandas



    pip install pandas



  3. random



    pip install random



  4. seaborn



    pip install seaborn



Logistic regression algorithm

![](/pictures/logistic_regression.gif)



  • Use the sigmoid activation function -


  • Remember the gradient descent formula for liner regression where Mean squared error was used but we cannot use Mean squared error here so replace with some error


  • Gradient Descent -
    Logistic regression -


  • Conditions for E:

    1. Convex or as convex as possible

    2. Should be function of

    3. Should be differentiable




  • So use, Entropy =

  • As we cant use both and y so use cross entropy
    as



  • So add 2 cross entropies CE 1 = and CE 2 = .
    We get Binary Cross entropy (BCE) =


  • So now our formula becomes,



  • Using simple chain rule we obtain,





  • Now apply Gradient Descent with this formula

## Code


  1. Data preprocessing
    Load data, remove empty values. As we are using logistic regression replace 2 and 4 with 0 and 1.

  2. sns.pairplot(df)
    Create pair wisegraphs for the features.

  3. Do Principal component analysis for simplified learning.


  4. full_data=np.matrix(full_data)
    x0=np.ones((full_data.shape[0],1))
    data=np.concatenate((x0,full_data),axis=1)

    print(data.shape)

    theta=np.zeros((1,data.shape[1]-1))

    print(theta.shape)

    print(theta)


    Convert data to matrix, concatenate a unit matrix with the complete data matrix. Also make a zero matrix, for the initial theta.


  5. test_size=0.2

    X_train=data[:-int(test_size*len(full_data)),:-1]

    Y_train=data[:-int(test_size*len(full_data)),-1]

    X_test=data[-int(test_size*len(full_data)):,:-1]

    Y_test=data[-int(test_size*len(full_data)):,-1]


    Create the train-test split



  6. def sigmoid(Z):

    &nbsp return 1/(1+np.exp(-Z))


    def BCE(X,y,theta):

    &nbsp pred=sigmoid(np.dot(X,theta.T))

    &nbsp mcost=-np.array(y)*np.array(np.log(pred))np.array((1y))*np.array(np.log(1pred))

    &nbsp return mcost.mean()


    Define the code for sigmoid function as mentioned and the BCE.



  7. def grad_descent(X,y,theta,alpha):

    &nbsp h=sigmoid(X.dot(theta.T))

    &nbsp loss=h-y

    &nbsp dj=(loss.T).dot(X)

    &nbsp theta -= (alpha/(len(X))*dj)

    &nbsp return theta

    cost=BCE(X_train,Y_train,theta)

    print("cost before: ",cost)

    theta=grad_descent(X_train,Y_train,theta,alpha)

    cost=BCE(X_train,Y_train,theta)

    print("cost after: ",cost)


    Define gradient descent algorithm and also define the number of epochs. Also test the gradient descent by 1 iteration.



  8. def logistic_reg(epoch,X,y,theta,alpha):

    &nbsp for ep in range(epoch):

    #update theta

    &nbsp theta=grad_descent(X,y,theta,alpha)

    #calculate new loss

    &nbsp if ((ep+1)%1000 == 0):

    &nbsp &nbsp loss=BCE(X,y,theta)

    &nbsp &nbsp print("Cost function ",loss)

    &nbsp return theta


    theta=logistic_reg(epoch,X_train,Y_train,theta,alpha)


    Define the logistic regression with gradient descent code.



  9. print(BCE(X_train,Y_train,theta))


    print(BCE(X_test,Y_test,theta))


    Finally test the code,




Now we are done with the code 😀

## The Algorithm as a web service

### Python 3+

import urllib.request
import json

data = {
"Inputs": {
"input1":
[
{
'1': "4",
'2': "7",
'3': "3",
'5': "5",
'1000025': "1002945",
'1 (2)': "4",
'1 (3)': "5",
'1 (4)': "10",
'1 (5)': "2",
'1 (6)': "1",
'2 (2)': "2",
}
],
},
"GlobalParameters": {
}
}

body = str.encode(json.dumps(data))

url = 'https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger'
api_key = 'abc123' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib.request.Request(url, body, headers)

try:
response = urllib.request.urlopen(req)

result = response.read()
print(result)
except urllib.error.HTTPError as error:
print("The request failed with status code: " + str(error.code))

# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read().decode("utf8", 'ignore')))

### Python

import urllib2
import json

data = {
"Inputs": {
"input1":
[
{
'1': "4",
'2': "7",
'3': "3",
'5': "5",
'1000025': "1002945",
'1 (2)': "4",
'1 (3)': "5",
'1 (4)': "10",
'1 (5)': "2",
'1 (6)': "1",
'2 (2)': "2",
}
],
},
"GlobalParameters": {
}
}

body = str.encode(json.dumps(data))

url = 'https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger'
api_key = 'abc123' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib2.Request(url, body, headers)

try:
response = urllib2.urlopen(req)

result = response.read()
print(result)
except urllib2.HTTPError, error:
print("The request failed with status code: " + str(error.code))

# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read()))

### R

library("RCurl")
library("rjson")

# Accept SSL certificates issued by public Certificate Authorities
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))

h = basicTextGatherer()
hdr = basicHeaderGatherer()

req = list(
Inputs = list(
"input1"= list(
list(
'1' = "4",
'2' = "7",
'3' = "3",
'5' = "5",
'1000025' = "1002945",
'1 (2)' = "4",
'1 (3)' = "5",
'1 (4)' = "10",
'1 (5)' = "2",
'1 (6)' = "1",
'2 (2)' = "2"
)
)
),
GlobalParameters = setNames(fromJSON('{}'), character(0))
)

body = enc2utf8(toJSON(req))
api_key = "abc123" # Replace this with the API key for the web service
authz_hdr = paste('Bearer', api_key, sep=' ')

h$reset()
curlPerform(url = "https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger",
httpheader=c('Content-Type' = "application/json", 'Authorization' = authz_hdr),
postfields=body,
writefunction = h$update,
headerfunction = hdr$update,
verbose = TRUE
)

headers = hdr$value()
httpStatus = headers["status"]
if (httpStatus >= 400)
{
print(paste("The request failed with status code:", httpStatus, sep=" "))

# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(headers)
}

print("Result:")
result = h$value()
print(fromJSON(result))

### C#

// This code requires the Nuget package Microsoft.AspNet.WebApi.Client to be installed.
// Instructions for doing this in Visual Studio:
// Tools -> Nuget Package Manager -> Package Manager Console
// Install-Package Microsoft.AspNet.WebApi.Client

using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Net.Http.Formatting;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;

namespace CallRequestResponseService
{
class Program
{
static void Main(string[] args)
{
InvokeRequestResponseService().Wait();
}

static async Task InvokeRequestResponseService()
{
using (var client = new HttpClient())
{
var scoreRequest = new
{
Inputs = new Dictionary>> () {
{
"input1",
new List>(){new Dictionary(){
{
"1", "4"
},
{
"2", "7"
},
{
"3", "3"
},
{
"5", "5"
},
{
"1000025", "1002945"
},
{
"1 (2)", "4"
},
{
"1 (3)", "5"
},
{
"1 (4)", "10"
},
{
"1 (5)", "2"
},
{
"1 (6)", "1"
},
{
"2 (2)", "2"
},
}
}
},
},
GlobalParameters = new Dictionary() {
}
};

const string apiKey = "abc123"; // Replace this with the API key for the web service
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue( "Bearer", apiKey);
client.BaseAddress = new Uri("https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger");

// WARNING: The 'await' statement below can result in a deadlock
// if you are calling this code from the UI thread of an ASP.Net application.
// One way to address this would be to call ConfigureAwait(false)
// so that the execution does not attempt to resume on the original context.
// For instance, replace code such as:
// result = await DoSomeTask()
// with the following:
// result = await DoSomeTask().ConfigureAwait(false)

HttpResponseMessage response = await client.PostAsJsonAsync("", scoreRequest);

if (response.IsSuccessStatusCode)
{
string result = await response.Content.ReadAsStringAsync();
Console.WriteLine("Result: {0}", result);
}
else
{
Console.WriteLine(string.Format("The request failed with status code: {0}", response.StatusCode));

// Print the headers - they include the requert ID and the timestamp,
// which are useful for debugging the failure
Console.WriteLine(response.Headers.ToString());

string responseContent = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseContent);
}
}
}
}
}

## More about the project
1. My medium article on same - [here](https://medium.com/@rishit.dagli/create-logistic-regression-algorithm-from-scratch-and-apply-it-on-data-set-3f16ca5dbdb9)
2. My research paper on this - [here](https://iarjset.com/papers/machine-learning-as-a-decision-aid-for-breast-cancer-diagnosis/)
3. Another must read paper about the same topic -[here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC55130/)

## Other algorithms for same project by me
1. Multiclass Neural Networks
2. Random Forest classifier

[Project](https://gallery.azure.ai/Experiment/Breast-cancer-dataset)
## About me
Rishit Dagli

[Website](rishitdagli.ml)

[LinkedIn](https://www.linkedin.com/in/rishit-dagli-440113165/)