https://github.com/rishit-dagli/breast-cancer-prediction-ml-python
Make predictions for breast cancer, malignant or benign using the Breast Cancer data set
https://github.com/rishit-dagli/breast-cancer-prediction-ml-python
breast-cancer-classification breast-cancer-prediction breast-cancer-wisconsin logistic-regression machine-learning python-3
Last synced: 11 months ago
JSON representation
Make predictions for breast cancer, malignant or benign using the Breast Cancer data set
- Host: GitHub
- URL: https://github.com/rishit-dagli/breast-cancer-prediction-ml-python
- Owner: Rishit-dagli
- License: mit
- Created: 2019-08-02T12:48:57.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-02-18T08:24:26.000Z (over 6 years ago)
- Last Synced: 2025-04-08T03:51:43.200Z (about 1 year ago)
- Topics: breast-cancer-classification, breast-cancer-prediction, breast-cancer-wisconsin, logistic-regression, machine-learning, python-3
- Language: Jupyter Notebook
- Size: 8.07 MB
- Stars: 5
- Watchers: 2
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Breast-cancer-prediction-ML-Python

[](https://twitter.com/intent/tweet?text=Wow:&url=https://github.com/Rishit-dagli/Breast-cancer-prediction-ML-Python)

Make predictions for breast cancer, malignant or benign using the Breast Cancer data set
Dataset - Breast Cancer Wisconsin (Original) Data Set
This code demonstrates logistic regression on the dataset and also uses gradient descent to lower the BCE(binary cross entropy).
## Dataset description

- Sample code number: id number
- Clump Thickness: 1 - 10
- Uniformity of Cell Size: 1 - 10
- Uniformity of Cell Shape: 1 - 10
- Marginal Adhesion: 1 - 10
- Single Epithelial Cell Size: 1 - 10
- Bare Nuclei: 1 - 10
- Bland Chromatin: 1 - 10
- Normal Nucleoli: 1 - 10
- Mitoses: 1 - 10
- Class: (2 for benign, 4 for malignant)
Libraries required
- numpy
pip install numpy
- pandas
pip install pandas
-
random
pip install random
-
seaborn
pip install seaborn
Logistic regression algorithm

-
Use the sigmoid activation function -
-
Remember the gradient descent formula for liner regression where Mean squared error was used but we cannot use Mean squared error here so replace with some error
-
Gradient Descent -
Logistic regression -
-
Conditions for E:
- Convex or as convex as possible
- Should be function of
- Should be differentiable
-
So use, Entropy =
- As we cant use both
and y so use cross entropy
as
-
So add 2 cross entropies CE 1 =and CE 2 =
.
We get Binary Cross entropy (BCE) =
-
So now our formula becomes,
-
Using simple chain rule we obtain,
-
Now apply Gradient Descent with this formula
## Code
- Data preprocessing
Load data, remove empty values. As we are using logistic regression replace 2 and 4 with 0 and 1. -
sns.pairplot(df)
Create pair wisegraphs for the features. - Do Principal component analysis for simplified learning.
-
full_data=np.matrix(full_data)
x0=np.ones((full_data.shape[0],1))
data=np.concatenate((x0,full_data),axis=1)
print(data.shape)
theta=np.zeros((1,data.shape[1]-1))
print(theta.shape)
print(theta)
Convert data to matrix, concatenate a unit matrix with the complete data matrix. Also make a zero matrix, for the initial theta. -
test_size=0.2
X_train=data[:-int(test_size*len(full_data)),:-1]
Y_train=data[:-int(test_size*len(full_data)),-1]
X_test=data[-int(test_size*len(full_data)):,:-1]
Y_test=data[-int(test_size*len(full_data)):,-1]
Create the train-test split -
def sigmoid(Z):
  return 1/(1+np.exp(-Z))
def BCE(X,y,theta):
  pred=sigmoid(np.dot(X,theta.T))
  mcost=-np.array(y)*np.array(np.log(pred))np.array((1y))*np.array(np.log(1pred))
  return mcost.mean()
Define the code for sigmoid function as mentioned and the BCE. -
def grad_descent(X,y,theta,alpha):
  h=sigmoid(X.dot(theta.T))
  loss=h-y
  dj=(loss.T).dot(X)
  theta -= (alpha/(len(X))*dj)
  return theta
cost=BCE(X_train,Y_train,theta)
print("cost before: ",cost)
theta=grad_descent(X_train,Y_train,theta,alpha)
cost=BCE(X_train,Y_train,theta)
print("cost after: ",cost)
Define gradient descent algorithm and also define the number of epochs. Also test the gradient descent by 1 iteration. -
def logistic_reg(epoch,X,y,theta,alpha):
  for ep in range(epoch):
#update theta
  theta=grad_descent(X,y,theta,alpha)
#calculate new loss
  if ((ep+1)%1000 == 0):
    loss=BCE(X,y,theta)
    print("Cost function ",loss)
  return theta
theta=logistic_reg(epoch,X_train,Y_train,theta,alpha)
Define the logistic regression with gradient descent code. -
print(BCE(X_train,Y_train,theta))
print(BCE(X_test,Y_test,theta))
Finally test the code,
Now we are done with the code 😀
## The Algorithm as a web service
### Python 3+
import urllib.request
import json
data = {
"Inputs": {
"input1":
[
{
'1': "4",
'2': "7",
'3': "3",
'5': "5",
'1000025': "1002945",
'1 (2)': "4",
'1 (3)': "5",
'1 (4)': "10",
'1 (5)': "2",
'1 (6)': "1",
'2 (2)': "2",
}
],
},
"GlobalParameters": {
}
}
body = str.encode(json.dumps(data))
url = 'https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger'
api_key = 'abc123' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
req = urllib.request.Request(url, body, headers)
try:
response = urllib.request.urlopen(req)
result = response.read()
print(result)
except urllib.error.HTTPError as error:
print("The request failed with status code: " + str(error.code))
# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read().decode("utf8", 'ignore')))
### Python
import urllib2
import json
data = {
"Inputs": {
"input1":
[
{
'1': "4",
'2': "7",
'3': "3",
'5': "5",
'1000025': "1002945",
'1 (2)': "4",
'1 (3)': "5",
'1 (4)': "10",
'1 (5)': "2",
'1 (6)': "1",
'2 (2)': "2",
}
],
},
"GlobalParameters": {
}
}
body = str.encode(json.dumps(data))
url = 'https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger'
api_key = 'abc123' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}
req = urllib2.Request(url, body, headers)
try:
response = urllib2.urlopen(req)
result = response.read()
print(result)
except urllib2.HTTPError, error:
print("The request failed with status code: " + str(error.code))
# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(error.info())
print(json.loads(error.read()))
### R
library("RCurl")
library("rjson")
# Accept SSL certificates issued by public Certificate Authorities
options(RCurlOptions = list(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")))
h = basicTextGatherer()
hdr = basicHeaderGatherer()
req = list(
Inputs = list(
"input1"= list(
list(
'1' = "4",
'2' = "7",
'3' = "3",
'5' = "5",
'1000025' = "1002945",
'1 (2)' = "4",
'1 (3)' = "5",
'1 (4)' = "10",
'1 (5)' = "2",
'1 (6)' = "1",
'2 (2)' = "2"
)
)
),
GlobalParameters = setNames(fromJSON('{}'), character(0))
)
body = enc2utf8(toJSON(req))
api_key = "abc123" # Replace this with the API key for the web service
authz_hdr = paste('Bearer', api_key, sep=' ')
h$reset()
curlPerform(url = "https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger",
httpheader=c('Content-Type' = "application/json", 'Authorization' = authz_hdr),
postfields=body,
writefunction = h$update,
headerfunction = hdr$update,
verbose = TRUE
)
headers = hdr$value()
httpStatus = headers["status"]
if (httpStatus >= 400)
{
print(paste("The request failed with status code:", httpStatus, sep=" "))
# Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
print(headers)
}
print("Result:")
result = h$value()
print(fromJSON(result))
### C#
// This code requires the Nuget package Microsoft.AspNet.WebApi.Client to be installed.
// Instructions for doing this in Visual Studio:
// Tools -> Nuget Package Manager -> Package Manager Console
// Install-Package Microsoft.AspNet.WebApi.Client
using System;
using System.Collections.Generic;
using System.IO;
using System.Net.Http;
using System.Net.Http.Formatting;
using System.Net.Http.Headers;
using System.Text;
using System.Threading.Tasks;
namespace CallRequestResponseService
{
class Program
{
static void Main(string[] args)
{
InvokeRequestResponseService().Wait();
}
static async Task InvokeRequestResponseService()
{
using (var client = new HttpClient())
{
var scoreRequest = new
{
Inputs = new Dictionary>> () {
{
"input1",
new List>(){new Dictionary(){
{
"1", "4"
},
{
"2", "7"
},
{
"3", "3"
},
{
"5", "5"
},
{
"1000025", "1002945"
},
{
"1 (2)", "4"
},
{
"1 (3)", "5"
},
{
"1 (4)", "10"
},
{
"1 (5)", "2"
},
{
"1 (6)", "1"
},
{
"2 (2)", "2"
},
}
}
},
},
GlobalParameters = new Dictionary() {
}
};
const string apiKey = "abc123"; // Replace this with the API key for the web service
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue( "Bearer", apiKey);
client.BaseAddress = new Uri("https://ussouthcentral.services.azureml.net/workspaces/f764effe004044e1b1c56ce46a5a8050/services/689b12141b8b4d9886aa420832a2f406/execute?api-version=2.0&format=swagger");
// WARNING: The 'await' statement below can result in a deadlock
// if you are calling this code from the UI thread of an ASP.Net application.
// One way to address this would be to call ConfigureAwait(false)
// so that the execution does not attempt to resume on the original context.
// For instance, replace code such as:
// result = await DoSomeTask()
// with the following:
// result = await DoSomeTask().ConfigureAwait(false)
HttpResponseMessage response = await client.PostAsJsonAsync("", scoreRequest);
if (response.IsSuccessStatusCode)
{
string result = await response.Content.ReadAsStringAsync();
Console.WriteLine("Result: {0}", result);
}
else
{
Console.WriteLine(string.Format("The request failed with status code: {0}", response.StatusCode));
// Print the headers - they include the requert ID and the timestamp,
// which are useful for debugging the failure
Console.WriteLine(response.Headers.ToString());
string responseContent = await response.Content.ReadAsStringAsync();
Console.WriteLine(responseContent);
}
}
}
}
}
## More about the project
1. My medium article on same - [here](https://medium.com/@rishit.dagli/create-logistic-regression-algorithm-from-scratch-and-apply-it-on-data-set-3f16ca5dbdb9)
2. My research paper on this - [here](https://iarjset.com/papers/machine-learning-as-a-decision-aid-for-breast-cancer-diagnosis/)
3. Another must read paper about the same topic -[here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC55130/)
## Other algorithms for same project by me
1. Multiclass Neural Networks
2. Random Forest classifier
[Project](https://gallery.azure.ai/Experiment/Breast-cancer-dataset)
## About me
Rishit Dagli
[Website](rishitdagli.ml)
[LinkedIn](https://www.linkedin.com/in/rishit-dagli-440113165/)