https://github.com/anjasfedo/fccmachinelearning

Repo as resource of Freecodecamp Machine Learning.
https://github.com/anjasfedo/fccmachinelearning
freecodecamp learning-by-doing learning-resources machine-learning python
Last synced: 11 days ago
JSON representation
Repo as resource of Freecodecamp Machine Learning.
Host: GitHub
URL: https://github.com/anjasfedo/fccmachinelearning
Owner: Anjasfedo
Created: 2023-11-26T05:37:18.000Z (about 2 years ago)
Default Branch: main
Last Pushed: 2023-12-19T16:04:06.000Z (about 2 years ago)
Last Synced: 2025-01-17T03:28:30.549Z (12 months ago)
Topics: freecodecamp, learning-by-doing, learning-resources, machine-learning, python
Language: Jupyter Notebook
Homepage:
Size: 6.96 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # FCCMachineLearning

# Supervised learning (classification/MAGIC)

## 1. DataSet:

using Magic dataset from https://archive.ics.uci.edu/dataset/159/magic+gamma+telescope

firstly we will import some dependency: 

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

from imblearn.over_sampling import RandomOverSampler

now upload dataset to google colab, and we can show it by defining the header with this code :

cols = ["fLength", "fWidth", "fSize", "fConc", "fConc1", "fAsym", "fM3Long", "fM3Trans", "fAlpha", "fDist", "class"]

df = pd.read_csv("magic04.data", names=cols)

df.head()

then to change the class that have value g to integer(1), use this code: df["class"] = (df["class"] == "g").astype(int)

next we can show data as diagram for each criteria by the class type with code: for label in cols[:-1]:

  plt.hist(df[df["class"]==1][label], color='blue', label='gamma', alpha=0.5, density=True)

  plt.hist(df[df["class"]==0][label], color='red', label='hadron', alpha=0.5, density=True)

  plt.title(label)

  plt.ylabel("Probability")

  plt.xlabel(label)

  plt.legend()

  plt.show()

density make the data same, exam if 0 is 100 and 1 is 50, so will take 50

## 2. Train, validation, test dataset

create destructuing variable train, valid, and test with value np.split like: train, valid, test = np.split(df.sample(frac=1), [int(0.6*len(df)), int(0.8*len(df))])

then create new function name scale_dataset that get argument dataframe

get x by dataframe last column, and rest of it become y:   X = dataframe[dataframe.columns[:-1]].values

  y = dataframe[dataframe.columns[-1]].values

then use scaler by StandartScaler to fit and transform x:   scaler = StandardScaler()

  X = scaler.fit_transform(X)

next create data as one 2d numpy array use hstack x and y, and we need to call np.reshape:   data = np.hstack((X, np.reshape(y, (-1, 1))))

then we can return the data, x, and y:   return data, X, y 

that is the function, next we will check sum of data with class 1 and 0 use: print(len(train[train["class"] == 1]))

print(len(train[train["class"] == 0]))

we will see the data is not equal. so we can oversample, to increase the number of less to make it equal, use imblearn.over_sampling from RandomOverSampler: from imblearn.over_sampling import RandomOverSampler

then add new parameter on scale_dataset function name oversample with value false as default: def scale_dataset(dataframe, oversample = False):

the add conditional if oversample, then use RandomOverSampler name ros, and fit_resample the x and y:   if oversample:

    ros = RandomOverSampler()

    x, y = ros.fit_resample(x, y)

create new code block, that will make variable with destructuring name train, x_train, y_train with value of scale_dataset function that passing train and oversample = true: train, X_train, y_train = scale_dataset(train, oversample=True)

and also do it with valid and test, but with oversampe=false: valid, X_valid, y_valid = scale_dataset(valid, oversample=False)

test, X_test, y_test = scale_dataset(test, oversample=False)

## 3. Model Classification

### 1. K Nearest Neighbor

to use it we will import package from sklearn: from sklearn.neighbors import KNeighborsClassifier

then use it as knn_model by passing parameter how many k will be use: knn_model  = KNeighborsClassifier(n_neighbors=5) 

then we can use it to training the data by use fit and passing x_train an y_train: knn_model.fit(X_train, y_train)

to do predict or use test data, we can make new variable name y_pred that value knn_model.predict that passing argument of x_text: y_pred = knn_model.predict(X_test)

to see the classification report, import it from sklearn: from sklearn.metrics import classification_report

and use it to see report of y_test, and the y_pred that use before: print(classification_report(y_test, y_pred))

### 2. naive bayes

we also import naive bayes from sklearn with code: from sklearn.naive_bayes import GaussianNB

and use it as nb_model: nb_model = GaussianNB()

the we can use train data with fit: nb_model = nb_model.fit(X_train, y_train)

then make prediction name y_pred using nb_model, also dont forget to do classification_report with y_test and y_pred: y_pred = nb_model.predict(X_test)

to show the report use code: print(classification_report(y_test, y_pred))

### 3. logistic regression

firstly we can import logistic regression model from sklearn: from sklearn.linear_model import LogisticRegression

and use it as variable lg_model and fit it with x_train and y_train: lg_model = LogisticRegression()

lg_model.fit(X_train, y_train)

then we can predict with that lg_model and see the report: y_pred = lg_model.predict(X_test)

print(classification_report(y_test, y_pred))

### 4.svm:

same as before we can import svc(svm classification) model from sklearn: from sklearn.linear_model import LogisticRegression

and use it as vairable name svm_model and fit with our train data: lg_model = LogisticRegression()

lg_model.fit(X_train, y_train)

then we make y_pred that predict the x_test with svm_model and show the report: y_pred = lg_model.predict(X_test)

print(classification_report(y_test, y_pred))

### 5. neural network:

different from 4 model before, nn use some complicated method in it with tensorflow

we will use neural network classification with tensorflow, we can import tensorflow as tf: import tensorflow as tf

and make variable name nn_model with value tf.keras.Sequental and passing an array layer with node 32 and actiivation of relu also input_shape 10. and repeat it once without shape, then make last layer with node 1 and activation sigmoid:  nn_model = tf.keras.Sequential([

    tf.keras.layers.Dense(32, activation="relu", input_shape=(10,)),

    tf.keras.layers.Dense(32, activation="relu"),

    tf.keras.layers.Dense(1, activation="sigmoid")

])

and we can compile it and passing optimizer adam from tf with 0.001, with loss binary_crossentropy, also metric an array with accuracy to see the accuration: nn_model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss="binary_crossentropy", metrics=["accuracy"])

then take function from tensorflow fot plot_loss and plot_accuracy, place the code block after import: def plot_loss(history):

  plt.plot(history.history["loss"], label="loss")

  plt.plot(history.history["val_loss"], label = "val_loss")

  plt.xlabel("Epoch")

  plt.ylabel("Binary crossentropy")

  plt.legend()

  plt.grid(True)

  plt.show()

def plot_accuracy(history):

  plt.plot(history.history["accuracy"], label="accuracy")

  plt.plot(history.history["val_accuracy"], label = "val_accuracy")

  plt.xlabel("Epoch")

  plt.ylabel("Binary crossentropy")

  plt.legend()

  plt.grid(True)

  plt.show()

next create a variable nam history with value of nn_model that fit the x_train, y_train, epoch of 100, batch_size of 32, validation_split of 0.2 because tensorflow use validation it self, and verbose of 0: history = nn_model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)

so we can wait train process, if done we can use plot_loss and plot_accuracy by passing history: plot_loss(history)

plot_accuracy(history)

to more optimize the model we can change the node in layers, next we will make the node change automaticly also change the optimizer, epoch, batch_size, etc.

add new Dropout layers for after each layer expect last layer to choose node and dont train it to prevent overfit:    nn_model = tf.keras.Sequential([

      tf.keras.layers.Dense(32, activation="relu", input_shape=(10,)),

      tf.keras.layers.Dropout(),

      tf.keras.layers.Dense(32, activation="relu"),

      tf.keras.layers.Dropout(),

      tf.keras.layers.Dense(1, activation="sigmoid")

  ])

then wrap the nn_model on function name train_model with some argument(x_train, y_train, num_nodes, dropout_prob, lr, batch_size, and epochs: def train_model(X_train, y_train, num_nodes, dropout_prob, lr, batch_size, epochs):

change the node with num_nodes:       tf.keras.layers.Dense(num_nodes, activation="relu", input_shape=(10,)),

passing the dropout_prob to Dropout:       tf.keras.layers.Dropout(dropout_prob),

change optimizers value with lr:   nn_model.compile(optimizer=tf.keras.optimizers.Adam(lr), loss="binary_crossentropy", metrics=["accuracy"])

place history variable on this function, and set some value on fit like epochs equal epochs, batch_size equal batch_size:   history = nn_model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.2, verbose=0)

so return the nn_model and hitsory:   return nn_model, history

this is the full function: def train_model(X_train, y_train, num_nodes, dropout_prob, lr, batch_size, epochs)

  nn_model = tf.keras.Sequential([

      tf.keras.layers.Dense(num_nodes, activation="relu", input_shape=(10,)),

      tf.keras.layers.Dropout(dropout_prob),

      tf.keras.layers.Dense(num_nodes, activation="relu"),

      tf.keras.layers.Dropout(),

      tf.keras.layers.Dense(1, activation="sigmoid")

  ])

  nn_model.compile(optimizer=tf.keras.optimizers.Adam(lr), loss="binary_crossentropy", metrics=["accuracy"])

  history = nn_model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.2, verbose=0)

  return nn_model, history

try change some value with looping

use epochs of 100: epochs=100

and create loop with for, with 16, 32, and 64 for num_nodes: for num_nodes in [16, 32, 64]:

0, 0.2 for dropout_prob:   for dropout_prob in[0, 0.2]:

lf with value 0.01, 0.005. 0.002:     for lr in [0.01, 0.005, 0.001]:

for batch_size with 32, 64, 128:       for batch_size in [32, 64, 128]:

then on that nesting loop destructuring the variable model and history from train_model function by passing all the argument:         model, history = train_model(X_train, y_train, num_nodes, dropout_prob, lr, batch_size, epochs)

next for plot_loss and plot_accuracy

we can combine the plot_loss and plot_accuacy

back to plot_loss change the name to plot_history: def plot_history(history):

split the plot with ax1 and ax2: 

use ax1 as plot loss, and ax2 as plot accuracy, also change the x and ylabel to set_x and set_ylabel: def plot_history(history):

  fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

  ax1.plot(history.history['loss'], label='loss')

  ax1.plot(history.history['val_loss'], label='val_loss')

  ax1.set_xlabel('Epoch')

  ax1.set_ylabel('Binary crossentropy')

  ax1.grid(True)

  ax2.plot(history.history['accuracy'], label='accuracy')

  ax2.plot(history.history['val_accuracy'], label='val_accuracy')

  ax2.set_xlabel('Epoch')

  ax2.set_ylabel('Accuracy')

  ax2.grid(True)

  plt.show()

so we can use this function to plot it:         plot_history(history)

next to the loop, we can see what is out parameter by print it:         print(f"{num_nodes} nodes, dropout {dropout_prob}, lr {lr}, batch size {batch_size}")

and create variable name val_loss with value model.evaluate of x_valid and y_valid: 

on top of iteration, crate least_val_loss with value float inf: least_val_loss = float('inf')

and least_loss_model of none: least_loss_model = None

back to the iteration after val_loss add conditioal that check if val_loss less than least_val_loss, we will set least_val_loss with val_loss and also for least_loss_model with model:         if val_loss < least_val_loss:

          least_val_loss = val_loss

          least_loss_model = model

this the full code of loop: least_val_loss = float('inf')

least_loss_model = None

epochs=100

for num_nodes in [16, 32, 64]:

  for dropout_prob in[0, 0.2]:

    for lr in [0.01, 0.005, 0.001]:

      for batch_size in [32, 64, 128]:

        print(f"{num_nodes} nodes, dropout {dropout_prob}, lr {lr}, batch size {batch_size}")

        model, history = train_model(X_train, y_train, num_nodes, dropout_prob, lr, batch_size, epochs)

        plot_history(history)

        val_loss = model.evaluate(X_valid, y_valid)[0]

        if val_loss < least_val_loss:

          least_val_loss = val_loss

          least_loss_model = model

we can change the validation_split to validation_data with value valid, for now we will still use validation_split: validation_data=valid

lastly we can predict with least_loss_model by passing x_test: y_pred = least_loss_model.predict(X_test)

and we can caxt them with conditional list, if y_pred greater than 0.5 change it as int, and reshape it to one dimensioal with -1: y_pred = (y_pred > 0.5).astype(int).reshape(-1,)

next we can check the report by passing y_test and y_pred: print(classification_report(y_test, y_pred))

## 4. Model Regression
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anjasfedo/fccmachinelearning

Awesome Lists containing this project

README