https://github.com/theemperorofdaiviet/correctness

Classification evaluation metrics
https://github.com/theemperorofdaiviet/correctness
classification-algorithm machine-learning python
Last synced: 23 days ago
JSON representation
Classification evaluation metrics
Host: GitHub
URL: https://github.com/theemperorofdaiviet/correctness
Owner: theEmperorofDaiViet
Created: 2022-11-25T16:53:57.000Z (over 3 years ago)
Default Branch: master
Last Pushed: 2023-01-25T12:13:51.000Z (over 3 years ago)
Last Synced: 2025-02-08T12:28:30.459Z (over 1 year ago)
Topics: classification-algorithm, machine-learning, python
Language: Python
Homepage:
Size: 17.6 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          

  Table of Contents

  


    

      About The Project

      

        Built With

      

    

    

      Getting Started

      

        Prerequisites

        Installation

      

    

    API Documentation

    Usage

    Contact

  


# About The Project

This is a part of my Introduction to Data Science's assignment at university. In this part, I tried to write my own module for classification evaluation metrics, based on the sklearn.metrics

## Built With

* [![Numpy][Numpy-shield]][Numpy-url]

* [![Pandas][Pandas-shield]][Pandas-url]

(back to top)


# Getting Started

## Prerequisites

To use this module, your system needs to have:

* numpy

  ```sh

  pip install numpy

  ```

* pandas

  ```sh

  pip install pandas

  ```

## Installation

You can install this module by cloning this repository into your current working directory:

```sh

git clone https://github.com/theEmperorofDaiViet/correctness.git

```

(back to top)


# API Documentation

Classification is a type of supervised machine learning problem where the goal is to predict, for one or more observations, the category or class they belong to.

An important element of any machine learning workflow is the evaluation of the performance of the model. This is the process where we use the trained model to make predictions on previously unseen, labelled data. In the case of classification, we then evaluate how many of these predictions the model got right.

In real-world classification problems, it is usually impossible for a model to be 100% correct. When evaluating a model it is, therefore, useful to know, not only how wrong the model was, but in which way the model was wrong.

In the this module, I provide seven different performance metrics and techniques you can use to evaluate a classifier.

## 1. correctness.confusion_matrix

It is a matrix that compares the number of predictions for each class that are correct and those that are incorrect.


In a confusion matrix, there are 4 numbers to pay attention to:



True Positive: The number of positive observations the model correctly predicted as positive.




False Positive: The number of negative observations the model incorrectly predicted as positive.




True Negative: The number of negative observations the model correctly predicted as negative.




False Negative: The number of positive observations the model incorrectly predicted as negative.


Other references may use a different convention for confusion matrix. In correctness's convention, each row represents the instances in a predicted class, while each column of the matrix represents the instances in an actual class, as follows:


  

    

    Actual class

  

  

    Predicted class

    TP

    FP

  

  

    FN

    TN

  



  
confusion_matrix(y_true, y_pred)[source]


Compute confusion matrix to evaluate the accuracy of a classification.


  

    Parameters

    

      y_true: array-like of shape (n_samples)


      
Ground truth (correct) target values.

      y_pred: array-like of shape (n_samples)


      Estimated targets as returned by a classifier.      

    

  

  

    Returns

    

      C: DataFrame of shape (n_classes, n_classes)


      Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.

    

  

(back to top)


## 2. correctness.accuracy

The overall accuracy of a model is simply the number of correct predictions divided by the total number of predictions. An accuracy score will give a value between 0 and 1, a value of 1 would indicate a perfect model.




  
accuracy(cm)[source]


Accuracy classification score.


  

    Parameters

    

      cm: DataFrame of shape (n_classes, n_classes)


      
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.     

    

  

  

    Returns

    

      score: float


      The fraction of correctly classified samples.

    

  

(back to top)


## 3. correctness.precision

Precision measures how good the model is at correctly identifying the positive class. In other words out of all predictions for the positive class how many were actually correct?




    precision = TP / (TP + FP)



Using alone this metric for optimising a model, we would be minimising the false positives. This might be desirable for our fraud detection example, but would be less useful for diagnosing cancer as we would have little understanding of positive observations that are missed.




  
precision(cm, average='binary', pos_label = 0)[source]


Compute the precision.


  

    Parameters

    

      cm: DataFrame of shape (n_classes, n_classes)


      
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.

      average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’


      This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

        


'binary':

        Only report results for the class specified by pos_label.

        

'micro':

        Calculate metrics globally by counting the total true positives, false negatives and false positives.

        

'macro':

        Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

        

'weighted':

        Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. 

      

      pos_label: int, default=0


      The class to report if average='binary'. If average != 'binary', this will be ignored.   

    

  

  

    Returns

    

      precision: float (if average is not None) or array of float of shape (n_unique_labels)


      Precision of the positive class in binary classification or weighted average of the precision of each class for the multiclass task.

    

  

(back to top)


## 4. correctness.recall

Recall tell us how good the model is at correctly predicting all the positive observations in the dataset.




    recall = TP / (TP + FN)



It does not include information about the false positives so would be more useful in the cancer example.




  
recall(cm, average='binary', pos_label = 0)[source]


Compute the recall.


  

    Parameters

    

      cm: DataFrame of shape (n_classes, n_classes)


      
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.

      average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’


      This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

        


'binary':

        Only report results for the class specified by pos_label.

        

'micro':

        Calculate metrics globally by counting the total true positives, false negatives and false positives.

        

'macro':

        Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

        

'weighted':

        Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. 

      

      pos_label: int, default=0


      The class to report if average='binary'. If average != 'binary', this will be ignored.   

    

  

  

    Returns

    

      recall: float (if average is not None) or array of float of shape (n_unique_labels)


      Recall of the positive class in binary classification or weighted average of the recall of each class for the multiclass task.

    

  

(back to top)


## 5. correctness.f1_score

 The F1 score is the harmonic mean of precision and recall.




    F1 = 2 x precision x recall / (precision + recall)



The F1 score will give a number between 0 and 1. If the F1 score is 1.0 this indicates perfect precision and recall. If the F1 score is 0 this means that either the precision or the recall is 0.




  
f1_score(cm, average='binary', pos_label = 0)[source]


Compute the F1 score, also known as balanced F-score or F-measure.


  

    Parameters

    

      cm: DataFrame of shape (n_classes, n_classes)


      
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.

      average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’


      This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:

        


'binary':

        Only report results for the class specified by pos_label.

        

'micro':

        Calculate metrics globally by counting the total true positives, false negatives and false positives.

        

'macro':

        Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.

        

'weighted':

        Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall. 

      

      pos_label: int, default=0


      The class to report if average='binary'. If average != 'binary', this will be ignored.   

    

  

  

    Returns

    

      f1_score: float (if average is not None) or array of float of shape (n_unique_labels)


      F1 score of the positive class in binary classification or weighted average of the F1 scores of each class for the multiclass task.

    

  

(back to top)


## 6. correctness.support

Support is the number of actual occurrences of the class in the specified dataset.


Imbalanced support in the training data may indicate structural weaknesses in the reported scores of the classifier and could indicate the need for stratified sampling or rebalancing.




  
support(cm, average = 'binary', pos_label=0)[source]


Compute the support.


  

    Parameters

    

      cm: DataFrame of shape (n_classes, n_classes)


      
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.

      average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’


      This parameter determines which value would be returned:

        


'binary':

        Return support of the class specified by pos_label.

        

else:

        Return n_samples of the specified dataset.

      

      pos_label: int, default=0


      The class to report if average='binary'. If average != 'binary', this will be ignored.   

    

  

  

    Returns

    

      support: int


      Support of the specified class or the total number of samples of the dataset.

    

  

(back to top)


## 7. correctness.report



  
report(cm)[source]


Build a text report showing all the classification metrics above.


  

    Parameters

    

      cm: DataFrame of shape (n_classes, n_classes)


      
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class. 

    

  

  

    Returns

    

      None


      This is a side effect function.

    

  

(back to top)


# Usage

Let me illustrate how to use this module to evaluate a classification model.


## Example 1:

Inside the module, I already provided a [test case](https://github.com/theEmperorofDaiViet/correctness/blob/master/correctness.py#L160") for it. Since I placed it in the __main__ block, you can test it yourself by running the file as a script. I will reintroduce it here:

### Actual values and Predicted values

```python

>>> y_target = ['dog', 'cat', 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'cat', 'dog', 'dog', 'dog', 

... 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat']

>>> y_predicted = ['dog', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog', 

... 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat']

```

### Compute confusion matrix

```python

>>> cm = confusion_matrix(y_target, y_predicted)    

>>> print(cm)

       cat  dog

cat      6    2

dog      1   11

```

### Return classification report

```python

>>> report(cm)

CLASSIFICATION REPORT:

   precision    recall  f1-score  support

0   0.750000  0.857143      0.80        7

1   0.916667  0.846154      0.88       13

          precision    recall  f1-score  support

macro      0.833333  0.851648     0.840       20

micro      0.850000  0.850000     0.850       20

weighted   0.858333  0.850000     0.852       20

accuracy    0.85

```

(back to top)


## Example 2:

Besides the simple test case above, I will also provide a more objective example by building a classification model and then evaluating it.

More specifically, I will build a Gaussian Naive Bayes model to classify the dry bean dataset from [Kaggle](https://www.kaggle.com/datasets/muratkokludataset/dry-bean-dataset).

### Import libraries, modules and load data

```python

>>> from Naive_Bayes import Gaussian_Naive_Bayes

>>> import correctness

>>> import pandas as pd

>>> import numpy as np

>>> from sklearn.model_selection import train_test_split

>>> df = pd.read_excel('Dry_Bean_Dataset.xlsx')

>>> df.shape

(13611, 17)

```

The Naive_Bayes module I import is my other built-from-scratch module that implements Naive Bayes algorithms. It is a supervised learning method based on applying Bayes’ theorem with strong (naive) feature independence assumptions. You can check it out here.


### Preprocess and split data

```python

>>> data = df.drop(['ConvexArea','EquivDiameter','AspectRation','Eccentricity','Class','Area','Perimeter',

... 'ShapeFactor2','ShapeFactor3','ShapeFactor1','ShapeFactor4'],axis = 1)

>>> target = df['Class']

>>> X = np.array(data)

>>> y = np.array(target)

>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

```

### Perform classification using this module and evaluate the model performance

```python

>>> nb = Gaussian_Naive_Bayes()

>>> nb.fit(X_train, y_train)

>>> y_pred = nb.predict(X_test)

>>> cm = correctness.confusion_matrix(y_test, y_pred)

>>> scratch = correctness.accuracy(cm)

>>> correctness.report(cm)

CLASSIFICATION REPORT:

   precision    recall  f1-score  support

0   0.853846  0.840909  0.847328      264

1   0.989796  1.000000  0.994872       97

2   0.914773  0.901961  0.908322      357

3   0.914986  0.895628  0.905203      709

4   0.935829  0.951087  0.943396      368

5   0.947368  0.951923  0.949640      416

6   0.834915  0.859375  0.846968      512

          precision    recall  f1-score  support                                                

macro      0.913073  0.914412  0.913676     2723

micro      0.904150  0.904150  0.904150     2723

weighted   0.904404  0.904150  0.904196     2723

accuracy    0.90415

```

(back to top)


# Contact

You can contact me via:

* [![GitHub][GitHub-shield]][GitHub-url]

* [![LinkedIn][LinkedIn-shield]][LinkedIn-url]

* ![Gmail][Gmail-shield]: Khiet.To.05012001@gmail.com

* [![Facebook][Facebook-shield]][Facebook-url]

* [![Twitter][Twitter-shield]][Twitter-url]




(back to top)


[Numpy-shield]: https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white

[Numpy-url]: https://numpy.org

[Pandas-shield]: https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white

[Pandas-url]: https://pandas.pydata.org

[GitHub-shield]: https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white

[GitHub-url]: https://github.com/theEmperorofDaiViet

[LinkedIn-shield]: https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white

[LinkedIn-url]: https://www.linkedin.com/in/khiet-to/

[Gmail-shield]: https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white

[Facebook-shield]: https://img.shields.io/badge/Facebook-%231877F2.svg?style=for-the-badge&logo=Facebook&logoColor=white

[Facebook-url]: https://www.facebook.com/Khiet.To.Official/

[Twitter-shield]: https://img.shields.io/badge/Twitter-%231DA1F2.svg?style=for-the-badge&logo=Twitter&logoColor=white

[Twitter-url]: https://twitter.com/KhietTo

### Style Sheets

Github's markdown processor cannot render `````` sheets, so you may see it lying here:

<style>

table, th, td {

  border: 1px solid black;

  border-collapse: collapse;

}

.api {

  align: left;

  vertical-align: top;

  width: 12%

}







You can read this file with the best experience by using other text editor, e.g. Visual Studio Code's Open Preview mode (Ctrl+Shift+V)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/theemperorofdaiviet/correctness

Awesome Lists containing this project

README