An open API service indexing awesome lists of open source software.

https://github.com/theemperorofdaiviet/correctness

Classification evaluation metrics
https://github.com/theemperorofdaiviet/correctness

classification-algorithm machine-learning python

Last synced: 23 days ago
JSON representation

Classification evaluation metrics

Awesome Lists containing this project

README

          

Table of Contents



  1. About The Project



  2. Getting Started


  3. API Documentation

  4. Usage

  5. Contact

# About The Project

This is a part of my Introduction to Data Science's assignment at university. In this part, I tried to write my own module for classification evaluation metrics, based on the sklearn.metrics

## Built With

* [![Numpy][Numpy-shield]][Numpy-url]
* [![Pandas][Pandas-shield]][Pandas-url]

(back to top)

# Getting Started

## Prerequisites
To use this module, your system needs to have:
* numpy
```sh
pip install numpy
```
* pandas
```sh
pip install pandas
```

## Installation
You can install this module by cloning this repository into your current working directory:
```sh
git clone https://github.com/theEmperorofDaiViet/correctness.git
```

(back to top)

# API Documentation
Classification is a type of supervised machine learning problem where the goal is to predict, for one or more observations, the category or class they belong to.

An important element of any machine learning workflow is the evaluation of the performance of the model. This is the process where we use the trained model to make predictions on previously unseen, labelled data. In the case of classification, we then evaluate how many of these predictions the model got right.

In real-world classification problems, it is usually impossible for a model to be 100% correct. When evaluating a model it is, therefore, useful to know, not only how wrong the model was, but in which way the model was wrong.

In the this module, I provide seven different performance metrics and techniques you can use to evaluate a classifier.

## 1. correctness.confusion_matrix

It is a matrix that compares the number of predictions for each class that are correct and those that are incorrect.

In a confusion matrix, there are 4 numbers to pay attention to:


  • True Positive: The number of positive observations the model correctly predicted as positive.

  • False Positive: The number of negative observations the model incorrectly predicted as positive.

  • True Negative: The number of negative observations the model correctly predicted as negative.

  • False Negative: The number of positive observations the model incorrectly predicted as negative.
  • Other references may use a different convention for confusion matrix. In correctness's convention, each row represents the instances in a predicted class, while each column of the matrix represents the instances in an actual class, as follows:



    Actual class


    Predicted class
    TP
    FP


    FN
    TN


    confusion_matrix(y_true, y_pred)[source]

    Compute confusion matrix to evaluate the accuracy of a classification.


    Parameters

    y_true: array-like of shape (n_samples)

    Ground truth (correct) target values.


    y_pred: array-like of shape (n_samples)

    Estimated targets as returned by a classifier.





    Returns

    C: DataFrame of shape (n_classes, n_classes)

    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.



    (back to top)

    ## 2. correctness.accuracy

    The overall accuracy of a model is simply the number of correct predictions divided by the total number of predictions. An accuracy score will give a value between 0 and 1, a value of 1 would indicate a perfect model.


    accuracy(cm)[source]

    Accuracy classification score.


    Parameters

    cm: DataFrame of shape (n_classes, n_classes)

    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.





    Returns

    score: float

    The fraction of correctly classified samples.



    (back to top)

    ## 3. correctness.precision

    Precision measures how good the model is at correctly identifying the positive class. In other words out of all predictions for the positive class how many were actually correct?


    precision = TP / (TP + FP)

    Using alone this metric for optimising a model, we would be minimising the false positives. This might be desirable for our fraud detection example, but would be less useful for diagnosing cancer as we would have little understanding of positive observations that are missed.


    precision(cm, average='binary', pos_label = 0)[source]

    Compute the precision.


    Parameters

    cm: DataFrame of shape (n_classes, n_classes)

    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.


    average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’

    This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:


  • 'binary':

  • Only report results for the class specified by pos_label.



  • 'micro':

  • Calculate metrics globally by counting the total true positives, false negatives and false positives.



  • 'macro':

  • Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.



  • 'weighted':

  • Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.



    pos_label: int, default=0

    The class to report if average='binary'. If average != 'binary', this will be ignored.





    Returns

    precision: float (if average is not None) or array of float of shape (n_unique_labels)

    Precision of the positive class in binary classification or weighted average of the precision of each class for the multiclass task.



    (back to top)

    ## 4. correctness.recall

    Recall tell us how good the model is at correctly predicting all the positive observations in the dataset.


    recall = TP / (TP + FN)

    It does not include information about the false positives so would be more useful in the cancer example.


    recall(cm, average='binary', pos_label = 0)[source]

    Compute the recall.


    Parameters

    cm: DataFrame of shape (n_classes, n_classes)

    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.


    average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’

    This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:


  • 'binary':

  • Only report results for the class specified by pos_label.



  • 'micro':

  • Calculate metrics globally by counting the total true positives, false negatives and false positives.



  • 'macro':

  • Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.



  • 'weighted':

  • Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.



    pos_label: int, default=0

    The class to report if average='binary'. If average != 'binary', this will be ignored.





    Returns

    recall: float (if average is not None) or array of float of shape (n_unique_labels)

    Recall of the positive class in binary classification or weighted average of the recall of each class for the multiclass task.



    (back to top)

    ## 5. correctness.f1_score

    The F1 score is the harmonic mean of precision and recall.


    F1 = 2 x precision x recall / (precision + recall)

    The F1 score will give a number between 0 and 1. If the F1 score is 1.0 this indicates perfect precision and recall. If the F1 score is 0 this means that either the precision or the recall is 0.


    f1_score(cm, average='binary', pos_label = 0)[source]

    Compute the F1 score, also known as balanced F-score or F-measure.


    Parameters

    cm: DataFrame of shape (n_classes, n_classes)

    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.


    average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’

    This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:


  • 'binary':

  • Only report results for the class specified by pos_label.



  • 'micro':

  • Calculate metrics globally by counting the total true positives, false negatives and false positives.



  • 'macro':

  • Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.



  • 'weighted':

  • Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.



    pos_label: int, default=0

    The class to report if average='binary'. If average != 'binary', this will be ignored.





    Returns

    f1_score: float (if average is not None) or array of float of shape (n_unique_labels)

    F1 score of the positive class in binary classification or weighted average of the F1 scores of each class for the multiclass task.



    (back to top)

    ## 6. correctness.support

    Support is the number of actual occurrences of the class in the specified dataset.

    Imbalanced support in the training data may indicate structural weaknesses in the reported scores of the classifier and could indicate the need for stratified sampling or rebalancing.


    support(cm, average = 'binary', pos_label=0)[source]

    Compute the support.


    Parameters

    cm: DataFrame of shape (n_classes, n_classes)

    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.


    average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’

    This parameter determines which value would be returned:


  • 'binary':

  • Return support of the class specified by pos_label.



  • else:

  • Return n_samples of the specified dataset.



    pos_label: int, default=0

    The class to report if average='binary'. If average != 'binary', this will be ignored.





    Returns

    support: int

    Support of the specified class or the total number of samples of the dataset.



    (back to top)

    ## 7. correctness.report


    report(cm)[source]

    Build a text report showing all the classification metrics above.


    Parameters

    cm: DataFrame of shape (n_classes, n_classes)

    Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.





    Returns

    None

    This is a side effect function.



    (back to top)

    # Usage

    Let me illustrate how to use this module to evaluate a classification model.

    ## Example 1:
    Inside the module, I already provided a [test case](https://github.com/theEmperorofDaiViet/correctness/blob/master/correctness.py#L160") for it. Since I placed it in the __main__ block, you can test it yourself by running the file as a script. I will reintroduce it here:

    ### Actual values and Predicted values
    ```python
    >>> y_target = ['dog', 'cat', 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'cat', 'dog', 'dog', 'dog',
    ... 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat']
    >>> y_predicted = ['dog', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog',
    ... 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat']
    ```
    ### Compute confusion matrix
    ```python
    >>> cm = confusion_matrix(y_target, y_predicted)
    >>> print(cm)
    cat dog
    cat 6 2
    dog 1 11
    ```
    ### Return classification report
    ```python
    >>> report(cm)
    CLASSIFICATION REPORT:
    precision recall f1-score support
    0 0.750000 0.857143 0.80 7
    1 0.916667 0.846154 0.88 13

    precision recall f1-score support
    macro 0.833333 0.851648 0.840 20
    micro 0.850000 0.850000 0.850 20
    weighted 0.858333 0.850000 0.852 20
    accuracy 0.85
    ```

    (back to top)

    ## Example 2:
    Besides the simple test case above, I will also provide a more objective example by building a classification model and then evaluating it.

    More specifically, I will build a Gaussian Naive Bayes model to classify the dry bean dataset from [Kaggle](https://www.kaggle.com/datasets/muratkokludataset/dry-bean-dataset).

    ### Import libraries, modules and load data
    ```python
    >>> from Naive_Bayes import Gaussian_Naive_Bayes
    >>> import correctness
    >>> import pandas as pd
    >>> import numpy as np
    >>> from sklearn.model_selection import train_test_split

    >>> df = pd.read_excel('Dry_Bean_Dataset.xlsx')
    >>> df.shape
    (13611, 17)
    ```

    The Naive_Bayes module I import is my other built-from-scratch module that implements Naive Bayes algorithms. It is a supervised learning method based on applying Bayes’ theorem with strong (naive) feature independence assumptions. You can check it out here.

    ### Preprocess and split data
    ```python
    >>> data = df.drop(['ConvexArea','EquivDiameter','AspectRation','Eccentricity','Class','Area','Perimeter',
    ... 'ShapeFactor2','ShapeFactor3','ShapeFactor1','ShapeFactor4'],axis = 1)
    >>> target = df['Class']

    >>> X = np.array(data)
    >>> y = np.array(target)

    >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
    ```

    ### Perform classification using this module and evaluate the model performance
    ```python
    >>> nb = Gaussian_Naive_Bayes()
    >>> nb.fit(X_train, y_train)
    >>> y_pred = nb.predict(X_test)

    >>> cm = correctness.confusion_matrix(y_test, y_pred)
    >>> scratch = correctness.accuracy(cm)
    >>> correctness.report(cm)
    CLASSIFICATION REPORT:
    precision recall f1-score support
    0 0.853846 0.840909 0.847328 264
    1 0.989796 1.000000 0.994872 97
    2 0.914773 0.901961 0.908322 357
    3 0.914986 0.895628 0.905203 709
    4 0.935829 0.951087 0.943396 368
    5 0.947368 0.951923 0.949640 416
    6 0.834915 0.859375 0.846968 512

    precision recall f1-score support
    macro 0.913073 0.914412 0.913676 2723
    micro 0.904150 0.904150 0.904150 2723
    weighted 0.904404 0.904150 0.904196 2723
    accuracy 0.90415
    ```

    (back to top)

    # Contact
    You can contact me via:
    * [![GitHub][GitHub-shield]][GitHub-url]
    * [![LinkedIn][LinkedIn-shield]][LinkedIn-url]
    * ![Gmail][Gmail-shield]: Khiet.To.05012001@gmail.com
    * [![Facebook][Facebook-shield]][Facebook-url]
    * [![Twitter][Twitter-shield]][Twitter-url]



    (back to top)

    [Numpy-shield]: https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white
    [Numpy-url]: https://numpy.org
    [Pandas-shield]: https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white
    [Pandas-url]: https://pandas.pydata.org

    [GitHub-shield]: https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white
    [GitHub-url]: https://github.com/theEmperorofDaiViet
    [LinkedIn-shield]: https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white
    [LinkedIn-url]: https://www.linkedin.com/in/khiet-to/
    [Gmail-shield]: https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white
    [Facebook-shield]: https://img.shields.io/badge/Facebook-%231877F2.svg?style=for-the-badge&logo=Facebook&logoColor=white
    [Facebook-url]: https://www.facebook.com/Khiet.To.Official/
    [Twitter-shield]: https://img.shields.io/badge/Twitter-%231DA1F2.svg?style=for-the-badge&logo=Twitter&logoColor=white
    [Twitter-url]: https://twitter.com/KhietTo

    ### Style Sheets
    Github's markdown processor cannot render `````` sheets, so you may see it lying here:
    <style>
    table, th, td {
    border: 1px solid black;
    border-collapse: collapse;
    }
    .api {
    align: left;
    vertical-align: top;
    width: 12%
    }





    You can read this file with the best experience by using other text editor, e.g. Visual Studio Code's Open Preview mode (Ctrl+Shift+V)