https://github.com/theemperorofdaiviet/correctness
Classification evaluation metrics
https://github.com/theemperorofdaiviet/correctness
classification-algorithm machine-learning python
Last synced: 23 days ago
JSON representation
Classification evaluation metrics
- Host: GitHub
- URL: https://github.com/theemperorofdaiviet/correctness
- Owner: theEmperorofDaiViet
- Created: 2022-11-25T16:53:57.000Z (over 3 years ago)
- Default Branch: master
- Last Pushed: 2023-01-25T12:13:51.000Z (over 3 years ago)
- Last Synced: 2025-02-08T12:28:30.459Z (over 1 year ago)
- Topics: classification-algorithm, machine-learning, python
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Table of Contents
# About The Project
This is a part of my Introduction to Data Science's assignment at university. In this part, I tried to write my own module for classification evaluation metrics, based on the sklearn.metrics
## Built With
* [![Numpy][Numpy-shield]][Numpy-url]
* [![Pandas][Pandas-shield]][Pandas-url]
# Getting Started
## Prerequisites
To use this module, your system needs to have:
* numpy
```sh
pip install numpy
```
* pandas
```sh
pip install pandas
```
## Installation
You can install this module by cloning this repository into your current working directory:
```sh
git clone https://github.com/theEmperorofDaiViet/correctness.git
```
# API Documentation
Classification is a type of supervised machine learning problem where the goal is to predict, for one or more observations, the category or class they belong to.
An important element of any machine learning workflow is the evaluation of the performance of the model. This is the process where we use the trained model to make predictions on previously unseen, labelled data. In the case of classification, we then evaluate how many of these predictions the model got right.
In real-world classification problems, it is usually impossible for a model to be 100% correct. When evaluating a model it is, therefore, useful to know, not only how wrong the model was, but in which way the model was wrong.
In the this module, I provide seven different performance metrics and techniques you can use to evaluate a classifier.
## 1. correctness.confusion_matrix
It is a matrix that compares the number of predictions for each class that are correct and those that are incorrect.
In a confusion matrix, there are 4 numbers to pay attention to:
True Positive: The number of positive observations the model correctly predicted as positive.
False Positive: The number of negative observations the model incorrectly predicted as positive.
True Negative: The number of negative observations the model correctly predicted as negative.
False Negative: The number of positive observations the model incorrectly predicted as negative.
Other references may use a different convention for confusion matrix. In correctness's convention, each row represents the instances in a predicted class, while each column of the matrix represents the instances in an actual class, as follows:
Actual class
Predicted class
TP
FP
FN
TN
confusion_matrix(y_true, y_pred)[source]
Compute confusion matrix to evaluate the accuracy of a classification.
Parameters
y_true: array-like of shape (n_samples)
Ground truth (correct) target values.
y_pred: array-like of shape (n_samples)
Estimated targets as returned by a classifier.
Returns
C: DataFrame of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.
## 2. correctness.accuracy
The overall accuracy of a model is simply the number of correct predictions divided by the total number of predictions. An accuracy score will give a value between 0 and 1, a value of 1 would indicate a perfect model.
accuracy(cm)[source]
Accuracy classification score.
Parameters
cm: DataFrame of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.
Returns
score: float
The fraction of correctly classified samples.
## 3. correctness.precision
Precision measures how good the model is at correctly identifying the positive class. In other words out of all predictions for the positive class how many were actually correct?
precision = TP / (TP + FP)
Using alone this metric for optimising a model, we would be minimising the false positives. This might be desirable for our fraud detection example, but would be less useful for diagnosing cancer as we would have little understanding of positive observations that are missed.
precision(cm, average='binary', pos_label = 0)[source]
Compute the precision.
Parameters
cm: DataFrame of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.
average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’
This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:
'binary':Only report results for the class specified by pos_label.
'micro':Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro':Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted':Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
pos_label: int, default=0
The class to report if average='binary'. If average != 'binary', this will be ignored.
Returns
precision: float (if
average is not None) or array of float of shape (n_unique_labels)Precision of the positive class in binary classification or weighted average of the precision of each class for the multiclass task.
## 4. correctness.recall
Recall tell us how good the model is at correctly predicting all the positive observations in the dataset.
recall = TP / (TP + FN)
It does not include information about the false positives so would be more useful in the cancer example.
recall(cm, average='binary', pos_label = 0)[source]
Compute the recall.
Parameters
cm: DataFrame of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.
average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’
This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:
'binary':Only report results for the class specified by pos_label.
'micro':Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro':Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted':Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
pos_label: int, default=0
The class to report if average='binary'. If average != 'binary', this will be ignored.
Returns
recall: float (if
average is not None) or array of float of shape (n_unique_labels)Recall of the positive class in binary classification or weighted average of the recall of each class for the multiclass task.
## 5. correctness.f1_score
The F1 score is the harmonic mean of precision and recall.
F1 = 2 x precision x recall / (precision + recall)
The F1 score will give a number between 0 and 1. If the F1 score is 1.0 this indicates perfect precision and recall. If the F1 score is 0 this means that either the precision or the recall is 0.
f1_score(cm, average='binary', pos_label = 0)[source]
Compute the F1 score, also known as balanced F-score or F-measure.
Parameters
cm: DataFrame of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.
average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’
This parameter is required for multiclass/multilabel targets. If None, the scores for each class are returned. Otherwise, this determines the type of averaging performed on the data:
'binary':Only report results for the class specified by pos_label.
'micro':Calculate metrics globally by counting the total true positives, false negatives and false positives.
'macro':Calculate metrics for each label, and find their unweighted mean. This does not take label imbalance into account.
'weighted':Calculate metrics for each label, and find their average weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.
pos_label: int, default=0
The class to report if average='binary'. If average != 'binary', this will be ignored.
Returns
f1_score: float (if
average is not None) or array of float of shape (n_unique_labels)F1 score of the positive class in binary classification or weighted average of the F1 scores of each class for the multiclass task.
## 6. correctness.support
Support is the number of actual occurrences of the class in the specified dataset.
Imbalanced support in the training data may indicate structural weaknesses in the reported scores of the classifier and could indicate the need for stratified sampling or rebalancing.
support(cm, average = 'binary', pos_label=0)[source]
Compute the support.
Parameters
cm: DataFrame of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.
average: {‘micro’, ‘macro’, ‘weighted’, ‘binary’} or None, default=’binary’
This parameter determines which value would be returned:
'binary':Return support of the class specified by pos_label.
else:
Return n_samples of the specified dataset.
pos_label: int, default=0
The class to report if average='binary'. If average != 'binary', this will be ignored.
Returns
support: int
Support of the specified class or the total number of samples of the dataset.
## 7. correctness.report
report(cm)[source]
Build a text report showing all the classification metrics above.
Parameters
cm: DataFrame of shape (n_classes, n_classes)
Confusion matrix whose i-th row and j-th column entry indicates the number of samples with predicted label being i-th class and true label being j-th class.
Returns
None
This is a side effect function.
# Usage
Let me illustrate how to use this module to evaluate a classification model.
## Example 1:
Inside the module, I already provided a [test case](https://github.com/theEmperorofDaiViet/correctness/blob/master/correctness.py#L160") for it. Since I placed it in the __main__ block, you can test it yourself by running the file as a script. I will reintroduce it here:
### Actual values and Predicted values
```python
>>> y_target = ['dog', 'cat', 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'cat', 'dog', 'dog', 'dog',
... 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat']
>>> y_predicted = ['dog', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat', 'cat', 'cat', 'cat', 'dog', 'dog',
... 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat']
```
### Compute confusion matrix
```python
>>> cm = confusion_matrix(y_target, y_predicted)
>>> print(cm)
cat dog
cat 6 2
dog 1 11
```
### Return classification report
```python
>>> report(cm)
CLASSIFICATION REPORT:
precision recall f1-score support
0 0.750000 0.857143 0.80 7
1 0.916667 0.846154 0.88 13
precision recall f1-score support
macro 0.833333 0.851648 0.840 20
micro 0.850000 0.850000 0.850 20
weighted 0.858333 0.850000 0.852 20
accuracy 0.85
```
## Example 2:
Besides the simple test case above, I will also provide a more objective example by building a classification model and then evaluating it.
More specifically, I will build a Gaussian Naive Bayes model to classify the dry bean dataset from [Kaggle](https://www.kaggle.com/datasets/muratkokludataset/dry-bean-dataset).
### Import libraries, modules and load data
```python
>>> from Naive_Bayes import Gaussian_Naive_Bayes
>>> import correctness
>>> import pandas as pd
>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> df = pd.read_excel('Dry_Bean_Dataset.xlsx')
>>> df.shape
(13611, 17)
```
The Naive_Bayes module I import is my other built-from-scratch module that implements Naive Bayes algorithms. It is a supervised learning method based on applying Bayes’ theorem with strong (naive) feature independence assumptions. You can check it out here.
### Preprocess and split data
```python
>>> data = df.drop(['ConvexArea','EquivDiameter','AspectRation','Eccentricity','Class','Area','Perimeter',
... 'ShapeFactor2','ShapeFactor3','ShapeFactor1','ShapeFactor4'],axis = 1)
>>> target = df['Class']
>>> X = np.array(data)
>>> y = np.array(target)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
```
### Perform classification using this module and evaluate the model performance
```python
>>> nb = Gaussian_Naive_Bayes()
>>> nb.fit(X_train, y_train)
>>> y_pred = nb.predict(X_test)
>>> cm = correctness.confusion_matrix(y_test, y_pred)
>>> scratch = correctness.accuracy(cm)
>>> correctness.report(cm)
CLASSIFICATION REPORT:
precision recall f1-score support
0 0.853846 0.840909 0.847328 264
1 0.989796 1.000000 0.994872 97
2 0.914773 0.901961 0.908322 357
3 0.914986 0.895628 0.905203 709
4 0.935829 0.951087 0.943396 368
5 0.947368 0.951923 0.949640 416
6 0.834915 0.859375 0.846968 512
precision recall f1-score support
macro 0.913073 0.914412 0.913676 2723
micro 0.904150 0.904150 0.904150 2723
weighted 0.904404 0.904150 0.904196 2723
accuracy 0.90415
```
# Contact
You can contact me via:
* [![GitHub][GitHub-shield]][GitHub-url]
* [![LinkedIn][LinkedIn-shield]][LinkedIn-url]
* ![Gmail][Gmail-shield]: Khiet.To.05012001@gmail.com
* [![Facebook][Facebook-shield]][Facebook-url]
* [![Twitter][Twitter-shield]][Twitter-url]
[Numpy-shield]: https://img.shields.io/badge/numpy-%23013243.svg?style=for-the-badge&logo=numpy&logoColor=white
[Numpy-url]: https://numpy.org
[Pandas-shield]: https://img.shields.io/badge/pandas-%23150458.svg?style=for-the-badge&logo=pandas&logoColor=white
[Pandas-url]: https://pandas.pydata.org
[GitHub-shield]: https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white
[GitHub-url]: https://github.com/theEmperorofDaiViet
[LinkedIn-shield]: https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white
[LinkedIn-url]: https://www.linkedin.com/in/khiet-to/
[Gmail-shield]: https://img.shields.io/badge/Gmail-D14836?style=for-the-badge&logo=gmail&logoColor=white
[Facebook-shield]: https://img.shields.io/badge/Facebook-%231877F2.svg?style=for-the-badge&logo=Facebook&logoColor=white
[Facebook-url]: https://www.facebook.com/Khiet.To.Official/
[Twitter-shield]: https://img.shields.io/badge/Twitter-%231DA1F2.svg?style=for-the-badge&logo=Twitter&logoColor=white
[Twitter-url]: https://twitter.com/KhietTo
### Style Sheets
Github's markdown processor cannot render `````` sheets, so you may see it lying here:
<style>
table, th, td {
border: 1px solid black;
border-collapse: collapse;
}
.api {
align: left;
vertical-align: top;
width: 12%
}
You can read this file with the best experience by using other text editor, e.g. Visual Studio Code's Open Preview mode (Ctrl+Shift+V)