An open API service indexing awesome lists of open source software.

https://github.com/statsim/confusionmatrix

Generate Confusion Matrix and Evaluation Metrics Online
https://github.com/statsim/confusionmatrix

classification confusion-matrix machine-learning

Last synced: 6 months ago
JSON representation

Generate Confusion Matrix and Evaluation Metrics Online

Awesome Lists containing this project

README

          

# Generate Confusion Matrix and Evaluation Metrics Online

*A confusion matrix is a useful tool for evaluating the performance of classification models. It provides a breakdown of predicted versus actual outcomes, allowing for a deeper understanding of model performance beyond just accuracy.*

---

## **Why Accuracy is Not Enough for Binary Classification?**

Accuracy is the proportion of correctly classified instances among all instances:


```latex
\text{Accuracy} = \frac{\text{TP} + \text{TN}}{\text{Total}}
```

Where $$\text{TP}$$ is the number of true positives, $$\text{TN}$$ is the number of true negatives, and $$\text{Total}$$ is the total number of instances.

### **Limitations of Accuracy as a Metric**
- **Class Imbalance.** In datasets with imbalanced classes (e.g., 95% negatives and 5% positives), a model that always predicts the majority class (negative) can achieve high accuracy but fails to capture the minority class.
- **No Insight into Error Types.** Accuracy does not distinguish between types of errors (e.g., false positives vs. false negatives), which can have vastly different implications in real-world scenarios.

---

## **Confusion Matrix Terminology**

| | **Predicted Positive** | **Predicted Negative** |
|----------------|-------------------------|-------------------------|
| **Actual Positive** | True Positive (TP): Correctly identified positive cases | False Negative (FN): Missed positive cases |
| **Actual Negative** | False Positive (FP): Incorrectly predicted positive cases | True Negative (TN): Correctly identified negative cases |

### **Key Terms**:
- **TP (True Positive)**: Correctly predicted positive instances.
- **TN (True Negative)**: Correctly predicted negative instances.
- **FP (False Positive)**: Negative instances incorrectly predicted as positive.
- **FN (False Negative)**: Positive instances incorrectly predicted as negative.

---

## **Metrics and Their Meaning**

### **1. Sensitivity (True Positive Rate, TPR)**


```latex
\text{TPR} = \frac{\text{TP}}{\text{TP} + \text{FN}}
```

- **Definition**: Proportion of actual positives correctly identified.
- **Importance**: Measures the ability to detect positive cases (useful in medical diagnosis, fraud detection).

---

### **2. Specificity (SPC)**


```latex
\text{SPC} = \frac{\text{TN}}{\text{FP} + \text{TN}}
```

- **Definition**: Proportion of actual negatives correctly identified.
- **Importance**: Measures the ability to avoid false alarms (useful in spam filters, anomaly detection).

---

### **3. Precision (Positive Predictive Value, PPV)**


```latex
\text{PPV} = \frac{\text{TP}}{\text{TP} + \text{FP}}
```

- **Definition**: Proportion of positive predictions that are correct.
- **Importance**: Indicates reliability of positive predictions.

---

### **4. Negative Predictive Value (NPV)**


```latex
\text{NPV} = \frac{\text{TN}}{\text{TN} + \text{FN}}
```

- **Definition**: Proportion of negative predictions that are correct.
- **Importance**: Indicates reliability of negative predictions.

---

### **5. False Positive Rate (FPR)**


```latex
\text{FPR} = \frac{\text{FP}}{\text{FP} + \text{TN}}
```

- **Definition**: Proportion of actual negatives incorrectly predicted as positive.
- **Importance**: Highlights the rate of false alarms.

---

### **6. False Discovery Rate (FDR)**


```latex
\text{FDR} = \frac{\text{FP}}{\text{FP} + \text{TP}}
```

- **Definition**: Proportion of positive predictions that are incorrect.
- **Importance**: Complements precision in evaluating prediction quality.

---

### **7. False Negative Rate (FNR)**


```latex
\text{FNR} = \frac{\text{FN}}{\text{FN} + \text{TP}}
```

- **Definition**: Proportion of actual positives incorrectly predicted as negative.
- **Importance**: Highlights the rate of missed positive cases.

---

### **8. F1 Score**


```latex
\text{F1} = \frac{2 \cdot \text{TP}}{2 \cdot \text{TP} + \text{FP} + \text{FN}}
```

- **Definition**: Harmonic mean of precision and recall.
- **Importance**: Balances precision and recall, especially useful in imbalanced datasets.

---

### **9. Accuracy**


```latex
\text{ACC} = \frac{\text{TP} + \text{TN}}{\text{P} + \text{N}}
```

- **Definition**: Proportion of correct predictions.
- **Importance**: Provides a general sense of model performance but is less reliable in imbalanced datasets.

---

### **10. Matthews Correlation Coefficient (MCC)**


```latex
\text{MCC} = \frac{\text{TP} \cdot \text{TN} - \text{FP} \cdot \text{FN}}{\sqrt{(\text{TP} + \text{FP})(\text{TP} + \text{FN})(\text{TN} + \text{FP})(\text{TN} + \text{FN})}}
```

- **Definition**: A balanced measure that accounts for TP, TN, FP, and FN.
- **Importance**: Considered a robust metric for imbalanced datasets. Ranges from -1 (inverse prediction) to +1 (perfect prediction), with 0 indicating random performance.

---

## **Key Takeaways**

1. **Class Imbalance Requires Caution**: Accuracy alone can be misleading when classes are imbalanced.
2. **Use Multiple Metrics**: Evaluate sensitivity, specificity, precision, and other metrics to understand the trade-offs in your model.
3. **MCC for Imbalanced Datasets**: Use Matthews Correlation Coefficient for a single comprehensive measure.
4. **Domain-Specific Importance**: Choose metrics based on the problem domain (e.g., sensitivity for medical tests, precision for legal applications).

---

By understanding and applying these metrics, you can better assess and improve your model's performance, ensuring it meets the requirements of the task at hand.