https://github.com/ev2900/data_science_notes

Handy reference notes for common data sciences topics
https://github.com/ev2900/data_science_notes

accuracy classification data-science f1-score precision recall

Last synced: 5 months ago
JSON representation

Handy reference notes for common data sciences topics

Host: GitHub
URL: https://github.com/ev2900/data_science_notes
Owner: ev2900
Created: 2023-06-07T14:10:07.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-07-04T14:18:28.000Z (about 3 years ago)
Last Synced: 2025-06-09T04:40:56.590Z (about 1 year ago)
Topics: accuracy, classification, data-science, f1-score, precision, recall
Homepage:
Size: 11.7 KB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Key Concepts in Classification Model(s)

This readme is a place to document key concepts of data science esp. as it pertains to classification models.

## Precision, Accuracy & Recall

Calculating precision, accuracy and recall requires testing a model with a labeled data set and determining the number of true positives, True negatives, false positives and false negatives. 

The definition of each is below

* **True positive** an outcome where the model correctly predicts the positive class. A correct prediction 


* **True negative** an outcome where the model correctly predicts the negative class. A correct prediction 


* **False positive** an outcome where the model incorrectly predicts the positive class. An incorrect prediction 


* **False negative** an outcome where the model incorrectly predicts the negative class. An incorrect prediction 


### Precision

* $\frac{number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} positives}{number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} positives \hspace{1mm} + \hspace{1mm} number \hspace{1mm} of \hspace{1mm} false \hspace{1mm} positives \hspace{1mm}}$

  

* The best possible precision a model can achieve is 1

* The worst possible precision a model can achieve is 0

* If you picture a target with a bullseye - precision is how tight a grouping of shots at the target are

* Precision can also be thought of as the proportion of positive identifications that were actually correct

* *Example* if we have a classification model that predicts will it rain or not and the model has a precision of 0.5 (ie. 50%). When the model predicts it will rain it is correct 50% of the time

* Bear in mind that precision only looks at positives. The formula for precision does not include true or false negatives

  

### Accuracy

* $\frac{number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} positives \hspace{1mm} + \hspace{1mm} number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} negatives}{number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} positives \hspace{1mm} + \hspace{1mm} number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} negatives \hspace{1mm} + \hspace{1mm} number \hspace{1mm} of \hspace{1mm} false \hspace{1mm} positives \hspace{1mm} + \hspace{1mm} number \hspace{1mm} of \hspace{1mm} false \hspace{1mm} negatvies \hspace{1mm}}$

* OR $\frac{number \hspace{1mm} of \hspace{1mm} correct \hspace{1mm} predictions}{total \hspace{1mm} number \hspace{1mm} of \hspace{1mm} predictions}$

  

* The best possible accuracy a model can achieve it 1

* The worst possible accuracy a model can achieve is 0

* If you picture a target with a bullseye - accuracy is how close the shots at the target are to the center of the bullseye

* Accuracy can also be thought of as the proportion of predictions the model got right

* *Example* if we have a classification model that predicts will it rain or not and the model has a accuracy of 0.5 (ie. 50%). When the model predicts it will rain OR it will not rain it is correct 50% of the time

* Bear in mind that accuracy can be impacted by data skew or an unbalanced data set. An unbalanced data set is one that has a dispositional amount of a single class. Using our will it rain classification model example - if we have a data set with 10 days of weather history and 9 of the days it did not rain our data set is skewed in the direction of the negative class. Accuracy determined by testing with a skewed data set can resulting in a misleading depiction of accuracy. This reason is why recall is often used instead of accuracy

### Recall

* $\frac{number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} positives}{number \hspace{1mm} of \hspace{1mm} true \hspace{1mm} positives \hspace{1mm} + \hspace{1mm} number \hspace{1mm} of \hspace{1mm} false \hspace{1mm} negatives}$

* The best possible recall a model can achieve is 1

* The worst possible recall a model can achieve is 1

* Recall can be thought of as the proportion correctly classified positive classes among all the real positive classes

* Using our will it rain classification model example, recall is proportion of times the model predicated it would rain among all the times it actually rained

### Which metric is most important?

Using our will it rain classification model example

* Optimize for accuracy when you want to predicting both when it will rain and when it will not rain correctly and our dataset is balanced enough

* Optimize for precision when you want predictions of when it will rain to be as correct  as possible

* Optimize for recall when we want our model to spot as many real rainy days as possible

### *Bonus* F1 score

* $2 * \frac{Precision \hspace{1mm} * \hspace{1mm} Recall}{Precision \hspace{1mm} + \hspace{1mm} Recall}$

* The best possible accuracy a model can achieve it 1

* The worst possible accuracy a model can achieve it 0

* Combines precision and recall into a single metric by calculating the harmonic mean of precision and recall

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/ev2900/data_science_notes

Awesome Lists containing this project

README