https://github.com/razielar/datascience_cheetsheet
Personal DataScience Cheet Sheet
https://github.com/razielar/datascience_cheetsheet
cheetsheet data-science
Last synced: 3 months ago
JSON representation
Personal DataScience Cheet Sheet
- Host: GitHub
- URL: https://github.com/razielar/datascience_cheetsheet
- Owner: razielar
- Created: 2022-09-18T19:55:32.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2022-09-30T18:29:32.000Z (over 2 years ago)
- Last Synced: 2025-01-11T14:48:38.581Z (4 months ago)
- Topics: cheetsheet, data-science
- Homepage:
- Size: 174 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data Science Cheetsheet
1. [Probability](#prob)
2. [Statistics](#stats)
3. [Machine Learning](#ml)### 1.1) Conditional Probability
### 1.2) Counting
#### 1.2.1) Permutation
``` python
from itertools import permutationsa = [1,2,3]
perm = permutations(a)
print(list(perm))# [(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 1, 2), (3, 2, 1)]
```
#### 1.2.1) Combination
``` python
from itertools import combinationsa = [1,2,3,4]
comb = combinations(a, 2)
print(list(comb))# [(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
```
### 1.3) Probability Distributions
#### 1.3.1) Discrete Probability Distributions
| Num | Distribution | Definition | Usage |
|---:|:-------------|------------:|---------:|
| 1 | Binomial distribution | Probability of *k* number of successes in *n* independent trial | Coin flips (number of heads in *n* flips) |
| 2 | Poisson distribution | Number of events occurring within a particular fixed interval \( $\lambda$ \) | Number of visits to a website in a certain period of time |#### 1.3.2) Continuous Probability Distributions
| Num | Distribution | Definition | Usage |
|---:|:-------------|------------:|---------:|
| 1 | Uniform distribution | Constant probability of *X* falling between *a* and *b* | In sampling and hypothesis testing cases |
| 2 | Exponential distribution | Poisson for continous data | The time until a credit defaul occurs |
| 3 | Normal distribution | Probability according to the bell curve over a range of *Xs* | The Central Limit Theorem |### 1.4) Markov Chains
### 2.1) Random Variables
### 2.2) Central Limit Theorem
### 2.3) Hypothesis Testing
#### 2.3.1) General Information
#### 2.3.2) Type I and Type II Errors
#### 2.3.3) *p-values* & Confidence Intervals
#### 2.3.4) Test Statistics
### 2.4) MLE & MAP
Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) estimation. The difference among them is the inclusion of the prior in MAP. Moreover, MLE can be seen as a special case of MAP with a uniform prior.
### 3.1) Linear Algebra
#### 3.1.1) Eigenvalues and Eigenvectors
### 3.2) Model Evaluation and Selection
#### 3.2.1) Bias-Variance Trade-off
![]()
### 3.3) Model Training
#### 3.3.1) Hyperparameter Tuning
### 3.4) Linear Regression
Linear regression assumptions
| Num | Assumption | Description |
|----------|----------|:-------------:|
| 1 | Linearity | The relationship between the features and the target variable is linear |
| 2 | Homoscedasticity | The variance of the residuals is constant |
| 3 | Independence | All observations are independent of each other |
| 4 | Normality | The distribution of the target variable (*Y*) is assumed to be normal |