https://github.com/genesisblock3301/probability_statistics_and_machine_learning
This repo for learning ML related concept and tools
https://github.com/genesisblock3301/probability_statistics_and_machine_learning
Last synced: over 1 year ago
JSON representation
This repo for learning ML related concept and tools
- Host: GitHub
- URL: https://github.com/genesisblock3301/probability_statistics_and_machine_learning
- Owner: GenesisBlock3301
- License: mit
- Created: 2023-01-30T10:03:14.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-05-07T09:38:46.000Z (about 2 years ago)
- Last Synced: 2025-01-08T18:45:33.247Z (over 1 year ago)
- Language: Jupyter Notebook
- Homepage:
- Size: 4.42 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## Learning Roadmap of Probability and Statistics
### Statistics roadmap for ML:
Probability theory:
- Probability
- Random variables
- Probability distributions
- Conditional probability is crucial for modeling uncertainty in ML.
Descriptive statistics:
- Measures of central tendency (mean, median, mode)
- measures of dispersion (variance, standard deviation)
Inferential statistics:
- Hypothesis testing
- Confidence intervals
- P-values are essential for making inferences and drawing conclusions from data samples.
Regression analysis:
- Linear regression and its variants are widely used in ML for modeling relationships between variables and making predictions.
Probability distributions:
- Gaussian (normal) distribution
- Binomial distribution
- Poisson's distribution is beneficial for understanding the behavior of data and modeling assumptions.
Sampling techniques:
- Understanding different sampling techniques, such as random sampling and stratified sampling, is important for collecting representative training and test datasets.
Statistical hypothesis testing:
- Knowing how to perform hypothesis tests, interpret the results
- Make decisions based on statistical significance is crucial for evaluating ML models.
Statistical modeling: Knowledge of techniques like
- maximum likelihood estimation (MLE),
- Bayesian inference can be helpful for parameter estimation and building probabilistic models.
Experimental design:
- Understanding principles of experimental design, such as randomization,
control groups, and factorial designs, helps in conducting rigorous experiments and A/B testing in ML.
Multivariate statistics:
- Techniques like principal component analysis (PCA), factor analysis
- Cluster analysis provide tools for dimensionality reduction, feature selection
- Pattern recognition in high-dimensional datasets.
**Exploratory data analysis**
1. **Scatter plot.**
2. **Pair Plot.**
3. **Histogram**
4. **Cumulative Distribution**
5. **Mean and Standard Deviation**
6. **Median, Percentile, Quantile**
7. MAD, Box plot and Voilin Plot
-------
8. EDA on Cancer Dataset
9. Gaussian or Normal distribution
10. Skewness and Kurtosis
11. Sampling Distribution & Standard Normal Variate(z) and Standardization
12. Quantile quantile plot
13. Chebyshev's inequality
14. Uniform Distribution
15. Bernoulli Vs Binomial VS Normal VS Pareto Distribute.
16. Box Cox Transformation
17. Covariance Statistics
18. Pearson Correlation
19. Spearman rank Correlation Coefficient
20. Correlation VS Causation and confidence interval.
21. Confidence Interval with underlying or Gaussian Distribution.
22. Hypothesis testing and P value statistics.
23. T test vs Chi Square test VS Anova test