An open API service indexing awesome lists of open source software.

https://github.com/ricardorobledo/ml_statistical_methods


https://github.com/ricardorobledo/ml_statistical_methods

matplotlib numpy pandas python3

Last synced: 4 months ago
JSON representation

Awesome Lists containing this project

README

          

# Statistics Notebook

This notebook is based on Statistical methods for ML book available at [machinelearningmastery.com](https://machinelearningmastery.com/). It covers fundamental and advanced topics.

## Key Topics Covered

- **Gaussian distribution and summary statistics:** population vs sample, central tendency, variance
- **Simple data visualization:** Matplotlib basics, line plots, bar charts, histograms, box plots, scatter plots
- **Random numbers:** pseudorandom number generation, seeding, controlling randomness
- **Law of large numbers and Central Limit Theorem:** theory, examples, and implications in ML
- **Statistical hypothesis testing:** test interpretation, error types, degrees of freedom
- **Statistical distributions:** Gaussian, Student’s t, Chi-squared
- **Critical values and their use in tests**
- **Covariance and correlation:** Pearson’s correlation, test datasets
- **Significance tests:** parametric tests (t-test, ANOVA, repeated measures)
- **Effect size and statistical power:** importance, calculation, power analysis
- **Resampling methods:** statistical sampling, bootstrap, cross-validation
- **Estimation statistics:** problems with hypothesis testing, interval estimation, meta-analysis
- **Tolerance and confidence intervals:** calculation and interpretation
- **Prediction intervals:** calculation and worked examples
- **Nonparametric methods:** rank data, ranking, rank correlation (Spearman, Kendall), significance tests (Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Friedman)
- **Independence tests:** contingency tables, Pearson’s Chi-squared test

This notebook provides clear tutorials, practical examples, and worked problems to build a solid understanding of statistics essential for data scientists and ML practitioners.