https://github.com/ricardorobledo/ml_statistical_methods
https://github.com/ricardorobledo/ml_statistical_methods
matplotlib numpy pandas python3
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/ricardorobledo/ml_statistical_methods
- Owner: RicardoRobledo
- Created: 2025-07-17T23:42:23.000Z (7 months ago)
- Default Branch: main
- Last Pushed: 2025-07-17T23:52:36.000Z (7 months ago)
- Last Synced: 2025-08-08T06:43:20.829Z (6 months ago)
- Topics: matplotlib, numpy, pandas, python3
- Language: Jupyter Notebook
- Homepage:
- Size: 657 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Statistics Notebook
This notebook is based on Statistical methods for ML book available at [machinelearningmastery.com](https://machinelearningmastery.com/). It covers fundamental and advanced topics.
## Key Topics Covered
- **Gaussian distribution and summary statistics:** population vs sample, central tendency, variance
- **Simple data visualization:** Matplotlib basics, line plots, bar charts, histograms, box plots, scatter plots
- **Random numbers:** pseudorandom number generation, seeding, controlling randomness
- **Law of large numbers and Central Limit Theorem:** theory, examples, and implications in ML
- **Statistical hypothesis testing:** test interpretation, error types, degrees of freedom
- **Statistical distributions:** Gaussian, Student’s t, Chi-squared
- **Critical values and their use in tests**
- **Covariance and correlation:** Pearson’s correlation, test datasets
- **Significance tests:** parametric tests (t-test, ANOVA, repeated measures)
- **Effect size and statistical power:** importance, calculation, power analysis
- **Resampling methods:** statistical sampling, bootstrap, cross-validation
- **Estimation statistics:** problems with hypothesis testing, interval estimation, meta-analysis
- **Tolerance and confidence intervals:** calculation and interpretation
- **Prediction intervals:** calculation and worked examples
- **Nonparametric methods:** rank data, ranking, rank correlation (Spearman, Kendall), significance tests (Mann-Whitney U, Wilcoxon, Kruskal-Wallis, Friedman)
- **Independence tests:** contingency tables, Pearson’s Chi-squared test
This notebook provides clear tutorials, practical examples, and worked problems to build a solid understanding of statistics essential for data scientists and ML practitioners.