{"id":51121587,"url":"https://github.com/imosudi/model_training","last_synced_at":"2026-06-25T03:01:16.349Z","repository":{"id":352290247,"uuid":"1214516856","full_name":"imosudi/model_training","owner":"imosudi","description":"Breast Cancer Diagnosis: Logistic Regression, Random Forest, k-NN and Decision Tree classifiers models with feature importance analysis - Includes data exploration, train/test splitting, feature scaling, cross-validation, and model evaluation metrics with confusion matrices and decision boundary visualisation","archived":false,"fork":false,"pushed_at":"2026-04-18T20:37:22.000Z","size":131492,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-04-18T21:29:44.577Z","etag":null,"topics":["classification","data-science","decision-tree","educational","feature-importance","k-nearest-neighbors","linear-regression","machine-learning","model-evaluation","python3","random-forest","scikit-learn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/imosudi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-04-18T17:25:46.000Z","updated_at":"2026-04-18T20:37:25.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/imosudi/model_training","commit_stats":null,"previous_names":["imosudi/model_training"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/imosudi/model_training","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imosudi%2Fmodel_training","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imosudi%2Fmodel_training/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imosudi%2Fmodel_training/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imosudi%2Fmodel_training/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/imosudi","download_url":"https://codeload.github.com/imosudi/model_training/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/imosudi%2Fmodel_training/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34757355,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-25T02:00:05.521Z","response_time":101,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","data-science","decision-tree","educational","feature-importance","k-nearest-neighbors","linear-regression","machine-learning","model-evaluation","python3","random-forest","scikit-learn"],"created_at":"2026-06-25T03:01:15.235Z","updated_at":"2026-06-25T03:01:16.301Z","avatar_url":"https://github.com/imosudi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Basic AI/ML Model Training\n\nEducational machine learning project covering classical ML and TensorFlow classification workflows, evaluation, visualisation, and model serialisation.\n\n## Overview\n\nThis repository contains hands-on classification examples built with scikit-learn and TensorFlow. It covers data preprocessing, model training, cross-validation, reporting, visualisation, and model export.\n\nBreast Cancer Diagnosis now compares Logistic Regression, Random Forest, k-NN, Decision Tree, and a TensorFlow neural network on the Breast Cancer Wisconsin dataset. The workflow includes data exploration, train/test splitting, feature scaling, cross-validation, classification reports, ROC-AUC, confusion matrices, learning curves, feature importance analysis, and training-vs-validation plots.\n\n[![data-science](https://img.shields.io/badge/-data--science-informational?style=flat)](#) [![machine-learning](https://img.shields.io/badge/-machine--learning-blue?style=flat)](#) [![tensorflow](https://img.shields.io/badge/-TensorFlow-FF6F00?style=flat\u0026logo=tensorflow\u0026logoColor=white)](#) [![keras](https://img.shields.io/badge/-Keras-D00000?style=flat\u0026logo=keras\u0026logoColor=white)](#) [![scikit-learn](https://img.shields.io/badge/-scikit--learn-F7931E?style=flat\u0026logo=scikit-learn\u0026logoColor=white)](#) [![python3](https://img.shields.io/badge/-python3-3776AB?style=flat\u0026logo=python\u0026logoColor=white)](#) [![pandas](https://img.shields.io/badge/-pandas-150458?style=flat\u0026logo=pandas\u0026logoColor=white)](#) [![numpy](https://img.shields.io/badge/-NumPy-013243?style=flat\u0026logo=numpy\u0026logoColor=white)](#) [![matplotlib](https://img.shields.io/badge/-Matplotlib-11557C?style=flat)](#) [![seaborn](https://img.shields.io/badge/-Seaborn-4C72B0?style=flat)](#) [![classification](https://img.shields.io/badge/-classification-red?style=flat)](#) [![model-evaluation](https://img.shields.io/badge/-model--evaluation-teal?style=flat)](#) [![cross-validation](https://img.shields.io/badge/-cross--validation-0A9396?style=flat)](#) [![roc-auc](https://img.shields.io/badge/-ROC--AUC-7B2CBF?style=flat)](#) [![feature-importance](https://img.shields.io/badge/-feature--importance-blueviolet?style=flat)](#) [![random-forest](https://img.shields.io/badge/-random--forest-brightgreen?style=flat)](#) [![linear-regression](https://img.shields.io/badge/-linear--regression-orange?style=flat)](#) [![decision-tree](https://img.shields.io/badge/-decision--tree-yellow?style=flat)](#) [![k-nearest-neighbors](https://img.shields.io/badge/-k--nearest--neighbors-green?style=flat)](#) [![educational](https://img.shields.io/badge/-educational-purple?style=flat)](#)\n\n## Projects\n\n### 1. Breast Cancer Diagnosis (`cancer/`)\n**Dataset:** Breast Cancer Wisconsin (569 samples, 30 features)\n\n**Files:**\n- `serialise_models.py` - Main model serialisation script\n- `data_load.py` - Data loading and preprocessing utilities\n- `trainings.py` - Training functions and pipelines\n- `validations.py` - Model validation and cross-validation\n- `visualisations.py` - Plotting and visualisation functions\n- `reports.py` - Report generation and metrics calculation\n- `outputs/` - Directory for generated plots and model files\n\n**Models:**\n- Logistic Regression\n- Random Forest\n- k-Nearest Neighbors (k-NN)\n- Decision Tree\n- TensorFlow dense neural network\n\n**Features:**\n- Full dataset exploration and statistical summary\n- Train/test splitting with stratification\n- Feature scaling for Logistic Regression, k-NN, and TensorFlow\n- TensorFlow training with model summary, epoch logs, validation tracking, and early stopping\n- Cross-validation for all models, including manual TensorFlow CV\n- Learning curves for all models\n- Comprehensive evaluation metrics:\n  - Accuracy\n  - Classification reports\n  - Confusion matrices\n  - ROC-AUC\n- Feature importance analysis:\n  - Random Forest and Decision Tree: built-in importances\n  - Logistic Regression: absolute coefficients\n  - k-NN and TensorFlow: permutation importance\n- Unified training-history plots for train vs validation loss and accuracy\n- Model serialisation:\n  - scikit-learn models saved as `.pkl`\n  - TensorFlow model saved as `.keras`\n\n**Generated outputs include:**\n- `training_validation_curves.png`\n- Per-model learning curves\n- Per-model confusion matrices\n- Per-model feature importance plots\n- Serialised model artifacts in `cancer/outputs/models/`\n\n### 2. Single Model Training (`one/train_iris.py`)\nTraining pipeline for individual machine learning models.\n\n### 3. Multi-Model Comparison (`three/`)\nAdvanced model comparison and evaluation framework.\n\n---\n\n## Project Structure\n\n```\nmodel_training/\n├── cancer/                    # Breast cancer classification project\n│   ├── serialise_models.py   # Model serialisation script\n│   ├── data_load.py          # Data loading utilities\n│   ├── trainings.py          # Training functions\n│   ├── validations.py        # Validation methods\n│   ├── visualisations.py     # Plotting functions\n│   ├── reports.py            # Report generation\n│   └── outputs/              # Generated files and plots\n├── one/                      # Single model training\n│   └── train_iris.py\n├── three/                    # Multi-model comparison\n├── requirements.txt          # dependency\n├── README.md                 # This file\n└── LICENSE                   # Project license\n```\n\n---\n\n## Core Concepts Covered\n\n- **Data Exploration:** Shape, class distribution, summary statistics, pairplot visualisation\n- **Train/Test Splitting:** Stratified splits to preserve class proportions\n- **Feature Scaling:** StandardScaler for distance-based and neural-network models\n- **Cross-Validation:** k-fold CV for robust model evaluation\n- **Model Comparison:** Side-by-side evaluation of multiple algorithms\n- **Deep Learning Basics:** Dense neural networks with TensorFlow/Keras\n- **Evaluation Metrics:**\n  - Accuracy\n  - Confusion matrices\n  - Classification reports (precision, recall, F1-score, support)\n  - AUC-ROC score\n- **Feature Importance:** Understanding which features drive predictions\n- **Visualisation:** Training-validation curves, confusion matrices, learning curves, feature importance plots\n- **Serialisation:** Exporting sklearn and TensorFlow models for reuse\n\n---\n\n## Requirements\n\n```bash\ngit clone git@github.com:imosudi/model_training.git\n```\n\n```bash\ncd model_training\n```\n\n```bash\npython3 -m venv venv\n```\n\n```bash\nsource venv/bin/activate\n```\n\n\n```bash\npip install -r requirements.txt\n```\n\n## Usage\n\nRun the Breast Cancer diagnosis example:\n```bash\npython cancer/serialise_models.py\n```\n\nRun the Iris classification example:\n```bash\npython one/train_iris.py\n```\n\n\nThis command trains the models, generates reports and visualisations, and writes serialised artifacts to `cancer/outputs/models/`.\n\n---\n\n## Educational Value\n\nThese scripts are designed as learning resources for:\n- Understanding how different classifiers work\n- Learning proper ML workflow (explore → split → scale → train → evaluate)\n- Interpreting model outputs and evaluation metrics\n- Comparing algorithm performance\n- Extracting actionable insights from feature importance\n\n---\n\n\n## Notes\n\n- All random states are fixed (42) for reproducibility\n- Stratified splitting ensures balanced train/test distributions\n- Feature scaling is crucial for distance-based models and the TensorFlow model\n- Cross-validation provides robust performance estimates\n- Confusion matrices reveal which classes are confused with each other\n- Feature importance helps understand model decisions\n- TensorFlow uses CPU if CUDA drivers are not available\n\n\n## License\n\nThis project is licensed under the **BSD 3-Clause License** - see the [LICENSE](./LICENSE) file for details.\n\n```\nBSD 3-Clause License\n\nCopyright (c) 2026, Mosudi Isiaka, IoT and Smart Systems, FH Technikum Wien\nAll rights reserved.\n```\n\n---\n\n##  Author\n\n**Mosudi Isiaka O.**  \n📧 [mosudi.isiaka@gmail.com](mailto:mosudi.isiaka@gmail.com)  | [FH Technikum Wien email](mailto:io24m006@technikum-wien.at)  \n🌐 [https://mioemi.com](https://mioemi.com)   \n💻 [https://github.com/imosudi](https://github.com/imosudi)\n\n---","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimosudi%2Fmodel_training","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimosudi%2Fmodel_training","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimosudi%2Fmodel_training/lists"}