{"id":50857945,"url":"https://github.com/leonidasdev/federated-light-skin-cancer-classification","last_synced_at":"2026-06-14T19:32:11.239Z","repository":{"id":330565693,"uuid":"1123180928","full_name":"leonidasdev/federated-light-skin-cancer-classification","owner":"leonidasdev","description":"Federated learning system for skin cancer classification using lightweight vision transformer DSCATNet. Trains across multiple dermoscopic image datasets HAM10000, ISIC 2018/2019/2020, PAD UFES 20","archived":false,"fork":false,"pushed_at":"2026-05-27T15:27:50.000Z","size":22980,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-05-27T17:15:16.339Z","etag":null,"topics":["deep-learning","federated-learning","skin-cancer"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leonidasdev.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-26T10:43:52.000Z","updated_at":"2026-05-27T15:28:56.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/leonidasdev/federated-light-skin-cancer-classification","commit_stats":null,"previous_names":["leonidasdev/federated-light-skin-cancer-classification"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/leonidasdev/federated-light-skin-cancer-classification","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonidasdev%2Ffederated-light-skin-cancer-classification","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonidasdev%2Ffederated-light-skin-cancer-classification/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonidasdev%2Ffederated-light-skin-cancer-classification/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonidasdev%2Ffederated-light-skin-cancer-classification/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leonidasdev","download_url":"https://codeload.github.com/leonidasdev/federated-light-skin-cancer-classification/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leonidasdev%2Ffederated-light-skin-cancer-classification/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34335688,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-14T02:00:07.365Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","federated-learning","skin-cancer"],"created_at":"2026-06-14T19:32:10.550Z","updated_at":"2026-06-14T19:32:11.233Z","avatar_url":"https://github.com/leonidasdev.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Federated Learning for Skin Cancer Classification with DSCATNet\n\n[![CI](https://github.com/leonidasdev/federated-light-skin-cancer-classification/actions/workflows/ci.yml/badge.svg)](https://github.com/leonidasdev/federated-light-skin-cancer-classification/actions/workflows/ci.yml)\n[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)\n[![PyTorch 2.7+](https://img.shields.io/badge/pytorch-2.7+-ee4c2c.svg)](https://pytorch.org/)\n[![Flower 1.25+](https://img.shields.io/badge/flower-1.25+-green.svg)](https://flower.dev/)\n[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)\n\n---\n\n## Table of Contents\n\n1. [Overview](#overview)\n2. [Research Contribution](#research-contribution)\n3. [Project Structure](#project-structure)\n4. [Model Architecture](#model-architecture)\n5. [Installation](#installation)\n6. [Dataset Setup](#dataset-setup)\n7. [Configuration System](#configuration-system)\n8. [Training Pipeline](#training-pipeline)\n9. [Checkpoints \u0026 Resume Training](#checkpoints--resume-training)\n10. [Model Evaluation](#model-evaluation)\n11. [CLI Reference](#cli-reference)\n12. [Experiment Outputs](#experiment-outputs)\n13. [Notebooks](#notebooks)\n14. [Testing](#testing)\n15. [Troubleshooting](#troubleshooting)\n16. [Documentation](#documentation)\n17. [Citation](#citation)\n18. [License](#license)\n\n---\n\n## Overview\n\nThis project evaluates the **Dual-Scale Cross-Attention Vision Transformer (DSCATNet)** in a **Federated Learning** setting for dermoscopic skin lesion classification.\n\n**This is a thesis project** investigating whether lightweight Vision Transformers can maintain their classification accuracy under federated learning constraints, specifically with non-IID (non-Independent and Identically Distributed) data across multiple simulated hospitals/institutions.\n\n### Key Features\n\n- **DSCATNet Implementation**: Lightweight ViT with dual-scale cross-attention (~29.4M parameters, paper variant)\n- **Federated Learning**: Flower-based FL simulation with FedAvg aggregation\n- **Multiple Non-IID Modes**: Natural (dataset-based), Dirichlet, label skew, quantity skew\n- **5 Dermoscopy Datasets**: HAM10000, ISIC 2018/2019/2020, PAD-UFES-20\n- **Comprehensive Evaluation**: Accuracy, F1, AUC-ROC, confusion matrices, per-class metrics\n- **Checkpoint Management**: Resume training, best model tracking, automatic cleanup\n\n---\n\n## Research Contribution\n\n| Aspect | Description |\n|--------|-------------|\n| **Novel Evaluation** | First adaptation and evaluation of DSCATNet in federated learning |\n| **Real-World Non-IID** | Each FL client holds a different dermoscopy dataset (natural heterogeneity) |\n| **Comprehensive Comparison** | Centralized vs. IID-FL vs. Non-IID-FL performance analysis |\n| **Lightweight Focus** | Benchmarking against literature on efficient FL models |\n\n---\n\n## Project Structure\n\n```\nfederated-light-skin-cancer-classification/\n│\n├── configs/                          # YAML configuration files\n│   ├── dscatnet_federated_ham10000_non_iid.yaml  # Main FL experiment config\n│   ├── dscatnet_centralized_original.yaml        # Centralized baseline config\n│   ├── dscatnet_federated_padufes20_non_iid.yaml # Alternative FL config\n│   ├── fl_config.yaml                      # FL framework defaults\n│   ├── model_config.yaml                   # DSCATNet architecture settings\n│   └── experiment_config.yaml              # Comparison experiment settings\n│\n├── data/                             # Datasets (download required)\n│   ├── HAM10000/\n│   ├── ISIC2018/\n│   ├── ISIC2019/\n│   ├── ISIC2020/\n│   └── PAD-UFES-20/\n│\n├── outputs/                          # Training outputs (auto-generated)\n│   └── \u003cexperiment_name\u003e/\n│       ├── checkpoints/\n│       │   ├── best_model.pt\n│       │   ├── best_checkpoint.pt\n│       │   └── checkpoint_{epoch/round}_N.pt\n│       ├── config.json\n│       ├── results.json\n│       ├── metrics/\n│       │   └── \u003cexperiment_name\u003e_metrics.csv\n│       └── experiment.log\n│\n├── src/                              # Source code\n│   ├── __init__.py\n│   ├── models/                       # DSCATNet implementation\n│   │   ├── dscatnet.py               # Main model class\n│   │   ├── cross_attention.py        # Cross-scale attention module\n│   │   └── patch_embedding.py        # Dual-scale patch embedding\n│   ├── federated/                    # FL components\n│   │   ├── client.py                 # Flower NumPyClient\n│   │   ├── server.py                 # FL server utilities\n│   │   ├── simulation.py             # FL simulator (FedAvg)\n│   │   └── strategy.py               # Aggregation strategies\n│   ├── centralized/                  # Baseline training\n│   │   └── centralized.py            # Centralized trainer\n│   ├── data/                         # Data handling\n│   │   ├── datasets.py               # Dataset classes (HAM10000, ISIC, PAD-UFES-20)\n│   │   ├── preprocessing.py          # Transforms \u0026 augmentation\n│   │   ├── splits.py                 # IID/Non-IID splitting utilities\n│   │   ├── download.py               # ISIC API downloader\n│   │   └── verify.py                 # Dataset verification\n│   ├── evaluation/                   # Evaluation utilities\n│   │   ├── metrics.py                # Classification metrics\n│   │   └── visualization.py          # Plotting functions\n│   └── utils/                        # Helpers\n│       ├── checkpoints.py            # Checkpoint management\n│       ├── config_schema.py          # YAML config validation\n│       ├── helpers.py                # Seed, device, formatting\n│       └── logging_utils.py          # Logging configuration\n│\n├── notebooks/                        # Jupyter notebooks\n│   ├── 01_dataset_exploration.ipynb\n│   ├── 02_model_evaluation.ipynb\n│   └── 03_fl_vs_centralized_comparison.ipynb\n\n│\n├── tests/                            # Unit tests\n│   ├── conftest.py               # Shared fixtures and markers\n│   ├── test_centralized.py           # Centralized training tests\n│   ├── test_checkpoints.py           # Checkpoint save/load tests\n│   ├── test_cli.py                   # CLI argument parsing tests\n│   ├── test_client.py                # FL client tests\n│   ├── test_config_loading.py        # Config loading/validation tests\n│   ├── test_config_schema.py         # Config schema validation tests\n│   ├── test_datasets.py              # Dataset registry tests\n│   ├── test_download.py              # Download functionality tests\n│   ├── test_evaluation.py            # Evaluation metrics tests\n│   ├── test_helpers.py               # Helper utility tests\n│   ├── test_integration.py           # End-to-end integration tests\n│   ├── test_logging_utils.py         # Logging \u0026 metrics tracker tests\n│   ├── test_model_evaluator.py       # Model evaluator tests\n│   ├── test_models.py                # DSCATNet architecture tests\n│   ├── test_preprocessing.py         # Preprocessing pipeline tests\n│   ├── test_simulation.py            # FL simulation tests\n│   ├── test_splits.py                # Data splitting tests\n│   ├── test_strategy.py              # FedAvg strategy tests\n│   ├── test_verify.py                # Dataset verification tests\n│   └── test_visualization.py         # Visualization tests\n│\n├── docs/                             # Documentation\n│   ├── architecture.md               # System architecture\n│   ├── benchmark-comparison.md       # FL vs centralized fairness audit\n│   ├── CLAUDE.md                     # AI assistant context\n│   ├── config-options-guide.md       # Configuration reference\n│   └── README.md                     # Documentation index\n│\n├── run_experiment.py                 # Main entry point\n├── run_download.py                   # Dataset downloader\n├── run_tests.py                      # Test runner\n├── CONTRIBUTING.md                   # Contribution guidelines\n├── requirements.txt                  # Python dependencies\n├── pyproject.toml                    # Project configuration\n└── README.md\n```\n\n---\n\n## Model Architecture\n\n### DSCATNet (Dual-Scale Cross-Attention Vision Transformer)\n\nDSCATNet is a lightweight Vision Transformer designed specifically for dermoscopic image classification. It captures both fine-grained local features and global contextual information through dual-scale processing.\n\n```\nInput Image (224×224×3)\n         │\n         ▼\n┌─────────────────────────────────┐\n│   Dual-Scale Patch Embedding    │\n│  ┌───────────┬───────────┐      │\n│  │ Fine 8×8  │Coarse 16×16│     │\n│  │784 patches│196 patches │     │\n│  └───────────┴───────────┘      │\n└─────────────────────────────────┘\n         │\n         ▼\n┌─────────────────────────────────┐\n│  Cross-Scale Attention Blocks   │\n│  (6 blocks, 12 heads, dim=384)  │\n│  Fine ←→ Coarse attention       │\n└─────────────────────────────────┘\n         │\n         ▼\n┌─────────────────────────────────┐\n│     Feature Fusion (concat)     │\n└─────────────────────────────────┘\n         │\n         ▼\n┌─────────────────────────────────┐\n│   CLS Token Extraction          │\n│   + Classification Head         │\n│   → 7 classes (softmax)         │\n└─────────────────────────────────┘\n```\n\n### Model Variants\n\n| Variant | Embed Dim | Depth | Heads | Parameters | Use Case |\n|---------|-----------|-------|-------|------------|----------|\n| `tiny`  | 192       | 4     | 3     | ~5M        | Resource-constrained FL clients |\n| `small` | 384       | 6     | 6     | ~29.4M     | Balanced performance |\n| `paper` | 384       | 6     | 12    | ~29.4M     | **Default** - paper-faithful (Yadav et al.) |\n| `base`  | 384       | 8     | 6     | ~39M       | Maximum accuracy |\n\n---\n\n## Installation\n\n### 1. Clone Repository\n\n```bash\ngit clone https://github.com/leonidasdev/federated-light-skin-cancer-classification.git\ncd federated-light-skin-cancer-classification\n```\n\n### 2. Create Virtual Environment\n\n```bash\n# Create venv\npython -m venv .venv\n\n# Activate (Windows PowerShell)\n.\\.venv\\Scripts\\Activate.ps1\n\n# Activate (Linux/Mac)\nsource .venv/bin/activate\n```\n\n### 3. Install Dependencies\n\n```bash\npip install -r requirements.txt\n```\n\n### 4. Verify Installation\n\n```bash\npython -c \"import torch; import flwr; print(f'PyTorch: {torch.__version__}'); print(f'CUDA: {torch.cuda.is_available()}')\"\n```\n\n## Quickstart — run notebooks and an example experiment\n\nAfter creating and activating the virtual environment and installing dependencies, you can open the notebooks or execute an experiment from the CLI.\n\n1. Activate venv and install dependencies:\n\n```powershell\n# Windows (PowerShell)\n.\\.venv\\Scripts\\Activate.ps1\npip install -r requirements.txt\n\n# macOS / Linux\nsource .venv/bin/activate\npip install -r requirements.txt\n```\n\n2. Run Jupyter Lab/Notebook and open the notebooks in `notebooks/`:\n\n```bash\njupyter lab\n```\n\n3. Or run an example experiment from the CLI (uses a `configs/` YAML):\n\n```bash\npython run_experiment.py --config configs/dscatnet_centralized_ham10000.yaml\n```\n\n### Analysis scripts\n\nThis repository includes analysis utilities to extract convergence metrics and generate comparison plots from training results.\n\n- `scripts/analysis/extract_logs.py`: Searches recursively under `--outputs-dir` for `results.json` files from experiments, extracts training/validation accuracy curves, and generates:\n  - **Convergence plots by dataset** (recommended for thesis): centralized vs federated IID vs federated Non-IID for each dataset (HAM10000, All Datasets, PAD-UFES-20)\n  - **Convergence plots by learning type**: overview of all experiments, centralized-only, and federated-only\n  - **Summary CSV**: best/final validation accuracy, test accuracy, and training time per experiment\n  \n  Usage: `python scripts/analysis/extract_logs.py --outputs-dir outputs/ --out-dir outputs/analysis`. See `scripts/analysis/README.md` for full details.\n\n\n### System Requirements\n\n| Resource | Minimum | Recommended |\n|----------|---------|-------------|\n| Python   | 3.10+   | 3.10+       |\n| RAM      | 8GB     | 16GB+       |\n| GPU VRAM | 4GB     | 8GB+        |\n| Disk     | 30GB    | 50GB+       |\n| CUDA     | 11.8+   | 12.0+       |\n\n---\n\n## Dataset Setup\n\n### Supported Datasets\n\n| Dataset | Images | Classes | Source | FL Client |\n|---------|--------|---------|--------|-----------|\n| HAM10000 | 10,015 | 7 | Kaggle | Client 1 |\n| ISIC 2018 | ~10,015 | 7 | ISIC Archive | Client 2 |\n| ISIC 2019 | ~25,331 | 8+UNK | ISIC Archive | Client 3 |\n| ISIC 2020 | ~33,126 | 2 (binary) | ISIC Archive | Client 4 |\n| PAD-UFES-20 | 2,298 | 6 | Mendeley | Client 5 |\n\n### Unified 7-Class Mapping\n\nAll datasets are mapped to a unified 7-class schema:\n\n| Class | Abbreviation | Description |\n|-------|--------------|-------------|\n| 0 | AK/AKIEC | Actinic Keratosis |\n| 1 | BCC | Basal Cell Carcinoma |\n| 2 | BKL | Benign Keratosis |\n| 3 | DF | Dermatofibroma |\n| 4 | MEL | Melanoma |\n| 5 | NV | Melanocytic Nevus |\n| 6 | VASC | Vascular Lesion |\n\n###  Recommended: Manual Download\n\n**For significantly faster download speeds, we strongly recommend downloading datasets manually via your web browser** rather than using the API downloader. Browser downloads are typically 10-50x faster than API-based downloads.\n\n#### Download Links\n\n| Dataset | Download Link | Size |\n|---------|---------------|------|\n| **HAM10000** | [Kaggle](https://www.kaggle.com/datasets/kmader/skin-cancer-mnist-ham10000) | ~2.5GB |\n| **ISIC 2018** | [ISIC Archive](https://challenge.isic-archive.com/data/#2018) | ~2.5GB |\n| **ISIC 2019** | [ISIC Archive](https://challenge.isic-archive.com/data/#2019) | ~9GB |\n| **ISIC 2020** | [ISIC Archive](https://challenge.isic-archive.com/data/#2020) | ~25GB |\n| **PAD-UFES-20** | [Mendeley](https://data.mendeley.com/datasets/zr7vgbcyr2/1) | ~1.2GB |\n\n#### Manual Setup Steps\n\n1. **Download** each dataset from the links above\n2. **Extract** the archives\n3. **Organize** into the following structure:\n\n```\ndata/\n├── HAM10000/\n│   ├── HAM10000_metadata.csv\n│   ├── HAM10000_images_part_1/\n│   │   └── *.jpg\n│   └── HAM10000_images_part_2/\n│       └── *.jpg\n│\n├── ISIC2018/\n│   ├── ISIC2018_Task3_Training_GroundTruth.csv\n│   └── ISIC2018_Task3_Training_Input/\n│       └── *.jpg\n│\n├── ISIC2019/\n│   ├── ISIC_2019_Training_GroundTruth.csv\n│   └── ISIC_2019_Training_Input/\n│       └── *.jpg\n│\n├── ISIC2020/\n│   ├── train.csv\n│   └── train/\n│       └── *.jpg\n│\n└── PAD-UFES-20/\n    ├── metadata.csv\n    ├── imgs_part_1/\n    ├── imgs_part_2/\n    └── imgs_part_3/\n        └── *.png\n```\n\n4. **Verify** the installation:\n\n```bash\npython run_download.py --verify\n```\n\n#### Alternative: API Download (Slower)\n\nIf you prefer automated downloading:\n\n```bash\n# Download all datasets (may take several hours)\npython run_download.py --download-all --workers 16\n\n# Download specific dataset\npython run_download.py --download ISIC2019\n```\n\n---\n\n## Configuration System\n\nAll experiments are configured via **YAML files** in the `configs/` directory. This provides reproducibility and easy parameter tuning.\n\n### Main Configuration Files\n\n| File | Purpose |\n|------|---------|\n| `dscatnet_federated_ham10000_non_iid.yaml` | Primary FL experiment config (non-IID) |\n| `dscatnet_centralized_original.yaml` | Centralized baseline config |\n| `model_config.yaml` | DSCATNet architecture settings |\n| `fl_config.yaml` | FL framework defaults |\n\n### Configuration Structure\n\n```yaml\n# Example: dscatnet_federated_ham10000_non_iid.yaml\n\nfederated:\n  experiment:\n    name: dscatnet_federated_isic2019\n    description: \"FL benchmark on ISIC2019\"\n\n  # Data\n  data_root: ./data\n  output_dir: ./outputs\n  datasets:\n    - ISIC2019\n\n  # Model\n  model:\n    variant: paper        # tiny, small, paper, base\n    image_size: 224\n    num_classes: 7\n\n  # Training\n  training:\n    batch_size: 4\n    lr: 0.001\n    local_epochs: 1\n    num_rounds: 25\n\n  # Federation\n  federation:\n    num_clients: 4  # Adjust based on number of datasets used\n    data_partition_type: dirichlet    # natural, dirichlet, label_skew, quantity_skew, iid\n    dirichlet_alpha: 0.5      # Lower = more non-IID\n\n  # Augmentation\n  augmentation:\n    level: medium             # light, medium, heavy\n```\n\n### Non-IID Distribution Types\n\n| Type | Description | When to Use |\n|------|-------------|-------------|\n| `natural` | Each dataset = 1 client | Simulating real hospitals |\n| `dirichlet` | Dirichlet-based label skew | Controlled heterogeneity studies |\n| `label_skew` | Artificial label imbalance | Extreme non-IID testing |\n| `quantity_skew` | Different sample counts | Unbalanced client scenarios |\n\n---\n\n## Training Pipeline\n\n### Pipeline Overview\n\n```\n┌─────────────────────────────────────────────────────────────────┐\n│                     TRAINING PIPELINE                           │\n├─────────────────────────────────────────────────────────────────┤\n│                                                                 │\n│  1. CONFIGURATION                                               │\n│     └── Load YAML config → SimulationConfig/CentralizedConfig   │\n│                                                                 │\n│  2. DATA SETUP                                                  │\n│     ├── Load datasets (HAM10000, ISIC, PAD-UFES-20)            │\n│     ├── Apply transforms (resize, normalize, augment)          │\n│     └── Create train/val splits (stratified)                   │\n│                                                                 │\n│  3. MODEL INITIALIZATION                                        │\n│     └── Create DSCATNet(variant, num_classes, pretrained)      │\n│                                                                 │\n│  4. TRAINING LOOP                                               │\n│     ├── Centralized: Standard epoch-based training             │\n│     └── Federated:                                             │\n│         ├── Distribute model to clients                        │\n│         ├── Local training (local_epochs)                      │\n│         ├── Aggregate weights (FedAvg)                         │\n│         └── Repeat for num_rounds                              │\n│                                                                 │\n│  5. CHECKPOINTING                                               │\n│     ├── Save best_model.pt (best val accuracy)                 │\n│     └── Save periodic checkpoints                              │\n│                                                                 │\n│  6. EVALUATION                                                  │\n│     └── Compute metrics on validation/test set                 │\n│                                                                 │\n└─────────────────────────────────────────────────────────────────┘\n```\n\n### Running Experiments\n\n#### Federated Learning (Recommended)\n\n```bash\n# Using config file (recommended)\npython run_experiment.py --mode federated --config configs/dscatnet_federated_ham10000_non_iid.yaml\n\n# Override specific settings\npython run_experiment.py --mode federated \\\n    --config configs/dscatnet_federated_ham10000_non_iid.yaml \\\n    --rounds 50 \\\n    --batch-size 16 \\\n    --model-variant paper\n```\n\n#### Centralized Training (Baseline)\n\n```bash\n# Using config file\npython run_experiment.py --mode centralized --config configs/dscatnet_centralized_original.yaml\n\n# With overrides\npython run_experiment.py --mode centralized \\\n    --config configs/dscatnet_centralized_original.yaml \\\n    --epochs 50 \\\n    --augmentation medium\n```\n\n#### Comparison Experiment\n\n```bash\npython run_experiment.py --mode comparison --config configs/experiment_config.yaml\n```\n\n#### Standalone Model Evaluation\n\n```bash\n# Evaluate a trained checkpoint on specific datasets\npython run_experiment.py --mode evaluate \\\n    --checkpoint outputs/federated_20260126_005720/checkpoints/best_model.pt \\\n    --datasets HAM10000 ISIC2019\n\n# Save evaluation results to file\npython run_experiment.py --mode evaluate \\\n    --checkpoint outputs/experiment/checkpoints/best_model.pt \\\n    --output-dir ./evaluation_results\n```\n\n---\n\n## Checkpoints \u0026 Resume Training\n\n### Checkpoint Structure\n\n**File Types**:\n- `best_model.pt` — Model weights only; use for inference and fast evaluation.\n- `best_checkpoint.pt` — Full training state (model, optimizer, scheduler, scaler, epoch, and metrics); use for resuming training or reproducing the exact training run.\n\nCheckpoints are saved in `outputs/\u003cexperiment_name\u003e/checkpoints/`:\n\n```\ncheckpoints/\n├── best_model.pt             # Best model weights only (for inference)\n├── best_checkpoint.pt        # Full checkpoint with training state (for resumption)\n├── checkpoint_epoch_10.pt    # Periodic checkpoint (centralized)\n├── checkpoint_round_5.pt     # Periodic checkpoint (federated)\n└── checkpoint_round_10.pt\n```\n\n### Checkpoint Contents\n\n**Centralized checkpoints** contain full training state for perfect resumption:\n\n```python\n{\n    \"epoch\": 10,                          # Current epoch number\n    \"model_state_dict\": {...},            # Model weights\n    \"optimizer_state_dict\": {...},        # Optimizer state (momentum, etc.)\n    \"scheduler_state_dict\": {...},        # LR scheduler position\n    \"scaler_state_dict\": {...},           # AMP scaler state (if enabled)\n    \"metrics\": {\n        \"val_accuracy\": 0.85,\n        \"val_loss\": 0.42,\n        ...\n    },\n    \"config\": {...},                      # Training configuration\n    \"history\": {...},                     # Full training history\n    \"best_val_accuracy\": 0.85,\n    \"best_epoch\": 10,\n    \"epochs_without_improvement\": 0,\n}\n```\n\n**Federated checkpoints** contain:\n\n```python\n{\n    \"round\": 10,                          # Current round number\n    \"model_state_dict\": {...},            # Global model weights\n    \"metrics\": {...},                     # Round metrics\n    \"config\": {...},                      # Simulation configuration\n    \"history\": {...},                     # Full training history\n    \"best_val_accuracy\": 0.78,\n    \"best_round\": 8,\n    \"rounds_without_improvement\": 2,\n}\n```\n\n### Resume Training from Checkpoint\n\n**Resume Centralized Training:**\n\n```bash\n# Resume from best checkpoint (continues training)\npython run_experiment.py --mode centralized \\\n    --resume outputs/centralized_20260125_120000/checkpoints/best_checkpoint.pt \\\n    --epochs 150\n\n# Resume with config file + checkpoint\npython run_experiment.py --mode centralized \\\n    --config configs/dscatnet_centralized_original.yaml \\\n    --resume outputs/experiment/checkpoints/checkpoint_epoch_50.pt \\\n    --epochs 100\n```\n\n**Resume Federated Training:**\n\n```bash\n# Resume FL from round 25 checkpoint, continue to round 50\npython run_experiment.py --mode federated \\\n    --resume outputs/federated_20260126_005720/checkpoints/checkpoint_round_25.pt \\\n    --rounds 50\n\n# Resume with config + new experiment name\npython run_experiment.py --mode federated \\\n    --config configs/dscatnet_federated_ham10000_non_iid.yaml \\\n    --resume outputs/federated_20260126_005720/checkpoints/checkpoint_round_10.pt \\\n    --rounds 30 \\\n    --experiment-name federated_continued\n```\n\n### Loading Checkpoints in Code\n\n```python\nimport torch\nfrom src.models.dscatnet import create_dscatnet\n\n# Create model\nmodel = create_dscatnet(variant=\"paper\", num_classes=7)\n\n# Load checkpoint\ncheckpoint = torch.load(\"outputs/experiment/checkpoints/best_model.pt\")\nmodel.load_state_dict(checkpoint[\"model_state_dict\"])\n\n# Check training progress\nprint(f\"Loaded from epoch/round: {checkpoint.get('epoch') or checkpoint.get('round')}\")\nprint(f\"Best accuracy: {checkpoint.get('best_val_accuracy', checkpoint.get('val_accuracy')):.4f}\")\n```\n\n---\n\n## Model Evaluation\n\n### Evaluation Metrics\n\nThe evaluation system computes comprehensive metrics:\n\n| Metric | Description |\n|--------|-------------|\n| **Accuracy** | Overall correct predictions |\n| **Balanced Accuracy** | Mean per-class accuracy (handles imbalance) |\n| **Precision (macro)** | Average precision across classes |\n| **Recall (macro)** | Average recall across classes |\n| **F1-Score (macro/weighted)** | Harmonic mean of precision \u0026 recall |\n| **AUC-ROC** | Area under ROC curve (one-vs-rest) |\n| **Confusion Matrix** | Per-class prediction breakdown |\n| **Per-Class Metrics** | Sensitivity/specificity per class |\n\n### Running Evaluation\n\n#### Evaluate a Trained Model\n\n```python\nfrom src.models.dscatnet import create_dscatnet\nfrom src.evaluation.metrics import ModelEvaluator\nfrom src.data.datasets import ISIC2019Dataset\nfrom src.data.preprocessing import get_val_transforms\nfrom torch.utils.data import DataLoader\nimport torch\n\n# Setup\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n\n# Load model\nmodel = create_dscatnet(variant=\"paper\", num_classes=7)\ncheckpoint = torch.load(\"outputs/experiment/checkpoints/best_model.pt\")\nmodel.load_state_dict(checkpoint[\"model_state_dict\"])\nmodel.to(device)\n\n# Prepare test data\ntransform = get_val_transforms(img_size=224)\ntest_dataset = ISIC2019Dataset(\n    root_dir=\"data/ISIC2019/ISIC_2019_Training_Input\",\n    csv_path=\"data/ISIC2019/ISIC_2019_Training_GroundTruth.csv\",\n    transform=transform\n)\ntest_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)\n\n# Evaluate\nevaluator = ModelEvaluator(model, device, num_classes=7)\nresults = evaluator.evaluate(test_loader)\n\n# Print report\nevaluator.print_report(results)\n\n# Access specific metrics\nprint(f\"Accuracy: {results.accuracy:.4f}\")\nprint(f\"F1 (macro): {results.f1_macro:.4f}\")\nprint(f\"AUC-ROC: {results.auc_macro:.4f}\")\n```\n\n### Evaluation After Training\n\nEvaluation is automatically performed at the end of each experiment. Results are saved in:\n\n```\noutputs/\u003cexperiment_name\u003e/\n├── results.json              # Final metrics + training history\n├── config.json               # Experiment configuration\n├── metrics/                  # Real-time CSV metrics\n│   └── \u003cname\u003e_metrics.csv\n└── checkpoints/\n    └── best_model.pt         # Best model weights\n```\n\n### Metrics JSON Structure\n\n```json\n{\n    \"accuracy\": 0.8542,\n    \"balanced_accuracy\": 0.7891,\n    \"precision_macro\": 0.8123,\n    \"recall_macro\": 0.7891,\n    \"f1_macro\": 0.7956,\n    \"f1_weighted\": 0.8412,\n    \"auc_macro\": 0.9234,\n    \"per_class_metrics\": {\n        \"AK\": {\"accuracy\": 0.82, \"precision\": 0.79, \"recall\": 0.75, \"support\": 312},\n        \"BCC\": {\"accuracy\": 0.88, \"precision\": 0.85, \"recall\": 0.82, \"support\": 514}\n    }\n}\n```\n\n---\n\n## CLI Reference\n\n### `run_experiment.py` (Main Entry Point)\n\n```bash\npython run_experiment.py --mode \u003cMODE\u003e [OPTIONS]\n```\n\n#### Mode Selection\n\n| Argument | Type | Description |\n|----------|------|-------------|\n| `--mode` | required | `centralized`, `federated`, `comparison`, or `evaluate` |\n| `--config` | path | YAML configuration file (CLI args override config values) |\n\n#### Common Arguments\n\n| Argument | Type | Description |\n|----------|------|-------------|\n| `--data-root` | path | Root directory for datasets (default: `./data`) |\n| `--output-dir` | path | Output directory (default: `./outputs`) |\n| `--experiment-name` | string | Custom experiment name |\n| `--batch-size` | int | Batch size for training/evaluation |\n| `--lr` | float | Learning rate |\n| `--datasets` | list | Specific datasets: `HAM10000 ISIC2018 ISIC2019 ISIC2020 PAD-UFES-20` |\n\n#### Model Configuration\n\n| Argument | Type | Description |\n|----------|------|-------------|\n| `--model-variant` | string | DSCATNet variant: `tiny` (~5M), `small` (~29.4M), `paper` (~29.4M, default), `base` (~39M) |\n| `--num-classes` | int | Number of output classes (default: 7) |\n| `--image-size` | int | Input image size (default: 224) |\n\n#### Training Hyperparameters\n\n| Argument | Type | Description |\n|----------|------|-------------|\n| `--weight-decay` | float | Weight decay for optimizer (default: 0.0) |\n| `--augmentation` | string | Data augmentation level: `none`, `light`, `medium`, `heavy` |\n| `--early-stopping` | int | Early stopping patience (epochs/rounds without improvement) |\n| `--checkpoint-interval` | int | Save checkpoint every N epochs/rounds |\n| `--num-workers` | int | Number of data loader workers |\n\n#### Centralized-Specific Arguments\n\n| Argument | Type | Description |\n|----------|------|-------------|\n| `--epochs` | int | Number of training epochs |\n| `--warmup-epochs` | int | Number of warmup epochs for LR scheduler |\n| `--scheduler` | string | LR scheduler type: `cosine`, `plateau` |\n| `--val-split` | float | Validation split ratio (default: 0.15) |\n| `--no-amp` | flag | Disable automatic mixed precision (AMP) |\n\n#### Federated-Specific Arguments\n\n| Argument | Type | Description |\n|----------|------|-------------|\n| `--rounds` | int | Number of FL communication rounds |\n| `--clients` | int | Number of FL clients |\n| `--local-epochs` | int | Local epochs per round |\n| `--data-partition-type` | string | `natural`, `dirichlet`, `label_skew`, `quantity_skew`, `iid` |\n| `--dirichlet-alpha` | float | Dirichlet alpha (lower = more non-IID) |\n| `--participation` | float | Client participation rate per round (0.0-1.0) |\n\n#### Checkpoint \u0026 Resume Arguments\n\n| Argument | Type | Description |\n|----------|------|-------------|\n| `--resume` | path | Checkpoint path to resume training from (centralized or federated) |\n| `--checkpoint` | path | Checkpoint path for evaluation mode (`--mode evaluate`) |\n\n### `run_download.py` (Dataset Management)\n\n```bash\npython run_download.py [OPTIONS]\n```\n\n| Argument | Description |\n|----------|-------------|\n| `--verify` | Verify existing dataset installation |\n| `--instructions` | Print manual download instructions |\n| `--setup` | Interactive setup wizard |\n| `--download \u003cDATASET\u003e` | Download specific dataset |\n| `--download-all` | Download all datasets |\n| `--workers N` | Parallel download workers (default: 8) |\n| `--force` | Force re-download existing files |\n\n---\n\n## Experiment Outputs\n\n### Output Directory Structure\n\n```\noutputs/\n└── \u003cexperiment_name\u003e/\n    ├── checkpoints/\n    │   ├── best_model.pt             # Best weights only (inference)\n    │   ├── best_checkpoint.pt        # Full state (resumption)\n    │   ├── checkpoint_epoch_10.pt    # Periodic (centralized)\n    │   └── checkpoint_round_5.pt    # Periodic (federated)\n    ├── config.json                   # Experiment configuration\n    ├── results.json                  # Final metrics + training history\n    ├── metrics/                      # Real-time CSV metrics\n    │   └── \u003cname\u003e_metrics.csv\n    └── experiment.log                # Full training log\n```\n\n### Training History (in results.json)\n\nThe `results.json` file written at experiment completion includes the full training history:\n\n```json\n{\n    \"best_val_accuracy\": 0.85,\n    \"best_epoch\": 42,\n    \"total_time_seconds\": 3600.0,\n    \"history\": {\n        \"epochs\": [1, 2, 3],\n        \"train_loss\": [2.1, 1.8, 1.5],\n        \"val_loss\": [2.0, 1.7, 1.4],\n        \"val_accuracy\": [0.35, 0.52, 0.61],\n        \"learning_rate\": [0.001, 0.001, 0.0009]\n    },\n    \"environment\": {\n        \"python_version\": \"3.13.3\",\n        \"pytorch_version\": \"2.7.0\",\n        \"cuda_available\": true\n    }\n}\n```\n\n---\n\n## Notebooks\n\nInteractive Jupyter notebooks for exploration, evaluation, and analysis are provided in the `notebooks/` directory.\n\n| Notebook | Description |\n|----------|-------------|\n| [01_dataset_exploration.ipynb](notebooks/01_dataset_exploration.ipynb) | Dataset verification, class distribution analysis, image statistics, non-IID visualization, preprocessing pipeline testing, and sample visualization. Outputs exploratory figures and dataset summaries to `outputs/evaluation_dataset_exploration/`. |\n| [02_model_evaluation.ipynb](notebooks/02_model_evaluation.ipynb) | Comprehensive model evaluation including performance metrics, confusion matrices, per-class analysis, ROC curves, confidence distribution analysis, and artifact export. Exports `results_latest.json` (and timestamped JSON) with per-sample predictions and metrics used by Notebook 03. |\n| [03_fl_vs_centralized_comparison.ipynb](notebooks/03_fl_vs_centralized_comparison.ipynb) | Head-to-head comparison between centralized and federated (IID and non-IID) training approaches with paired statistical testing (McNemar exact test, Bonferroni correction, paired bootstrap gap CI, communication-cost analysis). Outputs saved to `outputs/evaluation_comparison_dscatnet_all_datasets/`. |\n\n### Notebook Details\n\n#### Notebook 01: Dataset Exploration\n\nVerifies dataset integrity and visualizes class distributions, heterogeneity metrics, and sample images. No model training required. Use this before running experiments to understand data characteristics, especially when comparing IID vs non-IID modes.\n\n#### Notebook 02: Model Evaluation \u0026 Export\n\nNotebook 02 evaluates trained models and exports comprehensive artifacts:\n\n**Configuration**: Select the experiment and dataset in the configuration cell, then run all cells sequentially.\n\n**Key Exports** (saved to `outputs/evaluation_\u003cexperiment\u003e/` for each dataset):\n\n| File | Description |\n|------|-------------|\n| `results_latest.json` | Current evaluation snapshot with metrics and per-sample predictions |\n| `results_\u003ctimestamp\u003e.json` | Timestamped archive of evaluation results |\n| `metrics_summary.csv` | Summary metrics (accuracy, F1, AUC, etc.) |\n| `per_class_metrics.csv` | Per-class performance breakdown |\n| `confusion_matrix.csv` | Confusion matrix in tabular form |\n| `kpi_dashboard.png`, `confusion_matrix.png`, `per_class_metrics.png`, `roc_curves.png`, `confidence_analysis.png` | Visualizations |\n\n#### Notebook 03: FL vs Centralized Comparison\n\nCompares centralized baselines against federated experiments under both IID and non-IID conditions. Requires `results_latest.json` from Notebook 02 runs for each experiment. Performs paired statistical testing to determine significance of accuracy gaps and computes communication costs.\n\n**Supported Experiment Modalities**:\n- `configs/dscatnet_centralized_*.yaml` — centralized training baseline\n- `configs/dscatnet_federated_*_iid.yaml` — federated with near-IID data (large Dirichlet alpha)\n- `configs/dscatnet_federated_*_non_iid.yaml` — federated with non-IID data (Dirichlet alpha 0.1–0.5)\n\n**Results JSON Structure**:\n\n```json\n{\n    \"evaluation_timestamp\": \"2026-05-07T23:06:35...\",\n    \"dataset\": \"HAM10000\",\n    \"model_variant\": \"centralized\",\n    \"num_samples\": 10015,\n    \"metrics\": {\n        \"accuracy\": 0.8542,\n        \"balanced_accuracy\": 0.7891,\n        \"f1_macro\": 0.7956,\n        \"auc_macro\": 0.9234\n    },\n    \"per_class_metrics\": { ... },\n    \"per_class_auc\": { ... },\n    \"confusion_matrix\": [...],\n    \"confidence_stats\": { ... },\n    \"labels\": [5, 1, 3, ...],                    // Ground truth per sample\n    \"predictions\": [5, 1, 3, ...],              // Model predictions per sample\n    \"sample_ids\": [\"path/to/img1.jpg\", ...],   // Unique identifier per sample\n    \"sample_predictions\": [                     // Detailed per-sample results\n        {\n            \"sample_index\": 0,\n            \"sample_id\": \"HAM10000_000000\",\n            \"y_true\": 5,\n            \"y_pred\": 5,\n            \"correct\": true,\n            \"confidence\": 0.9876\n        },\n        ...\n    ]\n}\n```\n\n**Purpose of Per-Sample Data**: The `labels`, `predictions`, `sample_ids`, and `sample_predictions` enable exact paired statistical tests in Notebook 03 (e.g., McNemar test for centralized vs FL) without requiring recomputation.\n\n### Running Notebooks\n\n```bash\n# Start Jupyter Lab\njupyter lab notebooks/\n\n# Or start Jupyter Notebook\njupyter notebook notebooks/\n```\n\n\u003e **Note**: Ensure the virtual environment is activated and datasets are downloaded before running notebooks. Notebook 02 requires `results` object from model evaluation; Notebook 03 requires evaluation artifacts from Notebook 02.\n\n---\n\n## Testing\n\nThe project includes comprehensive unit tests for all major components.\n\n### Test Modules\n\n| Module | Description |\n|--------|-------------|\n| `test_centralized.py` | Tests for centralized training configuration and trainer |\n| `test_checkpoints.py` | Tests for checkpoint saving, loading, and management |\n| `test_cli.py` | Tests for CLI argument parsing and validation |\n| `test_client.py` | Tests for Flower FL client |\n| `test_config_loading.py` | Tests for YAML config loading and schema validation |\n| `test_config_schema.py` | Tests for configuration schema validation |\n| `test_datasets.py` | Tests for dataset registry and loading functions |\n| `test_download.py` | Tests for download functionality |\n| `test_evaluation.py` | Tests for evaluation metrics and visualization functions |\n| `test_helpers.py` | Tests for seed, device, formatting, and other utilities |\n| `test_integration.py` | End-to-end integration tests (marked `@slow`) |\n| `test_logging_utils.py` | Tests for MetricsTracker, CSV logging, and resume safety |\n| `test_model_evaluator.py` | Tests for ModelEvaluator integration |\n| `test_models.py` | Tests for DSCATNet model architecture |\n| `test_preprocessing.py` | Tests for image transforms, augmentation levels, and normalization |\n| `test_simulation.py` | Tests for FL simulation, FedAvg aggregation, and client management |\n| `test_splits.py` | Tests for IID/Non-IID data splitting utilities |\n| `test_strategy.py` | Tests for DSCATNetFedAvg custom strategy |\n| `test_verify.py` | Tests for dataset verification utilities |\n| `test_visualization.py` | Tests for plotting and visualization functions |\n\n### Running Tests\n\n```bash\n# Run all tests\npython run_tests.py\n\n# Run all tests with pytest (verbose)\npytest tests/ -v\n\n# Run specific test module\npytest tests/test_simulation.py -v\n\n# Run specific test class\npytest tests/test_simulation.py::TestFLSimulator -v\n\n# Run with coverage report\npytest --cov=src tests/\n\n# Run with coverage and HTML report\npytest --cov=src --cov-report=html tests/\n```\n\n### Test Results\n\nExpected output:\n\n```\n======================== test session starts ========================\ncollected 467 items / 10 deselected / 457 selected\n\ntests/test_centralized.py ........................              [  5%]\ntests/test_checkpoints.py ..................                    [  9%]\ntests/test_cli.py .......................                       [ 14%]\ntests/test_client.py ............                               [ 17%]\ntests/test_config_loading.py ..........                         [ 19%]\ntests/test_config_schema.py ....................................[ 27%]\ntests/test_datasets.py .....................                    [ 32%]\ntests/test_download.py ......................                   [ 36%]\ntests/test_evaluation.py .......                                [ 38%]\ntests/test_helpers.py ......................                    [ 43%]\ntests/test_logging_utils.py ...........................         [ 49%]\ntests/test_model_evaluator.py .............                     [ 52%]\ntests/test_models.py ..................                         [ 56%]\ntests/test_preprocessing.py ......                              [ 57%]\ntests/test_simulation.py .....................                  [ 62%]\ntests/test_splits.py ........                                   [ 64%]\ntests/test_strategy.py ...............                          [ 67%]\ntests/test_verify.py ..........................                 [ 73%]\ntests/test_visualization.py ........................            [ 78%]\n\n================= 457 passed, 10 deselected in ~100s =================\n```\n\nTest coverage: **≥80%** across all source modules.\n\n\u003e **Note**: Integration tests are deselected by default (marked `@pytest.mark.slow`). Run them with `pytest -m slow tests/`.\n\n---\n\n## Troubleshooting\n\n### CUDA Issues on Windows\n\n```powershell\n# Reinstall PyTorch with CUDA support\npip uninstall -y torch torchvision torchaudio\npip cache purge\npip install --index-url https://download.pytorch.org/whl/cu118 torch torchvision torchaudio\n```\n\n### Out of Memory (OOM)\n\n1. **Reduce batch size** in config: `batch_size: 4`\n2. **Reduce num_workers**: `num_workers: 2`\n3. **Use smaller model variant**: `variant: tiny`\n\n### Dataset Not Found\n\n```bash\n# Verify dataset structure\npython run_download.py --verify\n\n# Check expected paths\npython run_download.py --instructions\n```\n\n---\n\n## Documentation\n\nAdditional documentation is available in the `docs/` directory:\n\n| Document | Description |\n|----------|-------------|\n| [docs/README.md](docs/README.md) | Documentation index and navigation |\n| [docs/config-options-guide.md](docs/config-options-guide.md) | Complete configuration reference |\n| [docs/architecture.md](docs/architecture.md) | System architecture and module documentation |\n| [docs/benchmark-comparison.md](docs/benchmark-comparison.md) | Federated vs centralized benchmark fairness audit |\n| [CONTRIBUTING.md](CONTRIBUTING.md) | Contributing guidelines and code style |\n\nFor AI assistants (Claude, GPT, etc.), see [docs/CLAUDE.md](docs/CLAUDE.md) for comprehensive codebase context.\n\n---\n\n## Citation\n\nIf you use this code in your research, please cite:\n\n```bibtex\n@thesis{chen2026dscatnet_fl,\n    title={Federated Learning for Skin Cancer Classification using Lightweight Vision Transformers},\n    author={Chen, Leonardo},\n    year={2026},\n    school={Universidad Politécnica de Madrid}\n}\n```\n\n**DSCATNet Reference:**\n\n```bibtex\n@article{dscatnet2024,\n    title={DSCATNet: Dual-Scale Cross-Attention Vision Transformer for Skin Cancer Classification},\n    journal={PLOS ONE},\n    year={2024}\n}\n```\n\n---\n\n## License\n\nThis project is licensed under the Apache 2.0 License - see [LICENSE](LICENSE) for details.\n\n---\n\n## Acknowledgments\n\n- DSCATNet authors for the original architecture\n- Flower team for the FL framework\n- ISIC Archive for the dermoscopy datasets\n- Universidad Politécnica de Madrid\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleonidasdev%2Ffederated-light-skin-cancer-classification","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleonidasdev%2Ffederated-light-skin-cancer-classification","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleonidasdev%2Ffederated-light-skin-cancer-classification/lists"}