https://github.com/code-with-zeeshan/skin-lesion-classification
🏥 AI-powered skin lesion classifier (Benign vs Malignant) using EfficientNetB3 + Grad-CAM. AUC: 0.935 | Recall: 94.7%
https://github.com/code-with-zeeshan/skin-lesion-classification
deep-learning efficientnet grad-cam gradio image-classification medical-ai python skin-cancer tensorflow transfer-learning
Last synced: 9 days ago
JSON representation
🏥 AI-powered skin lesion classifier (Benign vs Malignant) using EfficientNetB3 + Grad-CAM. AUC: 0.935 | Recall: 94.7%
- Host: GitHub
- URL: https://github.com/code-with-zeeshan/skin-lesion-classification
- Owner: code-with-zeeshan
- Created: 2026-03-17T22:19:00.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-23T12:15:31.000Z (3 months ago)
- Last Synced: 2026-03-23T12:32:07.271Z (3 months ago)
- Topics: deep-learning, efficientnet, grad-cam, gradio, image-classification, medical-ai, python, skin-cancer, tensorflow, transfer-learning
- Language: Jupyter Notebook
- Homepage: https://huggingface.co/spaces/code-with-zeeshan/skin-lesion-classifier
- Size: 28.4 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🏥 Skin Lesion Classification: Benign vs Malignant
## 📋 Overview
A deep learning-based screening tool that classifies skin lesion images
as **benign** or **malignant** using EfficientNetB3 with transfer learning,
threshold optimization, test-time augmentation (TTA), and Grad-CAM explainability.
| Metric | Score |
|---|---|
| **AUC-ROC** | 0.935 |
| **Cancer Detection (Recall)** | 94.7% |
| **Accuracy** | 83.8% |
| **F1 Score** | 0.842 |
| **MCC** | 0.698 |
| **Missed Cancers** | Only 16/300 (5.3%) |
> ⚕️ **Disclaimer:** For educational/screening purposes only.
> Not a substitute for professional medical diagnosis.
---
## 🏗️ Architecture
```
Input Image (300×300×3)
│
▼
EfficientNetB3 (Pre-trained ImageNet)
│ Phase 1: Frozen base
│ Phase 2: Fine-tuned last 50 layers
▼
GlobalAveragePooling2D
│
BatchNormalization
│
Dense(512, ReLU) → Dropout(0.5)
Dense(256, ReLU) → Dropout(0.4)
Dense(128, ReLU) → Dropout(0.3)
│
Dense(1, Sigmoid) → Benign (0) / Malignant (1)
│
▼
Threshold Optimization (0.38) + Test-Time Augmentation (10 rounds)
```
---
## 📁 Project Structure
```
skin-lesion-classification/
├── README.md ← Project documentation
├── requirements.txt ← Dependencies
├── final_config.json ← Model config
├── .gitignore ← Excludes .keras files
├── notebooks/
│ └── skin_lesion_classification_v2.ipynb ← Clean Colab notebook
├── app/
│ └── gradio_app.py ← Standalone Gradio app
├── assets/
│ ├── banner.png ← Project banner
│ ├── training_history.png ← Training curves
│ ├── confusion_matrix.png ← CM plot
│ ├── roc_curve.png ← ROC curve
│ ├── score_distribution.png ← Score distribution
│ ├── complete_evaluation.png ← All metrics visual
│ ├── gradcam_samples.png ← Grad-CAM examples
│ ├── error_analysis.png ← Wrong predictions
│ ├── class_distribution.png ← Dataset EDA
│ ├── final_evaluation.png ← final metrics
│ └── sample_images.png ← Sample images
├── docs/
│ ├── model_evaluation_report.md ← Full eval report
│ └── deployment_guide.md ← Deployment instructions
└── samples/
├── benign/
│ ├── image1.jpg
│ └── image2.jpg
└── malignant/
├── image1.jpg
└── image2.jpg
```
---
## 📥 Model Downloads
Models are hosted on **HuggingFace** (too large for GitHub):
| File | Size | Link |
|---|---|---|
| `model_b3.keras` | ~96.4 MB | [HuggingFace Space](https://huggingface.co/spaces/code-with-zeeshan/skin-lesion-classifier/blob/main/model_b3.keras) |
| `model_b0.keras` | ~33.4 MB | [HuggingFace Space](https://huggingface.co/spaces/code-with-zeeshan/skin-lesion-classifier/blob/main/model_b0.keras) |
| `final_config.json` | 1 KB | Included in this repo |
### Automatic Download
The Gradio app **automatically downloads** models from HuggingFace on first run:
```bash
python app/gradio_app.py
```
### Manual Download
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="code-with-zeeshan/skin-lesion-classifier",
filename="model_b3.keras",
repo_type="space"
)
```
---
## ⚙️ Two Prediction Modes
| Mode | AUC | Recall | Accuracy | Speed |
|---|---|---|---|---|
| ⚡ **Fast Mode** | 0.911 | 90.0% | 81.7% | ~2 sec |
| 🎯 **Best Mode (TTA)** | 0.935 | 94.7% | 83.8% | ~20 sec |
---
## 🔧 Key Technical Decisions
### 1. Why EfficientNetB3?
- **Compound scaling** optimizes depth, width, and resolution simultaneously
- 81.6% ImageNet accuracy with only 12.3M parameters
- Superior accuracy-per-parameter vs ResNet50 (25.6M params, 76.1% accuracy)
- Built-in preprocessing eliminates manual rescaling errors
### 2. Why NO `rescale=1./255`?
EfficientNet has **internal preprocessing layers** expecting [0, 255] input.
Adding `rescale=1./255` causes double preprocessing → model receives garbage
values → learns nothing (predicts all one class). This was our first major
bug fix.
### 3. Why Two-Phase Training?
- **Phase 1 (Frozen):** Trains only the custom head while preserving ImageNet features
- **Phase 2 (Fine-tune):** Unfreezes last 50 layers with 10× smaller learning rate
to adapt base features to skin lesion domain without catastrophic forgetting
### 4. Why Threshold Optimization?
Default 0.5 threshold gave 82% recall. In medical screening, missing cancers
(false negatives) is more dangerous than false alarms. Optimized threshold
(0.38) targets ≥90% recall, accepting slightly more false alarms for safety.
### 5. Why Test-Time Augmentation?
TTA averages predictions over 10 augmented versions of each image.
Free +2.4% AUC improvement with zero retraining. Single biggest
no-cost accuracy boost.
---
## 📊 Results
### Evolution of the Model
| Stage | AUC | Recall | Missed Cancers | Key Change |
|---|---|---|---|---|
| Broken V1 | 0.349 | 0.0% | 300/300 | Bug: double preprocessing |
| Fixed V1 (B0) | 0.908 | 85.7% | 43/300 | Removed rescale, optimized threshold |
| V2 (B3) | 0.911 | 90.0% | 30/300 | Upgraded to EfficientNetB3 |
| **V2 (B3+TTA)** | **0.935** | **94.7%** | **16/300** | Added test-time augmentation |
### Final Metrics (B3 + TTA on Unseen Test Data)
| Metric | Score | Grade |
|---|---|---|
| AUC-ROC | 0.935 | 🟢 Good |
| Accuracy | 83.8% | 🟡 Okay |
| Balanced Accuracy | 84.7% | 🟡 Okay |
| Sensitivity (Recall) | 94.7% | 🏆 Excellent |
| Specificity | 74.7% | 🟡 Okay |
| Precision (Malignant) | 75.7% | 🟡 Okay |
| F1 Score (Malignant) | 0.842 | 🟡 Okay |
| MCC | 0.698 | 🟢 Good |
| NPV | 94.4% | 🏆 Excellent |
| **Overall Verdict** | | **🟢 Good — Reliable for Screening** |
### Confusion Matrix
```
Predicted Benign Predicted Malignant
Actual Benign 269 91
Actual Malignant 16 284
```
---
## 🔥 Features
- 📷 **Image Upload:** Upload any skin lesion image
- 👤 **Patient Info:** Name, age, gender for personalized report
- 🔬 **Prediction:** Benign vs Malignant with confidence score
- 🔥 **Grad-CAM:** Visual explanation of model focus areas
- 🛡️ **Precautions:** Automated medical precaution report
- ⚡ **Two Modes:** Fast (single prediction) or Best (TTA)
---
## 🔥 Grad-CAM Visualization
Grad-CAM shows the model focuses on lesion borders, color variations,
and texture irregularities — consistent with dermatological diagnostic criteria.
---
## 🚀 Quick Start
### Prerequisites
```bash
pip install -r requirements.txt
```
### Run Gradio App (Downloads model automatically)
```bash
python app/gradio_app.py
```
### Run Inference (Code)
```python
from tensorflow.keras.models import load_model
import numpy as np, json
from tensorflow.keras.preprocessing import image
model = load_model('model_b3.keras')
config = json.load(open('final_config.json'))
img = image.load_img('path/to/lesion.jpg', target_size=(300, 300))
img_array = np.expand_dims(image.img_to_array(img), axis=0) # No /255!
pred = model.predict(img_array)[0][0]
label = "MALIGNANT" if pred >= config['threshold'] else "BENIGN"
print(f"{label} (score: {pred:.4f})")
```
---
## 🔄 How to Reproduce
1. **Clone the repository:**
```bash
git clone https://github.com/code-with-zeeshan/skin-lesion-classification.git
cd skin-lesion-classification
```
2. **Install dependencies:**
```bash
pip install -r requirements.txt
```
3. **Option A — Run Gradio App (uses pre-trained model):**
```bash
python app/gradio_app.py
```
4. **Option B — Retrain from scratch:**
- Open `notebooks/skin_lesion_classification_v2.ipynb` in Google Colab
- Upload the Skin Lesions Classification dataset to Google Drive
- Run all cells in order
- Training takes ~30-45 minutes on Colab GPU
5. **Option C — Open in Colab directly:**
[](https://colab.research.google.com/github/code-with-zeeshan/skin-lesion-classification/blob/main/notebooks/skin_lesion_classification_v2.ipynb)
---
## 📈 Lessons Learned
### Bugs Encountered & Fixed
| Bug | Impact | Fix |
|---|---|---|
| `rescale=1./255` with EfficientNet | Model predicted all benign (AUC: 0.35) | Removed rescaling — EfficientNet has built-in preprocessing |
| `layer.output_shape` in Grad-CAM | AttributeError in TF 2.16+ | Hardcoded `top_activation` layer name |
| Default threshold (0.5) | 82% recall → missed many cancers | Threshold optimization → 94.7% recall |
| Gradio 6.0 breaking changes | `theme` and `show_copy_button` errors | Removed deprecated parameters |
### What Worked Best
1. **Transfer learning** > training from scratch (small dataset)
2. **Threshold optimization** gave the biggest practical improvement
3. **TTA** provided free +2.4% AUC boost
4. **Class weights** helped with mild imbalance
### What Didn't Help Much
1. **EfficientNetB3 vs B0:** Only marginal AUC improvement
2. **Ensemble (B0+B3):** Didn't beat B3+TTA alone
3. Both hit the same ~0.93 AUC ceiling → **dataset size is the bottleneck**
---
## 🛠️ Tools & Acknowledgments
- **AI Assistance:** [Claude AI](https://claude.ai) (Anthropic) was used as
a productivity tool for:
- Code generation and debugging
- Model architecture suggestions
- Performance analysis and optimization recommendations
- Documentation structure guidance
All experimental decisions, hyperparameter choices, result interpretation,
and project direction were made by the author. Claude served as an
accelerator — similar to using Stack Overflow or documentation — not as
the decision maker.
- **Framework:** TensorFlow 2.x / Keras
- **Pre-trained Model:** EfficientNetB3 (ImageNet weights)
- **Deployment:** Gradio
- **Environment:** Google Colab (GPU runtime)
---
## Docs
- [Deployment](docs/deployment_guide.md)
- [Evaluation](docs/model_evaluation_report.md)
---
## Demo Video
- [Demo](assets/Skin_Lesion_Classification_demo.gif)
## 🔮 Future Improvements
- [ ] Add more training data (ISIC Archive: 25,000+ images)
- [ ] Try advanced augmentation (albumentations: CLAHE, elastic transform)
- [ ] Implement 5-fold cross-validation for more robust evaluation
- [ ] Deploy permanently on HuggingFace Spaces [DEPLOYED](https://huggingface.co/spaces/code-with-zeeshan/skin-lesion-classifier)
- [ ] Add multi-class classification (melanoma, BCC, SCC, etc.)
---
## 📄 License
This project is for educational purposes only. Not intended for clinical use.
---
## 📬 Contact
MOHAMMAD ZEESHAN — [LinkedIn](https://www.linkedin.com/in/mohammad-zeeshan-37637a1a5)