An open API service indexing awesome lists of open source software.

https://github.com/eudalabs/healthwithsevgi

Browser-based ML education tool for healthcare professionals — 7-step wizard covering 20 medical specialties with explainability, bias detection & EU AI Act compliance
https://github.com/eudalabs/healthwithsevgi

bias-detection education eu-ai-act explainability fastapi healthcare machine-learning medical-ai react shap

Last synced: 2 months ago
JSON representation

Browser-based ML education tool for healthcare professionals — 7-step wizard covering 20 medical specialties with explainability, bias detection & EU AI Act compliance

Awesome Lists containing this project

README

          


FastAPI
React
scikit-learn
TypeScript
Docker
Python
SHAP
License: MIT

# HealthWithSevgi

**An interactive, browser-based machine learning education tool for healthcare professionals.**

> **SENG 430 - Software Quality Assurance**
> Cankaya University - Spring 2025-2026
> Instructor: Dr. Sevgi Koyuncu Tunç

HealthWithSevgi guides clinicians through a complete ML pipeline in **7 steps** — from selecting a medical specialty to training a model, interpreting predictions with SHAP, and auditing fairness — all with **zero coding required**.


Live Demo  | 
Jira Board  | 
Figma Designs  | 
Setup Guide

---

## Table of Contents

- [Overview](#overview)
- [The 7-Step Pipeline](#the-7-step-pipeline)
- [Supported Specialties](#supported-specialties)
- [ML Models](#ml-models)
- [Tech Stack](#tech-stack)
- [Architecture](#architecture)
- [Project Structure](#project-structure)
- [Getting Started](#getting-started)
- [API Reference](#api-reference)
- [Testing](#testing)
- [Deployment](#deployment)
- [Branch Strategy](#branch-strategy)
- [Team](#team)
- [License](#license)

---

## Overview

Healthcare professionals increasingly encounter AI/ML in clinical settings but rarely get hands-on experience with how these systems work. HealthWithSevgi bridges that gap by providing an intuitive, wizard-style interface that walks users through every stage of the machine learning lifecycle using real clinical datasets.

**Key capabilities:**

- **20 medical specialties** with real-world clinical datasets (Cardiology, Oncology, Nephrology, Neurology, ICU/Sepsis, Dermatology, and more)
- **8 ML classifiers** with interactive hyperparameter tuning via sliders
- **SHAP-based explainability** — global feature importance and single-patient waterfall explanations
- **Fairness auditing** — subgroup performance analysis across demographics with bias detection
- **EU AI Act compliance checklist** with downloadable PDF certificate
- **No server-side data storage** — all session data is held in-memory and evicted automatically

---

## The 7-Step Pipeline

| Step | Name | What Happens |
|:----:|------|-------------|
| **1** | **Clinical Context** | Introduces the medical problem the AI will address. Displays the clinical question, why it matters, and the 7-step roadmap. |
| **2** | **Data Exploration** | Upload a CSV file (up to 50 MB) or load a built-in clinical dataset. Inspect column statistics, missing values, and class distribution. Confirm the target variable. |
| **3** | **Data Preparation** | Configure preprocessing: train/test split ratio, missing value strategy (median/mode/drop), normalization (z-score/min-max), SMOTE for class imbalance, and outlier handling (IQR/z-score clipping). |
| **4** | **Model & Parameters** | Choose from 8 ML models. Adjust hyperparameters with intuitive sliders. Optionally enable hyperparameter tuning (RandomizedSearchCV) and feature selection (VarianceThreshold + SelectKBest). |
| **5** | **Results & Evaluation** | View accuracy, sensitivity, specificity, precision, F1, AUC-ROC, and MCC. Explore interactive ROC curves, precision-recall curves, and confusion matrices. Detect overfitting via cross-validation comparison. |
| **6** | **Explainability** | Global feature importance ranking with clinical name mapping. Single-patient SHAP waterfall charts with plain-language summaries (e.g., _"High glucose increases diabetes risk by 0.23"_). |
| **7** | **Ethics & Bias** | Subgroup fairness audit (by age, gender, ethnicity). Bias warnings for performance gaps >10%. EU AI Act compliance checklist. Real-world case studies of AI bias in healthcare. Downloadable PDF compliance certificate. |

---

## Supported Specialties

| # | Specialty | Prediction Task | Dataset | Samples |
|---|-----------|-----------------|---------|--------:|
| 1 | Cardiology | 30-day heart failure mortality | Heart Failure Clinical Records | ~300 |
| 2 | Radiology | Pneumonia detection (chest X-ray metadata) | NIH Chest X-ray | 100K+ |
| 3 | Nephrology | Chronic kidney disease detection | UCI CKD | 400 |
| 4 | Oncology - Breast | Malignant vs. benign biopsy | Wisconsin Breast Cancer | 569 |
| 5 | Neurology - Parkinson's | Parkinson's from voice biomarkers | UCI Parkinson's | 195 |
| 6 | Endocrinology - Diabetes | Diabetes onset within 5 years | Pima Indians | 768 |
| 7 | Hepatology - Liver | Liver disease detection | Indian Liver Patient | 583 |
| 8 | Cardiology - Stroke | Stroke risk prediction | Kaggle Stroke Prediction | 5,110 |
| 9 | Mental Health | Depression severity (PHQ-9) | Kaggle Depression | ~1,000 |
| 10 | Pulmonology - COPD | COPD exacerbation risk | PhysioNet + Kaggle | ~1,000 |
| 11 | Haematology - Anaemia | Anaemia type classification | Kaggle Anaemia | ~400 |
| 12 | Dermatology | Benign vs. malignant skin lesion | HAM10000 metadata | ~10K |
| 13 | Ophthalmology | Diabetic retinopathy detection | UCI Diabetic Retinopathy | 1,151 |
| 14 | Orthopaedics - Spine | Disc herniation / spondylolisthesis | UCI Vertebral Column | 310 |
| 15 | ICU / Sepsis | Sepsis onset within 6 hours | PhysioNet Sepsis | ~40K |
| 16 | Obstetrics - Fetal Health | Fetal health classification (CTG) | UCI Fetal Health | 2,126 |
| 17 | Cardiology - Arrhythmia | Arrhythmia detection (ECG) | UCI Arrhythmia | 452 |
| 18 | Oncology - Cervical | Cervical cancer risk | UCI Cervical Cancer | 858 |
| 19 | Thyroid / Endocrinology | Thyroid function classification | UCI Thyroid | 9,172 |
| 20 | Pharmacy - Readmission | Hospital readmission risk | UCI Diabetes 130-US | 101,766 |

---

## ML Models

| Model | Category | Key Hyperparameters |
|-------|----------|---------------------|
| **K-Nearest Neighbors** | Instance-based | k (1-25), distance metric |
| **Support Vector Machine** | Boundary-based | C (0.01-100), kernel (linear/rbf/poly) |
| **Decision Tree** | Tree-based | max_depth (1-20), criterion (gini/entropy) |
| **Random Forest** | Ensemble | n_estimators (10-500), max_depth |
| **Logistic Regression** | Linear | C (0.001-100), solver (lbfgs/saga) |
| **Naive Bayes** | Probabilistic | var_smoothing (1e-12 to 1e-3) |
| **XGBoost** | Gradient Boosting | n_estimators, max_depth, learning_rate |
| **LightGBM** | Gradient Boosting | n_estimators, max_depth, learning_rate |

All models are trained with balanced class weights where supported. Optional hyperparameter tuning uses RandomizedSearchCV (20 iterations, 3-fold CV). Feature selection combines VarianceThreshold with SelectKBest (mutual information).

---

## Tech Stack

| Layer | Technology | Purpose |
|-------|-----------|---------|
| **Frontend** | React 18, TypeScript, Vite | Single-page wizard application |
| **UI Components** | Recharts, Lucide Icons, react-dropzone | Charts, icons, file uploads |
| **State Management** | TanStack React Query | Server state caching and synchronization |
| **Backend** | FastAPI, Python 3.12 | REST API with auto-generated OpenAPI docs |
| **ML Engine** | scikit-learn, XGBoost, LightGBM | Model training, evaluation, cross-validation |
| **Explainability** | SHAP | TreeExplainer (tree models), KernelExplainer (linear), permutation importance |
| **Data Processing** | pandas, numpy, imbalanced-learn | Data cleaning, normalization, SMOTE |
| **PDF Generation** | ReportLab | Compliance certificate export |
| **Containerization** | Docker (multi-stage) | Production deployment |
| **Hosting** | HuggingFace Spaces | Live demo environment |
| **Package Manager** | pnpm (frontend), pip (backend) | Dependency management |

---

## Architecture

📐 **[Full Architecture Diagrams (Google Drive)](https://drive.google.com/drive/folders/1AkMxaaPLizfPlfJDjkP7YISSiOEiL9tV?usp=sharing)** — C4 model diagrams (System Context, Container, Component, Code levels), toolchain diagrams, and data flow sequences.

```
+---------------------+
| Browser (React) |
| Wizard UI (SPA) |
+----------+----------+
|
HTTP/REST (JSON)
|
+----------v----------+
| FastAPI Backend |
+----------+----------+
|
+----------------------+----------------------+
| | | |
+--------v---+ +------v-----+ +-----v------+ +-----v--------+
| DataService| | MLService | |ExplainSvc | | EthicsService|
| | | | | | | |
| - Explore | | - Train | | - SHAP | | - Subgroup |
| - Prepare | | - Evaluate | | - Waterfall| | - Bias detect|
| - SMOTE | | - Compare | | - Clinical | | - EU AI Act |
+-----+------+ +------+-----+ +------+-----+ +------+-------+
| | | |
v v v v
+-----------+ +------------+ +------------+ +-----------+
| In-Memory | | In-Memory | | SHAP | | ReportLab |
| Sessions | | Models | | Library | | PDF Gen |
| (LRU 50) | | (LRU 100+)| | | | |
+-----------+ +------------+ +------------+ +-----------+
```

**Data flow:** Upload CSV -> Explore columns -> Preprocess (split, normalize, SMOTE) -> Train model -> Evaluate metrics -> SHAP explanations -> Fairness audit -> PDF certificate

---

## Project Structure

```
HealthWithSevgi/
|
+-- frontend/ # React 18 + Vite + TypeScript
| +-- src/
| | +-- pages/ # Step 1-7 wizard pages
| | | +-- Step1ClinicalContext.tsx
| | | +-- Step2DataExploration.tsx
| | | +-- Step3DataPreparation.tsx
| | | +-- Step4ModelParameters.tsx
| | | +-- Step5Results.tsx
| | | +-- Step6Explainability.tsx
| | | +-- Step7Ethics.tsx
| | +-- components/ # Reusable UI components
| | | +-- NavBar.tsx # Specialty switcher, glossary
| | | +-- WizardProgress.tsx # Step progress tracker
| | | +-- SpecialtySelector.tsx # 20-specialty grid
| | | +-- ColumnMapperModal.tsx # Target column confirmation
| | | +-- ErrorModal.tsx # Error display modal
| | | +-- charts/ # Visualization components
| | | +-- ConfusionMatrixChart.tsx # 2x2 confusion matrix
| | | +-- KNNScatterCanvas.tsx # KNN decision boundary
| | | +-- PRCurveChart.tsx # Precision-Recall curve
| | | +-- ROCCurveChart.tsx # ROC curve with AUC badge
| | +-- api/ # API client layer
| | | +-- client.ts # Axios instance + interceptors
| | | +-- specialties.ts # Specialty endpoints
| | | +-- data.ts # Explore + Prepare endpoints
| | | +-- ml.ts # Train + Compare endpoints
| | | +-- explain.ts # Explainability + Ethics + Certificate
| | +-- types/index.ts # Shared TypeScript interfaces
| | +-- styles/globals.css # Global CSS + theme variables
| | +-- App.tsx # Main wizard state manager
| | +-- main.tsx # Application entry point
| +-- package.json
| +-- vite.config.ts
|
+-- backend/ # FastAPI REST API + ML engine
| +-- app/
| | +-- main.py # FastAPI setup, CORS, routers
| | +-- routers/
| | | +-- data_router.py # /specialties, /explore, /prepare
| | | +-- ml_router.py # /train, /compare, /models
| | | +-- explain_router.py # /explain/*, /ethics, /certificate
| | +-- services/
| | | +-- data_service.py # Dataset loading, exploration, preprocessing
| | | +-- ml_service.py # Model building, training, evaluation
| | | +-- explain_service.py # SHAP explanations, clinical mapping
| | | +-- ethics_service.py # Fairness audit, bias detection
| | | +-- certificate_service.py # PDF certificate generation
| | | +-- specialty_registry.py # 20 specialty definitions + datasets
| | +-- models/
| | | +-- schemas.py # Data exploration/preparation DTOs
| | | +-- ml_schemas.py # Training/evaluation DTOs
| | | +-- explain_schemas.py # Explainability/ethics DTOs
| | +-- utils/ # Utility modules
| +-- data_cache/ # Cached clinical CSV datasets
| +-- datasets/ # Additional dataset storage
| +-- tests/ # pytest test suite (178 tests)
| | +-- conftest.py # Shared fixtures
| | +-- test_step1_clinical_context.py
| | +-- test_step2_data_exploration.py
| | +-- test_step3_data_preparation.py
| | +-- test_step6_explainability.py
| | +-- test_step7_ethics.py
| | +-- test_certificate.py
| +-- pytest.ini
| +-- requirements.txt
|
+-- hf-space/ # HuggingFace Spaces deployment
| +-- main_hf.py # Combined API + SPA entrypoint
| +-- Dockerfile # HF-specific Docker build
| +-- README.md # HF Space metadata
|
+-- docs/ # Documentation & design specs
| +-- ML_Tool_User_Guide.md # Course user manual
| +-- Sprint_1_Assignment.md # Sprint 1 requirements
| +-- Clinical_Specialties_Dataset_Collection.pdf
| +-- diagrams/ # C4 architecture + toolchain PDFs
| +-- drawio/ # Editable draw.io source files
| +-- mermaid/ # C4 architecture (Mermaid source)
| +-- iso42001/ # ISO 42001 AI governance report
| +-- seng430-sprints/ # Sprint requirements from instructor
| +-- qa/ # QA test reports (PDF)
| +-- reports/ # Progress reports + screenshots
|
+-- jira/ # Jira backlog documentation
| +-- JIRA.md # Product backlog report
| +-- SPRINT_1_TASK_BOARD.md # Sprint 1 task breakdown
|
+-- local/ # Local-only extensions
| +-- model-arena/ # Model Arena comparison feature
| +-- arena/ # Backend (router, service, schemas)
| +-- frontend/ # Frontend (ArenaPage, charts, hooks)
|
+-- .github/
| +-- pull_request_template.md # PR template linked to Jira
| +-- workflows/deploy-hf.yml # Auto-deploy to HuggingFace on release
|
+-- Dockerfile # Multi-stage build (Node + Python)
+-- docker-compose.yml # Local development orchestration
+-- .dockerignore
+-- .gitignore
+-- CLAUDE.md # AI coding assistant context
+-- SETUP.md # Local development setup guide
+-- README.md
```

---

## Live Demo & Docker

### 🌐 Live Demo

The application is deployed on HuggingFace Spaces — no installation required:

**➡️ [0xbatuhan4-healthwithsevgi.hf.space](https://0xbatuhan4-healthwithsevgi.hf.space/)**

### 🐳 Docker (single command)

Pull and run the pre-built container image from GitHub Container Registry:

```bash
docker run -p 7860:7860 ghcr.io/eudalabs/healthwithsevgi:latest
```

Open **http://localhost:7860** — that's it.

Alternatively, build from source:

```bash
git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker build -t healthwithsevgi .
docker run -p 7860:7860 healthwithsevgi
```

### Docker Compose (one-command start)

```bash
git clone https://github.com/EudaLabs/HealthWithSevgi.git
cd HealthWithSevgi
docker compose up -d
```

`docker-compose.yml` pulls the pre-built `ghcr.io/eudalabs/healthwithsevgi:latest` image when available and falls back to a local multi-stage build (Node → Vite → Python). Either way, the full stack — React SPA **and** FastAPI — is served from a single container on **http://localhost:7860**.

**Measured startup (pre-built image, warm Docker daemon):** ~**8 seconds** from `docker compose up -d` to HTTP 200 on `/api/specialties` — well inside the Sprint 5 30-second target (see `docs/reports/Sprint5_Docker_Running.png`).

**First-time local build:** ~3–6 minutes (installs pnpm + pip dependencies). Force a rebuild with `docker compose up --build`.

Container name is `healthwithsevgi`; the compose file also wires a healthcheck that probes `/api/specialties` every 10s.

To stop: `docker compose down`.

---

## Quick Start

### Prerequisites (for local development)

| Tool | Version | Required For |
|------|---------|-------------|
| Python | >= 3.10 | Backend |
| Node.js | >= 18 | Frontend |
| Git | latest | Version control |

### Local Development

**Backend:**

```bash
cd backend

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows

# Install dependencies
pip install -r requirements.txt

# Start the API server
uvicorn app.main:app --reload --port 8001
```

API docs available at: **http://localhost:8001/docs** (Swagger UI)

**Frontend** (in a separate terminal):

```bash
cd frontend

# Install dependencies
pnpm install

# Start the dev server
pnpm dev
```

App available at: **http://localhost:5173** (proxies `/api` requests to port 8001)

### Environment Variables

Create a `.env` file in the project root:

```env
# Backend
BACKEND_PORT=8001
DEBUG=true

# Frontend (Vite uses VITE_ prefix)
VITE_API_URL=http://localhost:8001
```

---

## API Reference

All endpoints are prefixed with `/api`. Full interactive documentation is available at `/docs` when the backend is running.

### Specialties

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/api/specialties` | List all 20 specialties |
| `GET` | `/api/specialties/{id}` | Get specialty details (description, features, clinical context) |

### Data

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/api/explore` | Upload CSV or load built-in dataset; returns column stats + class distribution |
| `POST` | `/api/prepare` | Preprocess data (split, normalize, SMOTE); returns `session_id` |

### ML Training

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/api/train` | Train a model; returns `model_id` + evaluation metrics |
| `POST` | `/api/compare/{model_id}` | Add model to comparison table |
| `GET` | `/api/compare/{session_id}` | Get all compared models for a session |
| `DELETE` | `/api/compare/{session_id}` | Clear comparison table |
| `GET` | `/api/models/{model_id}` | Get model metadata |

### Explainability

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/api/explain/global/{model_id}` | Global feature importance (top 10 features + clinical names) |
| `GET` | `/api/explain/patient/{model_id}/{index}` | Single-patient SHAP waterfall explanation |

### Ethics & Certificate

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/api/ethics/{model_id}` | Subgroup fairness audit + bias warnings + checklist |
| `POST` | `/api/ethics/checklist` | Update EU AI Act checklist item |
| `POST` | `/api/generate-certificate` | Generate and download PDF compliance certificate |

> Full endpoint reference (request/response schemas, error codes, typical flow) lives on the wiki: **[API](../../wiki/API)**.

### Health

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | Status check (`{status: "ok"}`) |
| `GET` | `/health` | Health probe (`{status: "healthy"}`) |

---

## Testing

The project includes a comprehensive pytest suite covering all 7 steps of the pipeline — **191 tests** across 7 test files.

```bash
cd backend

# Run all tests
pytest -v

# Run a specific test file
pytest -v tests/test_step1_clinical_context.py

# Run only slow tests (domain context validation)
pytest -v -m slow
```

**Test coverage:**

| Test File | Covers | Key Assertions |
|-----------|--------|----------------|
| `test_step1_clinical_context.py` | Specialty registry | All 20 specialties present, required fields non-empty, clinical context > 50 chars, 404 handling |
| `test_step2_data_exploration.py` | Data exploration | CSV upload validation, missing value detection, class distribution, imbalance warnings |
| `test_step3_data_preparation.py` | Preprocessing | Missing strategies (median/mode/drop), normalization, train/test split, SMOTE, data leakage prevention |
| `test_step4_arena_latency.py` | Model Arena | Training latency, cross-model comparison, session consistency |
| `test_step6_explainability.py` | SHAP explanations | Global importance, patient explanation, What-If analysis, sample patient selection |
| `test_step7_ethics.py` | Fairness audit | Ethics endpoint, case study severity, checklist toggle, bias detection thresholds |
| `test_certificate.py` | PDF generation | Certificate content type, PDF magic bytes, checklist state persistence |

**Total: 191 tests — all passing.**

---

## Deployment

### HuggingFace Spaces

The production deployment runs on HuggingFace Spaces as a Docker container. The multi-stage Dockerfile:

1. **Stage 1** — Builds the React frontend with pnpm
2. **Stage 2** — Installs Python dependencies
3. **Stage 3** — Combines both into a slim Python 3.12 runtime serving the SPA + API on port 7860

`hf-space/main_hf.py` serves both the FastAPI backend and the static React build from a single process.

**Live demo:** [0xbatuhan4-healthwithsevgi.hf.space](https://0xbatuhan4-healthwithsevgi.hf.space/)

---

## Branch Strategy

| Branch | Purpose |
|--------|---------|
| `main` | Production-ready, protected |
| `develop` | Integration branch for sprint work |
| `feature/US-XXX` | One branch per user story |

**Rules:**
- All changes go through Pull Requests (use the [PR template](.github/pull_request_template.md))
- PRs require at least 1 approval
- `main` and `develop` are protected — no direct pushes
- PR titles follow: `feat/fix/docs(US-XXX): description`

---

## Team

| Role | Name | Student ID |
|------|------|:----------:|
| Product Owner + Developer | Efe Çelik | 202128016 |
| UX Designer | Burak Aydoğmuş | 202128028 |
| Lead Developer + Scrum Master | Batuhan Bayazıt | 202228008 |
| Developer | Berat Mert Gökkaya | 202228019 |
| QA / Documentation Lead | Berfin Duru Alkan | 202228005 |

---

## Links

- **Live Demo:** [0xbatuhan4-healthwithsevgi.hf.space](https://0xbatuhan4-healthwithsevgi.hf.space/)
- **Jira Board:** [Jira](https://berfindurualkan.atlassian.net/jira/software/projects/SCRUM/boards/1/backlog)
- **Figma Designs:** [Figma](https://www.figma.com/design/1K1Dw8PC6P98NZAa30DzII/430-HealthWithSevgi?node-id=0-1)
- **GitHub Wiki:** [Wiki](../../wiki)
- **API Docs:** `http://localhost:8001/docs` (when running locally)

---

## License

Released under the [**MIT License**](LICENSE) — you are free to use, copy, modify, and distribute this software with attribution.

Developed as part of the **SENG 430 Software Quality Assurance** course at Cankaya University by the EudaLabs team.