https://github.com/vamsi-krishna-2005/dl-db-vae-

debiasing deep-learning fairness-ml keras pca tensorflow variational-autoencoder

Last synced: 4 months ago
JSON representation

Host: GitHub
URL: https://github.com/vamsi-krishna-2005/dl-db-vae-
Owner: vamsi-krishna-2005
Created: 2025-06-19T09:37:52.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-06-19T10:37:16.000Z (4 months ago)
Last Synced: 2025-06-19T10:39:31.763Z (4 months ago)
Topics: debiasing, deep-learning, fairness-ml, keras, pca, tensorflow, variational-autoencoder
Language: Jupyter Notebook
Homepage:
Size: 831 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# 🎯 Debiased Face Classification using DB-VAE

This project explores how Variational Autoencoders (VAEs) can help mitigate dataset bias in facial classification. Inspired by real-world fairness concerns, we design a **Debiasing Variational Autoencoder (DB-VAE)** that adaptively samples underrepresented face examples.

---

## 📦 Dataset

- **Positive (faces)**: Subset of CelebA
- **Negative (non-faces)**: Subset of ImageNet / CIFAR-10
- **Bias labels**: Face brightness (Lighter / Darker skin tones)

---

## ⚙️ Working Principle

### ✅ 1. **Standard CNN Classifier**

- Trains on positive + negative examples
- Learns to classify face vs. not face
- May be biased toward over-represented groups (e.g. lighter skin tones)

### ✅ 2. **Variational Autoencoder (VAE)**

- Trained only on face images
- Learns a **latent representation** (features) of faces in an unsupervised way
- Outputs: `μ` (mean vector), `σ` (std dev vector), reconstructed image

### ✅ 3. **PCA + Latent Space Visualization**

- PCA used to project latent vectors to 2D
- Samples colored by brightness → reveals clusters of light/dark faces
- Shows imbalance: some areas are sparse (under-represented)

### ✅ 4. **Adaptive Sampling Strategy**

- Density of samples in latent space is calculated
- Sampling probability for training set ∝ `1 / (density + ε)`
- **Rare faces** (like dark-skinned ones) get sampled **more often**

### ✅ 5. **DB-VAE Classifier**

- CNN is retrained using batches drawn adaptively based on rarity
- Performance improves on rare groups
- Bias is reduced, fairness improved without labels

---

## 🔬 Results

| Model | Validation Accuracy | Notes |
| ----------------- | ------------------- | ----------------------------------------- |
| Standard CNN | 53% | Biased toward lighter faces |
| DB-VAE Classifier | 99% | Fairer + performs better on rare features |

- ✅ Adaptive sampling visualized in PCA
- ✅ DB-VAE improves generalization across skin tone subgroups

---

## 🛠️ Tools Used

- TensorFlow / Keras
- NumPy, matplotlib, seaborn
- PCA (scikit-learn)
- CelebA dataset

## 📚 Learnings

- VAEs can uncover latent structure in unlabeled data
- Sampling based on latent rarity can correct dataset imbalance
- DB-VAE shows improved fairness without needing skin tone labels

---

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vamsi-krishna-2005/dl-db-vae-

Awesome Lists containing this project

README