https://github.com/vamsi-krishna-2005/dl-db-vae-
https://github.com/vamsi-krishna-2005/dl-db-vae-
debiasing deep-learning fairness-ml keras pca tensorflow variational-autoencoder
Last synced: 4 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/vamsi-krishna-2005/dl-db-vae-
- Owner: vamsi-krishna-2005
- Created: 2025-06-19T09:37:52.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2025-06-19T10:37:16.000Z (4 months ago)
- Last Synced: 2025-06-19T10:39:31.763Z (4 months ago)
- Topics: debiasing, deep-learning, fairness-ml, keras, pca, tensorflow, variational-autoencoder
- Language: Jupyter Notebook
- Homepage:
- Size: 831 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🎯 Debiased Face Classification using DB-VAE
This project explores how Variational Autoencoders (VAEs) can help mitigate dataset bias in facial classification. Inspired by real-world fairness concerns, we design a **Debiasing Variational Autoencoder (DB-VAE)** that adaptively samples underrepresented face examples.
---
## 📦 Dataset
- **Positive (faces)**: Subset of CelebA
- **Negative (non-faces)**: Subset of ImageNet / CIFAR-10
- **Bias labels**: Face brightness (Lighter / Darker skin tones)---
## ⚙️ Working Principle
### ✅ 1. **Standard CNN Classifier**
- Trains on positive + negative examples
- Learns to classify face vs. not face
- May be biased toward over-represented groups (e.g. lighter skin tones)### ✅ 2. **Variational Autoencoder (VAE)**
- Trained only on face images
- Learns a **latent representation** (features) of faces in an unsupervised way
- Outputs: `μ` (mean vector), `σ` (std dev vector), reconstructed image### ✅ 3. **PCA + Latent Space Visualization**
- PCA used to project latent vectors to 2D
- Samples colored by brightness → reveals clusters of light/dark faces
- Shows imbalance: some areas are sparse (under-represented)### ✅ 4. **Adaptive Sampling Strategy**
- Density of samples in latent space is calculated
- Sampling probability for training set ∝ `1 / (density + ε)`
- **Rare faces** (like dark-skinned ones) get sampled **more often**### ✅ 5. **DB-VAE Classifier**
- CNN is retrained using batches drawn adaptively based on rarity
- Performance improves on rare groups
- Bias is reduced, fairness improved without labels---
## 🔬 Results
| Model | Validation Accuracy | Notes |
| ----------------- | ------------------- | ----------------------------------------- |
| Standard CNN | 53% | Biased toward lighter faces |
| DB-VAE Classifier | 99% | Fairer + performs better on rare features |- ✅ Adaptive sampling visualized in PCA
- ✅ DB-VAE improves generalization across skin tone subgroups---
## 🛠️ Tools Used
- TensorFlow / Keras
- NumPy, matplotlib, seaborn
- PCA (scikit-learn)
- CelebA dataset## 📚 Learnings
- VAEs can uncover latent structure in unlabeled data
- Sampling based on latent rarity can correct dataset imbalance
- DB-VAE shows improved fairness without needing skin tone labels---