An open API service indexing awesome lists of open source software.

https://github.com/priyankagnana/deepshield

DeepShield is a real-time deepfake face detection system built using EfficientNet-B0 and transfer learning, achieving 92% F1-score on 140K face dataset.
https://github.com/priyankagnana/deepshield

artificial-intelligence binary-classification cnn computer-vision cybersecurity deepfake-detection efficientnet-b0 face-classification image-classification machine-learning pytorch transfer-learning

Last synced: 3 months ago
JSON representation

DeepShield is a real-time deepfake face detection system built using EfficientNet-B0 and transfer learning, achieving 92% F1-score on 140K face dataset.

Awesome Lists containing this project

README

          

# ๐Ÿ›ก๏ธ DeepShield โ€” Real-Time Deepfake Detection System

A fully offline, explainable deepfake detection system built on **EfficientNet-B0** with Grad-CAM visual explanations, a polished Streamlit UI, and a FastAPI backend for real-time inference.

---

## ๐Ÿ“Œ Overview

Deepfake technology uses generative AI to create highly realistic synthetic faces in images and videos. While powerful, it poses serious risks โ€” misinformation, identity fraud, impersonation, and reputational harm.

**DeepShield** addresses this with a privacy-first, fully offline detection pipeline that:

- Classifies images and videos as โœ… **Real** or ๐Ÿšจ **Fake**
- Provides **confidence scores** and **P(Real) / P(Fake)** probabilities
- Explains decisions visually using **Grad-CAM heatmaps**
- Runs entirely on your machine โ€” **no cloud calls, no data leaves your device**

---

## ๐Ÿ—๏ธ System Architecture

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Input (Image / โ”‚
โ”‚ Video / Webcam) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Face Detection โ”‚ โ† OpenCV Haar Cascade (face_detector.py)
โ”‚ & Frame Sampling โ”‚ โ† Frame extractor (frame_extractor.py)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Image Preprocessingโ”‚ โ† Resize 224ร—224, ImageNet normalize
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ DeepfakeCNN Model โ”‚
โ”‚ โ”‚
โ”‚ EfficientNet-B0 (Spatial Branch) โ”‚ โ†’ 1280-dim features
โ”‚ + โ”‚
โ”‚ FrequencyBranch (FFT Spectrum) โ”‚ โ†’ 128-dim features [opt-in]
โ”‚ โ”‚
โ”‚ Fused โ†’ Linear head โ†’ Binary logit โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Classification โ”‚ Real / Fake + confidence score
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ†“
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Grad-CAM Module โ”‚ Visual heatmap over suspicious regions
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
```

---

## โœจ Key Features

| Feature | Details |
|---|---|
| **EfficientNet-B0 backbone** | ImageNet-pretrained, two-phase fine-tuning |
| **Frequency-domain analysis** | Optional FFT branch detects GAN grid artefacts |
| **Face detection** | OpenCV Haar cascade โ€” crops to face before inference |
| **Grad-CAM explanations** | Heatmap overlay showing which regions drove the decision |
| **Full video analysis** | Samples N frames evenly, aggregates with majority vote + timeline chart |
| **Live webcam** | `streamlit-webrtc` in the UI + CLI realtime script |
| **FastAPI backend** | REST + WebSocket endpoints for image, video, and frame streaming |
| **Fully offline** | No internet connection required for inference |
| **MPS / CUDA / CPU** | Auto-detects Apple Silicon, NVIDIA GPU, or CPU |

---

## ๐Ÿ“‚ Project Structure

```
DeepShield/
โ”‚
โ”œโ”€โ”€ api/ โ† FastAPI backend
โ”‚ โ”œโ”€โ”€ main.py โ† App entry point, model loaded at startup
โ”‚ โ”œโ”€โ”€ schemas.py โ† Pydantic response models
โ”‚ โ””โ”€โ”€ routes/
โ”‚ โ”œโ”€โ”€ predict.py โ† POST /predict/image, POST /predict/video
โ”‚ โ””โ”€โ”€ stream.py โ† WS /ws/webcam (real-time frame inference)
โ”‚
โ”œโ”€โ”€ model/
โ”‚ โ”œโ”€โ”€ cnn_model.py โ† DeepfakeCNN (EfficientNet-B0 + optional FrequencyBranch)
โ”‚ โ”œโ”€โ”€ frequency_branch.py โ† FFT-based spectral feature extractor
โ”‚ โ””โ”€โ”€ loss.py
โ”‚
โ”œโ”€โ”€ inference/
โ”‚ โ”œโ”€โ”€ predict.py โ† load_model, predict, predict_image, predict_video, predict_with_gradcam
โ”‚ โ””โ”€โ”€ realtime_inference.py โ† CLI webcam / video loop with frame skipping
โ”‚
โ”œโ”€โ”€ training/
โ”‚ โ”œโ”€โ”€ train.py โ† Two-phase EfficientNet fine-tuning
โ”‚ โ”œโ”€โ”€ evaluate.py โ† Test-set evaluation with tqdm progress
โ”‚ โ”œโ”€โ”€ dataset.py โ† DataLoader, balanced subset sampling
โ”‚ โ”œโ”€โ”€ metrics.py โ† Accuracy, precision, recall, F1, confusion matrix
โ”‚ โ””โ”€โ”€ early_stopping.py
โ”‚
โ”œโ”€โ”€ preprocessing/
โ”‚ โ”œโ”€โ”€ face_detector.py โ† detect_and_crop_face() using OpenCV Haar cascade
โ”‚ โ”œโ”€โ”€ frame_extractor.py โ† Extract 30 frames/video with multiprocessing
โ”‚ โ”œโ”€โ”€ dataset_split.py โ† Sort raw videos โ†’ real/ fake/ using metadata.json
โ”‚ โ”œโ”€โ”€ split_train_val_test.py โ† 70/15/15 split grouped by video ID
โ”‚ โ””โ”€โ”€ augmentations.py
โ”‚
โ”œโ”€โ”€ explainability/
โ”‚ โ”œโ”€โ”€ gradcam.py โ† Grad-CAM with forward + backward hooks
โ”‚ โ””โ”€โ”€ heatmap_utils.py โ† Heatmap colormap overlay
โ”‚
โ”œโ”€โ”€ saved_models/
โ”‚ โ””โ”€โ”€ best_model.pth โ† Best checkpoint saved during training
โ”‚
โ”œโ”€โ”€ app.py โ† Streamlit UI (Image / Video / Webcam tabs)
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md
```

---

## ๐Ÿ› ๏ธ Tech Stack

| Category | Tools |
|---|---|
| **Deep Learning** | PyTorch, TorchVision |
| **Model** | EfficientNet-B0 (ImageNet pretrained) |
| **Computer Vision** | OpenCV |
| **Frequency Analysis** | PyTorch FFT (`torch.fft.fft2`, fftshift) |
| **Explainability** | Grad-CAM (backward hooks) |
| **Frontend** | Streamlit, streamlit-webrtc, Plotly |
| **Backend API** | FastAPI, Uvicorn, WebSockets |
| **Data / Metrics** | NumPy, Pandas, Scikit-learn |
| **Training utilities** | tqdm, early stopping |

---

## ๐Ÿ“Š Model Performance

Trained on the **140k Real vs Fake Faces** dataset (Kaggle):

| Metric | Score |
|---|---|
| Accuracy | ~91โ€“92% |
| Precision | โ€” |
| Recall | โ€” |
| F1 Score | โ€” |

> Run `python -m training.evaluate` after training to get exact numbers on your test split.

---

## ๐Ÿ“ฅ Dataset Setup

This project uses the **140k Real vs Fake Faces** dataset from Kaggle.

**Download link:** [https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces](https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces)

After downloading, place it at the project root:

```
DeepShield/
โ””โ”€โ”€ 140k-faces/
โ””โ”€โ”€ real_vs_fake/
โ””โ”€โ”€ real-vs-fake/
โ”œโ”€โ”€ train/
โ”‚ โ”œโ”€โ”€ real/
โ”‚ โ””โ”€โ”€ fake/
โ”œโ”€โ”€ valid/
โ”‚ โ”œโ”€โ”€ real/
โ”‚ โ””โ”€โ”€ fake/
โ””โ”€โ”€ test/
โ”œโ”€โ”€ real/
โ””โ”€โ”€ fake/
```

> The `140k-faces/` folder is in `.gitignore` and must be placed manually on each machine.

---

## ๐Ÿš€ Full Setup & Workflow

### Prerequisites

#### 1. System libraries (macOS โ€” install before creating the venv)

```bash
brew install xz cmake libomp
```

#### 2. Python version (3.10+ recommended)

```bash
pyenv install 3.12.2
pyenv local 3.12.2
```

#### 3. Virtual environment

```bash
python3 -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
```

#### 4. Install dependencies

```bash
pip install -r requirements.txt
```

---

### Step 1 โ€” Train the Model

```bash
python -m training.train
```

Trains DeepfakeCNN using two-phase EfficientNet fine-tuning for up to 50 epochs. Best checkpoint is saved to `saved_models/best_model.pth` whenever validation accuracy improves.

**Training phases:**
- **Phase 1** (epochs 1โ€“5): Backbone frozen, only the classifier head trains at lr=1e-4
- **Phase 2** (epoch 6+): Last two EfficientNet blocks unfrozen, full model trains at lr=1e-5

Sample output:
```
Epoch 1/50 [Ph1] train_loss=0.512 val_loss=0.431 val_acc=0.8120
...
Epoch 20/50 [Ph2] train_loss=0.214 val_loss=0.198 val_acc=0.9167
```

---

### Step 2 โ€” Evaluate on the Test Set

```bash
python -m training.evaluate
```

Loads `saved_models/best_model.pth` and reports Accuracy, Precision, Recall, F1, and Confusion Matrix on the held-out test set. Includes a tqdm progress bar.

Sample output:
```
Evaluating: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 625/625 [05:23<00:00]

Test set evaluation
----------------------------------------
Accuracy: 0.9167
Precision: 0.9210
Recall: 0.9140
F1: 0.9175
```

---

### Step 3 โ€” Launch the Streamlit App

```bash
streamlit run app.py
```

Opens the full UI at `http://localhost:8501`. Three tabs:

#### ๐Ÿ“ท Image Tab
- Upload any face image (JPG/PNG)
- Shows verdict card with confidence %, P(Real), P(Fake)
- Enable **Grad-CAM** in sidebar to see which facial regions influenced the decision
- Plotly donut chart shows Real/Fake probability split

#### ๐ŸŽฌ Video Tab
- Upload a video (MP4/AVI/MOV)
- Choose how many frames to analyze (4โ€“32)
- Summary metrics: frames analyzed, avg P(Real), real/fake frame counts
- Interactive **P(Real) timeline chart** (per-frame line chart with 0.5 threshold)
- **Frame distribution histogram** showing score spread
- Collapsible per-frame detail table

#### ๐Ÿ“น Webcam Tab
- Live webcam feed via `streamlit-webrtc`
- Inference every 3rd frame to keep stream smooth
- Bottom banner shows Real/Fake label + confidence
- Top bar shows P(Real) as a fill indicator
- Falls back gracefully if `streamlit-webrtc` is not installed

**Sidebar options:**
- Toggle Grad-CAM overlay
- Score interpretation table (what P(Real) ranges mean)

---

### Step 4 โ€” Run the FastAPI Backend

```bash
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
```

Model is loaded **once at startup** and reused for all requests.

Interactive API docs: `http://localhost:8000/docs`

| Endpoint | Method | Description |
|---|---|---|
| `/health` | GET | Check if model is loaded and which device is in use |
| `/predict/image` | POST | Upload image โ†’ `{label, confidence, prob_real}` |
| `/predict/video` | POST | Upload video โ†’ aggregated + per-frame results |
| `/ws/webcam` | WebSocket | Send JPEG bytes โ†’ receive JSON predictions in real-time |

Example request (image):
```bash
curl -X POST http://localhost:8000/predict/image \
-F "file=@face.jpg"
```

Example response:
```json
{
"label": "Fake",
"confidence": 0.9312,
"prob_real": 0.0688
}
```

---

### Step 5 โ€” CLI Real-Time Inference (Webcam or Video)

```bash
# Webcam
python -m inference.realtime_inference

# Video file
python -m inference.realtime_inference --video path/to/video.mp4

# With Grad-CAM overlay
python -m inference.realtime_inference --video path/to/video.mp4 --gradcam
```

Press **Q** to quit. Inference runs every 3rd frame for smooth display.

---

### Quick Reference

```bash
# โ”€โ”€ Environment โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# โ”€โ”€ Train โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
python -m training.train

# โ”€โ”€ Evaluate โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
python -m training.evaluate

# โ”€โ”€ Streamlit UI โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
streamlit run app.py

# โ”€โ”€ FastAPI backend โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
uvicorn api.main:app --reload --port 8000

# โ”€โ”€ CLI webcam / video โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
python -m inference.realtime_inference [--video ] [--gradcam]
```

---

## ๐Ÿ”ฌ Model Architecture Details

### DeepfakeCNN

```
EfficientNet-B0 (pretrained on ImageNet)
โ””โ”€โ”€ features[0..8] (MBConv blocks)
โ””โ”€โ”€ classifier
โ”œโ”€โ”€ Dropout(0.4)
โ””โ”€โ”€ Linear(1280 โ†’ 1) # Default mode

Optional: use_frequency=True
EfficientNet features (1280-dim)
+ FrequencyBranch (128-dim)
โ†’ Linear(1408 โ†’ 256) โ†’ ReLU โ†’ Dropout(0.4) โ†’ Linear(256 โ†’ 1)
```

### FrequencyBranch

Detects spectral artefacts characteristic of GAN-generated images:

1. `torch.fft.fft2` โ€” 2D Fast Fourier Transform
2. `fftshift` โ€” centres low-frequency content for spatially-consistent conv filters
3. `log1p` โ€” compresses extreme dynamic range of FFT magnitudes
4. Two Conv2D + BatchNorm + MaxPool blocks
5. Fully connected โ†’ 128-dim feature vector

### Two-Phase Training

| Phase | Epochs | LR | Backbone |
|---|---|---|---|
| Phase 1 (warm-up) | 1โ€“5 | 1e-4 | Fully frozen |
| Phase 2 (fine-tune) | 6+ | 1e-5 | Last 2 blocks unfrozen |

---

## ๐Ÿ“Š Evaluation Metrics

| Metric | Description |
|---|---|
| Accuracy | Overall correct classifications |
| Precision | Of predicted fakes, how many were actually fake |
| Recall | Of actual fakes, how many were caught |
| F1 Score | Harmonic mean of precision and recall |
| Confusion Matrix | True/False Positive/Negative breakdown |

---

## ๐ŸŒ Applications

- Social media content verification
- News authenticity validation
- Digital identity protection
- Cybercrime and fraud detection
- Media forensics and journalism

---

## ๐Ÿ”ฎ Future Enhancements

- Temporal modeling with 3D CNN or Vision Transformer across video frames
- Audio-visual consistency check (voice + face sync)
- Browser extension for in-page detection
- Mobile deployment (CoreML / TFLite)
- Confidence calibration and uncertainty estimation

---

## ๐Ÿ“œ License

This project is released under the MIT License.