https://github.com/priyankagnana/deepshield
DeepShield is a real-time deepfake face detection system built using EfficientNet-B0 and transfer learning, achieving 92% F1-score on 140K face dataset.
https://github.com/priyankagnana/deepshield
artificial-intelligence binary-classification cnn computer-vision cybersecurity deepfake-detection efficientnet-b0 face-classification image-classification machine-learning pytorch transfer-learning
Last synced: 3 months ago
JSON representation
DeepShield is a real-time deepfake face detection system built using EfficientNet-B0 and transfer learning, achieving 92% F1-score on 140K face dataset.
- Host: GitHub
- URL: https://github.com/priyankagnana/deepshield
- Owner: priyankagnana
- Created: 2026-02-28T12:28:28.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-03-03T18:24:50.000Z (4 months ago)
- Last Synced: 2026-03-04T10:23:47.471Z (4 months ago)
- Topics: artificial-intelligence, binary-classification, cnn, computer-vision, cybersecurity, deepfake-detection, efficientnet-b0, face-classification, image-classification, machine-learning, pytorch, transfer-learning
- Language: Python
- Homepage: https://deepshield-ctjjsxczkzzsqfoftct3ra.streamlit.app/
- Size: 916 KB
- Stars: 0
- Watchers: 0
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# ๐ก๏ธ DeepShield โ Real-Time Deepfake Detection System
A fully offline, explainable deepfake detection system built on **EfficientNet-B0** with Grad-CAM visual explanations, a polished Streamlit UI, and a FastAPI backend for real-time inference.
---
## ๐ Overview
Deepfake technology uses generative AI to create highly realistic synthetic faces in images and videos. While powerful, it poses serious risks โ misinformation, identity fraud, impersonation, and reputational harm.
**DeepShield** addresses this with a privacy-first, fully offline detection pipeline that:
- Classifies images and videos as โ
**Real** or ๐จ **Fake**
- Provides **confidence scores** and **P(Real) / P(Fake)** probabilities
- Explains decisions visually using **Grad-CAM heatmaps**
- Runs entirely on your machine โ **no cloud calls, no data leaves your device**
---
## ๐๏ธ System Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโ
โ Input (Image / โ
โ Video / Webcam) โ
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Face Detection โ โ OpenCV Haar Cascade (face_detector.py)
โ & Frame Sampling โ โ Frame extractor (frame_extractor.py)
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Image Preprocessingโ โ Resize 224ร224, ImageNet normalize
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DeepfakeCNN Model โ
โ โ
โ EfficientNet-B0 (Spatial Branch) โ โ 1280-dim features
โ + โ
โ FrequencyBranch (FFT Spectrum) โ โ 128-dim features [opt-in]
โ โ
โ Fused โ Linear head โ Binary logit โ
โโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Classification โ Real / Fake + confidence score
โโโโโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโ
โ Grad-CAM Module โ Visual heatmap over suspicious regions
โโโโโโโโโโโโโโโโโโโโโโโ
```
---
## โจ Key Features
| Feature | Details |
|---|---|
| **EfficientNet-B0 backbone** | ImageNet-pretrained, two-phase fine-tuning |
| **Frequency-domain analysis** | Optional FFT branch detects GAN grid artefacts |
| **Face detection** | OpenCV Haar cascade โ crops to face before inference |
| **Grad-CAM explanations** | Heatmap overlay showing which regions drove the decision |
| **Full video analysis** | Samples N frames evenly, aggregates with majority vote + timeline chart |
| **Live webcam** | `streamlit-webrtc` in the UI + CLI realtime script |
| **FastAPI backend** | REST + WebSocket endpoints for image, video, and frame streaming |
| **Fully offline** | No internet connection required for inference |
| **MPS / CUDA / CPU** | Auto-detects Apple Silicon, NVIDIA GPU, or CPU |
---
## ๐ Project Structure
```
DeepShield/
โ
โโโ api/ โ FastAPI backend
โ โโโ main.py โ App entry point, model loaded at startup
โ โโโ schemas.py โ Pydantic response models
โ โโโ routes/
โ โโโ predict.py โ POST /predict/image, POST /predict/video
โ โโโ stream.py โ WS /ws/webcam (real-time frame inference)
โ
โโโ model/
โ โโโ cnn_model.py โ DeepfakeCNN (EfficientNet-B0 + optional FrequencyBranch)
โ โโโ frequency_branch.py โ FFT-based spectral feature extractor
โ โโโ loss.py
โ
โโโ inference/
โ โโโ predict.py โ load_model, predict, predict_image, predict_video, predict_with_gradcam
โ โโโ realtime_inference.py โ CLI webcam / video loop with frame skipping
โ
โโโ training/
โ โโโ train.py โ Two-phase EfficientNet fine-tuning
โ โโโ evaluate.py โ Test-set evaluation with tqdm progress
โ โโโ dataset.py โ DataLoader, balanced subset sampling
โ โโโ metrics.py โ Accuracy, precision, recall, F1, confusion matrix
โ โโโ early_stopping.py
โ
โโโ preprocessing/
โ โโโ face_detector.py โ detect_and_crop_face() using OpenCV Haar cascade
โ โโโ frame_extractor.py โ Extract 30 frames/video with multiprocessing
โ โโโ dataset_split.py โ Sort raw videos โ real/ fake/ using metadata.json
โ โโโ split_train_val_test.py โ 70/15/15 split grouped by video ID
โ โโโ augmentations.py
โ
โโโ explainability/
โ โโโ gradcam.py โ Grad-CAM with forward + backward hooks
โ โโโ heatmap_utils.py โ Heatmap colormap overlay
โ
โโโ saved_models/
โ โโโ best_model.pth โ Best checkpoint saved during training
โ
โโโ app.py โ Streamlit UI (Image / Video / Webcam tabs)
โโโ requirements.txt
โโโ README.md
```
---
## ๐ ๏ธ Tech Stack
| Category | Tools |
|---|---|
| **Deep Learning** | PyTorch, TorchVision |
| **Model** | EfficientNet-B0 (ImageNet pretrained) |
| **Computer Vision** | OpenCV |
| **Frequency Analysis** | PyTorch FFT (`torch.fft.fft2`, fftshift) |
| **Explainability** | Grad-CAM (backward hooks) |
| **Frontend** | Streamlit, streamlit-webrtc, Plotly |
| **Backend API** | FastAPI, Uvicorn, WebSockets |
| **Data / Metrics** | NumPy, Pandas, Scikit-learn |
| **Training utilities** | tqdm, early stopping |
---
## ๐ Model Performance
Trained on the **140k Real vs Fake Faces** dataset (Kaggle):
| Metric | Score |
|---|---|
| Accuracy | ~91โ92% |
| Precision | โ |
| Recall | โ |
| F1 Score | โ |
> Run `python -m training.evaluate` after training to get exact numbers on your test split.
---
## ๐ฅ Dataset Setup
This project uses the **140k Real vs Fake Faces** dataset from Kaggle.
**Download link:** [https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces](https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces)
After downloading, place it at the project root:
```
DeepShield/
โโโ 140k-faces/
โโโ real_vs_fake/
โโโ real-vs-fake/
โโโ train/
โ โโโ real/
โ โโโ fake/
โโโ valid/
โ โโโ real/
โ โโโ fake/
โโโ test/
โโโ real/
โโโ fake/
```
> The `140k-faces/` folder is in `.gitignore` and must be placed manually on each machine.
---
## ๐ Full Setup & Workflow
### Prerequisites
#### 1. System libraries (macOS โ install before creating the venv)
```bash
brew install xz cmake libomp
```
#### 2. Python version (3.10+ recommended)
```bash
pyenv install 3.12.2
pyenv local 3.12.2
```
#### 3. Virtual environment
```bash
python3 -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
```
#### 4. Install dependencies
```bash
pip install -r requirements.txt
```
---
### Step 1 โ Train the Model
```bash
python -m training.train
```
Trains DeepfakeCNN using two-phase EfficientNet fine-tuning for up to 50 epochs. Best checkpoint is saved to `saved_models/best_model.pth` whenever validation accuracy improves.
**Training phases:**
- **Phase 1** (epochs 1โ5): Backbone frozen, only the classifier head trains at lr=1e-4
- **Phase 2** (epoch 6+): Last two EfficientNet blocks unfrozen, full model trains at lr=1e-5
Sample output:
```
Epoch 1/50 [Ph1] train_loss=0.512 val_loss=0.431 val_acc=0.8120
...
Epoch 20/50 [Ph2] train_loss=0.214 val_loss=0.198 val_acc=0.9167
```
---
### Step 2 โ Evaluate on the Test Set
```bash
python -m training.evaluate
```
Loads `saved_models/best_model.pth` and reports Accuracy, Precision, Recall, F1, and Confusion Matrix on the held-out test set. Includes a tqdm progress bar.
Sample output:
```
Evaluating: 100%|โโโโโโโโโโโโ| 625/625 [05:23<00:00]
Test set evaluation
----------------------------------------
Accuracy: 0.9167
Precision: 0.9210
Recall: 0.9140
F1: 0.9175
```
---
### Step 3 โ Launch the Streamlit App
```bash
streamlit run app.py
```
Opens the full UI at `http://localhost:8501`. Three tabs:
#### ๐ท Image Tab
- Upload any face image (JPG/PNG)
- Shows verdict card with confidence %, P(Real), P(Fake)
- Enable **Grad-CAM** in sidebar to see which facial regions influenced the decision
- Plotly donut chart shows Real/Fake probability split
#### ๐ฌ Video Tab
- Upload a video (MP4/AVI/MOV)
- Choose how many frames to analyze (4โ32)
- Summary metrics: frames analyzed, avg P(Real), real/fake frame counts
- Interactive **P(Real) timeline chart** (per-frame line chart with 0.5 threshold)
- **Frame distribution histogram** showing score spread
- Collapsible per-frame detail table
#### ๐น Webcam Tab
- Live webcam feed via `streamlit-webrtc`
- Inference every 3rd frame to keep stream smooth
- Bottom banner shows Real/Fake label + confidence
- Top bar shows P(Real) as a fill indicator
- Falls back gracefully if `streamlit-webrtc` is not installed
**Sidebar options:**
- Toggle Grad-CAM overlay
- Score interpretation table (what P(Real) ranges mean)
---
### Step 4 โ Run the FastAPI Backend
```bash
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000
```
Model is loaded **once at startup** and reused for all requests.
Interactive API docs: `http://localhost:8000/docs`
| Endpoint | Method | Description |
|---|---|---|
| `/health` | GET | Check if model is loaded and which device is in use |
| `/predict/image` | POST | Upload image โ `{label, confidence, prob_real}` |
| `/predict/video` | POST | Upload video โ aggregated + per-frame results |
| `/ws/webcam` | WebSocket | Send JPEG bytes โ receive JSON predictions in real-time |
Example request (image):
```bash
curl -X POST http://localhost:8000/predict/image \
-F "file=@face.jpg"
```
Example response:
```json
{
"label": "Fake",
"confidence": 0.9312,
"prob_real": 0.0688
}
```
---
### Step 5 โ CLI Real-Time Inference (Webcam or Video)
```bash
# Webcam
python -m inference.realtime_inference
# Video file
python -m inference.realtime_inference --video path/to/video.mp4
# With Grad-CAM overlay
python -m inference.realtime_inference --video path/to/video.mp4 --gradcam
```
Press **Q** to quit. Inference runs every 3rd frame for smooth display.
---
### Quick Reference
```bash
# โโ Environment โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# โโ Train โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
python -m training.train
# โโ Evaluate โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
python -m training.evaluate
# โโ Streamlit UI โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
streamlit run app.py
# โโ FastAPI backend โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
uvicorn api.main:app --reload --port 8000
# โโ CLI webcam / video โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
python -m inference.realtime_inference [--video ] [--gradcam]
```
---
## ๐ฌ Model Architecture Details
### DeepfakeCNN
```
EfficientNet-B0 (pretrained on ImageNet)
โโโ features[0..8] (MBConv blocks)
โโโ classifier
โโโ Dropout(0.4)
โโโ Linear(1280 โ 1) # Default mode
Optional: use_frequency=True
EfficientNet features (1280-dim)
+ FrequencyBranch (128-dim)
โ Linear(1408 โ 256) โ ReLU โ Dropout(0.4) โ Linear(256 โ 1)
```
### FrequencyBranch
Detects spectral artefacts characteristic of GAN-generated images:
1. `torch.fft.fft2` โ 2D Fast Fourier Transform
2. `fftshift` โ centres low-frequency content for spatially-consistent conv filters
3. `log1p` โ compresses extreme dynamic range of FFT magnitudes
4. Two Conv2D + BatchNorm + MaxPool blocks
5. Fully connected โ 128-dim feature vector
### Two-Phase Training
| Phase | Epochs | LR | Backbone |
|---|---|---|---|
| Phase 1 (warm-up) | 1โ5 | 1e-4 | Fully frozen |
| Phase 2 (fine-tune) | 6+ | 1e-5 | Last 2 blocks unfrozen |
---
## ๐ Evaluation Metrics
| Metric | Description |
|---|---|
| Accuracy | Overall correct classifications |
| Precision | Of predicted fakes, how many were actually fake |
| Recall | Of actual fakes, how many were caught |
| F1 Score | Harmonic mean of precision and recall |
| Confusion Matrix | True/False Positive/Negative breakdown |
---
## ๐ Applications
- Social media content verification
- News authenticity validation
- Digital identity protection
- Cybercrime and fraud detection
- Media forensics and journalism
---
## ๐ฎ Future Enhancements
- Temporal modeling with 3D CNN or Vision Transformer across video frames
- Audio-visual consistency check (voice + face sync)
- Browser extension for in-page detection
- Mobile deployment (CoreML / TFLite)
- Confidence calibration and uncertainty estimation
---
## ๐ License
This project is released under the MIT License.