https://github.com/addytrunks/ivp_ntire_2026
Robust AI-Generated image Detection in the Wild
https://github.com/addytrunks/ivp_ntire_2026
Last synced: 6 days ago
JSON representation
Robust AI-Generated image Detection in the Wild
- Host: GitHub
- URL: https://github.com/addytrunks/ivp_ntire_2026
- Owner: addytrunks
- Created: 2026-04-12T08:15:11.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2026-04-12T08:28:57.000Z (3 months ago)
- Last Synced: 2026-04-12T10:19:07.413Z (3 months ago)
- Language: Jupyter Notebook
- Size: 17.7 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Robust AI-Generated Image Detection in the Wild
### NTIRE 2026 Challenge — Team *Three Guys*
> **75th place** in the final testing phase (94 active participants)
> **100th place** in the validation phase (193 participants)
---
## Overview
This repository contains our submission for the [NTIRE 2026: Robust AI-Generated Image Detection in the Wild](https://cvlai.net/ntire/2026/) challenge. The task is to distinguish AI-generated images from real photographs under heavy "in-the-wild" degradations such as JPEG compression, blur, noise, and resolution downscaling.
Our approach fuses **spatial RGB features** with **frequency-domain (FFT) features** through a modified **EfficientNet-B5** backbone that accepts a 4-channel input tensor `(3×RGB + 1×FFT)`.
---
## Method
### Architecture
- **Backbone:** EfficientNet-B5 with a modified 4-channel input stem
- **Frequency Branch:** On-the-fly GPU-accelerated 2D FFT — grayscale luminosity map → FFT shift → log-magnitude spectrum → per-image min-max normalization → single additional channel
- **Stem Initialization:** First 3 channels copied from ImageNet pretrained weights; 4th (FFT) channel initialized as the mean of the RGB weights, preserving Squeeze-and-Excitation attention
```
Input Image (456×456×3)
│
├──── RGB Branch (3ch) ──────────────────────┐
│ │
└──── FFT Branch: log|FFTshift(G)| (1ch) ────┤
▼
4-Channel Tensor (456×456×4)
│
Modified EfficientNet-B5 Stem
│
SE Blocks (spatial + freq attention)
│
Binary Classification Head
│
P(AI-Generated)
```
### Augmentation Strategy
A stochastic distortion pipeline applies **1–3 random transforms per image at 60% probability**, with severity sampled from a Gaussian centered at mild levels. The 7 distortion groups are:
| Group | Transforms |
|---|---|
| Blur | Gaussian, Lens |
| Noise | White, Impulse |
| Compression | JPEG (quality 4–43) |
| Color | Jitter, Saturation |
| Brightness | Gamma, Exposure |
| Spatial | Jitter, Crop |
| Tonal | Quantization |
---
## Results
### Ablation Study (Validation Set)
| Model Variant | Clean AUC | Robust AUC |
|---|---|---|
| EfficientNet-B0 | 0.9290 | 0.8402 |
| EfficientNet-B4 | 0.9410 | 0.9212 |
| Swin Transformer | 0.9709 | 0.8965 |
| EffNet-B5 (PID + FFT + Contrastive Loss) | 0.9506 | 0.8701 |
| **EffNet-B5 (RGB + FFT) — Final** | **0.9768** | **0.9341** |
### Leaderboard
| Phase | Clean AUC | Robust AUC | Clean Hard AUC | Robust Hard AUC | Rank |
|---|---|---|---|---|---|
| Validation | 0.9768 | 0.9341 | 0.9220 | 0.8324 | 100 / 193 |
| Testing | 0.8084 | 0.7099 | — | — | 75 / 94 |
---
## Training Configuration
| Parameter | Value |
|---|---|
| Backbone | EfficientNet-B5 (4-channel) |
| Input Resolution | 456 × 456 |
| Batch Size | 4 (× 8 gradient accumulation = 32 effective) |
| Epochs | 4 |
| Optimizer | AdamW |
| Learning Rate | 1×10⁻⁴ |
| Weight Decay | 1×10⁻² |
| Scheduler | OneCycleLR |
| Loss | BCEWithLogitsLoss |
| Distortion Probability | 0.6 |
| Hardware | Tesla P100 (16 GB VRAM) |
| Precision | AMP (mixed precision) |
---
## Setup
```bash
git clone https://github.com/addytrunks/IVP_NTIRE_2026.git
cd IVP_NTIRE_2026
pip install -r requirements.txt
```
### Training
```bash
python train.py --data_dir /path/to/dataset --epochs 4 --lr 1e-4
```
### Inference
```bash
python inference.py --checkpoint /path/to/best_checkpoint.pth --input_dir /path/to/images --output submission.csv
```
---
## Dataset
The NTIRE 2026 dataset contains ~277K training images generated by 20 different models. Access requires registration through the official challenge page.
| Split | Images | Real:Fake | Distortions |
|---|---|---|---|
| Train | ~277K | ~1:1.77 | 12 |
| Validation | 10K | 1:1 | 19 |
| Validation Hard | 2.5K | 1:1 | 19 |
| Test (Public) | 2.5K | 1:1 | 22 |
| Test (Private) | 2.5K | ~1:1 | 24 |
---
## References
- Tan & Le, "EfficientNet: Rethinking Model Scaling for CNNs," *ICML 2019*
- Ross Wightman, [PyTorch Image Models (timm)](https://github.com/huggingface/pytorch-image-models)
- Buslaev et al., "Albumentations: Fast and Flexible Image Augmentations," *Information 2020*
---
## Team
**Three Guys** — Department of Artificial Intelligence and Data Science, Shiv Nadar University Chennai
| Name | Role |
|---|---|
| Sharan K | Team Lead |
| Adhithya Srivatsan | Member |
| Ankith V | Member |
CodaBench username: `sharank`