An open API service indexing awesome lists of open source software.

https://github.com/addytrunks/ivp_ntire_2026

Robust AI-Generated image Detection in the Wild
https://github.com/addytrunks/ivp_ntire_2026

Last synced: 6 days ago
JSON representation

Robust AI-Generated image Detection in the Wild

Awesome Lists containing this project

README

          

# Robust AI-Generated Image Detection in the Wild
### NTIRE 2026 Challenge — Team *Three Guys*

> **75th place** in the final testing phase (94 active participants)
> **100th place** in the validation phase (193 participants)

---

## Overview

This repository contains our submission for the [NTIRE 2026: Robust AI-Generated Image Detection in the Wild](https://cvlai.net/ntire/2026/) challenge. The task is to distinguish AI-generated images from real photographs under heavy "in-the-wild" degradations such as JPEG compression, blur, noise, and resolution downscaling.

Our approach fuses **spatial RGB features** with **frequency-domain (FFT) features** through a modified **EfficientNet-B5** backbone that accepts a 4-channel input tensor `(3×RGB + 1×FFT)`.

---

## Method

### Architecture

- **Backbone:** EfficientNet-B5 with a modified 4-channel input stem
- **Frequency Branch:** On-the-fly GPU-accelerated 2D FFT — grayscale luminosity map → FFT shift → log-magnitude spectrum → per-image min-max normalization → single additional channel
- **Stem Initialization:** First 3 channels copied from ImageNet pretrained weights; 4th (FFT) channel initialized as the mean of the RGB weights, preserving Squeeze-and-Excitation attention

```
Input Image (456×456×3)

├──── RGB Branch (3ch) ──────────────────────┐
│ │
└──── FFT Branch: log|FFTshift(G)| (1ch) ────┤

4-Channel Tensor (456×456×4)

Modified EfficientNet-B5 Stem

SE Blocks (spatial + freq attention)

Binary Classification Head

P(AI-Generated)
```

### Augmentation Strategy

A stochastic distortion pipeline applies **1–3 random transforms per image at 60% probability**, with severity sampled from a Gaussian centered at mild levels. The 7 distortion groups are:

| Group | Transforms |
|---|---|
| Blur | Gaussian, Lens |
| Noise | White, Impulse |
| Compression | JPEG (quality 4–43) |
| Color | Jitter, Saturation |
| Brightness | Gamma, Exposure |
| Spatial | Jitter, Crop |
| Tonal | Quantization |

---

## Results

### Ablation Study (Validation Set)

| Model Variant | Clean AUC | Robust AUC |
|---|---|---|
| EfficientNet-B0 | 0.9290 | 0.8402 |
| EfficientNet-B4 | 0.9410 | 0.9212 |
| Swin Transformer | 0.9709 | 0.8965 |
| EffNet-B5 (PID + FFT + Contrastive Loss) | 0.9506 | 0.8701 |
| **EffNet-B5 (RGB + FFT) — Final** | **0.9768** | **0.9341** |

### Leaderboard

| Phase | Clean AUC | Robust AUC | Clean Hard AUC | Robust Hard AUC | Rank |
|---|---|---|---|---|---|
| Validation | 0.9768 | 0.9341 | 0.9220 | 0.8324 | 100 / 193 |
| Testing | 0.8084 | 0.7099 | — | — | 75 / 94 |

---

## Training Configuration

| Parameter | Value |
|---|---|
| Backbone | EfficientNet-B5 (4-channel) |
| Input Resolution | 456 × 456 |
| Batch Size | 4 (× 8 gradient accumulation = 32 effective) |
| Epochs | 4 |
| Optimizer | AdamW |
| Learning Rate | 1×10⁻⁴ |
| Weight Decay | 1×10⁻² |
| Scheduler | OneCycleLR |
| Loss | BCEWithLogitsLoss |
| Distortion Probability | 0.6 |
| Hardware | Tesla P100 (16 GB VRAM) |
| Precision | AMP (mixed precision) |

---

## Setup

```bash
git clone https://github.com/addytrunks/IVP_NTIRE_2026.git
cd IVP_NTIRE_2026
pip install -r requirements.txt
```

### Training

```bash
python train.py --data_dir /path/to/dataset --epochs 4 --lr 1e-4
```

### Inference

```bash
python inference.py --checkpoint /path/to/best_checkpoint.pth --input_dir /path/to/images --output submission.csv
```

---

## Dataset

The NTIRE 2026 dataset contains ~277K training images generated by 20 different models. Access requires registration through the official challenge page.

| Split | Images | Real:Fake | Distortions |
|---|---|---|---|
| Train | ~277K | ~1:1.77 | 12 |
| Validation | 10K | 1:1 | 19 |
| Validation Hard | 2.5K | 1:1 | 19 |
| Test (Public) | 2.5K | 1:1 | 22 |
| Test (Private) | 2.5K | ~1:1 | 24 |

---

## References

- Tan & Le, "EfficientNet: Rethinking Model Scaling for CNNs," *ICML 2019*
- Ross Wightman, [PyTorch Image Models (timm)](https://github.com/huggingface/pytorch-image-models)
- Buslaev et al., "Albumentations: Fast and Flexible Image Augmentations," *Information 2020*

---

## Team

**Three Guys** — Department of Artificial Intelligence and Data Science, Shiv Nadar University Chennai

| Name | Role |
|---|---|
| Sharan K | Team Lead |
| Adhithya Srivatsan | Member |
| Ankith V | Member |

CodaBench username: `sharank`