https://github.com/addytrunks/ivp_ntire_2026

Robust AI-Generated image Detection in the Wild
https://github.com/addytrunks/ivp_ntire_2026

Last synced: 6 days ago
JSON representation

Robust AI-Generated image Detection in the Wild

Host: GitHub
URL: https://github.com/addytrunks/ivp_ntire_2026
Owner: addytrunks
Created: 2026-04-12T08:15:11.000Z (3 months ago)
Default Branch: main
Last Pushed: 2026-04-12T08:28:57.000Z (3 months ago)
Last Synced: 2026-04-12T10:19:07.413Z (3 months ago)
Language: Jupyter Notebook
Size: 17.7 MB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Robust AI-Generated Image Detection in the Wild

### NTIRE 2026 Challenge — Team *Three Guys*

> **75th place** in the final testing phase (94 active participants)  

> **100th place** in the validation phase (193 participants)

---

## Overview

This repository contains our submission for the [NTIRE 2026: Robust AI-Generated Image Detection in the Wild](https://cvlai.net/ntire/2026/) challenge. The task is to distinguish AI-generated images from real photographs under heavy "in-the-wild" degradations such as JPEG compression, blur, noise, and resolution downscaling.

Our approach fuses **spatial RGB features** with **frequency-domain (FFT) features** through a modified **EfficientNet-B5** backbone that accepts a 4-channel input tensor `(3×RGB + 1×FFT)`.

---

## Method

### Architecture

- **Backbone:** EfficientNet-B5 with a modified 4-channel input stem

- **Frequency Branch:** On-the-fly GPU-accelerated 2D FFT — grayscale luminosity map → FFT shift → log-magnitude spectrum → per-image min-max normalization → single additional channel

- **Stem Initialization:** First 3 channels copied from ImageNet pretrained weights; 4th (FFT) channel initialized as the mean of the RGB weights, preserving Squeeze-and-Excitation attention

```

Input Image (456×456×3)

       │

       ├──── RGB Branch (3ch) ──────────────────────┐

       │                                             │

       └──── FFT Branch: log|FFTshift(G)| (1ch) ────┤

                                                     ▼

                                          4-Channel Tensor (456×456×4)

                                                     │

                                          Modified EfficientNet-B5 Stem

                                                     │

                                          SE Blocks (spatial + freq attention)

                                                     │

                                          Binary Classification Head

                                                     │

                                          P(AI-Generated)

```

### Augmentation Strategy

A stochastic distortion pipeline applies **1–3 random transforms per image at 60% probability**, with severity sampled from a Gaussian centered at mild levels. The 7 distortion groups are:

| Group | Transforms |

|---|---|

| Blur | Gaussian, Lens |

| Noise | White, Impulse |

| Compression | JPEG (quality 4–43) |

| Color | Jitter, Saturation |

| Brightness | Gamma, Exposure |

| Spatial | Jitter, Crop |

| Tonal | Quantization |

---

## Results

### Ablation Study (Validation Set)

| Model Variant | Clean AUC | Robust AUC |

|---|---|---|

| EfficientNet-B0 | 0.9290 | 0.8402 |

| EfficientNet-B4 | 0.9410 | 0.9212 |

| Swin Transformer | 0.9709 | 0.8965 |

| EffNet-B5 (PID + FFT + Contrastive Loss) | 0.9506 | 0.8701 |

| **EffNet-B5 (RGB + FFT) — Final** | **0.9768** | **0.9341** |

### Leaderboard

| Phase | Clean AUC | Robust AUC | Clean Hard AUC | Robust Hard AUC | Rank |

|---|---|---|---|---|---|

| Validation | 0.9768 | 0.9341 | 0.9220 | 0.8324 | 100 / 193 |

| Testing | 0.8084 | 0.7099 | — | — | 75 / 94 |

---

## Training Configuration

| Parameter | Value |

|---|---|

| Backbone | EfficientNet-B5 (4-channel) |

| Input Resolution | 456 × 456 |

| Batch Size | 4 (× 8 gradient accumulation = 32 effective) |

| Epochs | 4 |

| Optimizer | AdamW |

| Learning Rate | 1×10⁻⁴ |

| Weight Decay | 1×10⁻² |

| Scheduler | OneCycleLR |

| Loss | BCEWithLogitsLoss |

| Distortion Probability | 0.6 |

| Hardware | Tesla P100 (16 GB VRAM) |

| Precision | AMP (mixed precision) |

---

## Setup

```bash

git clone https://github.com/addytrunks/IVP_NTIRE_2026.git

cd IVP_NTIRE_2026

pip install -r requirements.txt

```

### Training

```bash

python train.py --data_dir /path/to/dataset --epochs 4 --lr 1e-4

```

### Inference

```bash

python inference.py --checkpoint /path/to/best_checkpoint.pth --input_dir /path/to/images --output submission.csv

```

---

## Dataset

The NTIRE 2026 dataset contains ~277K training images generated by 20 different models. Access requires registration through the official challenge page.

| Split | Images | Real:Fake | Distortions |

|---|---|---|---|

| Train | ~277K | ~1:1.77 | 12 |

| Validation | 10K | 1:1 | 19 |

| Validation Hard | 2.5K | 1:1 | 19 |

| Test (Public) | 2.5K | 1:1 | 22 |

| Test (Private) | 2.5K | ~1:1 | 24 |

---

## References

- Tan & Le, "EfficientNet: Rethinking Model Scaling for CNNs," *ICML 2019*

- Ross Wightman, [PyTorch Image Models (timm)](https://github.com/huggingface/pytorch-image-models)

- Buslaev et al., "Albumentations: Fast and Flexible Image Augmentations," *Information 2020*

---

## Team

**Three Guys** — Department of Artificial Intelligence and Data Science, Shiv Nadar University Chennai

| Name | Role |

|---|---|

| Sharan K | Team Lead |

| Adhithya Srivatsan | Member |

| Ankith V | Member |

CodaBench username: `sharank`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/addytrunks/ivp_ntire_2026

Awesome Lists containing this project

README