https://github.com/plugyawn/no-imagegen

Recursive Denoising Neural Operators for superfast image-generation.
https://github.com/plugyawn/no-imagegen

efficient image-generation image-processing neural neural-operator transformer

Last synced: 11 months ago
JSON representation

Recursive Denoising Neural Operators for superfast image-generation.

Host: GitHub
URL: https://github.com/plugyawn/no-imagegen
Owner: plugyawn
Created: 2025-06-28T03:44:52.000Z (12 months ago)
Default Branch: main
Last Pushed: 2025-06-28T04:00:06.000Z (12 months ago)
Last Synced: 2025-06-28T04:29:36.295Z (12 months ago)
Topics: efficient, image-generation, image-processing, neural, neural-operator, transformer
Language: Python
Homepage:
Size: 0 Bytes
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

          # Scale-Recursive Wavelet Neural Operator for Image Generation

## Overview

This repository contains an *autoregressive coarse-to-fine* generator that

predicts an image one scale at a time using a **weight–tied Wavelet Neural

Operator (W-NO)**.  The generator starts from a $4\times4$ noise canvas and

repeatedly doubles the resolution until a $64\times64$ RGB image is reached.



  



## Motivation

Image generation often stores the entire spatial activation stack during a

single forward pass (e.g. UNet diffusion), pushing memory beyond commodity

GPUs.  A *recursive* formulation keeps only one scale in memory at a time –

$\mathcal{O}(H\,W)$ → $\mathcal{O}(H\,W/4)$ at 64×64 – while re-using the same

operator at every level.

Fourier Neural Operators (FNOs) are compelling for global, translation-

periodic physics, but raw FFTs introduce ringing and checkerboards in natural

images.  We therefore replace the Fourier transform with a **1-level Haar

Discrete Wavelet Transform (DWT)** – preserving locality while still offering

multi-resolution coupling.

## Algorithmic Core

### Notation

* $x\_r \in [0,1]^{3\times r\times r}$ – current resolution canvas.

* $\epsilon\_r \sim \mathcal{N}(0,\sigma\_r^2)$ – Gaussian noise injected at

  every level.

* $\mathcal{T}$ – weight-tied Wavelet Neural Operator.

At each scale we perform

$x\_r^{\uparrow 2} = \text{NearestUpsample}(x\_r)$

$\hat{x}\_{2r} = \mathcal{T}\bigl(\;x\_r^{\uparrow 2},\;\epsilon\_r,\;\text{coords}\;\bigr)$

$x\_{2r} = \mathrm{clip}(x\_r^{\uparrow 2}+\hat{x}\_{2r},\, 0,1).$

### Wavelet Neural Operator

1. Forward **Haar DWT** splits the feature map into one low-pass block $L$ and

   three high-pass blocks $H^{(o)}\_{o\in\{\text{LH,HL,HH}\}}$.

2. A $1\times1$ convolution is applied to $L$; another $1\times1$ convolution

   is applied jointly to the concatenated $H^{(o)}$.

3. Inverse DWT reconstructs the spatial residual at the same resolution.

This $\mathcal{T}$ is embedded in a residual stack and shared across all

coarse→fine transitions.

## Training Objective

For each mini-batch and every scale

$\mathcal{L}=\sum\_{\ell=0}^{3}\;w\_{\ell}\Bigl[\;\|x\_{gt}^{(\ell)}-x\_{pred}^{(\ell)}\|\_1 +$

$\lambda\_{v}\,\|\phi(x\_{gt}^{(\ell)})-\phi(x\_{pred}^{(\ell)})\|\_1 +$

$\lambda\_{s}\,(1-\text{SSIM}) + \lambda\_{g}\,\|\nabla x\_{gt}^{(\ell)}-\nabla x\_{pred}^{(\ell)}\|\_1\Bigr]$

plus a hinge **PatchGAN** loss (weight $\lambda\_{\text{adv}}$) acting only on

the finest scale.

## Memory Footprint

Only the tensors of the *current* resolution are kept in GPU memory: roughly

`batch × channels × H × W`.  With batch = 128 and width = 192 the peak VRAM is

< 4 GB for 64×64 training.

## Results (CIFAR-10, 64×64)

| Epoch | PSNR (dB) | FID ↓ | Comments                  |

|------:|---------:|------:|---------------------------|

|  10   | 23.8     | -  | Coarse colour blobs       |

|  20   | **25.9** | -  | Edges sharpen; see above  |

|  30   | 26.5     | -  | Texture appears           |

*Evaluation uses 10 k validation images, no classifier guidance.*

## How to Run

```bash

pip install -r requirements.txt

python main.py            # trains 25 epochs and writes samples / curves

```

Intermediate predictions for every scale can be dumped by uncommenting the

lines in `generate_samples()`. Load pretrained checkpoints from `ckpt` directory.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/plugyawn/no-imagegen

Awesome Lists containing this project

README