Blue noise for diffusion models

SIGGRAPH (Conference Proceedings), 2024

1MPI Informatics, 2VIA Center, 3University of Cambridge, 4Google, 5Google Deepmind
Interpolate start reference image.

Teaser: In conventional diffusion-based generative modeling, data is corrupted by adding Gaussian (random) noise (top row). We explore the alternative approach of using correlated Gaussian noise. Different correlation can be used such as blue noise (second row) or a time-varying correlations (third row). Bottom row shows the corresponding noise mask for time-varying example at different time steps. Rightmost two columns show visual comparisons on generated images (from the same initial noise) between Heitz et al. [2023] (using only Gaussian noise) and Ours (using time-varying noise). Our generated images are more natural-looking and detailed with less artifacts.

Fast-forward video




Presentation video

Abstract

Most of the existing diffusion models use Gaussian noise for training and sampling across all time steps, which may not optimally account for the frequency contents reconstructed by the denoising network. Despite the diverse applications of correlated noise in computer graphics, its potential for improving the training process has been underexplored. In this paper, we introduce a novel and general class of diffusion models taking correlated noise within and across images into account. More specifically, we propose a time-varying noise model to incorporate correlated noise into the training process, as well as a method for fast generation of correlated noise mask. Our model is built upon deterministic diffusion models and utilizes blue noise to help improve the generation quality compared to using Gaussian white (random) noise only. Further, our framework allows introducing correlation across images within a single mini-batch to improve gradient flow. We perform both qualitative and quantitative evaluations on a variety of datasets using our method, achieving improvements on different tasks over existing deterministic diffusion models in terms of FID metric.

What is blue noise

Blue noise refers to a distribution with no energy in the low-frequency region in its power spectrum. We propose to use time-varying noise based on the time steps, by blending Gaussian (white) noise with Gaussian blue noise smoothly from left to right, as shown below.

Interpolate start reference image.

Why blue noise

Using Gaussian blue noise for denoising diffusion process better preserves fine details and the integrity of the content, while noise magnitude is below 100%.

Interpolate start reference image.

Our method

First, we need to generate Gaussian blue noise on the fly. We propose a fast method to generate Gaussian blue noise masks by using a precomputed matrix (L) from blue noise that are generated offline. To get higher-dimensional Gaussian blue noise masks, we tile different realizations of 64x64 masks.

Interpolate start reference image.

Then, we propose a novel and general diffusion model using time-varying noise. More specifically, we propose to use Gaussian blue noise by interpolating Gaussian noise and Gaussian blue noise across time steps to improve the training of diffusion models.

Interpolate start reference image.

Here is an video example showing our "time-varying" denoising process where the model is trained on LSUN-Church (64x64).

Besides using correlated noise across pixels, our framework can also employ rectified mapping to correlate data samples within a single mini-batch.

Interpolate start reference image.

Results

In addition to the results in the main paper, we show here the videos of generated image sequences using DDIM, IADB and Ours trained on different datasets.

↓ Comparisons of generated image sequences trained on Cat (64x64)

DDIM

IADB

Ours

↓ Comparisons of generated image sequences trained on Church (64x64)

DDIM

IADB

Ours

↓ Comparisons of generated image sequences trained on CelebA (64x64)

DDIM

IADB

Ours

↓ Comparisons of generated image sequences trained on Cat (128x128)

DDIM

IADB

Ours

↓ Comparisons of generated image sequences trained on CelebA (128x128)

DDIM

IADB

Ours

↓ Quantitative comparisons between IHDM, DDPM, DDIM, IADB and Ours on the above datasets.

Interpolate start reference image.

Interactive visualization of intermediate steps


↓ Blue noise effect visible on Ours from the middle time steps (Cat 64x64)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (Ours)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (IADB)


↓ Blue noise effect visible on Ours from the middle time steps (Church 64x64)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (Ours)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (IADB)


↓ Blue noise effect visible on Ours from the middle time steps (CelebA 64x64)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (Ours)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (IADB)


↓ Blue noise effect visible on Ours around the right end time steps (Cat 128x128)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (Ours)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (IADB)


↓ Blue noise effect visible on Ours around the right end time steps (CelebA 128x128)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (Ours)

Interpolate start reference image.

Init noise

Loading...
Interpolation end reference image.

Generated image (IADB)