In Denoising Diffusion Probabilistic Models(DDPMs), each denoising step moves the latent closer to a cleaner versio. This step can be expressed as: (Eq. 12 from DDIM paper eq. 12)):

And the corresponding code:

def ddim_step(self, sample, noise_pred, indices):
    ...
    # current prediction for x_0
    pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
    # direction pointing to x_t
    dir_xt = (1. - a_prev - sigma_t**2).sqrt() * e_t

    noise = sigma_t * noise_like(x.shape, device)

    x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise

What would happen if I remove the random noise part?

x_prev = a_prev.sqrt() * pred_x0 + dir_xt

If we force the random noise to be zero, it’s like turning off the randomness in the model. The model becomes a special case — DDIM(Denoising Diffusion Implicit Models). To be more percise, the video is made using a DDIM sampling, but the base model (the one that does the heavy lifting) was trained using DDPM.

Now, that’s reveil the results:

FIFO-Diffusion Remove Random Noise
"a person swimming in ocean, high quality, 4K resolution."


FIFO-Diffusion Remove Random Noise
"a person walking in the snowstorm, high quality, 4K resolution."


FIFO-Diffusion Remove Random Noise
"a person washing the dishes, high quality, 4K resolution."


FIFO-Diffusion Remove Random Noise
"alley"

In this case, even with the same 64 denoising steps, the implicit model struggles to generate fine details like waves, snowflakes, water droplets, and wall tiles.

It’s pretty fascinating how much impact that random noise term has on these elements!

Updated:

Comments