1. Setup and Noise Schedule

n_steps = 512  # Total number of diffusion steps (T)
beta = linspace(start, end, n_steps)

n_steps (T): total number of steps in the diffusion process
beta: controls how much noise is added at each step

This is a linear noise schedule

Derived Variables

alpha = 1. - beta
alpha_bar = cumprod(alpha, axis=0)

alpha: amount of signal preserved at each step
alpha_bar: cumulative product → how much of the original image remains after t steps

sqrt_alpha_bar = sqrt(alpha_bar)
sqrt_one_minus_alpha_bar = sqrt(1. - alpha_bar)

These are used in the reparameterization trick:

[
x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon
]

Model and Optimizer

model = UNet()
optimizer = Adam(model.parameters(), lr=0.001)

UNet: predicts the noise given a noisy image and timestep
Adam: optimizer used to train the model

TRAINING (Forward Process)

for batch, _ in dataloader:

Loop over real images.

Batch size

bs = batch.shape[0]

Sampling random timesteps

t = torch.randint(0, T, (bs,)).long()

Very important:

Each image uses a different timestep
The model learns to handle all noise levels

Generate noise

noise = torch.randn_like(batch, device=device)

Gaussian noise (ε)

Add noise to images

x_noisy = (
    sqrt_alpha_bar[t].view(bs, 1, 1, 1) * batch +
    sqrt_one_minus_alpha_bar[t].view(bs, 1, 1, 1) * noise
)

This is the forward diffusion step.

Early timesteps → image is mostly clean
Late timesteps → image becomes almost pure noise

Predict the noise

noise_pred = model(x_noisy, t)

The model learns:

“Given a noisy image and timestep, what noise was added?”

Loss function

loss = F.mse_loss(noise, noise_pred)

We minimize:

[
||\epsilon - \epsilon_\theta||^2
]

Optimization

loss.backward()
optimizer.step()

Update model weights.

3. IMAGE GENERATION (Reverse Process) Inference

Now we generate new images from pure noise.

Precomputations

sqrt_one_minus_alpha_bar = sqrt(1. - alpha_bar)

alpha_bar_t_minus_1 = F.pad(alpha_bar[:-1], (1, 0), value=1.0)

Shifted version of alpha_bar
Needed for reverse formulas

posterior_variance = (
    beta * (1.0 - alpha_bar_t_minus_1) / (1.0 - alpha_bar)
)

This corresponds to:

[
\sigma_t^2
]

Controls how much randomness is added during generation

Initialization

bs = 8
x = randn((bs, 3, IMG_SIZE, IMG_SIZE))

Start from pure noise.

Reverse loop

for ts in range(0, T)[::-1]:

Go backward in time (T → 0)

Add noise (except final step)

noise = randn_like(x) if ts > 0 else 0

Important detail:

No noise at the final step
Otherwise the image would degrade

Time tensor

t = full((bs,), ts).long()

Denoising step

x = (
    sqrt_one_over_alpha[t].view(bs, 1, 1, 1) *
    (
        x - beta[t].view(bs, 1, 1, 1) /
        sqrt_one_minus_alpha_bar[t].view(bs, 1, 1, 1) *
        model(x, t)
    )
    + sqrt(posterior_variance[t].view(bs, 1, 1, 1)) * noise
)

What happens here?

Each step:

The model predicts the noise
That noise is removed
A small amount of controlled noise is added back

Missing definition (important)

sqrt_one_over_alpha = sqrt(1.0 / alpha)

Final Output

generated_image = torch.clamp(x, -1, 1)

Clamp values to a valid image range.

Intuition Recap

Training

Add noise to images
Train model to predict that noise

Generation

Start from noise
Remove noise step by step
Obtain a clean

MicromOne

Pagine

Diffusion Models Explained Line by Line