1. Setup and Noise Schedule
n_steps = 512 # Total number of diffusion steps (T)
beta = linspace(start, end, n_steps)
n_steps(T): total number of steps in the diffusion processbeta: controls how much noise is added at each step
This is a linear noise schedule
Derived Variables
alpha = 1. - beta
alpha_bar = cumprod(alpha, axis=0)
alpha: amount of signal preserved at each stepalpha_bar: cumulative product → how much of the original image remains after t steps
sqrt_alpha_bar = sqrt(alpha_bar)
sqrt_one_minus_alpha_bar = sqrt(1. - alpha_bar)
These are used in the reparameterization trick:
[
x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon
]
Model and Optimizer
model = UNet()
optimizer = Adam(model.parameters(), lr=0.001)
UNet: predicts the noise given a noisy image and timestep
Adam: optimizer used to train the model
TRAINING (Forward Process)
for batch, _ in dataloader:
Loop over real images.
Batch size
bs = batch.shape[0]
Sampling random timesteps
t = torch.randint(0, T, (bs,)).long()
Very important:
Each image uses a different timestep
The model learns to handle all noise levels
Generate noise
noise = torch.randn_like(batch, device=device)
Gaussian noise (ε)
Add noise to images
x_noisy = (
sqrt_alpha_bar[t].view(bs, 1, 1, 1) * batch +
sqrt_one_minus_alpha_bar[t].view(bs, 1, 1, 1) * noise
)
This is the forward diffusion step.
Early timesteps → image is mostly clean
Late timesteps → image becomes almost pure noise
Predict the noise
noise_pred = model(x_noisy, t)
The model learns:
“Given a noisy image and timestep, what noise was added?”
Loss function
loss = F.mse_loss(noise, noise_pred)
We minimize:
[
||\epsilon - \epsilon_\theta||^2
]
Optimization
loss.backward()
optimizer.step()
Update model weights.
3. IMAGE GENERATION (Reverse Process) Inference
Now we generate new images from pure noise.
Precomputations
sqrt_one_minus_alpha_bar = sqrt(1. - alpha_bar)
alpha_bar_t_minus_1 = F.pad(alpha_bar[:-1], (1, 0), value=1.0)
Shifted version of
alpha_barNeeded for reverse formulas
posterior_variance = (
beta * (1.0 - alpha_bar_t_minus_1) / (1.0 - alpha_bar)
)
This corresponds to:
[
\sigma_t^2
]
Controls how much randomness is added during generation
Initialization
bs = 8
x = randn((bs, 3, IMG_SIZE, IMG_SIZE))
Start from pure noise.
Reverse loop
for ts in range(0, T)[::-1]:
Go backward in time (T → 0)
Add noise (except final step)
noise = randn_like(x) if ts > 0 else 0
Important detail:
No noise at the final step
Otherwise the image would degrade
Time tensor
t = full((bs,), ts).long()
Denoising step
x = (
sqrt_one_over_alpha[t].view(bs, 1, 1, 1) *
(
x - beta[t].view(bs, 1, 1, 1) /
sqrt_one_minus_alpha_bar[t].view(bs, 1, 1, 1) *
model(x, t)
)
+ sqrt(posterior_variance[t].view(bs, 1, 1, 1)) * noise
)
What happens here?
Each step:
The model predicts the noise
That noise is removed
A small amount of controlled noise is added back
Missing definition (important)
sqrt_one_over_alpha = sqrt(1.0 / alpha)
Final Output
generated_image = torch.clamp(x, -1, 1)
Clamp values to a valid image range.
Intuition Recap
Training
Add noise to images
Train model to predict that noise
Generation
Start from noise
Remove noise step by step
Obtain a clean