Thank you very much for your excellent work on Universal Guidance for Diffusion Models!
I have been attempting to reproduce your forward guidance pipeline from scratch using the [🤗 HuggingFace diffusers] library. The general logic follows your original design:
-
Computing \hat{z}_0 using the denoising UNet;
-
Evaluating task-specific loss (in my case, segmentation loss in keeping with the implementation);
-
Computing \nabla_{z_t} loss(c, f(\hat{z}_0));
-
Applying that gradient to update the predicted noise.
However, during random time step, I occasionally encounter NaN values in the grad.
May I ask:
-
Have you(or others) observed similar numerical instability or NaN issues in your own experiments?
-
If yes, what techniques did you use to mitigate it?
-
Or is it possible that there’s something wrong in my reproduction setup?
Any insight you could share would be very helpful for debugging and improving the stability of my implementation.