Hello, I am confused that in stable diffusion, this paper use forward guidance. So do we still need classifier free guidance? And how is the loss function defined, for example, in style transfer.
In the paper, it is the cosine similarity of CLIP embeddings. So it is not relevant to c, only relevant to f(z_0). To align with the text prompt, we need the standard cfg. Is my understanding correct?
