Add an alternative Karras et al. stochastic scheduler for VE models #160

anton-l · 2022-08-08T15:59:43Z

The required number of inference steps went down from 2000 to 50 with comparable quality 🎉

The algorithm is slightly modified to rely on pre- and post-processing steps integrated into the VE UNet's forward pass (centering and scaling by sigma), so it's specific to VE only.

HuggingFaceDocBuilderDev · 2022-08-08T16:05:49Z

The documentation is not available anymore as the PR was closed or merged.

patil-suraj

Great work @anton-l ! It looks good to me, just left some nits.

A more general comment. This scheduler goes a bit against our design of single step function. But given that the scheduler requires two model evaluations and the second eval depends on the output of first we can't really have a single step function. We also need to do some bookkeeping outside the scheduler like storing sigma , sigma_prev etc. (Looking at the code think we can probably avoid it). cc @patrickvonplaten

Thanks a lot for working on this!

src/diffusers/pipelines/stochatic_karras_ve/pipeline_stochastic_karras.py

patil-suraj · 2022-08-08T16:34:55Z

src/diffusers/pipelines/stochatic_karras_ve/pipeline_stochastic_karras.py

+
+            # 1. Select temporarily increased noise level sigma_hat
+            # 2. Add new noise to move from sample_i to sample_hat
+            sample_hat, sigma_hat = self.scheduler.get_model_inputs(sample, sigma, generator=generator)


get_model_inputs doesn't seem to describe the function well. Maybe let's call it add_noise_to_input or churn_input.

I would just call it get_sigmas(...), but I understand where get_model_inputs is coming from (to make the API as generic for all schedulers as possible which is also a good argument)

Here I would be in favor of being a bit more specific with something like get_sigmas(...) which is something most continuous schedulers have and which could become a common API across continuous schedulers.
So I think here it makes sense to favor intuitive, readable code over easy-to-use.

Also it scares me a bit to see that sample is an input to the function, it gives the impression that it's used to compute the sigmas. Can we instead maybe just pass shape and device?

Note that sample is also used in computation: sample_hat = sample + ((sigma_hat**2 - sigma**2)**0.5 * eps)

get_sigmas sound good to me, but still not clear enough, as it adds noise to the current sigmas rather than giving completly new sigma.

Also it scares me a bit to see that sample is an input to the function, it gives the impression that it's used to compute the sigmas. Can we instead maybe just pass shape and device?

Good poin! In this case we will need to compute the sample_hat in the pipeline then. Also, the function then will have to return the eps as well which is needed to compute sample_hat. I'm still in favor of add_noise_to_input as it makes clear that the function adds noise to the inputs rather using sample to compute sigma. That's also how the paper paper describes this part.

Opting for add_noise_to_input for now :)

ah sorry @anton-l you're right - missed that part. Then think we should more or less leave as is and maybe rename to compute_sigmas(...) to make sure reader knows that sigmas are dependent on the sample and computed (not taken from a predefined list)

Feel free to go ahead as you want though @anton-l

src/diffusers/pipelines/stochatic_karras_ve/pipeline_stochastic_karras.py

patil-suraj · 2022-08-08T16:41:24Z

src/diffusers/schedulers/scheduling_karras_ve.py

+        s_churn=80,
+        s_min=0.05,
+        s_max=50,


(nit) maybe rename the s_ parameters to scale_ for example scale_churn.

the S likely refers to "stochastic" in the paper (page 7, "Practical considerations" https://arxiv.org/pdf/2206.00364.pdf), so I'm not sure this would make it more explicit. I'll add detailed docstrings for them instead.

looking at the function these values are used to scale the stochastic noise so scale would be good IMO.

Also these arguments are a bit low level to expose them them in the __call __. So maybe keep in scheduler. Having scheduler self-contained would be better IMO.
Or do you think there is some advantage having them in pipeline ?

@patil-suraj on second thought, these parameters are best to have inside the scheduler, since they have to be tuned to each specific model. BTW, take a look at the docstrings so far to see if they would fit scale_

src/diffusers/schedulers/scheduling_karras_ve.py

tests/test_modeling_utils.py

src/diffusers/schedulers/scheduling_utils.py

patrickvonplaten · 2022-08-09T12:02:42Z

tests/test_modeling_utils.py

@@ -920,3 +922,19 @@ def test_ddpm_ddim_equality_batched(self):

        # the values aren't exactly equal, but the images look the same visually
        assert np.abs(ddpm_images - ddim_images).max() < 1e-1
+
+    @slow
+    def test_karras_ve_pipeline(self):


Very cool!!! (Did you also try it with the 1024 one?)

Yes, the results are pretty ok :)
(Although they can be made better with a bit of grid search to find the optimal s_churn, s_noise, etc, since the paper was dealing with much smaller models and I just guessed the params)

patrickvonplaten · 2022-08-09T12:18:31Z

Great work @anton-l ! It looks good to me, just left some nits.

A more general comment. This scheduler goes a bit against our design of single step function. But given that the scheduler requires two model evaluations and the second eval depends on the output of first we can't really have a single step function. We also need to do some bookkeeping outside the scheduler like storing sigma , sigma_prev etc. (Looking at the code think we can probably avoid it). cc @patrickvonplaten

Thanks a lot for working on this!

That's indeed a very important question and I don't really know the best answer here.

Note that we could make it work by making step stateful (like we did in the pndm scheduler), but then it becomes maybe a bit too magic.

Also correct is a pretty good name here IMO opinion because it "corrects" the initial prediction with a (second) derivative computation.

So @anton-l and @patil-suraj feel free to go ahead with whatever you think is best here!

patrickvonplaten

Amazing work Anton! This is the by far best continuous scheduler that we have now and makes the ve models usable (maybe even on CPU!)

Also it's great that the code is so clean and includes links to the paper!

Left some remarks, mostly nits good to go for me!

Let's make sure to advertise this well worth some nice examples (maybe linking the NVIDIA author as well)

patil-suraj · 2022-08-09T13:44:26Z

src/diffusers/schedulers/scheduling_karras_ve.py

+            s_noise (`float`): the amount of additional noise to counteract loss of detail during sampling.
+                A reasonable range is [1.000, 1.011].
+            s_churn (`float`): the parameter controlling the overall amount of stochasticity.
+                A reasonable range is [0, 100].
+            s_min (`float`): the start of the sigma range where we add noise (enable stochasticity)
+                A reasonable range is [0, 10].
+            s_max (`float`): the end of the sigma range where we add noise
+                A reasonable range is [0.2, 80].


This makes it much clear now! Agree with your comment, not all of these can be renamed to scale_. Some suggestions:

Suggested change

s_noise (`float`): the amount of additional noise to counteract loss of detail during sampling.

A reasonable range is [1.000, 1.011].

s_churn (`float`): the parameter controlling the overall amount of stochasticity.

A reasonable range is [0, 100].

s_min (`float`): the start of the sigma range where we add noise (enable stochasticity)

A reasonable range is [0, 10].

s_max (`float`): the end of the sigma range where we add noise

A reasonable range is [0.2, 80].

scale_noise (`float`): the amount of additional noise to counteract loss of detail during sampling.

A reasonable range is [1.000, 1.011].

s_churn (`float`): the parameter controlling the overall amount of stochasticity.

A reasonable range is [0, 100].

sigma_min_stochastic (`float`): the start value of the sigma range where we add noise (enable stochasticity).

A reasonable range is [0, 10].

sigma_max_stochastic (`float`): the end value of the sigma range where we add noise

A reasonable range is [0.2, 80]. We add noise to sigma range `[sigma_min_stochastic, sigma_max_stochastic]`

The thing is that after renaming it's harder to refer to their paper counterparts :(
Let's keep them as is for now, maybe we'll figure something out while implementing the SD scheduler

…uggingface#160) * karras + VE, not flexible yet * Fix inputs incompatibility with the original unet * Roll back sigma scaling * Apply suggestions from code review * Old comment * Fix doc

anton-l added 2 commits August 5, 2022 18:05

karras + VE, not flexible yet

41c5a6f

Fix inputs incompatibility with the original unet

e0d17a4

anton-l requested a review from patil-suraj August 8, 2022 15:59

Roll back sigma scaling

ea2bfb6

patil-suraj approved these changes Aug 8, 2022

View reviewed changes

patil-suraj reviewed Aug 8, 2022

View reviewed changes

src/diffusers/schedulers/scheduling_utils.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Aug 9, 2022

View reviewed changes

patrickvonplaten approved these changes Aug 9, 2022

View reviewed changes

anton-l added 2 commits August 9, 2022 15:04

Apply suggestions from code review

0ec09e9

Old comment

aea9fa2

patil-suraj reviewed Aug 9, 2022

View reviewed changes

Fix doc

b7d16f5

anton-l merged commit dd10da7 into main Aug 9, 2022

anton-l deleted the stochastic-scheduler branch August 17, 2022 12:35

Add an alternative Karras et al. stochastic scheduler for VE models #160

Add an alternative Karras et al. stochastic scheduler for VE models #160

Uh oh!

Conversation

anton-l commented Aug 8, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patil-suraj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anton-l Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patil-suraj Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anton-l Aug 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Aug 9, 2022

Uh oh!

patrickvonplaten left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2022 •

edited

Loading

patrickvonplaten Aug 9, 2022 •

edited

Loading

anton-l Aug 9, 2022 •

edited

Loading

patil-suraj Aug 9, 2022 •

edited

Loading

anton-l Aug 9, 2022 •

edited

Loading

patrickvonplaten left a comment •

edited

Loading