Broken group offloading using block_level

### Describe the bug

pr #12283 introduced group offloading on a pipeline  
but its currently broken when using offload_type="block_level" as it does not handle vae module correctly  
and result is failure with input vs weight device mismatch  
using `offload_type="leaf_level"` it seems to work  

### Reproduction

```py
import torch
import diffusers

model = '/mnt/models/stable-diffusion/mine/tempest-by-vlad-0.1.safetensors'
cache_dir = '/mnt/models/Diffusers'
pipe = diffusers.StableDiffusionXLPipeline.from_single_file(
    model,
    torch_dtype=torch.bfloat16,
    cache_dir=cache_dir,
)
pipe.enable_group_offload(
    onload_device=torch.device('cuda:0'),
    offload_device=torch.device('cpu'),
    offload_type="block_level",
    num_blocks_per_group=1,
    use_stream=True
)
image = pipe(
    prompt='A beautiful painting of a futuristic cityscape at sunset',
    width=1024,
    height=1024,
    num_inference_steps=10,
).images[0]
image.save('/tmp/output.png')
```

### Logs

```shell
Traceback (most recent call last):
...
File: diffusers/models/autoencoders/autoencoder_kl.py", line 294, in _decode
> image = self.vae.decode(latents, return_dict=False)[0]
...
RuntimeError: Input type (CUDABFloat16Type) and weight type (CPUBFloat16Type) should be the same
```

### System Info

python==3.12, diffusers==5e181eddfe7e44c1444a2511b0d8e21d177850a0

### Who can help?

@sayakpaul @yiyixuxu @DN6 @asomoza @a-r-r-o-w

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Broken group offloading using block_level #12319

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Broken group offloading using block_level #12319

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions