-
Notifications
You must be signed in to change notification settings - Fork 6k
Chroma Follow Up #11725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chroma Follow Up #11725
Conversation
@AmericanPresidentJimmyCarter do these changes help with the quality issue you're seeing? cc: @Ednaordinary |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
After installing Better now used this config pipeline_quant_config = PipelineQuantizationConfig(
quant_backend="bitsandbytes_4bit",
quant_kwargs={
"load_in_4bit": True,
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_compute_dtype": dtype,
"llm_int8_skip_modules": ["distilled_guidance_layer"],
},
components_to_quantize=["transformer", "text_encoder"],
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! Apologies I couldn't have been more help, I was busy all day. I did start on my own version which I'll compare to this version if I can get it to work well
@@ -256,6 +256,8 @@ def encode_prompt( | |||
num_images_per_prompt: int = 1, | |||
prompt_embeds: Optional[torch.FloatTensor] = None, | |||
negative_prompt_embeds: Optional[torch.FloatTensor] = None, | |||
prompt_attention_mask: Optional[torch.Tensor] = None, | |||
negative_prompt_attention_mask: Optional[torch.Tensor] = None, | |||
do_classifier_free_guidance: bool = True, | |||
max_sequence_length: int = 512, | |||
lora_scale: Optional[float] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include negative_prompt_embeds
(and prompt_attention_mask, negative_prompt_attention mask) in docs. I missed it in the pipeline PR
Also, now that prompt_embeds and negative_prompt embeds consistently stay the same size, we should batch cond and uncond (also simplifying attention_mask + negative_attention_mask into just attention_mask) Update: made a PR for this in #11729 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
do we already have ip-adapter for chroma?
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor * 2) | ||
self.default_sample_size = 128 | ||
|
||
def _get_t5_prompt_embeds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we create a custom attention mask as well for Chroma that unmasks a single pad token. So the implementation is different from existing methods.
|
||
return image_latents | ||
|
||
def encode_prompt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copied from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns attention masks, so I think it would be different no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also has negative_prompt
and negative_prompt_embeds
which flux doesn't have
image_embeds = image_embeds.repeat_interleave(num_images_per_prompt, dim=0) | ||
return image_embeds | ||
|
||
def prepare_ip_adapter_image_embeds( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh we already have ip adapter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the FluxIPAdapter does work with Chroma (you can load the adapter and run inference). It's just that the quality is not very good because the popular ones are trained for Flux Dev, while Chroma is based on Schnell weights.
I'm cool to remove it, but the consensus here was to leave it in case an IPAdapter for Chroma was trained.
f"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:" | ||
f" {negative_prompt_embeds}. Please make sure to only forward one of the two." | ||
) | ||
if prompt_attention_mask is not None and negative_prompt_attention_mask is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if negative_prompt is None, negative_prompt_attention_mask can be None too, no?
raise ValueError( | ||
"Cannot provide `negative_prompt_attention_mask` without also providing `prompt_attention_mask`" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to pass the mask, if we pass the embeddings directly, no?
if prompt_embeds is not None, prompt_attention_mask is None:
raise ValueError(...)
With this PR in combination with #11729 I need to manually pad prompt_embeds and negative_prompt_embeds to the same size now when I pass them manually. Is this intentional and consistent with other pipelines? This was not needed before these changes. The following error occurs now whereas it does not in the main branch.
This is my workaround: def get_weighted_prompt_embeddings_chroma(
pipe: ChromaPipeline,
prompt: str = "",
negative_prompt: str = "",
device=None
):
# ... do some stuff ...
# before the two PRs, this would work:
# return prompt_embeds, negative_prompt_embeds
# now this is needed:
return _pad_prompt_embeds_to_same_size(prompt_embeds, negative_prompt_embeds)
def _pad_prompt_embeds_to_same_size(prompt_embeds_a, prompt_embeds_b):
size_a = prompt_embeds_a.size(1)
size_b = prompt_embeds_b.size(1)
if size_a < size_b:
pad_size = size_b - size_a
prompt_embeds_a = F.pad(prompt_embeds_a, (0, 0, 0, pad_size))
elif size_b < size_a:
pad_size = size_a - size_b
prompt_embeds_b = F.pad(prompt_embeds_b, (0, 0, 0, pad_size))
return prompt_embeds_a, prompt_embeds_b |
My question was more about the user having to pad the embeds manually instead of the pipeline doing it automatically. I was not questioning the need for padding itself. I am almost sure this is isn't needed to be done by the user when using (at least some) other pipelines. |
If one doesn't pass in the attention mask when going from prompts to embeds, then yes it is automatic. However; I expect SDnext to actually pass the attention mask in due to e.g. weighted parts of prompts. |
What does this PR do?
Follow up to #11698
This PR
Fixes # (issue)
#11724
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.