Question about model architecture of HiDream-E1.

Congratulations on your great work !
I have some questions about the architecture of HiDream-E1, I would be grateful if you could provide some insignts.
1. Why concat latent image and condition image at width dimension before patchify,  instead of token dimension (height dimension) after patchify ?  
2. The position embedding seems to regard latents and condition image as an Union Image. Why not using separating position embedding ( e.g., OminiControl ) for latents and condition image ?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about model architecture of HiDream-E1. #19

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about model architecture of HiDream-E1. #19

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions