Skip to content

Question about model architecture of HiDream-E1. #19

@CyberPegasus

Description

@CyberPegasus

Congratulations on your great work !
I have some questions about the architecture of HiDream-E1, I would be grateful if you could provide some insignts.

  1. Why concat latent image and condition image at width dimension before patchify, instead of token dimension (height dimension) after patchify ?
  2. The position embedding seems to regard latents and condition image as an Union Image. Why not using separating position embedding ( e.g., OminiControl ) for latents and condition image ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions