Hi.
I have been trying to use the library at scale (up to 1024 gpus) on Frontier (MI250X). Is it possible to have support for CDNA 2 architecture as well? Especially the GroupGEMM kernel, which currently utilizes fp8, which is unsupportive on MI250X.