You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Forcing to_copy to insert ICast Layer reduces perf (~10%) on Unet.
It's not necessary to insert a Cast Layer if the dtype doesn't change, e.g., from DataType.HALF to DataType.HALF:
Forced Cast ITensor [NORMALIZATION]-[aten_ops.native_group_norm.default]-[model.1.submodule.1.submodule.conv.unit0.adn.N/native_group_norm_4]_output from DataType.HALF to DataType.HALF - [aten_ops.torch.ops.aten.clone.default]-[model.1.submodule.1.submodule.conv.unit0.adn.D/clone_4], type: LayerType.CAST, inputs: 1, outputs: 1
Currently, all copy related ops are inserting Cast Layer and TensorRT doesn't remove them for us during optimization. We need to carefully think about when is a must to insert Cast Layer.