[PoC] simplify simple tensor fallback heuristic #7340
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Since we had some internal discussions about the heuristic before and it came up again in #7331 (comment), this PR is an attempt to simplify it while adhering to the original goals. Let's start with a little bit of context:
When transforms v2 was conceived, one major design goal was to make it BC to v1. Part of that is that we need to treat simple
torch.Tensor
's as images and don't require users to wrap them into adatapoints.Image
or similar. To achieve that the functional API internally just dispatches to the*_image_tensor
kernel, e.g.vision/torchvision/transforms/v2/functional/_geometry.py
Lines 78 to 79 in 01ef0a6
By not adding any logic other than allowing simple tensors to be transformed, the transforms inherited this behavior. However, this proved detrimental for two reasons:
datapoints.Label
anddatapoints.OneHotLabel
in the prototype area for now, we wanted to represent them as simple tensorsCelebA
return simple tensors as part of their annotations.To support these use cases, the initial idea was to introduce a no-op datapoint ():
This could be easily filtered out by the transforms. However this again had two issues:
To overcome this, #7170 added a heuristic that currently behaves as follows:
vision/gallery/plot_transforms_v2.py
Lines 92 to 95 in 01ef0a6
This solves the issues above. However, it goes beyond the original goal of keeping BC: v1 does not support joint transformations and thus allowing simple tensors to act as images in a joint context is not needed for BC. And this is the part that makes the current heuristic more complicated than it has to be since it introduces stuff like order into the picture.
The heuristic this PR proposed goes a more pragmatic approach regarding BC:
The only thing we are losing by going for this simplification is the ability to intentionally use simple tensors as images in a joint setting. IMHO, it isn't a big ask of users to wrap into a
datapoints.Image
there, since they will have to wrap intodatapoints.Mask
's anddatapoints.BoundingBox
'es anyway in a joint context.