Skip to content

[prototype] Minor speed and nit optimizations on Transform Classes #6837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 27, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions torchvision/prototype/transforms/_geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -522,7 +522,7 @@ def _get_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:
torch.randint(height - bound_height, height, size=(1,)).item(),
]
startpoints = [[0, 0], [width - 1, 0], [width - 1, height - 1], [0, height - 1]]
endpoints = [topleft, topright, botright, botleft]
endpoints: List[List[int]] = [topleft, topright, botright, botleft] # type: ignore[list-item]
perspective_coeffs = _get_perspective_coeffs(startpoints, endpoints)
return dict(perspective_coeffs=perspective_coeffs)

Expand Down Expand Up @@ -622,7 +622,7 @@ def _get_params(self, flat_inputs: List[Any]) -> Dict[str, Any]:

while True:
# sample an option
idx = torch.randint(low=0, high=len(self.options), size=(1,)).item()
idx: int = torch.randint(low=0, high=len(self.options), size=(1,)).item() # type: ignore[assignment]
min_jaccard_overlap = self.options[idx]
if min_jaccard_overlap >= 1.0: # a value larger than 1 encodes the leave as-is option
return dict()
Expand Down
8 changes: 4 additions & 4 deletions torchvision/prototype/transforms/_type_conversion.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Any, cast, Dict, Optional, Union
from typing import Any, Dict, Optional, Union

import numpy as np
import PIL.Image
Expand All @@ -13,7 +13,7 @@ class DecodeImage(Transform):
_transformed_types = (features.EncodedImage,)

def _transform(self, inpt: torch.Tensor, params: Dict[str, Any]) -> features.Image:
return cast(features.Image, F.decode_image_with_pil(inpt))
return F.decode_image_with_pil(inpt) # type: ignore[no-any-return]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be here, because it seems

@torch.jit.unused
def decode_image_with_pil(encoded_image: torch.Tensor) -> features.Image:

doesn't "forward" the type annotations 🙄

Copy link
Contributor Author

@datumbox datumbox Oct 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all other places we took the decision to silence with ignore rather than cast, do we really need the cast here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nono, I was just explaining why we need the ignore for future me that is looking confused at the blame why we introduced it in the first place.



class LabelToOneHot(Transform):
Expand All @@ -27,7 +27,7 @@ def _transform(self, inpt: features.Label, params: Dict[str, Any]) -> features.O
num_categories = self.num_categories
if num_categories == -1 and inpt.categories is not None:
num_categories = len(inpt.categories)
output = one_hot(inpt, num_classes=num_categories)
output = one_hot(inpt.as_subclass(torch.Tensor), num_classes=num_categories)
return features.OneHotLabel(output, categories=inpt.categories)

def extra_repr(self) -> str:
Expand All @@ -50,7 +50,7 @@ class ToImageTensor(Transform):
def _transform(
self, inpt: Union[torch.Tensor, PIL.Image.Image, np.ndarray], params: Dict[str, Any]
) -> features.Image:
return cast(features.Image, F.to_image_tensor(inpt))
return F.to_image_tensor(inpt) # type: ignore[no-any-return]


class ToImagePIL(Transform):
Expand Down
2 changes: 1 addition & 1 deletion torchvision/prototype/transforms/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def _check_padding_mode_arg(padding_mode: Literal["constant", "edge", "reflect",


def query_bounding_box(flat_inputs: List[Any]) -> features.BoundingBox:
bounding_boxes = {inpt for inpt in flat_inputs if isinstance(inpt, features.BoundingBox)}
bounding_boxes = [inpt for inpt in flat_inputs if isinstance(inpt, features.BoundingBox)]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use a list here instead of a set.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a perf optimization? Otherwise I would prefer the set since

  1. it makes it more clear we are looking for duplicates
  2. it is aligned with the other query_* function below.

Functionally, the only difference is that with a set, passing the same bounding box twice in the same sample would not be caught. This indeed seems like a user error, but if we raise there, we should probably check for duplicates everywhere. Not sure if we want to single out bounding boxes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's both perf and code correctness. What we want here is to ensure there is only 1 bbox in the input. If there were two, even if they were the same exact copy, our transforms that rely on query_bounding_box would end up working incorrectly. Effectively they would end up modifying the first copy but not the second. This is a different use-case from the rest of the query_* methods where we are able to handle multiple entries but just want to ensure same size across.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's both perf and code correctness.

Are list's actually faster than set's? I'm honestly curious since I never benchmarked this 😇

If there were two, even if they were the same exact copy, our transforms that rely on query_bounding_box would end up working incorrectly. Effectively they would end up modifying the first copy but not the second.

Agreed.

This is a different use-case from the rest of the query_* methods where we are able to handle multiple entries but just want to ensure same size across.

You are right that, I was not clear enough. What I meant was: we are not checking for duplicate images, videos, masks, ... anywhere. Either we should do that (and that seems like the right thing to do, but that probably has perf implications) or we shouldn't single out bounding boxes here. No strong opinion though. A single mitigation is probably better than none. But we should still discuss if we need to check this for everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, query_chw and query_spatial_size must use sets because what we are interested in is enforcing that all images have a single size. Multiple images/videos are allowed in these cases, we just enforce the size is the same everywhere.

On the other hand the query_bounding_box is used to extract the single bbox. The transforms that use the specific util, handle only 1 bbox and this is why we need a list. Note that this is identical to the _flatten_and_extract_image_or_video from AA where we also use a list.

So this is primarily a bug fix to restore the right semantics on the function and also has secondary positive performance implications.

if not bounding_boxes:
raise TypeError("No bounding box was found in the sample")
elif len(bounding_boxes) > 1:
Expand Down