-
Notifications
You must be signed in to change notification settings - Fork 7.1k
refactor prototype.transforms.RandomCrop
#6640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@pmeier Nice work! I do agree that the behaviour you describe on the 3rd case looks more like a bug/omission on the original implementation rather than an intentional behaviour. I also never seen anyone in practice doing the corner-case transform you describe on the 3rd case. I was hoping if you could provide some measurements on the speed difference between the 2 implementations? @fmassa / @vfdev-5 Do you have an opinion on whether the above is an intended behaviour or just an omission/bug on the cornercase of the previous implementation? Another approach we could take if we think the previous behaviour was intentional is to apply the speed optimization only when possible (for example when we use |
There are only two cases where we could hit more than one pad:
Since 1. is quite unrealistic, I focused on 2. I also used Code
Run this snippet on the import pickle
import itertools
import torch
from torch.utils import benchmark
from torchvision.prototype import transforms
description = "main" # "main", "PR"
measurements = []
for input_size, output_size_factor in itertools.product(
[64, 256, 1024],
[1.0, 1.5, 2.0],
):
image = torch.randint(0, 256, (3, input_size, input_size), dtype=torch.uint8)
output_size = int(input_size * output_size_factor)
transform = transforms.RandomCrop(output_size, pad_if_needed=True)
timer = benchmark.Timer(
stmt="transform(image)",
globals=dict(
transform=transform,
image=image,
),
label="RandomCrop with padding needed on both sides",
sub_label=f"{input_size:4d} -> {output_size}",
description=description,
)
measurements.append(timer.blocked_autorange(min_run_time=5))
with open(f"{description}.measurements", "wb") as fh:
pickle.dump(measurements, fh) Afterwards run this snippet for the results: import pathlib
import pickle
from torch.utils import benchmark
measurements = []
for file in pathlib.Path(".").glob("*.measurements"):
with open(file, "rb") as fh:
measurements.extend(pickle.load(fh))
comparison = benchmark.Compare(measurements)
comparison.trim_significant_figures()
In the tested scenario, this PR gives us roughly a 30%-50% perf boost. As for the real impact of this, it is hard to say. Only the segmentation and video classification references are using it: vision/references/segmentation/presets.py Line 15 in 7046e56
In case of the segmentation references, the However, we discussed offline to use it directly there since it uses a more reasonable padding strategy. This gives us the opportunity to actually compute an estimate boost in a real setting. Code
import torch
from torchvision import datasets
# https://github.com/pytorch/vision/blob/7046e56fe4370e94339b3e8b6fd011e285294a3a/references/segmentation/train.py#L34
base_size = 520
crop_size = 480
dataset = datasets.CocoDetection(
"/home/philip/datasets/coco/train2017",
"/home/philip/datasets/coco/annotations/instances_train2017.json",
)
print(f"Dataset has {len(dataset)} samples")
input_sizes = torch.tensor([(image.height, image.width) for image, _ in dataset])
input_smaller_size = torch.min(input_sizes, dim=-1, keepdim=True).values
torch.manual_seed(0)
random_resize_factor = (
torch.distributions.Uniform(0.5, 2.0).sample(input_smaller_size.shape) * base_size / input_smaller_size
)
random_resized_input_sizes = (input_sizes * random_resize_factor).int()
needs_pad = random_resized_input_sizes < crop_size
needs_one_pad = needs_pad.any(dim=-1)
print(f"{int(torch.sum(needs_one_pad)) / len(dataset):.1%} of the images need one padding")
needs_two_pads = needs_pad.all(dim=-1)
print(f"{int(torch.sum(needs_two_pads)) / len(dataset):.1%} of the images need two paddings")
input_shapes_two_pads = random_resized_input_sizes[needs_two_pads.unsqueeze(1).repeat(1, 2)].reshape(-1, 2)
print(f"Of those the median shape is {tuple(torch.median(input_shapes_two_pads.float(), dim=0).values.int().tolist())}")
Re-running our benchmarks from above with inputs of shape
So we are looking at an ~50% decrease in ~10% of the cases or a total of a 5% decrease. In total though we are saving only about 2 seconds per epoch. To conclude, although this PR actually speeds up the transform, in a real world scenario the patch makes no significant difference. It is a nice code clean up though. |
prototype.transforms,RandomCrop
prototype.transforms.RandomCrop
This was a bug in behavior I believe, and would only happen on really very few cases, so I would be ok breaking BC here. |
Co-authored-by: vfdev <[email protected]>
…into random-crop-padding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @pmeier !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Summary: * refactor RandomCrop * mypy * fix test * use padding directly rather than private attribute * only compute type specific fill if padding is needed * [DRAFT] don't use the diff trick * fix error message * remove height and width diff * reinstate separate diff checking * introduce needs_crop flag Reviewed By: datumbox Differential Revision: D40138740 fbshipit-source-id: 2dac098db8270b2d377036db2eb34dd10c8df137 Co-authored-by: vfdev <[email protected]> Co-authored-by: vfdev <[email protected]>
This PR eliminates two
F.pad
calls inRandomCrop
by computing the full padding upfront. IMO, the implementation of_get_params
is also more clear now, but you shouldn't trust that since I'm the author of the patch 😛Since we are only touching the padding here, we can compare the old and new transform by ignoring the cropping altogether. Plus, since the only random behavior happens for the crop coordinate selection, we also don't need to pay attention to random behavior.
So everything works out perfectly? No 😞
Replacing multiple
F.pad
calls with a single one only works if the later ones don't depend on the earlier ones. In general, this is only given forpadding_mode="constant"
andpadding_mode="edge"
.We will use this "1D" gradient image for simplification:
Let's look at three examples for
padding_mode="reflect"
✔️
padding
onlyold:
new:
✔️
pad_if_needed
onlyold:
new:
✖️
padding
andpad_if_needed
old:
new:
Since the old implementation pads twice independently from each other, the second pad reflects the reflection created by the first pad. IMO, the new implementation is doing the right thing here and only reflecting once.
My guess is that this was just an oversight in the original stable implementation. Given that this is an obscure use case anyway, i.e. setting a fixed padding as well as using the dynamic one, and is not a problem at all for the most dominant
padding_mode="constant"
, I would be ok to break BC here in favor of performance and simplified implementation. One could also argue that this can be considered a bug fix although I'm not keen on porting this change back.