refactor `prototype.transforms.RandomCrop` #6640

pmeier · 2022-09-23T20:44:42Z

This PR eliminates two F.pad calls in RandomCrop by computing the full padding upfront. IMO, the implementation of _get_params is also more clear now, but you shouldn't trust that since I'm the author of the patch 😛

Since we are only touching the padding here, we can compare the old and new transform by ignoring the cropping altogether. Plus, since the only random behavior happens for the crop coordinate selection, we also don't need to pay attention to random behavior.

import itertools
import unittest.mock

import torch

from torchvision.prototype import transforms


input_size = (32, 17)

for size, padding, pad_if_needed in itertools.product(
    [64, 16],
    [None, 5],
    [True, False],
):
    if any(size > dim for dim in input_size) and not pad_if_needed:
        continue

    old_transform = OldRandomCropWithoutCrop(size, padding=padding, pad_if_needed=pad_if_needed)
    new_transform = transforms.RandomCrop(size, padding=padding, pad_if_needed=pad_if_needed)

    image = torch.rand(3, *input_size)

    old_output = old_transform(image)

    with unittest.mock.patch("torchvision.prototype.transforms._geometry.F.crop", new=lambda x, *_, **__: x):
        new_output = new_transform(image)

    torch.testing.assert_close(new_output, old_output, rtol=0, atol=0)

So everything works out perfectly? No 😞

Replacing multiple F.pad calls with a single one only works if the later ones don't depend on the earlier ones. In general, this is only given for padding_mode="constant" and padding_mode="edge".

We will use this "1D" gradient image for simplification:

Let's look at three examples for padding_mode="reflect"

✔️ `padding` only

old:

new:

✔️ `pad_if_needed` only

old:

new:

✖️ `padding` and `pad_if_needed`

old:

new:

Since the old implementation pads twice independently from each other, the second pad reflects the reflection created by the first pad. IMO, the new implementation is doing the right thing here and only reflecting once.

My guess is that this was just an oversight in the original stable implementation. Given that this is an obscure use case anyway, i.e. setting a fixed padding as well as using the dynamic one, and is not a problem at all for the most dominant padding_mode="constant", I would be ok to break BC here in favor of performance and simplified implementation. One could also argue that this can be considered a bug fix although I'm not keen on porting this change back.

datumbox · 2022-09-24T09:26:00Z

@pmeier Nice work! I do agree that the behaviour you describe on the 3rd case looks more like a bug/omission on the original implementation rather than an intentional behaviour. I also never seen anyone in practice doing the corner-case transform you describe on the 3rd case. I was hoping if you could provide some measurements on the speed difference between the 2 implementations?

@fmassa / @vfdev-5 Do you have an opinion on whether the above is an intended behaviour or just an omission/bug on the cornercase of the previous implementation?

Another approach we could take if we think the previous behaviour was intentional is to apply the speed optimization only when possible (for example when we use constant) which will probably account for 99.999% of real-world use-cases. This would probably have implications on the code-quality (we would have a very complex and difficult to maintain implementation). To avoid this we could split the Transform into 2 private Transform classes and use internally the optimal one based on the constructor configuration (that's not an idiom we use currently but we can discuss the details).

pmeier · 2022-09-24T22:25:10Z

I was hoping if you could provide some measurements on the speed difference between the 2 implementations?

There are only two cases where we could hit more than one pad:

Setting padding and pad_if_needed at the same time and having an input that triggers pad_if_needed in at least one dimension.
Setting only pad_if_needed and having an input that triggers it in both dimensions.

Since 1. is quite unrealistic, I focused on 2. I also used padding_mode="constant", because that should account for most of the real world usages.

Code

Run this snippet on the main branch as well on the branch of this PR just changing the description:

import pickle
import itertools

import torch
from torch.utils import benchmark

from torchvision.prototype import transforms

description = "main"  # "main", "PR"

measurements = []
for input_size, output_size_factor in itertools.product(
    [64, 256, 1024],
    [1.0, 1.5, 2.0],
):
    image = torch.randint(0, 256, (3, input_size, input_size), dtype=torch.uint8)

    output_size = int(input_size * output_size_factor)
    transform = transforms.RandomCrop(output_size, pad_if_needed=True)

    timer = benchmark.Timer(
        stmt="transform(image)",
        globals=dict(
            transform=transform,
            image=image,
        ),
        label="RandomCrop with padding needed on both sides",
        sub_label=f"{input_size:4d} -> {output_size}",
        description=description,
    )

    measurements.append(timer.blocked_autorange(min_run_time=5))

with open(f"{description}.measurements", "wb") as fh:
    pickle.dump(measurements, fh)

Afterwards run this snippet for the results:

import pathlib
import pickle

from torch.utils import benchmark

measurements = []
for file in pathlib.Path(".").glob("*.measurements"):
    with open(file, "rb") as fh:
        measurements.extend(pickle.load(fh))

comparison = benchmark.Compare(measurements)
comparison.trim_significant_figures()

[ RandomCrop with padding needed on both sides ]
                    |   main  |   PR 
1 threads: --------------------------
        64 -> 64    |      9  |     9
        64 -> 96    |     66  |    49
        64 -> 128   |     88  |    65
       256 -> 256   |     10  |     9
       256 -> 384   |    389  |   256
       256 -> 512   |    716  |   523
      1024 -> 1024  |      9  |     9
      1024 -> 1536  |   5390  |  3600
      1024 -> 2048  |  10500  |  7860

Times are in microseconds (us).

In the tested scenario, this PR gives us roughly a 30%-50% perf boost. As for the real impact of this, it is hard to say. Only the segmentation and video classification references are using it:

vision/references/segmentation/presets.py

Line 15 in 7046e56

T.RandomCrop(crop_size),

vision/references/video_classification/presets.py

Line 22 in 7046e56

    
           trans.extend([transforms.Normalize(mean=mean, std=std), transforms.RandomCrop(crop_size), ConvertBCHWtoCBHW()])

In case of the segmentation references, the RandomCrop is actually differs from what we have in our prototype namespace there by the padding performed before the cropping. See #6433 (comment) for details.

However, we discussed offline to use it directly there since it uses a more reasonable padding strategy. This gives us the opportunity to actually compute an estimate boost in a real setting.

Code

import torch
from torchvision import datasets

# https://github.com/pytorch/vision/blob/7046e56fe4370e94339b3e8b6fd011e285294a3a/references/segmentation/train.py#L34
base_size = 520
crop_size = 480

dataset = datasets.CocoDetection(
    "/home/philip/datasets/coco/train2017",
    "/home/philip/datasets/coco/annotations/instances_train2017.json",
)
print(f"Dataset has {len(dataset)} samples")

input_sizes = torch.tensor([(image.height, image.width) for image, _ in dataset])

input_smaller_size = torch.min(input_sizes, dim=-1, keepdim=True).values
torch.manual_seed(0)
random_resize_factor = (
    torch.distributions.Uniform(0.5, 2.0).sample(input_smaller_size.shape) * base_size / input_smaller_size
)

random_resized_input_sizes = (input_sizes * random_resize_factor).int()

needs_pad = random_resized_input_sizes < crop_size

needs_one_pad = needs_pad.any(dim=-1)
print(f"{int(torch.sum(needs_one_pad)) / len(dataset):.1%} of the images need one padding")

needs_two_pads = needs_pad.all(dim=-1)
print(f"{int(torch.sum(needs_two_pads)) / len(dataset):.1%} of the images need two paddings")

input_shapes_two_pads = random_resized_input_sizes[needs_two_pads.unsqueeze(1).repeat(1, 2)].reshape(-1, 2)
print(f"Of those the median shape is {tuple(torch.median(input_shapes_two_pads.float(), dim=0).values.int().tolist())}")

Dataset has 118287 samples
28.1% of the images need one padding
10.9% of the images need two paddings
Of those the median shape is (318, 399)

Re-running our benchmarks from above with inputs of shape (318, 399) and size=480 yields

[ RandomCrop with padding needed on both sides ]
                                |   main  |   PR
1 threads: -------------------------------------
      (318, 399) -> (480, 480)  |  500.2  |  336

Times are in microseconds (us).

So we are looking at an ~50% decrease in ~10% of the cases or a total of a 5% decrease. In total though we are saving only about 2 seconds per epoch.

To conclude, although this PR actually speeds up the transform, in a real world scenario the patch makes no significant difference. It is a nice code clean up though.

torchvision/prototype/transforms/_geometry.py

fmassa · 2022-09-26T08:33:34Z

@datumbox

Do you have an opinion on whether the above is an intended behaviour or just an omission/bug on the cornercase of the previous implementation?

This was a bug in behavior I believe, and would only happen on really very few cases, so I would be ok breaking BC here.
For most cases where users apply T.Resize beforehand they don't need this option, as discussed in #462

torchvision/prototype/transforms/_geometry.py

Co-authored-by: vfdev <[email protected]>

…into random-crop-padding

torchvision/prototype/transforms/_geometry.py

vfdev-5

lgtm, thanks @pmeier !

datumbox

LGTM, thanks!

Summary: * refactor RandomCrop * mypy * fix test * use padding directly rather than private attribute * only compute type specific fill if padding is needed * [DRAFT] don't use the diff trick * fix error message * remove height and width diff * reinstate separate diff checking * introduce needs_crop flag Reviewed By: datumbox Differential Revision: D40138740 fbshipit-source-id: 2dac098db8270b2d377036db2eb34dd10c8df137 Co-authored-by: vfdev <[email protected]> Co-authored-by: vfdev <[email protected]>

pmeier added 3 commits September 23, 2022 21:43

refactor RandomCrop

f49b4d2

mypy

a684ac7

fix test

80dd0be

pmeier added module: transforms code quality prototype labels Sep 23, 2022

pmeier requested review from vfdev-5 and datumbox September 23, 2022 20:44

facebook-github-bot added the cla signed label Sep 23, 2022

datumbox added the Perf For performance improvements label Sep 24, 2022

vfdev-5 changed the title ~~refactor prototype.transforms,RandomCrop~~ refactor prototype.transforms.RandomCrop Sep 26, 2022

vfdev-5 reviewed Sep 26, 2022

View reviewed changes

torchvision/prototype/transforms/_geometry.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Sep 26, 2022

View reviewed changes

torchvision/prototype/transforms/_geometry.py Outdated Show resolved Hide resolved

pmeier added 2 commits September 27, 2022 11:35

use padding directly rather than private attribute

64014dc

only compute type specific fill if padding is needed

b7c8dc4

vfdev-5 reviewed Sep 27, 2022

View reviewed changes

torchvision/prototype/transforms/_geometry.py Outdated Show resolved Hide resolved

torchvision/prototype/transforms/_geometry.py Outdated Show resolved Hide resolved

pmeier and others added 6 commits September 27, 2022 16:02

[DRAFT] don't use the diff trick

5aff3b7

fix error message

f8a841e

Co-authored-by: vfdev <[email protected]>

remove height and width diff

3d2fc44

Merge branch 'main' into random-crop-padding

a3aaf16

Merge branch 'random-crop-padding' of https://github.com/pmeier/vision …

7c37c4c

…into random-crop-padding

Merge branch 'main' into random-crop-padding

7dc233a

pmeier commented Sep 28, 2022

View reviewed changes

torchvision/prototype/transforms/_geometry.py Outdated Show resolved Hide resolved

pmeier requested a review from vfdev-5 September 28, 2022 07:48

pmeier added 2 commits September 30, 2022 14:11

Merge branch 'main' into random-crop-padding

9f5ed6f

reinstate separate diff checking

749b2c9

datumbox reviewed Sep 30, 2022

View reviewed changes

torchvision/prototype/transforms/_geometry.py Show resolved Hide resolved

vfdev-5 approved these changes Sep 30, 2022

View reviewed changes

pmeier added 2 commits October 3, 2022 07:52

introduce needs_crop flag

a63df01

Merge branch 'main' into random-crop-padding

e68d45b

datumbox approved these changes Oct 3, 2022

View reviewed changes

pmeier merged commit 1d7d92c into pytorch:main Oct 3, 2022

pmeier deleted the random-crop-padding branch October 3, 2022 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor `prototype.transforms.RandomCrop` #6640

refactor `prototype.transforms.RandomCrop` #6640

Uh oh!

pmeier commented Sep 23, 2022

Uh oh!

datumbox commented Sep 24, 2022

Uh oh!

pmeier commented Sep 24, 2022 •

edited

Loading

Uh oh!

Uh oh!

fmassa commented Sep 26, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vfdev-5 left a comment

Uh oh!

datumbox left a comment

Uh oh!

Uh oh!

refactor prototype.transforms.RandomCrop #6640

refactor prototype.transforms.RandomCrop #6640

Uh oh!

Conversation

pmeier commented Sep 23, 2022

✔️ padding only

✔️ pad_if_needed only

✖️ padding and pad_if_needed

Uh oh!

datumbox commented Sep 24, 2022

Uh oh!

pmeier commented Sep 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fmassa commented Sep 26, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

datumbox left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

refactor `prototype.transforms.RandomCrop` #6640

refactor `prototype.transforms.RandomCrop` #6640

✔️ `padding` only

✔️ `pad_if_needed` only

✖️ `padding` and `pad_if_needed`

pmeier commented Sep 24, 2022 •

edited

Loading