Skip to content

refactor prototype.transforms.RandomCrop #6640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 3, 2022

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Sep 23, 2022

This PR eliminates two F.pad calls in RandomCrop by computing the full padding upfront. IMO, the implementation of _get_params is also more clear now, but you shouldn't trust that since I'm the author of the patch 😛

Since we are only touching the padding here, we can compare the old and new transform by ignoring the cropping altogether. Plus, since the only random behavior happens for the crop coordinate selection, we also don't need to pay attention to random behavior.

import itertools
import unittest.mock

import torch

from torchvision.prototype import transforms


input_size = (32, 17)

for size, padding, pad_if_needed in itertools.product(
    [64, 16],
    [None, 5],
    [True, False],
):
    if any(size > dim for dim in input_size) and not pad_if_needed:
        continue

    old_transform = OldRandomCropWithoutCrop(size, padding=padding, pad_if_needed=pad_if_needed)
    new_transform = transforms.RandomCrop(size, padding=padding, pad_if_needed=pad_if_needed)

    image = torch.rand(3, *input_size)

    old_output = old_transform(image)

    with unittest.mock.patch("torchvision.prototype.transforms._geometry.F.crop", new=lambda x, *_, **__: x):
        new_output = new_transform(image)

    torch.testing.assert_close(new_output, old_output, rtol=0, atol=0)

So everything works out perfectly? No 😞

Replacing multiple F.pad calls with a single one only works if the later ones don't depend on the earlier ones. In general, this is only given for padding_mode="constant" and padding_mode="edge".

We will use this "1D" gradient image for simplification:

input

Let's look at three examples for padding_mode="reflect"

✔️ padding only

old:

old_padding

new:

new_padding

✔️ pad_if_needed only

old:

old_pad_if_needed

new:

new_pad_if_needed

✖️ padding and pad_if_needed

old:

old_padding_pad_if_needed

new:

new_padding_pad_if_needed

Since the old implementation pads twice independently from each other, the second pad reflects the reflection created by the first pad. IMO, the new implementation is doing the right thing here and only reflecting once.

My guess is that this was just an oversight in the original stable implementation. Given that this is an obscure use case anyway, i.e. setting a fixed padding as well as using the dynamic one, and is not a problem at all for the most dominant padding_mode="constant", I would be ok to break BC here in favor of performance and simplified implementation. One could also argue that this can be considered a bug fix although I'm not keen on porting this change back.

@datumbox
Copy link
Contributor

@pmeier Nice work! I do agree that the behaviour you describe on the 3rd case looks more like a bug/omission on the original implementation rather than an intentional behaviour. I also never seen anyone in practice doing the corner-case transform you describe on the 3rd case. I was hoping if you could provide some measurements on the speed difference between the 2 implementations?

@fmassa / @vfdev-5 Do you have an opinion on whether the above is an intended behaviour or just an omission/bug on the cornercase of the previous implementation?

Another approach we could take if we think the previous behaviour was intentional is to apply the speed optimization only when possible (for example when we use constant) which will probably account for 99.999% of real-world use-cases. This would probably have implications on the code-quality (we would have a very complex and difficult to maintain implementation). To avoid this we could split the Transform into 2 private Transform classes and use internally the optimal one based on the constructor configuration (that's not an idiom we use currently but we can discuss the details).

@datumbox datumbox added the Perf For performance improvements label Sep 24, 2022
@pmeier
Copy link
Collaborator Author

pmeier commented Sep 24, 2022

I was hoping if you could provide some measurements on the speed difference between the 2 implementations?

There are only two cases where we could hit more than one pad:

  1. Setting padding and pad_if_needed at the same time and having an input that triggers pad_if_needed in at least one dimension.
  2. Setting only pad_if_needed and having an input that triggers it in both dimensions.

Since 1. is quite unrealistic, I focused on 2. I also used padding_mode="constant", because that should account for most of the real world usages.

Code

Run this snippet on the main branch as well on the branch of this PR just changing the description:

import pickle
import itertools

import torch
from torch.utils import benchmark

from torchvision.prototype import transforms

description = "main"  # "main", "PR"

measurements = []
for input_size, output_size_factor in itertools.product(
    [64, 256, 1024],
    [1.0, 1.5, 2.0],
):
    image = torch.randint(0, 256, (3, input_size, input_size), dtype=torch.uint8)

    output_size = int(input_size * output_size_factor)
    transform = transforms.RandomCrop(output_size, pad_if_needed=True)

    timer = benchmark.Timer(
        stmt="transform(image)",
        globals=dict(
            transform=transform,
            image=image,
        ),
        label="RandomCrop with padding needed on both sides",
        sub_label=f"{input_size:4d} -> {output_size}",
        description=description,
    )

    measurements.append(timer.blocked_autorange(min_run_time=5))

with open(f"{description}.measurements", "wb") as fh:
    pickle.dump(measurements, fh)

Afterwards run this snippet for the results:

import pathlib
import pickle

from torch.utils import benchmark

measurements = []
for file in pathlib.Path(".").glob("*.measurements"):
    with open(file, "rb") as fh:
        measurements.extend(pickle.load(fh))

comparison = benchmark.Compare(measurements)
comparison.trim_significant_figures()

[ RandomCrop with padding needed on both sides ]
                    |   main  |   PR 
1 threads: --------------------------
        64 -> 64    |      9  |     9
        64 -> 96    |     66  |    49
        64 -> 128   |     88  |    65
       256 -> 256   |     10  |     9
       256 -> 384   |    389  |   256
       256 -> 512   |    716  |   523
      1024 -> 1024  |      9  |     9
      1024 -> 1536  |   5390  |  3600
      1024 -> 2048  |  10500  |  7860

Times are in microseconds (us).

In the tested scenario, this PR gives us roughly a 30%-50% perf boost. As for the real impact of this, it is hard to say. Only the segmentation and video classification references are using it:

T.RandomCrop(crop_size),

trans.extend([transforms.Normalize(mean=mean, std=std), transforms.RandomCrop(crop_size), ConvertBCHWtoCBHW()])

In case of the segmentation references, the RandomCrop is actually differs from what we have in our prototype namespace there by the padding performed before the cropping. See #6433 (comment) for details.

However, we discussed offline to use it directly there since it uses a more reasonable padding strategy. This gives us the opportunity to actually compute an estimate boost in a real setting.

Code

import torch
from torchvision import datasets

# https://github.com/pytorch/vision/blob/7046e56fe4370e94339b3e8b6fd011e285294a3a/references/segmentation/train.py#L34
base_size = 520
crop_size = 480

dataset = datasets.CocoDetection(
    "/home/philip/datasets/coco/train2017",
    "/home/philip/datasets/coco/annotations/instances_train2017.json",
)
print(f"Dataset has {len(dataset)} samples")

input_sizes = torch.tensor([(image.height, image.width) for image, _ in dataset])

input_smaller_size = torch.min(input_sizes, dim=-1, keepdim=True).values
torch.manual_seed(0)
random_resize_factor = (
    torch.distributions.Uniform(0.5, 2.0).sample(input_smaller_size.shape) * base_size / input_smaller_size
)

random_resized_input_sizes = (input_sizes * random_resize_factor).int()

needs_pad = random_resized_input_sizes < crop_size

needs_one_pad = needs_pad.any(dim=-1)
print(f"{int(torch.sum(needs_one_pad)) / len(dataset):.1%} of the images need one padding")

needs_two_pads = needs_pad.all(dim=-1)
print(f"{int(torch.sum(needs_two_pads)) / len(dataset):.1%} of the images need two paddings")

input_shapes_two_pads = random_resized_input_sizes[needs_two_pads.unsqueeze(1).repeat(1, 2)].reshape(-1, 2)
print(f"Of those the median shape is {tuple(torch.median(input_shapes_two_pads.float(), dim=0).values.int().tolist())}")

Dataset has 118287 samples
28.1% of the images need one padding
10.9% of the images need two paddings
Of those the median shape is (318, 399)

Re-running our benchmarks from above with inputs of shape (318, 399) and size=480 yields

[ RandomCrop with padding needed on both sides ]
                                |   main  |   PR
1 threads: -------------------------------------
      (318, 399) -> (480, 480)  |  500.2  |  336

Times are in microseconds (us).

So we are looking at an ~50% decrease in ~10% of the cases or a total of a 5% decrease. In total though we are saving only about 2 seconds per epoch.

To conclude, although this PR actually speeds up the transform, in a real world scenario the patch makes no significant difference. It is a nice code clean up though.

@vfdev-5 vfdev-5 changed the title refactor prototype.transforms,RandomCrop refactor prototype.transforms.RandomCrop Sep 26, 2022
@fmassa
Copy link
Member

fmassa commented Sep 26, 2022

@datumbox

Do you have an opinion on whether the above is an intended behaviour or just an omission/bug on the cornercase of the previous implementation?

This was a bug in behavior I believe, and would only happen on really very few cases, so I would be ok breaking BC here.
For most cases where users apply T.Resize beforehand they don't need this option, as discussed in #462

@pmeier pmeier requested a review from vfdev-5 September 28, 2022 07:48
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @pmeier !

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pmeier pmeier merged commit 1d7d92c into pytorch:main Oct 3, 2022
@pmeier pmeier deleted the random-crop-padding branch October 3, 2022 08:24
facebook-github-bot pushed a commit that referenced this pull request Oct 7, 2022
Summary:
* refactor RandomCrop

* mypy

* fix test

* use padding directly rather than private attribute

* only compute type specific fill if padding is needed

* [DRAFT] don't use the diff trick

* fix error message

* remove height and width diff

* reinstate separate diff checking

* introduce needs_crop flag

Reviewed By: datumbox

Differential Revision: D40138740

fbshipit-source-id: 2dac098db8270b2d377036db2eb34dd10c8df137

Co-authored-by: vfdev <[email protected]>
Co-authored-by: vfdev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants