Skip to content

try unflake CI #7627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions test/test_transforms_v2_consistency.py
Original file line number Diff line number Diff line change
Expand Up @@ -354,24 +354,34 @@ def __init__(
v2_transforms.ElasticTransform,
legacy_transforms.ElasticTransform,
[
ArgsKwargs(),
ArgsKwargs(alpha=20.0),
ArgsKwargs(alpha=(15.3, 27.2)),
ArgsKwargs(sigma=3.0),
ArgsKwargs(sigma=(2.5, 3.9)),
ArgsKwargs(interpolation=v2_transforms.InterpolationMode.NEAREST),
ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
ArgsKwargs(interpolation=PIL.Image.NEAREST),
ArgsKwargs(interpolation=PIL.Image.BICUBIC),
ArgsKwargs(fill=1),
*extra_args_kwargs,
],
# ElasticTransform needs larger images to avoid the needed internal padding being larger than the actual image
make_images_kwargs=dict(DEFAULT_MAKE_IMAGES_KWARGS, sizes=[(163, 163), (72, 333), (313, 95)], dtypes=[dt]),
# We updated gaussian blur kernel generation with a faster and numerically more stable version
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance
closeness_kwargs=ckw,
)
for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})]
for dt, ckw, extra_args_kwargs in [
(torch.uint8, {"rtol": 1e-1, "atol": 1}, []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set rtol to 1e-1 and not 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just copied it over from what we had. Came from #6762 that @vfdev-5 authored. Maybe he remembers? Agree that 0 would be better here.

Copy link
Collaborator

@vfdev-5 vfdev-5 May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is explained in the above comment:

# We updated gaussian blur kernel generation with a faster and numerically more stable version
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance

and also in the PR:

Gaussian blur eager vs jit tests are flaky for uint8 input. This may be due to the fact that we cast to float32, perform the opertation in float32 and cast back to uint8. We generate conv kernels (created using exp op) in float32 and maybe accumulating precision errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, we are expecting differences > 1?

(
torch.float32,
{"rtol": 1e-2, "atol": 1e-3},
[
# These proved to be flaky on uint8 inputs so we only run them on float32
ArgsKwargs(),
ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
ArgsKwargs(interpolation=PIL.Image.BICUBIC),
Comment on lines +379 to +381
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that with default alpha and sigma, bilinear (default) and bicubic interpolation are flaky for uint8. For example

The errors are likely related to some uint8 issue since we have a difference of 255:

Tensor-likes are not close!

Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

This is not an issue with the testing function though:

>>> zeros = torch.zeros(3, dtype=torch.uint8)
>>> ones = torch.ones_like(zeros)
>>> ones - zeros
tensor([1, 1, 1], dtype=torch.uint8)
>>> torch.testing.assert_close(ones, zeros, atol=1, rtol=0)
>>> zeros - ones
tensor([255, 255, 255], dtype=torch.uint8)
>>> torch.testing.assert_close(zeros, ones, atol=1, rtol=0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to decipher what's going on just looking at the logs.


test.test_transforms_v2_consistency.test_call_consistency[ElasticTransform-080]

Traceback (most recent call last):
  File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 672, in test_call_consistency
    check_call_consistency(
  File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 587, in check_call_consistency
    assert_close(
  File "/Users/runner/work/vision/vision/pytorch/vision/test/common_utils.py", line 347, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor image consistency check failed with: 

Tensor-likes are not close!

Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

Is that with bicubic or bilinear? float or ints?

bilinear (default) and bicubic interpolation are flaky for uint8

WDYM flaky?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that with bicubic or bilinear? float or ints?

Yes, it is hard and no I'm not happy with it. Please remember that this never was meant to stay longer than the initial release.

Here is the explanation:

The 080 from ElasticTransform-080 gives us these details:

  • 08 (first two digits) it is parametrization with index 8, i.e. the ninth one. Counting this gives us bicubic
    ArgsKwargs(),
    ArgsKwargs(alpha=20.0),
    ArgsKwargs(alpha=(15.3, 27.2)),
    ArgsKwargs(sigma=3.0),
    ArgsKwargs(sigma=(2.5, 3.9)),
    ArgsKwargs(interpolation=v2_transforms.InterpolationMode.NEAREST),
    ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
    ArgsKwargs(interpolation=PIL.Image.NEAREST),
    ArgsKwargs(interpolation=PIL.Image.BICUBIC),

    Similarly, 06 is also bicubic. 00 is bilinear, since that is the default.
  • 0 (last digit) is the index pytest automatically assigns to avoid duplicate test names. It needs to do that since we have hack in place to instantiate uint8 and float32 test separately but with the same ids:
    for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})]

    Thus, index 0 corresponds to uint8

WDYM flaky?

I probably don't understand. No snark answer: error pops up from time to time, but not consistently. Could you rephrase?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

They are not new and have been popping up since at least late last year. This is why we have a comment like

# We updated gaussian blur kernel generation with a faster and numerically more stable version
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance

Since we are talking about a single pixel, I don't think that is much of a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I find it concerning that v1 differs significantly from v2 on certain interpolation modes. Those modes are controlled by torch core, not by a v1 vs v2 implementation difference (except for bilinear mode which we changed recently, but according to your comment those problems are pre-dating that change)

],
),
]
],
ConsistencyConfig(
v2_transforms.GaussianBlur,
Expand Down
5 changes: 5 additions & 0 deletions test/transforms_v2_kernel_infos.py
Original file line number Diff line number Diff line change
Expand Up @@ -635,6 +635,9 @@ def sample_inputs_affine_video():
sample_inputs_fn=sample_inputs_affine_bounding_box,
reference_fn=reference_affine_bounding_box,
reference_inputs_fn=reference_inputs_affine_bounding_box,
closeness_kwargs={
(("TestKernels", "test_against_reference"), torch.int64, "cpu"): {"atol": 1, "rtol": 0},
},
test_marks=[
xfail_jit_python_scalar_arg("shear"),
],
Expand Down Expand Up @@ -1946,13 +1949,15 @@ def sample_inputs_adjust_contrast_video():
**float32_vs_uint8_pixel_difference(2),
**cuda_vs_cpu_pixel_difference(),
(("TestKernels", "test_against_reference"), torch.uint8, "cpu"): pixel_difference_closeness_kwargs(1),
(("TestKernels", "test_scripted_vs_eager"), torch.uint8, "cuda"): pixel_difference_closeness_kwargs(1),
},
),
KernelInfo(
F.adjust_contrast_video,
sample_inputs_fn=sample_inputs_adjust_contrast_video,
closeness_kwargs={
**cuda_vs_cpu_pixel_difference(),
(("TestKernels", "test_scripted_vs_eager"), torch.uint8, "cuda"): pixel_difference_closeness_kwargs(1),
(("TestKernels", "test_against_reference"), torch.uint8, "cpu"): pixel_difference_closeness_kwargs(1),
},
),
Expand Down