Skip to content

try unflake CI #7627

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

try unflake CI #7627

wants to merge 2 commits into from

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented May 25, 2023

I went through the commits to main the last few days and upped tolerances on flaky tests.

cc @vfdev-5 @seemethere

@pytorch-bot
Copy link

pytorch-bot bot commented May 25, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7627

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit f9c9f6e:

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base 4125d3a:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Comment on lines +379 to +381
ArgsKwargs(),
ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
ArgsKwargs(interpolation=PIL.Image.BICUBIC),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that with default alpha and sigma, bilinear (default) and bicubic interpolation are flaky for uint8. For example

The errors are likely related to some uint8 issue since we have a difference of 255:

Tensor-likes are not close!

Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

This is not an issue with the testing function though:

>>> zeros = torch.zeros(3, dtype=torch.uint8)
>>> ones = torch.ones_like(zeros)
>>> ones - zeros
tensor([1, 1, 1], dtype=torch.uint8)
>>> torch.testing.assert_close(ones, zeros, atol=1, rtol=0)
>>> zeros - ones
tensor([255, 255, 255], dtype=torch.uint8)
>>> torch.testing.assert_close(zeros, ones, atol=1, rtol=0)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to decipher what's going on just looking at the logs.


test.test_transforms_v2_consistency.test_call_consistency[ElasticTransform-080]

Traceback (most recent call last):
  File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 672, in test_call_consistency
    check_call_consistency(
  File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 587, in check_call_consistency
    assert_close(
  File "/Users/runner/work/vision/vision/pytorch/vision/test/common_utils.py", line 347, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor image consistency check failed with: 

Tensor-likes are not close!

Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

Is that with bicubic or bilinear? float or ints?

bilinear (default) and bicubic interpolation are flaky for uint8

WDYM flaky?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that with bicubic or bilinear? float or ints?

Yes, it is hard and no I'm not happy with it. Please remember that this never was meant to stay longer than the initial release.

Here is the explanation:

The 080 from ElasticTransform-080 gives us these details:

  • 08 (first two digits) it is parametrization with index 8, i.e. the ninth one. Counting this gives us bicubic
    ArgsKwargs(),
    ArgsKwargs(alpha=20.0),
    ArgsKwargs(alpha=(15.3, 27.2)),
    ArgsKwargs(sigma=3.0),
    ArgsKwargs(sigma=(2.5, 3.9)),
    ArgsKwargs(interpolation=v2_transforms.InterpolationMode.NEAREST),
    ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
    ArgsKwargs(interpolation=PIL.Image.NEAREST),
    ArgsKwargs(interpolation=PIL.Image.BICUBIC),

    Similarly, 06 is also bicubic. 00 is bilinear, since that is the default.
  • 0 (last digit) is the index pytest automatically assigns to avoid duplicate test names. It needs to do that since we have hack in place to instantiate uint8 and float32 test separately but with the same ids:
    for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})]

    Thus, index 0 corresponds to uint8

WDYM flaky?

I probably don't understand. No snark answer: error pops up from time to time, but not consistently. Could you rephrase?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

They are not new and have been popping up since at least late last year. This is why we have a comment like

# We updated gaussian blur kernel generation with a faster and numerically more stable version
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance

Since we are talking about a single pixel, I don't think that is much of a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I find it concerning that v1 differs significantly from v2 on certain interpolation modes. Those modes are controlled by torch core, not by a v1 vs v2 implementation difference (except for bilinear mode which we changed recently, but according to your comment those problems are pre-dating that change)

@pmeier pmeier marked this pull request as ready for review May 25, 2023 08:54
],
# ElasticTransform needs larger images to avoid the needed internal padding being larger than the actual image
make_images_kwargs=dict(DEFAULT_MAKE_IMAGES_KWARGS, sizes=[(163, 163), (72, 333), (313, 95)], dtypes=[dt]),
# We updated gaussian blur kernel generation with a faster and numerically more stable version
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance
closeness_kwargs=ckw,
)
for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})]
for dt, ckw, extra_args_kwargs in [
(torch.uint8, {"rtol": 1e-1, "atol": 1}, []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why set rtol to 1e-1 and not 0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just copied it over from what we had. Came from #6762 that @vfdev-5 authored. Maybe he remembers? Agree that 0 would be better here.

Copy link
Collaborator

@vfdev-5 vfdev-5 May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is explained in the above comment:

# We updated gaussian blur kernel generation with a faster and numerically more stable version
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance

and also in the PR:

Gaussian blur eager vs jit tests are flaky for uint8 input. This may be due to the fact that we cast to float32, perform the opertation in float32 and cast back to uint8. We generate conv kernels (created using exp op) in float32 and maybe accumulating precision errors.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning, we are expecting differences > 1?

Comment on lines +379 to +381
ArgsKwargs(),
ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
ArgsKwargs(interpolation=PIL.Image.BICUBIC),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to decipher what's going on just looking at the logs.


test.test_transforms_v2_consistency.test_call_consistency[ElasticTransform-080]

Traceback (most recent call last):
  File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 672, in test_call_consistency
    check_call_consistency(
  File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 587, in check_call_consistency
    assert_close(
  File "/Users/runner/work/vision/vision/pytorch/vision/test/common_utils.py", line 347, in assert_close
    raise error_metas[0].to_error(msg)
AssertionError: Tensor image consistency check failed with: 

Tensor-likes are not close!

Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

Is that with bicubic or bilinear? float or ints?

bilinear (default) and bicubic interpolation are flaky for uint8

WDYM flaky?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants