try unflake CI #7627

pmeier · 2023-05-25T08:48:57Z

I went through the commits to main the last few days and upped tolerances on flaky tests.

cc @vfdev-5 @seemethere

pytorch-bot · 2023-05-25T08:49:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7627

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit f9c9f6e:

NEW FAILURE - The following job has failed:

unittests-linux (3.8, linux.12xlarge, cpu) / linux-job (gh)

BROKEN TRUNK - The following job failed but were present on the merge base 4125d3a:

👉 Rebase onto the `viable/strict` branch to avoid these failures

unittests-macos (3.8, macos-m1-12) / macos-job (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pmeier · 2023-05-25T08:54:41Z

test/test_transforms_v2_consistency.py

+                    ArgsKwargs(),
+                    ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
+                    ArgsKwargs(interpolation=PIL.Image.BICUBIC),


It seems that with default alpha and sigma, bilinear (default) and bicubic interpolation are flaky for uint8. For example

https://github.com/pytorch/vision/actions/runs/4992664184

https://github.com/pytorch/vision/actions/runs/5043966208

The errors are likely related to some uint8 issue since we have a difference of 255:

Tensor-likes are not close! Mismatched elements: 1 / 318828 (0.0%) Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed) Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

This is not an issue with the testing function though:

>>> zeros = torch.zeros(3, dtype=torch.uint8) >>> ones = torch.ones_like(zeros) >>> ones - zeros tensor([1, 1, 1], dtype=torch.uint8) >>> torch.testing.assert_close(ones, zeros, atol=1, rtol=0) >>> zeros - ones tensor([255, 255, 255], dtype=torch.uint8) >>> torch.testing.assert_close(zeros, ones, atol=1, rtol=0)

It's hard to decipher what's going on just looking at the logs.

test.test_transforms_v2_consistency.test_call_consistency[ElasticTransform-080] Traceback (most recent call last): File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 672, in test_call_consistency check_call_consistency( File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 587, in check_call_consistency assert_close( File "/Users/runner/work/vision/vision/pytorch/vision/test/common_utils.py", line 347, in assert_close raise error_metas[0].to_error(msg) AssertionError: Tensor image consistency check failed with: Tensor-likes are not close! Mismatched elements: 1 / 318828 (0.0%) Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed) Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

Is that with bicubic or bilinear? float or ints?

bilinear (default) and bicubic interpolation are flaky for uint8

WDYM flaky?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

Is that with bicubic or bilinear? float or ints?

Yes, it is hard and no I'm not happy with it. Please remember that this never was meant to stay longer than the initial release.

Here is the explanation:

The 080 from ElasticTransform-080 gives us these details:

08 (first two digits) it is parametrization with index 8, i.e. the ninth one. Counting this gives us bicubic

vision/test/test_transforms_v2_consistency.py

Lines 357 to 365 in 4a51822

ArgsKwargs(),

ArgsKwargs(alpha=20.0),

ArgsKwargs(alpha=(15.3, 27.2)),

ArgsKwargs(sigma=3.0),

ArgsKwargs(sigma=(2.5, 3.9)),

ArgsKwargs(interpolation=v2_transforms.InterpolationMode.NEAREST),

ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),

ArgsKwargs(interpolation=PIL.Image.NEAREST),

ArgsKwargs(interpolation=PIL.Image.BICUBIC),

Similarly, 06 is also bicubic. 00 is bilinear, since that is the default.

0 (last digit) is the index pytest automatically assigns to avoid duplicate test names. It needs to do that since we have hack in place to instantiate uint8 and float32 test separately but with the same ids:

vision/test/test_transforms_v2_consistency.py

Line 374 in 4a51822

for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})]

Thus, index 0 corresponds to uint8

WDYM flaky?

I probably don't understand. No snark answer: error pops up from time to time, but not consistently. Could you rephrase?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

They are not new and have been popping up since at least late last year. This is why we have a comment like

vision/test/test_transforms_v2_consistency.py

Lines 370 to 371 in 4a51822

# We updated gaussian blur kernel generation with a faster and numerically more stable version

# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance

Since we are talking about a single pixel, I don't think that is much of a concern.

Basically I find it concerning that v1 differs significantly from v2 on certain interpolation modes. Those modes are controlled by torch core, not by a v1 vs v2 implementation difference (except for bilinear mode which we changed recently, but according to your comment those problems are pre-dating that change)

NicolasHug · 2023-05-25T12:50:50Z

test/test_transforms_v2_consistency.py

            ],
            # ElasticTransform needs larger images to avoid the needed internal padding being larger than the actual image
            make_images_kwargs=dict(DEFAULT_MAKE_IMAGES_KWARGS, sizes=[(163, 163), (72, 333), (313, 95)], dtypes=[dt]),
            # We updated gaussian blur kernel generation with a faster and numerically more stable version
            # This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance
            closeness_kwargs=ckw,
        )
-        for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})]
+        for dt, ckw, extra_args_kwargs in [
+            (torch.uint8, {"rtol": 1e-1, "atol": 1}, []),


why set rtol to 1e-1 and not 0?

Just copied it over from what we had. Came from #6762 that @vfdev-5 authored. Maybe he remembers? Agree that 0 would be better here.

I think the reason is explained in the above comment:

# We updated gaussian blur kernel generation with a faster and numerically more stable version # This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance

and also in the PR:

Gaussian blur eager vs jit tests are flaky for uint8 input. This may be due to the fact that we cast to float32, perform the opertation in float32 and cast back to uint8. We generate conv kernels (created using exp op) in float32 and maybe accumulating precision errors.

Meaning, we are expecting differences > 1?

NicolasHug · 2023-05-25T12:55:27Z

test/test_transforms_v2_consistency.py

+                    ArgsKwargs(),
+                    ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
+                    ArgsKwargs(interpolation=PIL.Image.BICUBIC),


It's hard to decipher what's going on just looking at the logs.

test.test_transforms_v2_consistency.test_call_consistency[ElasticTransform-080] Traceback (most recent call last): File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 672, in test_call_consistency check_call_consistency( File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 587, in check_call_consistency assert_close( File "/Users/runner/work/vision/vision/pytorch/vision/test/common_utils.py", line 347, in assert_close raise error_metas[0].to_error(msg) AssertionError: Tensor image consistency check failed with: Tensor-likes are not close! Mismatched elements: 1 / 318828 (0.0%) Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed) Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)

Is that with bicubic or bilinear? float or ints?

bilinear (default) and bicubic interpolation are flaky for uint8

WDYM flaky?

For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning

try unflake CI

d510d4d

facebook-github-bot added the cla signed label May 25, 2023

pmeier added module: transforms module: tests module: ci and removed cla signed labels May 25, 2023

pmeier commented May 25, 2023

View reviewed changes

pmeier marked this pull request as ready for review May 25, 2023 08:54

Merge branch 'main' into tolerances

f9c9f6e

facebook-github-bot added the cla signed label May 25, 2023

pmeier requested a review from NicolasHug May 25, 2023 08:55

NicolasHug reviewed May 25, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

try unflake CI #7627

try unflake CI #7627

Uh oh!

pmeier commented May 25, 2023 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented May 25, 2023 •

edited

Loading

Uh oh!

pmeier May 25, 2023

Uh oh!

NicolasHug May 25, 2023

Uh oh!

pmeier May 25, 2023

Uh oh!

NicolasHug May 25, 2023

Uh oh!

NicolasHug May 25, 2023

Uh oh!

pmeier May 25, 2023

Uh oh!

vfdev-5 May 25, 2023 •

edited

Loading

Uh oh!

pmeier May 25, 2023

Uh oh!

NicolasHug May 25, 2023

Uh oh!

Uh oh!

	ArgsKwargs(),
	ArgsKwargs(alpha=20.0),
	ArgsKwargs(alpha=(15.3, 27.2)),
	ArgsKwargs(sigma=3.0),
	ArgsKwargs(sigma=(2.5, 3.9)),
	ArgsKwargs(interpolation=v2_transforms.InterpolationMode.NEAREST),
	ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC),
	ArgsKwargs(interpolation=PIL.Image.NEAREST),
	ArgsKwargs(interpolation=PIL.Image.BICUBIC),

	# We updated gaussian blur kernel generation with a faster and numerically more stable version
	# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance

try unflake CI #7627

Are you sure you want to change the base?

try unflake CI #7627

Uh oh!

Conversation

pmeier commented May 25, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7627

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vfdev-5 May 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pmeier commented May 25, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 25, 2023 •

edited

Loading

vfdev-5 May 25, 2023 •

edited

Loading