-
Notifications
You must be signed in to change notification settings - Fork 7.1k
try unflake CI #7627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
try unflake CI #7627
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7627
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 1 Unrelated FailureAs of commit f9c9f6e: NEW FAILURE - The following job has failed:
BROKEN TRUNK - The following job failed but were present on the merge base 4125d3a:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ArgsKwargs(), | ||
ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC), | ||
ArgsKwargs(interpolation=PIL.Image.BICUBIC), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that with default alpha
and sigma
, bilinear (default) and bicubic interpolation are flaky for uint8. For example
- https://github.com/pytorch/vision/actions/runs/4992664184
- https://github.com/pytorch/vision/actions/runs/5043966208
The errors are likely related to some uint8 issue since we have a difference of 255:
Tensor-likes are not close!
Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)
This is not an issue with the testing function though:
>>> zeros = torch.zeros(3, dtype=torch.uint8)
>>> ones = torch.ones_like(zeros)
>>> ones - zeros
tensor([1, 1, 1], dtype=torch.uint8)
>>> torch.testing.assert_close(ones, zeros, atol=1, rtol=0)
>>> zeros - ones
tensor([255, 255, 255], dtype=torch.uint8)
>>> torch.testing.assert_close(zeros, ones, atol=1, rtol=0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to decipher what's going on just looking at the logs.
test.test_transforms_v2_consistency.test_call_consistency[ElasticTransform-080]
Traceback (most recent call last):
File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 672, in test_call_consistency
check_call_consistency(
File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 587, in check_call_consistency
assert_close(
File "/Users/runner/work/vision/vision/pytorch/vision/test/common_utils.py", line 347, in assert_close
raise error_metas[0].to_error(msg)
AssertionError: Tensor image consistency check failed with:
Tensor-likes are not close!
Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)
Is that with bicubic or bilinear? float or ints?
bilinear (default) and bicubic interpolation are flaky for uint8
WDYM flaky?
For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that with bicubic or bilinear? float or ints?
Yes, it is hard and no I'm not happy with it. Please remember that this never was meant to stay longer than the initial release.
Here is the explanation:
The 080
from ElasticTransform-080
gives us these details:
08
(first two digits) it is parametrization with index 8, i.e. the ninth one. Counting this gives us bicubic
vision/test/test_transforms_v2_consistency.py
Lines 357 to 365 in 4a51822
ArgsKwargs(), ArgsKwargs(alpha=20.0), ArgsKwargs(alpha=(15.3, 27.2)), ArgsKwargs(sigma=3.0), ArgsKwargs(sigma=(2.5, 3.9)), ArgsKwargs(interpolation=v2_transforms.InterpolationMode.NEAREST), ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC), ArgsKwargs(interpolation=PIL.Image.NEAREST), ArgsKwargs(interpolation=PIL.Image.BICUBIC),
Similarly,06
is also bicubic.00
is bilinear, since that is the default.0
(last digit) is the indexpytest
automatically assigns to avoid duplicate test names. It needs to do that since we have hack in place to instantiate uint8 and float32 test separately but with the same ids:
vision/test/test_transforms_v2_consistency.py
Line 374 in 4a51822
for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})]
Thus, index0
corresponds to uint8
WDYM flaky?
I probably don't understand. No snark answer: error pops up from time to time, but not consistently. Could you rephrase?
For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning
They are not new and have been popping up since at least late last year. This is why we have a comment like
vision/test/test_transforms_v2_consistency.py
Lines 370 to 371 in 4a51822
# We updated gaussian blur kernel generation with a faster and numerically more stable version | |
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance |
Since we are talking about a single pixel, I don't think that is much of a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically I find it concerning that v1 differs significantly from v2 on certain interpolation modes. Those modes are controlled by torch core, not by a v1 vs v2 implementation difference (except for bilinear mode which we changed recently, but according to your comment those problems are pre-dating that change)
], | ||
# ElasticTransform needs larger images to avoid the needed internal padding being larger than the actual image | ||
make_images_kwargs=dict(DEFAULT_MAKE_IMAGES_KWARGS, sizes=[(163, 163), (72, 333), (313, 95)], dtypes=[dt]), | ||
# We updated gaussian blur kernel generation with a faster and numerically more stable version | ||
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance | ||
closeness_kwargs=ckw, | ||
) | ||
for dt, ckw in [(torch.uint8, {"rtol": 1e-1, "atol": 1}), (torch.float32, {"rtol": 1e-2, "atol": 1e-3})] | ||
for dt, ckw, extra_args_kwargs in [ | ||
(torch.uint8, {"rtol": 1e-1, "atol": 1}, []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why set rtol to 1e-1 and not 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason is explained in the above comment:
# We updated gaussian blur kernel generation with a faster and numerically more stable version
# This brings float32 accumulation visible in elastic transform -> we need to relax consistency tolerance
and also in the PR:
Gaussian blur eager vs jit tests are flaky for uint8 input. This may be due to the fact that we cast to float32, perform the opertation in float32 and cast back to uint8. We generate conv kernels (created using exp op) in float32 and maybe accumulating precision errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meaning, we are expecting differences > 1?
ArgsKwargs(), | ||
ArgsKwargs(interpolation=v2_transforms.InterpolationMode.BICUBIC), | ||
ArgsKwargs(interpolation=PIL.Image.BICUBIC), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to decipher what's going on just looking at the logs.
test.test_transforms_v2_consistency.test_call_consistency[ElasticTransform-080]
Traceback (most recent call last):
File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 672, in test_call_consistency
check_call_consistency(
File "/Users/runner/work/vision/vision/pytorch/vision/test/test_transforms_v2_consistency.py", line 587, in check_call_consistency
assert_close(
File "/Users/runner/work/vision/vision/pytorch/vision/test/common_utils.py", line 347, in assert_close
raise error_metas[0].to_error(msg)
AssertionError: Tensor image consistency check failed with:
Tensor-likes are not close!
Mismatched elements: 1 / 318828 (0.0%)
Greatest absolute difference: 255 at index (2, 1, 81, 93) (up to 1 allowed)
Greatest relative difference: inf at index (2, 1, 81, 93) (up to 0.1 allowed)
Is that with bicubic or bilinear? float or ints?
bilinear (default) and bicubic interpolation are flaky for uint8
WDYM flaky?
For bicubic we didn't change anythign from v1 to v2. If there are 0-255 differences (overflows?) from v1 to v2 with bilinear that's potentially concerning
I went through the commits to
main
the last few days and upped tolerances on flaky tests.cc @vfdev-5 @seemethere