-
Notifications
You must be signed in to change notification settings - Fork 267
Closed
Description
[Summary]
Winograd kernels are by design aiming performance by sacrificing numerical accuracy.
However, in this case for very small and non-practical case, selecting winograd
kernels have caused test_Conv2d_naive_groups_cuda_float16
to fail.
Question:
test_Conv2d_naive_groups_cuda_float16
has keywordnaive
in it, does it expect naive implementations to begin with?- @Kirpich30000 should
winograd
kernels have issues with such cases? i.e.-H 6 -W 6 -k 2
MIOpenDriver convfp16 -n 2 -c 2 -H 6 -W 6 -k 2 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -S 0
Forward Conv solutions available: 2
- id: 84 algo: 3, time: 10 ms, ws: 0, name: ConvBinWinogradRxSf2x3g1
- id: 107 algo: 5, time: 20 ms, ws: 1280, name: ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC
MIOpen Forward Conv. Algorithm: 3, Solution: 84/ConvBinWinogradRxSf2x3g1
GPU Kernel Time Forward Conv. Elapsed: 0.015378 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv3x3u1, 2, 2, 4, 4, 3, 3, 2, 2304, 360, 128, 0, 0, 0.015378
Forward Convolution Verifies OK on CPU reference (0.000340009)
[Observation and Steps to reproduce]:
To Reproduce:
PYTORCH_TEST_WITH_ROCM=1 python3 nn/test_convolution.py --use-pytest --verbose -k test_Conv2d_naive_groups_cuda_float16
Docker Images:
ROCM 5.6: rocm/pytorch:rocm5.6_ubuntu20.04_py3.8_pytorch_2.0.1
PyTorch Installed at /var/lib/jenkins/pytorch/test
ROCM 5.7: compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-5.7:86_ubuntu20.04_py3.9_pytorch_rocm5.7_internal_testing_55fbbdf
Original image: rocm/pytorch-private:86_ubuntu20.04_py3.9_pytorch_rocm5.7_internal_testing_55fbbdf
PyTorch Installed at /var/lib/jenkins/pytorch/test
NOTE: tolerance has already been raised to 1e-1, You need to run git revert e9b273df57b240f14ead07b5fda97bdf2be6673a to see the error
Expected Output:
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float16 PASSED
Actual Output:
Mismatched elements: 47 / 128 (36.7%)
Greatest absolute difference: 0.0009765625 at index (0, 2, 2, 1) (up to 1e-05 allowed)
Greatest relative difference: 0.0999755859375 at index (0, 0, 2, 0) (up to 0.001 allowed)