[PyTorch][Winograd] Winograd kernel been selected has caused issue with test_Conv2d_naive_groups_cuda_float16

**[Summary]**

Winograd kernels are by design aiming performance by sacrificing numerical accuracy.
However, in this case for very small and non-practical case, selecting `winograd` kernels have caused `test_Conv2d_naive_groups_cuda_float16` to fail.

Question:

- `test_Conv2d_naive_groups_cuda_float16` has keyword `naive` in it, does it expect naive implementations to begin with?
- @Kirpich30000 should `winograd` kernels have issues with such cases? i.e. `-H 6 -W 6 -k 2 `

```
MIOpenDriver convfp16 -n 2 -c 2 -H 6 -W 6 -k 2 -y 3 -x 3 -p 0 -q 0 -u 1 -v 1 -l 1 -j 1 -m conv -g 1 -F 1 -t 1 -S 0
Forward Conv solutions available: 2
- id: 84 algo: 3, time: 10 ms, ws: 0, name: ConvBinWinogradRxSf2x3g1
- id: 107 algo: 5, time: 20 ms, ws: 1280, name: ConvAsmImplicitGemmGTCDynamicFwdXdlopsNHWC
MIOpen Forward Conv. Algorithm: 3, Solution: 84/ConvBinWinogradRxSf2x3g1
GPU Kernel Time Forward Conv. Elapsed: 0.015378 ms (average)
stats: name, n, c, ho, wo, x, y, k, flopCnt, bytesRead, bytesWritten, GFLOPs, GB/s, timeMs
stats: fwd-conv3x3u1, 2, 2, 4, 4, 3, 3, 2,  2304, 360, 128, 0, 0, 0.015378
Forward Convolution Verifies OK on CPU reference (0.000340009)
```

**[Observation and Steps to reproduce]:**

To Reproduce:

PYTORCH_TEST_WITH_ROCM=1 python3 nn/test_convolution.py --use-pytest --verbose -k test_Conv2d_naive_groups_cuda_float16
Docker Images:
```
ROCM 5.6: rocm/pytorch:rocm5.6_ubuntu20.04_py3.8_pytorch_2.0.1
PyTorch Installed at /var/lib/jenkins/pytorch/test
ROCM 5.7:  compute-artifactory.amd.com:5000/rocm-plus-docker/framework/compute-rocm-rel-5.7:86_ubuntu20.04_py3.9_pytorch_rocm5.7_internal_testing_55fbbdf
Original image: rocm/pytorch-private:86_ubuntu20.04_py3.9_pytorch_rocm5.7_internal_testing_55fbbdf
PyTorch Installed at /var/lib/jenkins/pytorch/test
```
NOTE: tolerance has already been raised to 1e-1, You need to run git revert e9b273df57b240f14ead07b5fda97bdf2be6673a to see the error
Expected Output:
```
nn/test_convolution.py::TestConvolutionNNDeviceTypeCUDA::test_Conv2d_naive_groups_cuda_float16 PASSED
```
Actual Output:
```
Mismatched elements: 47 / 128 (36.7%)
Greatest absolute difference: 0.0009765625 at index (0, 2, 2, 1) (up to 1e-05 allowed)
Greatest relative difference: 0.0999755859375 at index (0, 0, 2, 0) (up to 0.001 allowed)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PyTorch][Winograd] Winograd kernel been selected has caused issue with test_Conv2d_naive_groups_cuda_float16 #2492

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[PyTorch][Winograd] Winograd kernel been selected has caused issue with test_Conv2d_naive_groups_cuda_float16 #2492

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions