Skip to content

RNN Transducer Loss Autograd Test #1532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 4, 2021
Merged

Conversation

vincentqb
Copy link
Contributor

@vincentqb vincentqb commented May 26, 2021

This PR

Follow-up noted below:

@vincentqb
Copy link
Contributor Author

vincentqb commented May 26, 2021

Investigation with autograd test from carolineechen#2.

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
===================================================================================================== 8 failed, 18 passed, 8 warnings in 4.58s =====================================================================================================
Details

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/vincentqb/miniconda/envs/torch-nightly/bin/python
cachedir: .pytest_cache
rootdir: /private/home/vincentqb/autograd/audio
plugins: hydra-core-1.0.6
collecting ... collected 26 items

autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED     [  3%]
autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED     [  7%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 11%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 FAILED [ 15%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED    [ 19%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED    [ 23%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 26%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 FAILED [ 30%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_basic_backward PASSED          [ 34%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 38%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 42%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 46%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 50%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 53%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 57%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_basic_backward PASSED         [ 61%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 65%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 69%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 73%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 76%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 80%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 84%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_RNNTLoss PASSED  [ 88%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_rnnt_loss PASSED [ 92%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_RNNTLoss PASSED [ 96%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_rnnt_loss PASSED [100%]

=================================== FAILURES ===================================
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...sor([[1, 2],
        [1, 1]], dtype=torch.int32), tensor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7360e+01,  1.0766e+01],
E                               [ 2.0285e+00,  1.2028e+00],
E                               [ 1.3320e+00,  3.9721e-01],
E                               [ 2.4867e-01,  1.9717e-01],
E                               [-3.6073e-01,  1.1039e-01],
E                               [-1.6189e-01,  6.2943e-02],
E                               [-6.7949e-02,  3.6478e-02],
E                               [-2.3365e-02,  2.1696e-02],
E                               [-1.9288e-01,  1.2875e-02],
E                               [-7.1764e-02,  7.3910e-03],
E                               [ 2.1267e-01,  4.0531e-03],
E                               [-2.2006e-01,  2.6226e-03],
E                               [ 2.4772e-01,  2.1458e-03],
E                               [-5.7220e-03,  9.5367e-04],
E                               [-1.9026e-01,  4.7684e-04],
E                               [ 1.2255e-01,  4.7684e-04],
E                               [ 9.7036e-02,  2.3842e-04],
E                               [-6.9141e-03, -4.7684e-04],
E                               [-8.7261e-02, -2.3842e-04],
E                               [ 1.1921e-01,  4.7684e-04],
E                               [-1.2374e-01, -2.3842e-04],
E                               [ 2.5749e-01,  0.0000e+00],
E                               [-9.5367e-02, -2.3842e-04],
E                               [-3.8958e-01,  0.0000e+00],
E                               [ 2.6107e-01, -2.3842e-04],
E                               [ 2.1553e-01, -2.3842e-04],
E                               [ 5.1022e-02,  2.3842e-04],
E                               [-7.4387e-02,  0.0000e+00],
E                               [ 5.3167e-02, -2.3842e-04],
E                               [ 1.4925e-01,  2.3842e-04],
E                               [ 1.2517e-01, -7.1526e-04],
E                               [-3.3593e-01,  7.1526e-04],
E                               [-7.9751e-01, -2.3842e-04],
E                               [ 4.7374e-01,  7.1526e-04],
E                               [ 4.5204e-01,  2.3842e-04],
E                               [ 3.1233e-02, -3.0470e-01],
E                               [ 9.5367e-04, -8.4639e-02],
E                               [ 4.7684e-04,  5.1975e-01],
E                               [-2.3842e-04, -1.6069e-01],
E                               [ 0.0000e+00,  4.3154e-02],
E                               [ 0.0000e+00,  1.5235e-01],
E                               [ 2.3842e-04, -8.1301e-02],
E                               [ 0.0000e+00,  4.6968e-02],
E                               [ 0.0000e+00,  3.1471e-02],
E                               [ 0.0000e+00, -1.1992e-01],
E                               [ 2.3842e-04, -7.4387e-02],
E                               [-4.7684e-04,  2.3460e-01],
E                               [ 9.5367e-04, -2.1267e-01],
E                               [-4.7684e-04,  8.5831e-03],
E                               [ 0.0000e+00,  2.2697e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [-2.3842e-04,  1.3256e-01],
E                               [-2.3842e-04,  9.8467e-02],
E                               [ 0.0000e+00, -6.4373e-03],
E                               [-4.7684e-04, -8.6546e-02],
E                               [-4.7684e-04,  1.1921e-01],
E                               [ 7.1526e-04, -1.2445e-01],
E                               [ 4.7684e-04, -8.0824e-02],
E                               [-4.7684e-04,  2.2840e-01],
E                               [ 2.3842e-04, -4.0674e-01],
E                               [ 0.0000e+00,  2.7966e-01],
E                               [ 9.5367e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.2452e-02],
E                               [-4.7684e-04, -7.4387e-02],
E                               [ 0.0000e+00,  5.2929e-02],
E                               [ 0.0000e+00,  1.4973e-01],
E                               [-2.3842e-04, -3.2377e-01],
E                               [-2.3842e-04,  2.1839e-01],
E                               [ 0.0000e+00, -8.7285e-01],
E                               [-4.7684e-04,  4.6802e-01],
E                               [-4.7684e-04,  4.5013e-01]])
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
----------------------------- Captured stderr call -----------------------------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss(), func_out = tensor([5.0957], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ...quires_grad=True), tensor([[1, 2]], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32))
outputs = (tensor([5.0957], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5291e+02],
E                               [ 1.2874e+01],
E                               [ 7.4053e-01],
E                               [ 3.3712e-01],
E                               [-7.9679e-01],
E                               [ 7.1311e-01],
E                               [ 3.2616e-01],
E                               [ 2.5654e-01],
E                               [ 1.4234e-01],
E                               [-1.8954e-01],
E                               [ 1.0657e-01],
E                               [ 9.6798e-02],
E                               [ 9.0361e-02],
E                               [ 8.4639e-02],
E                               [-2.0695e-01],
E                               [ 1.1611e-01],
E                               [-3.5858e-01],
E                               [ 1.6069e-01],
E                               [ 7.1049e-02],
E                               [ 7.0333e-02],
E                               [ 1.4257e-01],
E                               [ 1.4997e-01],
E                               [-6.6042e-01],
E                               [ 2.4986e-01],
E                               [ 1.4782e-01],
E                               [ 2.5439e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5249e-01],
E                               [-8.9550e-01]])
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = NumpyTransducerLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<_NumpyTransducerBackward>)
tupled_inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...sor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32), tensor([[1, 2],
        [1, 1]], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<_NumpyTransducerBackward>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[-0.1869,  0.0000],
E                               [-0.0629,  0.0000],
E                               [ 0.2494,  0.0000],
E                               [-0.2034,  0.0000],
E                               [ 0.2024,  0.0000],
E                               [ 0.0010,  0.0000],
E                               [-0.1411,  0.0000],
E                               [ 0.0789,  0.0000],
E                               [ 0.0620,  0.0000],
E                               [-0.0114,  0.0000],
E                               [-0.0813,  0.0000],
E                               [ 0.0930,  0.0000],
E                               [-0.1540,  0.0000],
E                               [ 0.2296,  0.0000],
E                               [-0.0751,  0.0000],
E                               [-0.2465,  0.0000],
E                               [ 0.1464,  0.0000],
E                               [ 0.1004,  0.0000],
E                               [-0.0131,  0.0000],
E                               [-0.0618,  0.0000],
E                               [ 0.0744,  0.0000],
E                               [-0.0560,  0.0000],
E                               [ 0.2201,  0.0000],
E                               [-0.1640,  0.0000],
E                               [-0.4976,  0.0000],
E                               [ 0.2096,  0.0000],
E                               [ 0.2885,  0.0000],
E                               [ 0.0136,  0.0000],
E                               [-0.0303,  0.0000],
E                               [ 0.0167,  0.0000],
E                               [ 0.1137,  0.0000],
E                               [ 0.0629,  0.0000],
E                               [-0.1767,  0.0000],
E                               [-0.6671,  0.0000],
E                               [ 0.3676,  0.0000],
E                               [ 0.2995,  0.0000],
E                               [ 0.0000, -0.3562],
E                               [ 0.0000, -0.0554],
E                               [ 0.0000,  0.4117],
E                               [ 0.0000, -0.0969],
E                               [ 0.0000,  0.0294],
E                               [ 0.0000,  0.0675],
E                               [ 0.0000, -0.0634],
E                               [ 0.0000,  0.0278],
E                               [ 0.0000,  0.0359],
E                               [ 0.0000, -0.1546],
E                               [ 0.0000, -0.0737],
E                               [ 0.0000,  0.2285],
E                               [ 0.0000, -0.1669],
E                               [ 0.0000,  0.0000],
E                               [ 0.0000,  0.1669],
E                               [ 0.0000, -0.1724],
E                               [ 0.0000,  0.1055],
E                               [ 0.0000,  0.0670],
E                               [ 0.0000,  0.0240],
E                               [ 0.0000, -0.1181],
E                               [ 0.0000,  0.0942],
E                               [ 0.0000, -0.1047],
E                               [ 0.0000, -0.1090],
E                               [ 0.0000,  0.2136],
E                               [ 0.0000, -0.3699],
E                               [ 0.0000,  0.1799],
E                               [ 0.0000,  0.1895],
E                               [ 0.0000,  0.0259],
E                               [ 0.0000, -0.0793],
E                               [ 0.0000,  0.0539],
E                               [ 0.0000,  0.1224],
E                               [ 0.0000, -0.2387],
E                               [ 0.0000,  0.1165],
E                               [ 0.0000, -0.5988],
E                               [ 0.0000,  0.3023],
E                               [ 0.0000,  0.2966]])
E                       analytical:tensor([[-1.8684e-01, -1.8684e-01],
E                               [-6.2555e-02, -6.2555e-02],
E                               [ 2.4940e-01,  2.4940e-01],
E                               [-2.0338e-01, -2.0338e-01],
E                               [ 2.0240e-01,  2.0240e-01],
E                               [ 9.7747e-04,  9.7747e-04],
E                               [-1.4102e-01, -1.4102e-01],
E                               [ 7.9123e-02,  7.9123e-02],
E                               [ 6.1893e-02,  6.1893e-02],
E                               [-1.1552e-02, -1.1552e-02],
E                               [-8.1280e-02, -8.1280e-02],
E                               [ 9.2832e-02,  9.2832e-02],
E                               [-1.5426e-01, -1.5426e-01],
E                               [ 2.2943e-01,  2.2943e-01],
E                               [-7.5175e-02, -7.5175e-02],
E                               [-2.4659e-01, -2.4659e-01],
E                               [ 1.4640e-01,  1.4640e-01],
E                               [ 1.0019e-01,  1.0019e-01],
E                               [-1.2918e-02, -1.2918e-02],
E                               [-6.1593e-02, -6.1593e-02],
E                               [ 7.4512e-02,  7.4512e-02],
E                               [-5.5986e-02, -5.5986e-02],
E                               [ 2.1983e-01,  2.1983e-01],
E                               [-1.6385e-01, -1.6385e-01],
E                               [-4.9763e-01, -4.9763e-01],
E                               [ 2.0924e-01,  2.0924e-01],
E                               [ 2.8839e-01,  2.8839e-01],
E                               [ 1.3605e-02,  1.3605e-02],
E                               [-3.0220e-02, -3.0220e-02],
E                               [ 1.6615e-02,  1.6615e-02],
E                               [ 1.1392e-01,  1.1392e-01],
E                               [ 6.2781e-02,  6.2781e-02],
E                               [-1.7671e-01, -1.7671e-01],
E                               [-6.6708e-01, -6.6708e-01],
E                               [ 3.6766e-01,  3.6766e-01],
E                               [ 2.9942e-01,  2.9942e-01],
E                               [-3.5634e-01, -3.5634e-01],
E                               [-5.5347e-02, -5.5347e-02],
E                               [ 4.1169e-01,  4.1169e-01],
E                               [-9.6922e-02, -9.6922e-02],
E                               [ 2.9459e-02,  2.9459e-02],
E                               [ 6.7463e-02,  6.7463e-02],
E                               [-6.3518e-02, -6.3518e-02],
E                               [ 2.7654e-02,  2.7654e-02],
E                               [ 3.5863e-02,  3.5863e-02],
E                               [-1.5450e-01, -1.5450e-01],
E                               [-7.3942e-02, -7.3942e-02],
E                               [ 2.2844e-01,  2.2844e-01],
E                               [-1.6679e-01, -1.6679e-01],
E                               [-8.8003e-05, -8.8003e-05],
E                               [ 1.6688e-01,  1.6688e-01],
E                               [-1.7237e-01, -1.7237e-01],
E                               [ 1.0557e-01,  1.0557e-01],
E                               [ 6.6804e-02,  6.6804e-02],
E                               [ 2.3875e-02,  2.3875e-02],
E                               [-1.1826e-01, -1.1826e-01],
E                               [ 9.4381e-02,  9.4381e-02],
E                               [-1.0471e-01, -1.0471e-01],
E                               [-1.0893e-01, -1.0893e-01],
E                               [ 2.1364e-01,  2.1364e-01],
E                               [-3.6984e-01, -3.6984e-01],
E                               [ 1.8012e-01,  1.8012e-01],
E                               [ 1.8973e-01,  1.8973e-01],
E                               [ 2.5714e-02,  2.5714e-02],
E                               [-7.9462e-02, -7.9462e-02],
E                               [ 5.3748e-02,  5.3748e-02],
E                               [ 1.2233e-01,  1.2233e-01],
E                               [-2.3879e-01, -2.3879e-01],
E                               [ 1.1646e-01,  1.1646e-01],
E                               [-5.9869e-01, -5.9869e-01],
E                               [ 3.0220e-01,  3.0220e-01],
E                               [ 2.9648e-01,  2.9648e-01]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_1 __________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_np_transducer_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1265: in _gradcheck_helper
    _test_backward_mul_by_grad_output(outputs, tupled_inputs, check_sparse_nnz)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

outputs = (tensor([5.0957], grad_fn=<_NumpyTransducerBackward>),)
inputs = (tensor([[[[0.1000, 0.6000, 0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.6000, 0.1000, 0.1000],
          [0....quires_grad=True), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([[1, 2]], dtype=torch.int32))
check_sparse_nnz = False

    def _test_backward_mul_by_grad_output(outputs, inputs, check_sparse_nnz) -> bool:
        # Tests that backward is multiplied by grad_output
        diff_input_list: List[torch.Tensor] = list(_iter_tensors(inputs, True))
        if not diff_input_list:
            raise GradcheckError("no Tensors requiring grad found in input")
        grads_input = torch.autograd.grad(outputs, diff_input_list,
                                          [torch.zeros_like(o, memory_format=torch.legacy_contiguous_format) for o in outputs],
                                          allow_unused=True)
        for gi, di in zip(grads_input, diff_input_list):
            if gi is None:
                continue
            if isinstance(gi, torch.Tensor) and gi.layout != torch.strided:
                if gi.layout != di.layout:
                    raise GradcheckError('grad is incorrect layout (' + str(gi.layout) + ' is not ' + str(di.layout) + ')')
                if gi.layout == torch.sparse_coo:
                    if gi.sparse_dim() != di.sparse_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect sparse_dim')
                    if gi.dense_dim() != di.dense_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect dense_dim')
                gi = gi.to_dense()
                di = di.to_dense()
    
            if check_sparse_nnz:
                if not torch.allclose(gi, torch.zeros_like(gi)):
                    raise GradcheckError('backward not multiplied by grad_output')
            elif not gi.eq(0).all():
>               raise GradcheckError('backward not multiplied by grad_output')
E               torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:799: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...e=torch.int32), tensor([4, 4], device='cuda:0', dtype=torch.int32), tensor([2, 2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7359e+01,  1.0767e+01],
E                               [ 2.0287e+00,  1.2023e+00],
E                               [ 1.3320e+00,  3.9697e-01],
E                               [ 2.4867e-01,  1.9670e-01],
E                               [-3.6025e-01,  1.1015e-01],
E                               [-1.6212e-01,  6.2942e-02],
E                               [-6.7711e-02,  3.7193e-02],
E                               [-2.3127e-02,  2.2173e-02],
E                               [-1.9336e-01,  1.1921e-02],
E                               [-7.1049e-02,  7.3910e-03],
E                               [ 2.1219e-01,  4.2915e-03],
E                               [-2.1982e-01,  2.3842e-03],
E                               [ 2.4748e-01,  1.9073e-03],
E                               [-6.1989e-03,  9.5367e-04],
E                               [-1.9050e-01,  0.0000e+00],
E                               [ 1.2302e-01,  4.7684e-04],
E                               [ 9.7275e-02,  7.1526e-04],
E                               [-7.6294e-03, -2.3842e-04],
E                               [-8.6784e-02,  2.3842e-04],
E                               [ 1.1897e-01,  0.0000e+00],
E                               [-1.2517e-01,  0.0000e+00],
E                               [ 2.5821e-01,  0.0000e+00],
E                               [-9.5606e-02, -2.3842e-04],
E                               [-3.8981e-01,  0.0000e+00],
E                               [ 2.6083e-01, -2.3842e-04],
E                               [ 2.1482e-01,  0.0000e+00],
E                               [ 5.1737e-02, -4.7684e-04],
E                               [-7.4387e-02,  2.3842e-04],
E                               [ 5.2452e-02,  0.0000e+00],
E                               [ 1.4949e-01,  2.3842e-04],
E                               [ 1.2517e-01, -2.3842e-04],
E                               [-3.3593e-01,  0.0000e+00],
E                               [-7.9775e-01,  0.0000e+00],
E                               [ 4.7398e-01,  2.3842e-04],
E                               [ 4.5300e-01,  0.0000e+00],
E                               [ 3.0279e-02, -3.0446e-01],
E                               [ 1.1921e-03, -8.5115e-02],
E                               [ 7.1526e-04,  5.1975e-01],
E                               [ 2.3842e-04, -1.6117e-01],
E                               [ 9.5367e-04,  4.2439e-02],
E                               [-4.7684e-04,  1.5283e-01],
E                               [ 0.0000e+00, -8.1539e-02],
E                               [ 0.0000e+00,  4.7445e-02],
E                               [ 2.3842e-04,  3.1710e-02],
E                               [ 0.0000e+00, -1.2016e-01],
E                               [-7.1526e-04, -7.4387e-02],
E                               [ 0.0000e+00,  2.3484e-01],
E                               [-4.7684e-04, -2.1338e-01],
E                               [ 0.0000e+00,  9.0599e-03],
E                               [ 7.1526e-04,  2.2674e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [ 0.0000e+00,  1.3256e-01],
E                               [ 0.0000e+00,  9.8944e-02],
E                               [-2.3842e-04, -6.4373e-03],
E                               [ 2.3842e-04, -8.6546e-02],
E                               [ 0.0000e+00,  1.1945e-01],
E                               [ 2.3842e-04, -1.2493e-01],
E                               [-2.3842e-04, -8.0347e-02],
E                               [-2.3842e-04,  2.2817e-01],
E                               [ 2.3842e-04, -4.0579e-01],
E                               [ 0.0000e+00,  2.7943e-01],
E                               [-2.3842e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.1975e-02],
E                               [-2.3842e-04, -7.3910e-02],
E                               [ 0.0000e+00,  5.3406e-02],
E                               [-2.3842e-04,  1.4853e-01],
E                               [ 0.0000e+00, -3.2330e-01],
E                               [-4.7684e-04,  2.1815e-01],
E                               [ 2.3842e-04, -8.7214e-01],
E                               [ 0.0000e+00,  4.6754e-01],
E                               [ 0.0000e+00,  4.4966e-01]], device='cuda:0')
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ..., dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5292e+02],
E                               [ 1.2874e+01],
E                               [ 7.3957e-01],
E                               [ 3.3808e-01],
E                               [-7.9703e-01],
E                               [ 7.1311e-01],
E                               [ 3.2520e-01],
E                               [ 2.5606e-01],
E                               [ 1.4329e-01],
E                               [-1.9002e-01],
E                               [ 1.0657e-01],
E                               [ 9.7036e-02],
E                               [ 9.1076e-02],
E                               [ 8.3685e-02],
E                               [-2.0671e-01],
E                               [ 1.1563e-01],
E                               [-3.5834e-01],
E                               [ 1.6141e-01],
E                               [ 7.1287e-02],
E                               [ 6.9857e-02],
E                               [ 1.4281e-01],
E                               [ 1.4949e-01],
E                               [-6.6090e-01],
E                               [ 2.5082e-01],
E                               [ 1.4830e-01],
E                               [ 2.5296e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5177e-01],
E                               [-8.9502e-01]], device='cuda:0')
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = NumpyTransducerLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<_NumpyTransducerBackward>)
tupled_inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...nsor([2, 2], device='cuda:0', dtype=torch.int32), tensor([[1, 2],
        [1, 1]], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<_NumpyTransducerBackward>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[-0.1869,  0.0000],
E                               [-0.0625,  0.0000],
E                               [ 0.2494,  0.0000],
E                               [-0.2034,  0.0000],
E                               [ 0.2024,  0.0000],
E                               [ 0.0010,  0.0000],
E                               [-0.1411,  0.0000],
E                               [ 0.0789,  0.0000],
E                               [ 0.0620,  0.0000],
E                               [-0.0114,  0.0000],
E                               [-0.0813,  0.0000],
E                               [ 0.0930,  0.0000],
E                               [-0.1540,  0.0000],
E                               [ 0.2296,  0.0000],
E                               [-0.0751,  0.0000],
E                               [-0.2465,  0.0000],
E                               [ 0.1464,  0.0000],
E                               [ 0.1004,  0.0000],
E                               [-0.0131,  0.0000],
E                               [-0.0618,  0.0000],
E                               [ 0.0744,  0.0000],
E                               [-0.0560,  0.0000],
E                               [ 0.2201,  0.0000],
E                               [-0.1640,  0.0000],
E                               [-0.4976,  0.0000],
E                               [ 0.2096,  0.0000],
E                               [ 0.2887,  0.0000],
E                               [ 0.0136,  0.0000],
E                               [-0.0303,  0.0000],
E                               [ 0.0167,  0.0000],
E                               [ 0.1137,  0.0000],
E                               [ 0.0627,  0.0000],
E                               [-0.1767,  0.0000],
E                               [-0.6671,  0.0000],
E                               [ 0.3676,  0.0000],
E                               [ 0.2995,  0.0000],
E                               [ 0.0000, -0.3563],
E                               [ 0.0000, -0.0554],
E                               [ 0.0000,  0.4117],
E                               [ 0.0000, -0.0969],
E                               [ 0.0000,  0.0294],
E                               [ 0.0000,  0.0675],
E                               [ 0.0000, -0.0634],
E                               [ 0.0000,  0.0278],
E                               [ 0.0000,  0.0359],
E                               [ 0.0000, -0.1546],
E                               [ 0.0000, -0.0737],
E                               [ 0.0000,  0.2285],
E                               [ 0.0000, -0.1669],
E                               [ 0.0000,  0.0000],
E                               [ 0.0000,  0.1669],
E                               [ 0.0000, -0.1724],
E                               [ 0.0000,  0.1055],
E                               [ 0.0000,  0.0670],
E                               [ 0.0000,  0.0240],
E                               [ 0.0000, -0.1181],
E                               [ 0.0000,  0.0942],
E                               [ 0.0000, -0.1047],
E                               [ 0.0000, -0.1090],
E                               [ 0.0000,  0.2136],
E                               [ 0.0000, -0.3699],
E                               [ 0.0000,  0.1799],
E                               [ 0.0000,  0.1895],
E                               [ 0.0000,  0.0259],
E                               [ 0.0000, -0.0793],
E                               [ 0.0000,  0.0539],
E                               [ 0.0000,  0.1224],
E                               [ 0.0000, -0.2387],
E                               [ 0.0000,  0.1165],
E                               [ 0.0000, -0.5988],
E                               [ 0.0000,  0.3023],
E                               [ 0.0000,  0.2966]], device='cuda:0')
E                       analytical:tensor([[-1.8684e-01, -1.8684e-01],
E                               [-6.2555e-02, -6.2555e-02],
E                               [ 2.4940e-01,  2.4940e-01],
E                               [-2.0338e-01, -2.0338e-01],
E                               [ 2.0240e-01,  2.0240e-01],
E                               [ 9.7747e-04,  9.7747e-04],
E                               [-1.4102e-01, -1.4102e-01],
E                               [ 7.9123e-02,  7.9123e-02],
E                               [ 6.1893e-02,  6.1893e-02],
E                               [-1.1552e-02, -1.1552e-02],
E                               [-8.1280e-02, -8.1280e-02],
E                               [ 9.2832e-02,  9.2832e-02],
E                               [-1.5426e-01, -1.5426e-01],
E                               [ 2.2943e-01,  2.2943e-01],
E                               [-7.5175e-02, -7.5175e-02],
E                               [-2.4659e-01, -2.4659e-01],
E                               [ 1.4640e-01,  1.4640e-01],
E                               [ 1.0019e-01,  1.0019e-01],
E                               [-1.2918e-02, -1.2918e-02],
E                               [-6.1593e-02, -6.1593e-02],
E                               [ 7.4512e-02,  7.4512e-02],
E                               [-5.5986e-02, -5.5986e-02],
E                               [ 2.1983e-01,  2.1983e-01],
E                               [-1.6385e-01, -1.6385e-01],
E                               [-4.9763e-01, -4.9763e-01],
E                               [ 2.0924e-01,  2.0924e-01],
E                               [ 2.8839e-01,  2.8839e-01],
E                               [ 1.3605e-02,  1.3605e-02],
E                               [-3.0220e-02, -3.0220e-02],
E                               [ 1.6615e-02,  1.6615e-02],
E                               [ 1.1392e-01,  1.1392e-01],
E                               [ 6.2781e-02,  6.2781e-02],
E                               [-1.7671e-01, -1.7671e-01],
E                               [-6.6708e-01, -6.6708e-01],
E                               [ 3.6766e-01,  3.6766e-01],
E                               [ 2.9942e-01,  2.9942e-01],
E                               [-3.5634e-01, -3.5634e-01],
E                               [-5.5347e-02, -5.5347e-02],
E                               [ 4.1169e-01,  4.1169e-01],
E                               [-9.6922e-02, -9.6922e-02],
E                               [ 2.9459e-02,  2.9459e-02],
E                               [ 6.7463e-02,  6.7463e-02],
E                               [-6.3518e-02, -6.3518e-02],
E                               [ 2.7654e-02,  2.7654e-02],
E                               [ 3.5863e-02,  3.5863e-02],
E                               [-1.5450e-01, -1.5450e-01],
E                               [-7.3942e-02, -7.3942e-02],
E                               [ 2.2844e-01,  2.2844e-01],
E                               [-1.6679e-01, -1.6679e-01],
E                               [-8.7988e-05, -8.7988e-05],
E                               [ 1.6688e-01,  1.6688e-01],
E                               [-1.7237e-01, -1.7237e-01],
E                               [ 1.0557e-01,  1.0557e-01],
E                               [ 6.6804e-02,  6.6804e-02],
E                               [ 2.3875e-02,  2.3875e-02],
E                               [-1.1826e-01, -1.1826e-01],
E                               [ 9.4381e-02,  9.4381e-02],
E                               [-1.0471e-01, -1.0471e-01],
E                               [-1.0893e-01, -1.0893e-01],
E                               [ 2.1364e-01,  2.1364e-01],
E                               [-3.6984e-01, -3.6984e-01],
E                               [ 1.8012e-01,  1.8012e-01],
E                               [ 1.8973e-01,  1.8973e-01],
E                               [ 2.5714e-02,  2.5714e-02],
E                               [-7.9462e-02, -7.9462e-02],
E                               [ 5.3748e-02,  5.3748e-02],
E                               [ 1.2233e-01,  1.2233e-01],
E                               [-2.3879e-01, -2.3879e-01],
E                               [ 1.1646e-01,  1.1646e-01],
E                               [-5.9869e-01, -5.9869e-01],
E                               [ 3.0220e-01,  3.0220e-01],
E                               [ 2.9648e-01,  2.9648e-01]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_1 __________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_np_transducer_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1265: in _gradcheck_helper
    _test_backward_mul_by_grad_output(outputs, tupled_inputs, check_sparse_nnz)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

outputs = (tensor([5.0957], device='cuda:0', grad_fn=<_NumpyTransducerBackward>),)
inputs = (tensor([[[[0.1000, 0.6000, 0.1000, 0.1000, 0.1000],
          [0.1000, 0.1000, 0.6000, 0.1000, 0.1000],
          [0....pe=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([[1, 2]], device='cuda:0', dtype=torch.int32))
check_sparse_nnz = False

    def _test_backward_mul_by_grad_output(outputs, inputs, check_sparse_nnz) -> bool:
        # Tests that backward is multiplied by grad_output
        diff_input_list: List[torch.Tensor] = list(_iter_tensors(inputs, True))
        if not diff_input_list:
            raise GradcheckError("no Tensors requiring grad found in input")
        grads_input = torch.autograd.grad(outputs, diff_input_list,
                                          [torch.zeros_like(o, memory_format=torch.legacy_contiguous_format) for o in outputs],
                                          allow_unused=True)
        for gi, di in zip(grads_input, diff_input_list):
            if gi is None:
                continue
            if isinstance(gi, torch.Tensor) and gi.layout != torch.strided:
                if gi.layout != di.layout:
                    raise GradcheckError('grad is incorrect layout (' + str(gi.layout) + ' is not ' + str(di.layout) + ')')
                if gi.layout == torch.sparse_coo:
                    if gi.sparse_dim() != di.sparse_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect sparse_dim')
                    if gi.dense_dim() != di.dense_dim():
                        raise GradcheckError('grad is sparse tensor, but has incorrect dense_dim')
                gi = gi.to_dense()
                di = di.to_dense()
    
            if check_sparse_nnz:
                if not torch.allclose(gi, torch.zeros_like(gi)):
                    raise GradcheckError('backward not multiplied by grad_output')
            elif not gi.eq(0).all():
>               raise GradcheckError('backward not multiplied by grad_output')
E               torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:799: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
=============================== warnings summary ===============================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:632: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - t...
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 - t...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - ...
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 - ...
=================== 8 failed, 18 passed, 8 warnings in 4.95s ===================

@vincentqb vincentqb force-pushed the rnntautograd branch 2 times, most recently from 65eb141 to 69029b1 Compare May 26, 2021 19:44
@vincentqb
Copy link
Contributor Author

vincentqb commented May 26, 2021

Patch:

diff --git a/test/torchaudio_unittest/rnnt/numpy_transducer.py b/test/torchaudio_unittest/rnnt/numpy_transducer.py
index a284bc1..b4896b1 100644
--- a/test/torchaudio_unittest/rnnt/numpy_transducer.py
+++ b/test/torchaudio_unittest/rnnt/numpy_transducer.py
@@ -34,6 +34,8 @@ class _NumpyTransducer(torch.autograd.Function):
 
     @staticmethod
     def backward(ctx, output_gradients):
+        output_gradients = output_gradients.view(-1, 1, 1, 1).to(ctx.grads)
+        ctx.grads.mul_(output_gradients).to(ctx.grads)
         return ctx.grads, None, None, None, None, None, None, None, None
 
     @staticmethod

Before:

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: backward not multiplied by grad_output
===================================================================================================== 8 failed, 18 passed, 8 warnings in 4.57s =====================================================================================================

After:

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, alt...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, al...
===================================================================================================== 6 failed, 20 passed, 8 warnings in 4.80s =====================================================================================================
Details

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/vincentqb/miniconda/envs/torch-nightly/bin/python
cachedir: .pytest_cache
rootdir: /private/home/vincentqb/autograd/audio
plugins: hydra-core-1.0.6
collecting ... collected 26 items

autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED     [  3%]
autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED     [  7%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 11%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 15%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED    [ 19%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED    [ 23%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 FAILED [ 26%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 30%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_basic_backward PASSED          [ 34%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 38%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 42%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 46%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 50%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 53%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 57%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_basic_backward PASSED         [ 61%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 65%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 69%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 73%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 76%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 80%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 84%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_RNNTLoss PASSED  [ 88%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_rnnt_loss PASSED [ 92%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_RNNTLoss PASSED [ 96%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_rnnt_loss PASSED [100%]

=================================== FAILURES ===================================
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...sor([[1, 2],
        [1, 1]], dtype=torch.int32), tensor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7360e+01,  1.0766e+01],
E                               [ 2.0285e+00,  1.2028e+00],
E                               [ 1.3320e+00,  3.9721e-01],
E                               [ 2.4867e-01,  1.9717e-01],
E                               [-3.6073e-01,  1.1039e-01],
E                               [-1.6189e-01,  6.2943e-02],
E                               [-6.7949e-02,  3.6478e-02],
E                               [-2.3365e-02,  2.1696e-02],
E                               [-1.9288e-01,  1.2875e-02],
E                               [-7.1764e-02,  7.3910e-03],
E                               [ 2.1267e-01,  4.0531e-03],
E                               [-2.2006e-01,  2.6226e-03],
E                               [ 2.4772e-01,  2.1458e-03],
E                               [-5.7220e-03,  9.5367e-04],
E                               [-1.9026e-01,  4.7684e-04],
E                               [ 1.2255e-01,  4.7684e-04],
E                               [ 9.7036e-02,  2.3842e-04],
E                               [-6.9141e-03, -4.7684e-04],
E                               [-8.7261e-02, -2.3842e-04],
E                               [ 1.1921e-01,  4.7684e-04],
E                               [-1.2374e-01, -2.3842e-04],
E                               [ 2.5749e-01,  0.0000e+00],
E                               [-9.5367e-02, -2.3842e-04],
E                               [-3.8958e-01,  0.0000e+00],
E                               [ 2.6107e-01, -2.3842e-04],
E                               [ 2.1553e-01, -2.3842e-04],
E                               [ 5.1022e-02,  2.3842e-04],
E                               [-7.4387e-02,  0.0000e+00],
E                               [ 5.3167e-02, -2.3842e-04],
E                               [ 1.4925e-01,  2.3842e-04],
E                               [ 1.2517e-01, -7.1526e-04],
E                               [-3.3593e-01,  7.1526e-04],
E                               [-7.9751e-01, -2.3842e-04],
E                               [ 4.7374e-01,  7.1526e-04],
E                               [ 4.5204e-01,  2.3842e-04],
E                               [ 3.1233e-02, -3.0470e-01],
E                               [ 9.5367e-04, -8.4639e-02],
E                               [ 4.7684e-04,  5.1975e-01],
E                               [-2.3842e-04, -1.6069e-01],
E                               [ 0.0000e+00,  4.3154e-02],
E                               [ 0.0000e+00,  1.5235e-01],
E                               [ 2.3842e-04, -8.1301e-02],
E                               [ 0.0000e+00,  4.6968e-02],
E                               [ 0.0000e+00,  3.1471e-02],
E                               [ 0.0000e+00, -1.1992e-01],
E                               [ 2.3842e-04, -7.4387e-02],
E                               [-4.7684e-04,  2.3460e-01],
E                               [ 9.5367e-04, -2.1267e-01],
E                               [-4.7684e-04,  8.5831e-03],
E                               [ 0.0000e+00,  2.2697e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [-2.3842e-04,  1.3256e-01],
E                               [-2.3842e-04,  9.8467e-02],
E                               [ 0.0000e+00, -6.4373e-03],
E                               [-4.7684e-04, -8.6546e-02],
E                               [-4.7684e-04,  1.1921e-01],
E                               [ 7.1526e-04, -1.2445e-01],
E                               [ 4.7684e-04, -8.0824e-02],
E                               [-4.7684e-04,  2.2840e-01],
E                               [ 2.3842e-04, -4.0674e-01],
E                               [ 0.0000e+00,  2.7966e-01],
E                               [ 9.5367e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.2452e-02],
E                               [-4.7684e-04, -7.4387e-02],
E                               [ 0.0000e+00,  5.2929e-02],
E                               [ 0.0000e+00,  1.4973e-01],
E                               [-2.3842e-04, -3.2377e-01],
E                               [-2.3842e-04,  2.1839e-01],
E                               [ 0.0000e+00, -8.7285e-01],
E                               [-4.7684e-04,  4.6802e-01],
E                               [-4.7684e-04,  4.5013e-01]])
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
----------------------------- Captured stderr call -----------------------------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss(), func_out = tensor([5.0957], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ...quires_grad=True), tensor([[1, 2]], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32))
outputs = (tensor([5.0957], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5291e+02],
E                               [ 1.2874e+01],
E                               [ 7.4053e-01],
E                               [ 3.3712e-01],
E                               [-7.9679e-01],
E                               [ 7.1311e-01],
E                               [ 3.2616e-01],
E                               [ 2.5654e-01],
E                               [ 1.4234e-01],
E                               [-1.8954e-01],
E                               [ 1.0657e-01],
E                               [ 9.6798e-02],
E                               [ 9.0361e-02],
E                               [ 8.4639e-02],
E                               [-2.0695e-01],
E                               [ 1.1611e-01],
E                               [-3.5858e-01],
E                               [ 1.6069e-01],
E                               [ 7.1049e-02],
E                               [ 7.0333e-02],
E                               [ 1.4257e-01],
E                               [ 1.4997e-01],
E                               [-6.6042e-01],
E                               [ 2.4986e-01],
E                               [ 1.4782e-01],
E                               [ 2.5439e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5249e-01],
E                               [-8.9550e-01]])
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:974: in _slow_gradcheck
    analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...sor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32), tensor([[1, 2],
        [1, 1]], dtype=torch.int32))
output = tensor([4.2807, 3.9384], grad_fn=<_NumpyTransducerBackward>)
nondet_tol = 0.0, check_grad_dtypes = False, fast_mode = False, v = None

    def _check_analytical_jacobian_attributes(inputs, output, nondet_tol, check_grad_dtypes,
                                              fast_mode=False, v=None) -> Tuple[torch.Tensor, ...]:
        # This is used by both fast and slow mode:
        #  - For slow mode, vjps[i][j] is the jth row the Jacobian wrt the ith
        #    input.
        #  - For fast mode, vjps[i][0] is a linear combination of the rows
        #    of the Jacobian wrt the ith input
        diff_input_list = list(_iter_tensors(inputs, True))
    
        def vjp_fn(grad_output):
            return torch.autograd.grad(output, diff_input_list, grad_output,
                                       retain_graph=True, allow_unused=True)
        # Compute everything twice to check for nondeterminism (which we call reentrancy)
        if fast_mode:
            vjps1 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
            vjps2 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
        else:
            vjps1 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
            vjps2 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
    
        output_numel = output.numel() if not fast_mode else 1
        jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel)
        jacobians2, _, _ = _stack_and_check_tensors(vjps2, inputs, output_numel)
        reentrant = _check_jacobians_equal(jacobians1, jacobians2, nondet_tol)
    
        if not types_ok and check_grad_dtypes:
            raise GradcheckError('Gradient has dtype mismatch')
        if not sizes_ok:
            raise GradcheckError('Analytical gradient has incorrect size')
        if not reentrant:
>           raise GradcheckError('Backward is not reentrant, i.e., running backward with '
                                 'same input and grad_output multiple times gives different values, '
                                 'although analytical gradient matches numerical gradient.'
                                 f'The tolerance for nondeterminism was {nondet_tol}.' +
                                 FAILED_NONDET_MSG)
E           torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient.The tolerance for nondeterminism was 0.0.
E           
E           NOTE: If your op relies on non-deterministic operations i.e., it is listed here:
E           https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
E           this failure might be expected.
E           
E           If you are adding a new operator, please file an issue and then use one of the
E           workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
E           If the test
E           - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
E             with `nondet_tol=<tol>` as a keyword argument.
E           - is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test
E             to have `gradcheck_nondet_tol=<tol>`.
E           - is a Module test (e.g., in common_nn.py), then modify the corresponding
E             module_test entry to have `gradcheck_nondet_tol=<tol>`

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:529: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...e=torch.int32), tensor([4, 4], device='cuda:0', dtype=torch.int32), tensor([2, 2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7359e+01,  1.0767e+01],
E                               [ 2.0287e+00,  1.2023e+00],
E                               [ 1.3320e+00,  3.9697e-01],
E                               [ 2.4867e-01,  1.9670e-01],
E                               [-3.6025e-01,  1.1015e-01],
E                               [-1.6212e-01,  6.2942e-02],
E                               [-6.7711e-02,  3.7193e-02],
E                               [-2.3127e-02,  2.2173e-02],
E                               [-1.9336e-01,  1.1921e-02],
E                               [-7.1049e-02,  7.3910e-03],
E                               [ 2.1219e-01,  4.2915e-03],
E                               [-2.1982e-01,  2.3842e-03],
E                               [ 2.4748e-01,  1.9073e-03],
E                               [-6.1989e-03,  9.5367e-04],
E                               [-1.9050e-01,  0.0000e+00],
E                               [ 1.2302e-01,  4.7684e-04],
E                               [ 9.7275e-02,  7.1526e-04],
E                               [-7.6294e-03, -2.3842e-04],
E                               [-8.6784e-02,  2.3842e-04],
E                               [ 1.1897e-01,  0.0000e+00],
E                               [-1.2517e-01,  0.0000e+00],
E                               [ 2.5821e-01,  0.0000e+00],
E                               [-9.5606e-02, -2.3842e-04],
E                               [-3.8981e-01,  0.0000e+00],
E                               [ 2.6083e-01, -2.3842e-04],
E                               [ 2.1482e-01,  0.0000e+00],
E                               [ 5.1737e-02, -4.7684e-04],
E                               [-7.4387e-02,  2.3842e-04],
E                               [ 5.2452e-02,  0.0000e+00],
E                               [ 1.4949e-01,  2.3842e-04],
E                               [ 1.2517e-01, -2.3842e-04],
E                               [-3.3593e-01,  0.0000e+00],
E                               [-7.9775e-01,  0.0000e+00],
E                               [ 4.7398e-01,  2.3842e-04],
E                               [ 4.5300e-01,  0.0000e+00],
E                               [ 3.0279e-02, -3.0446e-01],
E                               [ 1.1921e-03, -8.5115e-02],
E                               [ 7.1526e-04,  5.1975e-01],
E                               [ 2.3842e-04, -1.6117e-01],
E                               [ 9.5367e-04,  4.2439e-02],
E                               [-4.7684e-04,  1.5283e-01],
E                               [ 0.0000e+00, -8.1539e-02],
E                               [ 0.0000e+00,  4.7445e-02],
E                               [ 2.3842e-04,  3.1710e-02],
E                               [ 0.0000e+00, -1.2016e-01],
E                               [-7.1526e-04, -7.4387e-02],
E                               [ 0.0000e+00,  2.3484e-01],
E                               [-4.7684e-04, -2.1338e-01],
E                               [ 0.0000e+00,  9.0599e-03],
E                               [ 7.1526e-04,  2.2674e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [ 0.0000e+00,  1.3256e-01],
E                               [ 0.0000e+00,  9.8944e-02],
E                               [-2.3842e-04, -6.4373e-03],
E                               [ 2.3842e-04, -8.6546e-02],
E                               [ 0.0000e+00,  1.1945e-01],
E                               [ 2.3842e-04, -1.2493e-01],
E                               [-2.3842e-04, -8.0347e-02],
E                               [-2.3842e-04,  2.2817e-01],
E                               [ 2.3842e-04, -4.0579e-01],
E                               [ 0.0000e+00,  2.7943e-01],
E                               [-2.3842e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.1975e-02],
E                               [-2.3842e-04, -7.3910e-02],
E                               [ 0.0000e+00,  5.3406e-02],
E                               [-2.3842e-04,  1.4853e-01],
E                               [ 0.0000e+00, -3.2330e-01],
E                               [-4.7684e-04,  2.1815e-01],
E                               [ 2.3842e-04, -8.7214e-01],
E                               [ 0.0000e+00,  4.6754e-01],
E                               [ 0.0000e+00,  4.4966e-01]], device='cuda:0')
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:64: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ..., dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5292e+02],
E                               [ 1.2874e+01],
E                               [ 7.3957e-01],
E                               [ 3.3808e-01],
E                               [-7.9703e-01],
E                               [ 7.1311e-01],
E                               [ 3.2520e-01],
E                               [ 2.5606e-01],
E                               [ 1.4329e-01],
E                               [-1.9002e-01],
E                               [ 1.0657e-01],
E                               [ 9.7036e-02],
E                               [ 9.1076e-02],
E                               [ 8.3685e-02],
E                               [-2.0671e-01],
E                               [ 1.1563e-01],
E                               [-3.5834e-01],
E                               [ 1.6141e-01],
E                               [ 7.1287e-02],
E                               [ 6.9857e-02],
E                               [ 1.4281e-01],
E                               [ 1.4949e-01],
E                               [-6.6090e-01],
E                               [ 2.5082e-01],
E                               [ 1.4830e-01],
E                               [ 2.5296e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5177e-01],
E                               [-8.9502e-01]], device='cuda:0')
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
_________________ TestAutograd.test_np_transducer_gradcheck_0 __________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_np_transducer_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:81: in test_np_transducer_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:47: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:974: in _slow_gradcheck
    analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

inputs = (tensor([[[[0.0654, 0.7875, 0.0816],
          [0.5297, 0.7507, 0.7541],
          [0.6098, 0.8681, 0.6225]],

       ...nsor([2, 2], device='cuda:0', dtype=torch.int32), tensor([[1, 2],
        [1, 1]], device='cuda:0', dtype=torch.int32))
output = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<_NumpyTransducerBackward>)
nondet_tol = 0.0, check_grad_dtypes = False, fast_mode = False, v = None

    def _check_analytical_jacobian_attributes(inputs, output, nondet_tol, check_grad_dtypes,
                                              fast_mode=False, v=None) -> Tuple[torch.Tensor, ...]:
        # This is used by both fast and slow mode:
        #  - For slow mode, vjps[i][j] is the jth row the Jacobian wrt the ith
        #    input.
        #  - For fast mode, vjps[i][0] is a linear combination of the rows
        #    of the Jacobian wrt the ith input
        diff_input_list = list(_iter_tensors(inputs, True))
    
        def vjp_fn(grad_output):
            return torch.autograd.grad(output, diff_input_list, grad_output,
                                       retain_graph=True, allow_unused=True)
        # Compute everything twice to check for nondeterminism (which we call reentrancy)
        if fast_mode:
            vjps1 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
            vjps2 = _get_analytical_vjps_wrt_specific_output(vjp_fn, output.clone(), v)
        else:
            vjps1 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
            vjps2 = _compute_analytical_jacobian_rows(vjp_fn, output.clone())
    
        output_numel = output.numel() if not fast_mode else 1
        jacobians1, types_ok, sizes_ok = _stack_and_check_tensors(vjps1, inputs, output_numel)
        jacobians2, _, _ = _stack_and_check_tensors(vjps2, inputs, output_numel)
        reentrant = _check_jacobians_equal(jacobians1, jacobians2, nondet_tol)
    
        if not types_ok and check_grad_dtypes:
            raise GradcheckError('Gradient has dtype mismatch')
        if not sizes_ok:
            raise GradcheckError('Analytical gradient has incorrect size')
        if not reentrant:
>           raise GradcheckError('Backward is not reentrant, i.e., running backward with '
                                 'same input and grad_output multiple times gives different values, '
                                 'although analytical gradient matches numerical gradient.'
                                 f'The tolerance for nondeterminism was {nondet_tol}.' +
                                 FAILED_NONDET_MSG)
E           torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient.The tolerance for nondeterminism was 0.0.
E           
E           NOTE: If your op relies on non-deterministic operations i.e., it is listed here:
E           https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
E           this failure might be expected.
E           
E           If you are adding a new operator, please file an issue and then use one of the
E           workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
E           If the test
E           - manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
E             with `nondet_tol=<tol>` as a keyword argument.
E           - is OpInfo-based (e.g., in test_ops.py), then modify the OpInfo for the test
E             to have `gradcheck_nondet_tol=<tol>`.
E           - is a Module test (e.g., in common_nn.py), then modify the corresponding
E             module_test entry to have `gradcheck_nondet_tol=<tol>`

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:529: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
=============================== warnings summary ===============================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:632: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 - t...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 - ...
=================== 6 failed, 20 passed, 8 warnings in 4.56s ===================

@vincentqb
Copy link
Contributor Author

vincentqb commented May 26, 2021

Patch with same result as comment above

diff --git a/test/torchaudio_unittest/rnnt/autograd_impl.py b/test/torchaudio_unittest/rnnt/autograd_impl.py
index aa642bc..123c752 100644
--- a/test/torchaudio_unittest/rnnt/autograd_impl.py
+++ b/test/torchaudio_unittest/rnnt/autograd_impl.py
@@ -16,6 +16,9 @@ from .utils import (
 )
 from .numpy_transducer import NumpyTransducerLoss
 
+import random
+import numpy as np
+
 
 class Autograd(TestBaseMixin):
     @staticmethod
@@ -44,7 +47,7 @@ class Autograd(TestBaseMixin):
         #         if enable_all_grad:
         #             i.requires_grad = True
         #     inputs_.append(i)
-        assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
+        assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=1e-02)
 
     @parameterized.expand([
         # (get_B1_T10_U3_D4_data, ),
@@ -77,5 +80,9 @@ class Autograd(TestBaseMixin):
             data["targets"],
         )
         loss = NumpyTransducerLoss(blank=data["blank"])
+
+        torch.use_deterministic_algorithms(True)
+        random.seed(0)
+        np.random.seed(0)
         
         self.assert_grad(loss, inputs, enable_all_grad=False)
diff --git a/test/torchaudio_unittest/rnnt/numpy_transducer.py b/test/torchaudio_unittest/rnnt/numpy_transducer.py
index a284bc1..b4896b1 100644
--- a/test/torchaudio_unittest/rnnt/numpy_transducer.py
+++ b/test/torchaudio_unittest/rnnt/numpy_transducer.py
@@ -34,6 +34,8 @@ class _NumpyTransducer(torch.autograd.Function):
 
     @staticmethod
     def backward(ctx, output_gradients):
+        output_gradients = output_gradients.view(-1, 1, 1, 1).to(ctx.grads)
+        ctx.grads.mul_(output_gradients).to(ctx.grads)
         return ctx.grads, None, None, None, None, None, None, None, None
 
     @staticmethod

@vincentqb
Copy link
Contributor Author

vincentqb commented May 26, 2021

FIX: The fix for numpy transducer is to not use inplace operations as shown in 8129432:

    def backward(ctx, grad_output):
        grad_output = grad_output.view(-1, 1, 1, 1).to(ctx.grads)
        return ctx.grads.mul(grad_output), None, None, None, None, None, None, None, None

BUG: warp-transducer and warp-rnnt modify the gradient inplace which can lead to backward pass not being reentrant. This means calling the loss function multiple time without calling a forward or using retain_graph will lead to different results. (internal)

out = rnnt_loss(…)
grad1, = autograd.grad(out, inputs, retain_graph=True)
grad2, = autograd.grad(out, inputs)
self.assertEqual(grad1, grad2, atol=atol, rtol=rtol)

The torchaudio C++ custom autograd function in torchaudio is ok thanks to #1507, see here.

BUG: the numpy transducer is issue is different: the jacobian-gradient product is not made and only the gradient is returned. (internal)

@vincentqb
Copy link
Contributor Author

vincentqb commented May 26, 2021

============================================================================================================= short test summary info ==============================================================================================================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
===================================================================================================== 4 failed, 22 passed, 8 warnings in 4.90s =====================================================================================================
Details

============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1 -- /private/home/vincentqb/miniconda/envs/torch-nightly/bin/python
cachedir: .pytest_cache
rootdir: /private/home/vincentqb/autograd/audio
plugins: hydra-core-1.0.6
collecting ... collected 26 items

autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED     [  3%]
autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED     [  7%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0 PASSED [ 11%]
autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 15%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 FAILED    [ 19%]
autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 FAILED    [ 23%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0 PASSED [ 26%]
autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1 PASSED [ 30%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_basic_backward PASSED          [ 34%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 38%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 42%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 46%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 50%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 53%]
rnnt_loss_cpu_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 57%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_basic_backward PASSED         [ 61%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp16 PASSED [ 65%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B1_T2_U3_D5_fp32 PASSED [ 69%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp16 PASSED [ 73%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_B2_T4_U3_D3_fp32 PASSED [ 76%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_costs_and_gradients_random_data_with_numpy_fp32 PASSED [ 80%]
rnnt_loss_cuda_test.py::TestRNNTLoss::test_rnnt_nonfused_log_softmax PASSED [ 84%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_RNNTLoss PASSED  [ 88%]
torchscript_consistency_cpu_test.py::TestRNNTLoss::test_rnnt_loss PASSED [ 92%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_RNNTLoss PASSED [ 96%]
torchscript_consistency_cuda_test.py::TestRNNTLoss::test_rnnt_loss PASSED [100%]

=================================== FAILURES ===================================
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...sor([[1, 2],
        [1, 1]], dtype=torch.int32), tensor([4, 4], dtype=torch.int32), tensor([2, 2], dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7360e+01,  1.0766e+01],
E                               [ 2.0285e+00,  1.2028e+00],
E                               [ 1.3320e+00,  3.9721e-01],
E                               [ 2.4867e-01,  1.9717e-01],
E                               [-3.6073e-01,  1.1039e-01],
E                               [-1.6189e-01,  6.2943e-02],
E                               [-6.7949e-02,  3.6478e-02],
E                               [-2.3365e-02,  2.1696e-02],
E                               [-1.9288e-01,  1.2875e-02],
E                               [-7.1764e-02,  7.3910e-03],
E                               [ 2.1267e-01,  4.0531e-03],
E                               [-2.2006e-01,  2.6226e-03],
E                               [ 2.4772e-01,  2.1458e-03],
E                               [-5.7220e-03,  9.5367e-04],
E                               [-1.9026e-01,  4.7684e-04],
E                               [ 1.2255e-01,  4.7684e-04],
E                               [ 9.7036e-02,  2.3842e-04],
E                               [-6.9141e-03, -4.7684e-04],
E                               [-8.7261e-02, -2.3842e-04],
E                               [ 1.1921e-01,  4.7684e-04],
E                               [-1.2374e-01, -2.3842e-04],
E                               [ 2.5749e-01,  0.0000e+00],
E                               [-9.5367e-02, -2.3842e-04],
E                               [-3.8958e-01,  0.0000e+00],
E                               [ 2.6107e-01, -2.3842e-04],
E                               [ 2.1553e-01, -2.3842e-04],
E                               [ 5.1022e-02,  2.3842e-04],
E                               [-7.4387e-02,  0.0000e+00],
E                               [ 5.3167e-02, -2.3842e-04],
E                               [ 1.4925e-01,  2.3842e-04],
E                               [ 1.2517e-01, -7.1526e-04],
E                               [-3.3593e-01,  7.1526e-04],
E                               [-7.9751e-01, -2.3842e-04],
E                               [ 4.7374e-01,  7.1526e-04],
E                               [ 4.5204e-01,  2.3842e-04],
E                               [ 3.1233e-02, -3.0470e-01],
E                               [ 9.5367e-04, -8.4639e-02],
E                               [ 4.7684e-04,  5.1975e-01],
E                               [-2.3842e-04, -1.6069e-01],
E                               [ 0.0000e+00,  4.3154e-02],
E                               [ 0.0000e+00,  1.5235e-01],
E                               [ 2.3842e-04, -8.1301e-02],
E                               [ 0.0000e+00,  4.6968e-02],
E                               [ 0.0000e+00,  3.1471e-02],
E                               [ 0.0000e+00, -1.1992e-01],
E                               [ 2.3842e-04, -7.4387e-02],
E                               [-4.7684e-04,  2.3460e-01],
E                               [ 9.5367e-04, -2.1267e-01],
E                               [-4.7684e-04,  8.5831e-03],
E                               [ 0.0000e+00,  2.2697e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [-2.3842e-04,  1.3256e-01],
E                               [-2.3842e-04,  9.8467e-02],
E                               [ 0.0000e+00, -6.4373e-03],
E                               [-4.7684e-04, -8.6546e-02],
E                               [-4.7684e-04,  1.1921e-01],
E                               [ 7.1526e-04, -1.2445e-01],
E                               [ 4.7684e-04, -8.0824e-02],
E                               [-4.7684e-04,  2.2840e-01],
E                               [ 2.3842e-04, -4.0674e-01],
E                               [ 0.0000e+00,  2.7966e-01],
E                               [ 9.5367e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.2452e-02],
E                               [-4.7684e-04, -7.4387e-02],
E                               [ 0.0000e+00,  5.2929e-02],
E                               [ 0.0000e+00,  1.4973e-01],
E                               [-2.3842e-04, -3.2377e-01],
E                               [-2.3842e-04,  2.1839e-01],
E                               [ 0.0000e+00, -8.7285e-01],
E                               [-4.7684e-04,  4.6802e-01],
E                               [-4.7684e-04,  4.5013e-01]])
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
----------------------------- Captured stderr call -----------------------------
[W LegacyTypeDispatch.h:74] Warning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (function operator())
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cpu_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss(), func_out = tensor([5.0957], grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ...quires_grad=True), tensor([[1, 2]], dtype=torch.int32), tensor([2], dtype=torch.int32), tensor([2], dtype=torch.int32))
outputs = (tensor([5.0957], grad_fn=<RNNTLossFunction>>),), eps = 0.001
rtol = 0.01, atol = 0.01, check_grad_dtypes = False, nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5291e+02],
E                               [ 1.2874e+01],
E                               [ 7.4053e-01],
E                               [ 3.3712e-01],
E                               [-7.9679e-01],
E                               [ 7.1311e-01],
E                               [ 3.2616e-01],
E                               [ 2.5654e-01],
E                               [ 1.4234e-01],
E                               [-1.8954e-01],
E                               [ 1.0657e-01],
E                               [ 9.6798e-02],
E                               [ 9.0361e-02],
E                               [ 8.4639e-02],
E                               [-2.0695e-01],
E                               [ 1.1611e-01],
E                               [-3.5858e-01],
E                               [ 1.6069e-01],
E                               [ 7.1049e-02],
E                               [ 7.0333e-02],
E                               [ 1.4257e-01],
E                               [ 1.4997e-01],
E                               [-6.6042e-01],
E                               [ 2.4986e-01],
E                               [ 1.4782e-01],
E                               [ 2.5439e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5249e-01],
E                               [-8.9550e-01]])
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]])

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_0 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_0>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[-0.3646, -0.1560,  0.5206],
          [-0.1865,  0.1634,  0.0231],
          [-0.0825,  0.0413,  0.0413]],...e=torch.int32), tensor([4, 4], device='cuda:0', dtype=torch.int32), tensor([2, 2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([4.2807, 3.9384], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.9532e+02,  2.5656e+02],
E                               [ 3.7359e+01,  1.0767e+01],
E                               [ 2.0287e+00,  1.2023e+00],
E                               [ 1.3320e+00,  3.9697e-01],
E                               [ 2.4867e-01,  1.9670e-01],
E                               [-3.6025e-01,  1.1015e-01],
E                               [-1.6212e-01,  6.2942e-02],
E                               [-6.7711e-02,  3.7193e-02],
E                               [-2.3127e-02,  2.2173e-02],
E                               [-1.9336e-01,  1.1921e-02],
E                               [-7.1049e-02,  7.3910e-03],
E                               [ 2.1219e-01,  4.2915e-03],
E                               [-2.1982e-01,  2.3842e-03],
E                               [ 2.4748e-01,  1.9073e-03],
E                               [-6.1989e-03,  9.5367e-04],
E                               [-1.9050e-01,  0.0000e+00],
E                               [ 1.2302e-01,  4.7684e-04],
E                               [ 9.7275e-02,  7.1526e-04],
E                               [-7.6294e-03, -2.3842e-04],
E                               [-8.6784e-02,  2.3842e-04],
E                               [ 1.1897e-01,  0.0000e+00],
E                               [-1.2517e-01,  0.0000e+00],
E                               [ 2.5821e-01,  0.0000e+00],
E                               [-9.5606e-02, -2.3842e-04],
E                               [-3.8981e-01,  0.0000e+00],
E                               [ 2.6083e-01, -2.3842e-04],
E                               [ 2.1482e-01,  0.0000e+00],
E                               [ 5.1737e-02, -4.7684e-04],
E                               [-7.4387e-02,  2.3842e-04],
E                               [ 5.2452e-02,  0.0000e+00],
E                               [ 1.4949e-01,  2.3842e-04],
E                               [ 1.2517e-01, -2.3842e-04],
E                               [-3.3593e-01,  0.0000e+00],
E                               [-7.9775e-01,  0.0000e+00],
E                               [ 4.7398e-01,  2.3842e-04],
E                               [ 4.5300e-01,  0.0000e+00],
E                               [ 3.0279e-02, -3.0446e-01],
E                               [ 1.1921e-03, -8.5115e-02],
E                               [ 7.1526e-04,  5.1975e-01],
E                               [ 2.3842e-04, -1.6117e-01],
E                               [ 9.5367e-04,  4.2439e-02],
E                               [-4.7684e-04,  1.5283e-01],
E                               [ 0.0000e+00, -8.1539e-02],
E                               [ 0.0000e+00,  4.7445e-02],
E                               [ 2.3842e-04,  3.1710e-02],
E                               [ 0.0000e+00, -1.2016e-01],
E                               [-7.1526e-04, -7.4387e-02],
E                               [ 0.0000e+00,  2.3484e-01],
E                               [-4.7684e-04, -2.1338e-01],
E                               [ 0.0000e+00,  9.0599e-03],
E                               [ 7.1526e-04,  2.2674e-01],
E                               [ 0.0000e+00, -2.0170e-01],
E                               [ 0.0000e+00,  1.3256e-01],
E                               [ 0.0000e+00,  9.8944e-02],
E                               [-2.3842e-04, -6.4373e-03],
E                               [ 2.3842e-04, -8.6546e-02],
E                               [ 0.0000e+00,  1.1945e-01],
E                               [ 2.3842e-04, -1.2493e-01],
E                               [-2.3842e-04, -8.0347e-02],
E                               [-2.3842e-04,  2.2817e-01],
E                               [ 2.3842e-04, -4.0579e-01],
E                               [ 0.0000e+00,  2.7943e-01],
E                               [-2.3842e-04,  2.1434e-01],
E                               [ 0.0000e+00,  5.1975e-02],
E                               [-2.3842e-04, -7.3910e-02],
E                               [ 0.0000e+00,  5.3406e-02],
E                               [-2.3842e-04,  1.4853e-01],
E                               [ 0.0000e+00, -3.2330e-01],
E                               [-4.7684e-04,  2.1815e-01],
E                               [ 2.3842e-04, -8.7214e-01],
E                               [ 0.0000e+00,  4.6754e-01],
E                               [ 0.0000e+00,  4.4966e-01]], device='cuda:0')
E                       analytical:tensor([[-0.3646, -0.0000],
E                               [-0.1560, -0.0000],
E                               [ 0.5206,  0.0000],
E                               [-0.1865, -0.0000],
E                               [ 0.1634,  0.0000],
E                               [ 0.0231,  0.0000],
E                               [-0.0825, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.1482, -0.0000],
E                               [-0.0941, -0.0000],
E                               [ 0.2423,  0.0000],
E                               [-0.2277, -0.0000],
E                               [ 0.2358,  0.0000],
E                               [-0.0080, -0.0000],
E                               [-0.2303, -0.0000],
E                               [ 0.1151,  0.0000],
E                               [ 0.1151,  0.0000],
E                               [-0.0173, -0.0000],
E                               [-0.0991, -0.0000],
E                               [ 0.1164,  0.0000],
E                               [-0.1395, -0.0000],
E                               [ 0.2369,  0.0000],
E                               [-0.0974, -0.0000],
E                               [-0.4631, -0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.2316,  0.0000],
E                               [ 0.0413,  0.0000],
E                               [-0.0827, -0.0000],
E                               [ 0.0413,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [ 0.1610,  0.0000],
E                               [-0.3220, -0.0000],
E                               [-0.8826, -0.0000],
E                               [ 0.4413,  0.0000],
E                               [ 0.4413,  0.0000],
E                               [-0.0000, -0.3645],
E                               [-0.0000, -0.1560],
E                               [ 0.0000,  0.5206],
E                               [-0.0000, -0.1865],
E                               [ 0.0000,  0.0231],
E                               [ 0.0000,  0.1634],
E                               [-0.0000, -0.0825],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.1482],
E                               [-0.0000, -0.0941],
E                               [ 0.0000,  0.2423],
E                               [-0.0000, -0.2277],
E                               [-0.0000, -0.0080],
E                               [ 0.0000,  0.2358],
E                               [-0.0000, -0.2303],
E                               [ 0.0000,  0.1151],
E                               [ 0.0000,  0.1151],
E                               [-0.0000, -0.0173],
E                               [-0.0000, -0.0991],
E                               [ 0.0000,  0.1164],
E                               [-0.0000, -0.1395],
E                               [-0.0000, -0.0974],
E                               [ 0.0000,  0.2369],
E                               [-0.0000, -0.4631],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.2316],
E                               [ 0.0000,  0.0413],
E                               [-0.0000, -0.0827],
E                               [ 0.0000,  0.0413],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.3220],
E                               [ 0.0000,  0.1610],
E                               [-0.0000, -0.8826],
E                               [ 0.0000,  0.4412],
E                               [ 0.0000,  0.4411]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[-1.86844e-01 -6.25550e-02  2.49399e-01]
   [-2.03377e-01  2.02399e-01  9.77000e-04]
   [-1.41016e-01  7.91230e-02  6.18930e-02]]

  [[-1.15520e-02 -8.12800e-02  9.28320e-02]
   [-1.54257e-01  2.29433e-01 -7.51760e-02]
   [-2.46593e-01  1.46405e-01  1.00188e-01]]

  [[-1.29180e-02 -6.15930e-02  7.45120e-02]
   [-5.59860e-02  2.19831e-01 -1.63845e-01]
   [-4.97627e-01  2.09240e-01  2.88387e-01]]

  [[ 1.36050e-02 -3.02200e-02  1.66150e-02]
   [ 1.13925e-01  6.27810e-02 -1.76706e-01]
   [-6.67078e-01  3.67659e-01  2.99419e-01]]]


 [[[-3.56344e-01 -5.53470e-02  4.11691e-01]
   [-9.69220e-02  2.94590e-02  6.74630e-02]
   [-6.35180e-02  2.76540e-02  3.58630e-02]]

  [[-1.54499e-01 -7.39420e-02  2.28441e-01]
   [-1.66790e-01 -8.80000e-05  1.66878e-01]
   [-1.72370e-01  1.05565e-01  6.68040e-02]]

  [[ 2.38750e-02 -1.18256e-01  9.43810e-02]
   [-1.04707e-01 -1.08934e-01  2.13642e-01]
   [-3.69844e-01  1.80118e-01  1.89726e-01]]

  [[ 2.57140e-02 -7.94620e-02  5.37480e-02]
   [ 1.22328e-01 -2.38789e-01  1.16460e-01]
   [-5.98687e-01  3.02203e-01  2.96484e-01]]]]
____________________ TestAutograd.test_RNNTLoss_gradcheck_1 ____________________

a = (<torchaudio_unittest.rnnt.autograd_cuda_test.TestAutograd testMethod=test_RNNTLoss_gradcheck_1>,)

    @wraps(func)
    def standalone_func(*a):
>       return func(*(a + p.args), **p.kwargs)

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/parameterized/parameterized.py:533: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
autograd_impl.py:63: in test_RNNTLoss_gradcheck
    self.assert_grad(loss, inputs, enable_all_grad=False)
autograd_impl.py:46: in assert_grad
    assert gradcheck(loss, inputs, eps=1e-03, atol=1e-02, rtol=1e-02, nondet_tol=0.)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1245: in gradcheck
    return _gradcheck_helper(**args)
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:1258: in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:930: in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = RNNTLoss()
func_out = tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>)
tupled_inputs = (tensor([[[[ 0.2438, -0.5317,  0.2438,  0.2438, -0.1996],
          [ 0.1468,  0.1468, -0.2588,  0.1468, -0.1816],
   ..., dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32), tensor([2], device='cuda:0', dtype=torch.int32))
outputs = (tensor([5.0957], device='cuda:0', grad_fn=<RNNTLossFunction>>),)
eps = 0.001, rtol = 0.01, atol = 0.01, check_grad_dtypes = False
nondet_tol = 0.0

    def _slow_gradcheck(func, func_out, tupled_inputs, outputs, eps, rtol, atol, check_grad_dtypes,
                        nondet_tol, *, use_forward_ad=False, complex_indices=None, test_imag=False):
        if not outputs:
            return _check_no_differentiable_outputs(func, tupled_inputs, _as_tuple(func_out), eps)
    
        numerical = _transpose(_get_numerical_jacobian(func, tupled_inputs, outputs, eps=eps, is_forward_ad=use_forward_ad))
    
        if use_forward_ad:
            analytical_forward = _get_analytical_jacobian_forward_ad(func, tupled_inputs, outputs, check_grad_dtypes=check_grad_dtypes)
    
            for i, n_per_out in enumerate(numerical):
                for j, n in enumerate(n_per_out):
                    a = analytical_forward[j][i]
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
                        raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag,
                                                                  is_forward_ad=True))
        else:
            for i, o in enumerate(outputs):
                analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
    
                for j, (a, n) in enumerate(zip(analytical, numerical[i])):
                    if not _allclose_with_type_promotion(a, n.to(a.device), rtol, atol):
>                       raise GradcheckError(_get_notallclose_msg(a, n, i, j, complex_indices, test_imag))
E                       torch.autograd.gradcheck.GradcheckError: Jacobian mismatch for output 0 with respect to input 0,
E                       numerical:tensor([[ 1.5292e+02],
E                               [ 1.2874e+01],
E                               [ 7.3957e-01],
E                               [ 3.3808e-01],
E                               [-7.9703e-01],
E                               [ 7.1311e-01],
E                               [ 3.2520e-01],
E                               [ 2.5606e-01],
E                               [ 1.4329e-01],
E                               [-1.9002e-01],
E                               [ 1.0657e-01],
E                               [ 9.7036e-02],
E                               [ 9.1076e-02],
E                               [ 8.3685e-02],
E                               [-2.0671e-01],
E                               [ 1.1563e-01],
E                               [-3.5834e-01],
E                               [ 1.6141e-01],
E                               [ 7.1287e-02],
E                               [ 6.9857e-02],
E                               [ 1.4281e-01],
E                               [ 1.4949e-01],
E                               [-6.6090e-01],
E                               [ 2.5082e-01],
E                               [ 1.4830e-01],
E                               [ 2.5296e-01],
E                               [ 2.5010e-01],
E                               [ 2.5296e-01],
E                               [ 2.5177e-01],
E                               [-8.9502e-01]], device='cuda:0')
E                       analytical:tensor([[ 0.2438],
E                               [-0.5317],
E                               [ 0.2438],
E                               [ 0.2438],
E                               [-0.1996],
E                               [ 0.1468],
E                               [ 0.1468],
E                               [-0.2588],
E                               [ 0.1468],
E                               [-0.1816],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [ 0.0762],
E                               [-0.3046],
E                               [ 0.0760],
E                               [-0.3041],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.0760],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [-0.5733],
E                               [ 0.1433],
E                               [ 0.1433],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [ 0.2318],
E                               [-0.9273]], device='cuda:0')

../../../../../miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:978: GradcheckError
----------------------------- Captured stdout call -----------------------------
reference gradient
[[[[ 0.17703132 -0.39992708  0.17703132  0.17703132 -0.13116692]
   [ 0.12247062  0.12247062 -0.181684    0.12247062 -0.1857276 ]
   [ 0.06269141  0.06269141  0.06928471  0.12624498 -0.32091248]]

  [[ 0.05456069 -0.2182428   0.05456069  0.05456069  0.05456069]
   [ 0.12073967  0.12073967 -0.48295838  0.12073967  0.12073967]
   [ 0.30741188  0.16871123  0.18645471  0.16871123 -0.83128875]]]]
=============================== warnings summary ===============================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:632: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
=========================== short test summary info ============================
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch....
FAILED autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch....
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0 - torch...
FAILED autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1 - torch...
=================== 4 failed, 22 passed, 8 warnings in 4.39s ===================

Link to _slow_check error in pytorch and where tests compute vjp.

@vincentqb
Copy link
Contributor Author

Trying to reproduce gradcheck.

diff --git a/test/torchaudio_unittest/rnnt/autograd_impl.py b/test/torchaudio_unittest/rnnt/autograd_impl.py
index ed1671b..c7ae5ee 100644
--- a/test/torchaudio_unittest/rnnt/autograd_impl.py
+++ b/test/torchaudio_unittest/rnnt/autograd_impl.py
@@ -21,8 +21,27 @@ class Autograd(TestBaseMixin):
     def get_data(data_func, device):
         data_np = data_func()
         if type(data_np) == tuple:
+            grad_out_base = torch.zeros_like(torch.tensor(data_np[1]), memory_format=torch.legacy_contiguous_format)
+            flat_grad_out = grad_out_base.view(-1)
+            print(grad_out_base.shape)
+            for j in range(flat_grad_out.numel()):
+                flat_grad_out.zero_()
+                flat_grad_out[j] = 1.0
+                t = torch.tensor(data_np[-1]).mul(grad_out_base.view((-1, 1, 1, 1)))
+                print("test", t.shape, t)
+            print("ref grads", data_np[-1].shape, data_np[-1])
+
             data_np = data_np[0]
         data = numpy_to_torch(
             data=data_np, device=device, requires_grad=True
@@ -78,3 +97,4 @@ class Autograd(TestBaseMixin):
         loss = NumpyTransducerLoss(blank=data["blank"])
 
         self.assert_grad(loss, inputs, enable_all_grad=False)

diff --git a/test/torchaudio_unittest/rnnt/numpy_transducer.py b/test/torchaudio_unittest/rnnt/numpy_transducer.py
index 1a90703..e559a1d 100644
--- a/test/torchaudio_unittest/rnnt/numpy_transducer.py
+++ b/test/torchaudio_unittest/rnnt/numpy_transducer.py
@@ -35,6 +35,9 @@ class _NumpyTransducer(torch.autograd.Function):
     @staticmethod
     def backward(ctx, grad_output):
         grad_output = grad_output.view(-1, 1, 1, 1).to(ctx.grads)
+        print("backward", ctx.grads.mul(grad_output).shape, ctx.grads.mul(grad_output))
+        print("grads", ctx.grads.shape, ctx.grads)
         return ctx.grads.mul(grad_output), None, None, None, None, None, None, None, None

@vincentqb
Copy link
Contributor Author

vincentqb commented May 27, 2021

gradcheck is passing.

================================================================================================================= warnings summary =================================================================================================================
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cpu_test.py::TestAutograd::test_np_transducer_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_RNNTLoss_gradcheck_1
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_0
test/torchaudio_unittest/rnnt/autograd_cuda_test.py::TestAutograd::test_np_transducer_gradcheck_1
  /private/home/vincentqb/miniconda/envs/torch-nightly/lib/python3.8/site-packages/torch/autograd/gradcheck.py:635: UserWarning: Input #0 requires gradient and is not a double precision floating point or complex. This check will likely fail if all the inputs are not of double precision floating point or complex. 
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/warnings.html
========================================================================================================== 26 passed, 8 warnings in 4.38s ==========================================================================================================

The option reuse_logits_for_grads needs to default to False to avoid surprises for the user.

Copy link
Contributor

@astaff astaff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find! See my comment about fused_log_softmax vs. reuse_logits_for_grads set to False.

@vincentqb vincentqb force-pushed the rnntautograd branch 2 times, most recently from 80534ac to 619fd57 Compare May 27, 2021 19:39
@vincentqb
Copy link
Contributor Author

vincentqb commented May 31, 2021

Failing gradcheck means that

out = rnnt_loss(inputs)
grad1, = autograd.grad(out, inputs, retain_graph=True)
grad2, = autograd.grad(out, inputs)
self.assertEqual(grad1, grad2, atol=atol, rtol=rtol) 

will fail.

@vincentqb vincentqb marked this pull request as ready for review May 31, 2021 16:38
(get_numpy_data_B2_T4_U3_D3, ),
(get_numpy_data_B1_T2_U3_D5, ),
])
def test_RNNTLoss_gradcheck(self, data_func):
Copy link
Contributor

@carolineechen carolineechen Jun 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great find, and thanks for looking into this! can you add autograd tests for the functional version as well?

Copy link
Contributor Author

@vincentqb vincentqb Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, added :)

@vincentqb vincentqb changed the title Autograd Test RNN Transducer Loss Autograd Test Jun 3, 2021
@vincentqb vincentqb requested a review from carolineechen June 3, 2021 20:56
Copy link
Contributor

@carolineechen carolineechen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor comment but otherwise lgtm, thanks for working on this!

if enable_all_grad:
i.requires_grad = True
inputs_.append(i)
assert gradcheck(loss, inputs, eps=1e-03, atol=1e-03, rtol=1e-03, nondet_tol=0.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if atol can be reduced any further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be documented in the test code. This looks much lower precision than other autograd tests we have in torchaudio.

Copy link
Contributor Author

@vincentqb vincentqb Jun 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if atol can be reduced any further?

unfortunately, that's the lowest for atol and eps as expected for float32. rtol can be reduced to 0, but this seems too good, so let's use the default value (0.001).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be documented in the test code. This looks much lower precision than other autograd tests we have in torchaudio.

As commented here this is due to the use of float32. added a comment above the line.

@vincentqb vincentqb merged commit d4d0907 into pytorch:master Jun 4, 2021
@vincentqb vincentqb mentioned this pull request Jun 10, 2021
22 tasks
mthrok pushed a commit to mthrok/audio that referenced this pull request Dec 13, 2022
mthrok added a commit to mthrok/audio that referenced this pull request Dec 13, 2022
* Update build.sh

Picks up 1.9 build from test.

* Update build.sh

* Update lite interpreter tutorial to beta (pytorch#1549)

* Update lite interpreter tutorial to beta

* Update lite interpreter to beta

* update model export script

* address comment and update documentation

* add custome build in first paragraph

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* Update prototype_source/lite_interpreter.rst

Co-authored-by: Raziel <[email protected]>

* replace file name

* update ios part

Co-authored-by: Raziel <[email protected]>

* Revert "Update lite interpreter tutorial to beta (pytorch#1549)" (pytorch#1569)

This reverts commit a702ca0fafe9d4a1ee0c1e4331de66245ceb3103.

* Update build.sh

* Update build.sh

* updated pipeline tutorial (pytorch#1562)

* reduce (pytorch#1546)

* Update seq2seq_translation_tutorial.py (pytorch#1532)

Co-authored-by: Holly Sweeney <[email protected]>

* added CPU optimization guide part into tuning_guide (pytorch#1512)

* added CPU optimization guide part into tuning_guide

* changed non-python command to python comments in CPU specific optimization section

* Update tuning_guide.py

Changed comment of bash commands to double quote.

* Update tuning_guide.py

Co-authored-by: Brian Johnson <[email protected]>

* Typo fix (pytorch#1538)

Co-authored-by: Holly Sweeney <[email protected]>

* Typo fix in text sentiment tutorial (pytorch#1543)

Trivial typo fix in docs

* Update dcgan_faces_tutorial.py (pytorch#1550)

Co-authored-by: Holly Sweeney <[email protected]>

* updated pipeline tutorial

Co-authored-by: define_liuyi <[email protected]>
Co-authored-by: dhayeah <[email protected]>
Co-authored-by: Holly Sweeney <[email protected]>
Co-authored-by: Jing Xu <[email protected]>
Co-authored-by: Brian Johnson <[email protected]>
Co-authored-by: Andrew C. Freeman <[email protected]>
Co-authored-by: Davide Fiocco <[email protected]>
Co-authored-by: universuen <[email protected]>

* Update audio manipulation tutorial  (pytorch#1566)

* add resampling tutorial

* update benchmarking and sectioning

* remove np import

* Update torchaudio tutorial

* update resample dtype initialization

Co-authored-by: moto <[email protected]>

* updated text sentiment tutorial (pytorch#1563)

* updated transformer tutorial (pytorch#1565)

* Update numeric_suite_tutorial.py

s/Logger=/logger_cls=/

* Update profiler recipe doc (1.9) (pytorch#1528)

Summary:
Update the profiler recipe to use the new API and features

Test Plan:
make html-noplot

Co-authored-by: Brian Johnson <[email protected]>

* Update build.sh

Co-authored-by: cccclai <[email protected]>
Co-authored-by: Raziel <[email protected]>
Co-authored-by: parmeet <[email protected]>
Co-authored-by: define_liuyi <[email protected]>
Co-authored-by: dhayeah <[email protected]>
Co-authored-by: Holly Sweeney <[email protected]>
Co-authored-by: Jing Xu <[email protected]>
Co-authored-by: Andrew C. Freeman <[email protected]>
Co-authored-by: Davide Fiocco <[email protected]>
Co-authored-by: universuen <[email protected]>
Co-authored-by: Caroline Chen <[email protected]>
Co-authored-by: moto <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: ilia-cher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants