RFC: The future of Kaldi compliance module

# Request For Comment: The future of Kaldi-compatible features

## Problems
`torchaudio.compliance.kaldi` implements functionalities that tries to reproduce Kaldi's feature extractions, and this module has many issues, and causing headache for maintainers.
1. Inconsistent design
While the rest of the `torchaudio` library is standardized to work with floating-point Tensor with value range `[-1.0, 1.0]`, the ported Kaldi implementations are not necessarily following this ([example](https://github.com/pytorch/audio/issues/371#issuecomment-625613872))
2. Does not support batch dimension well. #675 #1245
3. `compliance` is not the right name https://github.com/pytorch/audio/issues/281
4. While it's called `compliance`, it does not match with the result of Kaldi's CLI
I think the natural expectation that users get from the `compliance.kaldi` is that you can get a result that matches Kaldi, (and possibly in an easy manner)
#332, #328, #400, 
5. It's slow #908 #1057
The code translated verbatim from the original Kaldi's C++ code typically is slower than the original implementation. (100x is not uncommon) To overcome this the code has to be changed to adopt PyTorch-style operation. (Kaldi does par element access very efficiently, which incurs a lot of overhead for PyTorch framework) This incurs huge maintenance overhead, 1 for implementing our custom code and 2 for catching up with upstream Kaldi.

## Possible solutions

Before going into the detail of possible solutions, I note that removing Kaldi compatibility features from `torchaudio` was also mentioned as a possibility. It was not included in the original plan of `torchaudio`. I am not familiar on this matter and I am not advocating this but keeping it as a possibility.

The following describes some partial solutions and considerations I have put so far.

### Module location.

For problems like 1 (inconsistent design) and 3 (naming), we can add a new interface to `torchaudio.functional`.

We have some Kaldi-compatible functions in `torchaudio.functional` module, such as [sliding_window_cmn]( https://github.com/pytorch/audio/blob/d58ac213db1e361342b9b704b125b83214a7dcbb/torchaudio/functional/functional.py#L877-L883) and [compute_kaldi_pitch](https://github.com/pytorch/audio/blob/d58ac213db1e361342b9b704b125b83214a7dcbb/torchaudio/functional/functional.py#L1049-L1069). They are placed under `torchaudio.functional` because they are confirmed to work on floating-point Tensors and their behaviors are consistent with other feature implementations.

We can do the similar thing so that we can provide the same set of Kaldi features abut that works on float-point Tensor like the other feature extractions too. We can start by adding interface to `torchaudio.functional` that does value normalization and call the corresponding function in `compliance.kaldi` module. Then deprecate the `compliance.kaldi` module and eventually remove it.

### Implementation
For problems like 4 (numerical parity) and 5(speed). We have other approaches to make Kaldi features available and compatible with PyTorch.

1. Keep porting Kaldi in Python
The current state.
1. Build and bind `libkaldi-feat`
This will resolve the most of the headaches. The usual maintenance cost will be drastically reduced, though we need to figure out a robust way to build Kaldi with MKL that PyTorch is using. Users will get what they would naturally expect. The same result as Kaldi.
1. Re-building Kaldi's vector/matrix classes with PyTorch's Tensor class.
This is somewhat like a hybrid approach of 1 and 2. Detail can be found in https://mthrok.github.io/tkaldi/. This is how I added Pitch feature in #1243

The following table summarizes the pros and cons.

For problems 2 (batch support), one simple approach that can be added is to use `at::parallel_for` to parallelize the batch computation. This can be applied if the core implementation is in C++.

<table border=0 cellpadding=0 cellspacing=0 width=884 style='border-collapse:
 collapse;table-layout:fixed;width:662pt'>
 <col width=87 style='width:65pt'>
 <col width=196 style='mso-width-source:userset;mso-width-alt:6272;width:147pt'>
 <col width=187 style='mso-width-source:userset;mso-width-alt:5973;width:140pt'>
 <col width=191 style='mso-width-source:userset;mso-width-alt:6101;width:143pt'>
 <col width=223 style='mso-width-source:userset;mso-width-alt:7125;width:167pt'>
 <col width=141 style='mso-width-source:userset;mso-width-alt:4522;width:106pt'>
 <tr height=41 style='mso-height-source:userset;height:31.0pt'>
 <td colspan=2 height=41 class=xl65 width=283 style='height:31.0pt;width:212pt'></td>
 <td class=xl65 width=187 style='width:140pt;opacity:1'>Bind libkaldi-feat</td>
 <td class=xl65 width=191 style='width:143pt;opacity:1'>Reimpl libkaldi-matrix</td>
 <td class=xl65 width=223 style='width:167pt;opacity:1'>Reimplement in Python</td>
 </tr>
 <tr height=76 style='mso-height-source:userset;height:57.0pt'>
 <td colspan=2 height=76 class=xl65 width=283 style='height:57.0pt;width:212pt;
 opacity:1'>Numerical Compatibility</td>
 <td class=xl65 width=187 style='width:140pt;opacity:1'>✅ Baseline</td>
 <td class=xl65 width=191 style='width:143pt;opacity:1'>✅ Easy</td>
 <td class=xl65 width=223 style='width:167pt;opacity:1'>&#128683; Extremely Difficult 
 (None of the existing features meet this criteria)</td>
 </tr>
 <tr height=76 style='mso-height-source:userset;height:57.0pt'>
 <td colspan=2 height=76 class=xl65 width=283 style='height:57.0pt;width:212pt;
 opacity:1'>Execution Speed</td>
 <td class=xl65 width=187 style='width:140pt;opacity:1'>✅ Baseline</td>
 <td class=xl65 width=191 style='width:143pt;opacity:1'>&#128683; Slow (60x~)</td>
 <td class=xl65 width=223 style='width:167pt;opacity:1'>&#128683; Slow on CPU 
 Even slower on GPU</td>
 </tr>
 <tr height=76 style='mso-height-source:userset;height:57.0pt'>
 <td colspan=2 height=76 class=xl65 width=283 style='height:57.0pt;width:212pt'>Dev
 Scalability 
 (Easy to add another feature?) 
 </td>
 <td class=xl65 width=187 style='width:140pt;opacity:1'>✅ Easy 
 (Add new wrapper function)</td>
 <td class=xl65 width=191 style='width:143pt;opacity:1'>&#127856; Small Effort 
 (Extend Matrix interface as needed)</td>
 <td class=xl65 width=223 style='width:167pt;opacity:1'>&#128683; Very Time Consuming 
 (Understand, translate the code, verify the result)</td>
 </tr>
 <tr height=76 style='mso-height-source:userset;height:57.0pt'>
 <td colspan=2 height=76 class=xl65 width=283 style='height:57.0pt;width:212pt'>Upstream
 Adaptation 
 (Easy to follow the changes Kaldi makes?)</td>
 <td class=xl65 width=187 style='width:140pt;opacity:1'>✅ Easy 
 (Change the upstream commit, and wrapper function)</td>
 <td class=xl65 width=191 style='width:143pt;opacity:1'>✅ Easy 
 (Pull the upstream code, Change the wrapper function)</td>
 <td class=xl65 width=223 style='width:167pt;opacity:1'>&#128683; Practically impossible 
 (I do not know where the existing code comes from)</td>
 </tr>
 <tr height=76 style='mso-height-source:userset;height:57.0pt'>
 <td rowspan=2 height=152 class=xl65 width=87 style='height:114.0pt;
 width:65pt'>
 <meta charset=utf-8>
 <meta charset=utf-8>
 Maint Cost</td>
 <td class=xl65 width=196 style='width:147pt'>Initial build setup cost</td>
 <td class=xl65 width=187 style='width:140pt'>
 <meta charset=utf-8>
 <meta charset=utf-8>
 &#129320; Moderate Effort 
 (Custom Build + MKL setup)</td>
 <td class=xl65 width=191 style='width:143pt'>
 <meta charset=utf-8>
 <meta charset=utf-8>
 &#127856; Small Effort 
 (Custom Build)</td>
 <td class=xl65 width=223 style='width:167pt'>✅ None</td>
 </tr>
 <tr height=76 style='mso-height-source:userset;height:57.0pt'>
 <td height=76 class=xl65 width=196 style='height:57.0pt;width:147pt;
 opacity:1'>Long
 term maintenance</td>
 <td class=xl65 width=187 style='width:140pt;opacity:1'>&#127856;
 Small Effort 
 (Mostly about wrapper func)</td>
 <td class=xl65 width=191 style='width:143pt;opacity:1'>&#127856; Small effort 
 (Mostly about wrapper func)</td>
 <td class=xl65 width=223 style='width:167pt'>
 <meta charset=utf-8>
 <meta charset=utf-8>
 &#128683; High 
 (All the related codes, 1K LoC)</td>
 </tr>
 <![if supportMisalignedColumns]>
 <tr height=0 style='display:none'>
 <td width=87 style='width:65pt'></td>
 <td width=196 style='width:147pt'></td>
 <td width=187 style='width:140pt'></td>
 <td width=191 style='width:143pt'></td>
 <td width=223 style='width:167pt'></td>
 </tr>
 <![endif]>
</table>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: The future of Kaldi compliance module #1269

Request For Comment: The future of Kaldi-compatible features

Problems

Possible solutions

Module location.

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

		Bind libkaldi-feat	Reimpl libkaldi-matrix	Reimplement in Python
Numerical Compatibility		✅ Baseline	✅ Easy	🚫 Extremely Difficult (None of the existing features meet this criteria)
Execution Speed		✅ Baseline	🚫 Slow (60x~)	🚫 Slow on CPU Even slower on GPU
Dev Scalability (Easy to add another feature?)		✅ Easy (Add new wrapper function)	🍰 Small Effort (Extend Matrix interface as needed)	🚫 Very Time Consuming (Understand, translate the code, verify the result)
Upstream Adaptation (Easy to follow the changes Kaldi makes?)		✅ Easy (Change the upstream commit, and wrapper function)	✅ Easy (Pull the upstream code, Change the wrapper function)	🚫 Practically impossible (I do not know where the existing code comes from)
Maint Cost	Initial build setup cost	🤨 Moderate Effort (Custom Build + MKL setup)	🍰 Small Effort (Custom Build)	✅ None
Maint Cost	Long term maintenance	🍰 Small Effort (Mostly about wrapper func)	🍰 Small effort (Mostly about wrapper func)	🚫 High (All the related codes, 1K LoC)

RFC: The future of Kaldi compliance module #1269

Description

Request For Comment: The future of Kaldi-compatible features

Problems

Possible solutions

Module location.

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions