-
Notifications
You must be signed in to change notification settings - Fork 711
Description
Request For Comment: The future of Kaldi-compatible features
Problems
torchaudio.compliance.kaldi
implements functionalities that tries to reproduce Kaldi's feature extractions, and this module has many issues, and causing headache for maintainers.
- Inconsistent design
While the rest of thetorchaudio
library is standardized to work with floating-point Tensor with value range[-1.0, 1.0]
, the ported Kaldi implementations are not necessarily following this (example) - Does not support batch dimension well. support batching for kaldi compliant feature extraction functions #675 torchaudio.compliance.kaldi.fbank #1245
compliance
is not the right name Compliance should be named compatibility (or similar) #281- While it's called
compliance
, it does not match with the result of Kaldi's CLI
I think the natural expectation that users get from thecompliance.kaldi
is that you can get a result that matches Kaldi, (and possibly in an easy manner)
The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332, Problems with Kaldi MFCCs #328, Fbank features are different from Kaldi Fbank #400, - It's slow more efficient resample module #908 Torchaudio resampling could be faster and simpler #1057
The code translated verbatim from the original Kaldi's C++ code typically is slower than the original implementation. (100x is not uncommon) To overcome this the code has to be changed to adopt PyTorch-style operation. (Kaldi does par element access very efficiently, which incurs a lot of overhead for PyTorch framework) This incurs huge maintenance overhead, 1 for implementing our custom code and 2 for catching up with upstream Kaldi.
Possible solutions
Before going into the detail of possible solutions, I note that removing Kaldi compatibility features from torchaudio
was also mentioned as a possibility. It was not included in the original plan of torchaudio
. I am not familiar on this matter and I am not advocating this but keeping it as a possibility.
The following describes some partial solutions and considerations I have put so far.
Module location.
For problems like 1 (inconsistent design) and 3 (naming), we can add a new interface to torchaudio.functional
.
We have some Kaldi-compatible functions in torchaudio.functional
module, such as sliding_window_cmn and compute_kaldi_pitch. They are placed under torchaudio.functional
because they are confirmed to work on floating-point Tensors and their behaviors are consistent with other feature implementations.
We can do the similar thing so that we can provide the same set of Kaldi features abut that works on float-point Tensor like the other feature extractions too. We can start by adding interface to torchaudio.functional
that does value normalization and call the corresponding function in compliance.kaldi
module. Then deprecate the compliance.kaldi
module and eventually remove it.
Implementation
For problems like 4 (numerical parity) and 5(speed). We have other approaches to make Kaldi features available and compatible with PyTorch.
- Keep porting Kaldi in Python
The current state. - Build and bind
libkaldi-feat
This will resolve the most of the headaches. The usual maintenance cost will be drastically reduced, though we need to figure out a robust way to build Kaldi with MKL that PyTorch is using. Users will get what they would naturally expect. The same result as Kaldi. - Re-building Kaldi's vector/matrix classes with PyTorch's Tensor class.
This is somewhat like a hybrid approach of 1 and 2. Detail can be found in https://mthrok.github.io/tkaldi/. This is how I added Pitch feature in Add Kaldi Pitch feature #1243
The following table summarizes the pros and cons.
For problems 2 (batch support), one simple approach that can be added is to use at::parallel_for
to parallelize the batch computation. This can be applied if the core implementation is in C++.
Bind libkaldi-feat | Reimpl libkaldi-matrix | Reimplement in Python | ||
Numerical Compatibility | ✅ Baseline | ✅ Easy | 🚫 Extremely Difficult (None of the existing features meet this criteria) |
|
Execution Speed | ✅ Baseline | 🚫 Slow (60x~) | 🚫 Slow on CPU Even slower on GPU |
|
Dev
Scalability (Easy to add another feature?) |
✅ Easy (Add new wrapper function) |
🍰 Small Effort (Extend Matrix interface as needed) |
🚫 Very Time Consuming (Understand, translate the code, verify the result) |
|
Upstream
Adaptation (Easy to follow the changes Kaldi makes?) |
✅ Easy (Change the upstream commit, and wrapper function) |
✅ Easy (Pull the upstream code, Change the wrapper function) |
🚫 Practically impossible (I do not know where the existing code comes from) |
|
Maint Cost | Initial build setup cost |
🤨 Moderate Effort (Custom Build + MKL setup) |
🍰 Small Effort (Custom Build) |
✅ None |
Long term maintenance | 🍰
Small Effort (Mostly about wrapper func) |
🍰 Small effort (Mostly about wrapper func) |
🚫 High (All the related codes, 1K LoC) |
|