Skip to content

RFC: The future of Kaldi compliance module #1269

@mthrok

Description

@mthrok

Request For Comment: The future of Kaldi-compatible features

Problems

torchaudio.compliance.kaldi implements functionalities that tries to reproduce Kaldi's feature extractions, and this module has many issues, and causing headache for maintainers.

  1. Inconsistent design
    While the rest of the torchaudio library is standardized to work with floating-point Tensor with value range [-1.0, 1.0], the ported Kaldi implementations are not necessarily following this (example)
  2. Does not support batch dimension well. support batching for kaldi compliant feature extraction functions #675 torchaudio.compliance.kaldi.fbank #1245
  3. compliance is not the right name Compliance should be named compatibility (or similar) #281
  4. While it's called compliance, it does not match with the result of Kaldi's CLI
    I think the natural expectation that users get from the compliance.kaldi is that you can get a result that matches Kaldi, (and possibly in an easy manner)
    The spectrogram computed by "torchaudio.compliance.kaldi.spectrogram" and "compute-spectrogram-feats" are different #332, Problems with Kaldi MFCCs #328, Fbank features are different from Kaldi Fbank #400,
  5. It's slow more efficient resample module #908 Torchaudio resampling could be faster and simpler #1057
    The code translated verbatim from the original Kaldi's C++ code typically is slower than the original implementation. (100x is not uncommon) To overcome this the code has to be changed to adopt PyTorch-style operation. (Kaldi does par element access very efficiently, which incurs a lot of overhead for PyTorch framework) This incurs huge maintenance overhead, 1 for implementing our custom code and 2 for catching up with upstream Kaldi.

Possible solutions

Before going into the detail of possible solutions, I note that removing Kaldi compatibility features from torchaudio was also mentioned as a possibility. It was not included in the original plan of torchaudio. I am not familiar on this matter and I am not advocating this but keeping it as a possibility.

The following describes some partial solutions and considerations I have put so far.

Module location.

For problems like 1 (inconsistent design) and 3 (naming), we can add a new interface to torchaudio.functional.

We have some Kaldi-compatible functions in torchaudio.functional module, such as sliding_window_cmn and compute_kaldi_pitch. They are placed under torchaudio.functional because they are confirmed to work on floating-point Tensors and their behaviors are consistent with other feature implementations.

We can do the similar thing so that we can provide the same set of Kaldi features abut that works on float-point Tensor like the other feature extractions too. We can start by adding interface to torchaudio.functional that does value normalization and call the corresponding function in compliance.kaldi module. Then deprecate the compliance.kaldi module and eventually remove it.

Implementation

For problems like 4 (numerical parity) and 5(speed). We have other approaches to make Kaldi features available and compatible with PyTorch.

  1. Keep porting Kaldi in Python
    The current state.
  2. Build and bind libkaldi-feat
    This will resolve the most of the headaches. The usual maintenance cost will be drastically reduced, though we need to figure out a robust way to build Kaldi with MKL that PyTorch is using. Users will get what they would naturally expect. The same result as Kaldi.
  3. Re-building Kaldi's vector/matrix classes with PyTorch's Tensor class.
    This is somewhat like a hybrid approach of 1 and 2. Detail can be found in https://mthrok.github.io/tkaldi/. This is how I added Pitch feature in Add Kaldi Pitch feature #1243

The following table summarizes the pros and cons.

For problems 2 (batch support), one simple approach that can be added is to use at::parallel_for to parallelize the batch computation. This can be applied if the core implementation is in C++.

Bind libkaldi-feat Reimpl libkaldi-matrix Reimplement in Python
Numerical Compatibility ✅ Baseline ✅ Easy 🚫 Extremely Difficult
(None of the existing features meet this criteria)
Execution Speed ✅ Baseline 🚫 Slow (60x~) 🚫 Slow on CPU
Even slower on GPU
Dev Scalability
(Easy to add another feature?)
✅ Easy
(Add new wrapper function)
🍰 Small Effort
(Extend Matrix interface as needed)
🚫 Very Time Consuming
(Understand, translate the code, verify the result)
Upstream Adaptation
(Easy to follow the changes Kaldi makes?)
✅ Easy
(Change the upstream commit, and wrapper function)
✅ Easy
(Pull the upstream code, Change the wrapper function)
🚫 Practically impossible
(I do not know where the existing code comes from)
Maint Cost Initial build setup cost 🤨 Moderate Effort
(Custom Build + MKL setup)
🍰 Small Effort
(Custom Build)
✅ None
Long term maintenance 🍰 Small Effort
(Mostly about wrapper func)
🍰 Small effort
(Mostly about wrapper func)
🚫 High
(All the related codes, 1K LoC)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions