Skip to content

Update on TorchAudio’s future #3902

Open
@scotts

Description

@scotts

Dear TorchAudio users,

TorchAudio is the most popular audio library for PyTorch. It has critical transforms, models and datasets that we know the community relies on. That is why we wanted to let the community know that we have started a refactoring effort to transition TorchAudio into a maintenance phase. This process will involve removal of some user-facing features. We have three goals we want to achieve with this effort:

  1. Make TorchAudio easier to maintain to ensure long-term reliability. We plan to eliminate all C++ code so that TorchAudio is a Python-only library. We also plan to reduce external dependencies as much as possible. Both efforts will simplify testing and release.
  2. Reduce redundancies with the rest of the PyTorch ecosystem. Some of the functionality in TorchAudio is also available in TorchVision and TorchCodec. We are working across all three libraries to ensure a given capability lives in one library.
  3. Focus on TorchAudio’s strengths. Those strengths are the audio transforms, models and datasets that are integral to users training and inference pipelines. As a result, we will deprecate and eventually remove some functionality that is outside of these strengths.

The diagram below depicts the various components of TorchAudio. We have highlighted it according to the user-facing API changes that we are making:

Image

Starting with TorchAudio 2.8 (expected around August 2025), APIs slated for removal will trigger a deprecation warning. These APIs will be fully removed in TorchAudio 2.9 (anticipated by the end of 2025).

Most of the APIs in transforms, functional, compliance.kaldi, models and pipelines modules will remain. These are the APIs that we identified as the most popular and valuable ones.

  • A few APIs, specifically those relying on C++ implementations like RNNT loss and forced-alignment, may be dropped. Some, like lfilter and overdrive, will switch to pure-Python implementations, which might affect performance. We are exploring options to retain C++-backed APIs, but this is unlikely.
  • Remaining APIs will be compatible with the latest stable PyTorch version. No new features will be added.

The decoding and encoding capabilities of TorchAudio for both audio and video data will migrate to TorchCodec, where we are consolidating all of PyTorch media decoding and encoding. TorchAudio’s decoding and encoding APIs will be deprecated from TorchAudio 2.8, and they will be removed in TorchAudio 2.9, so we encourage users to migrate to TorchCodec as soon as possible. TorchCodec already supports video and audio decoding, and encoding will be supported soon. While there isn't a direct 1:1 API mapping, the migration process should be smooth. Please report any issues in the TorchCodec repository.

All other modules and APIs will be removed in TorchAudio 2.9.

We understand that these changes may be disruptive. We believe that they are unfortunately necessary, in order for us to guarantee TorchAudio’s stability in the future.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions