Skip to content

[Announcement] Improving I/O for correct and consistent experience #903

@mthrok

Description

@mthrok

tl;dr: how to migrate to new backend/interface in 0.7

  • If you are using torchaudio in Linux/macOS environments, please use torchaudio.set_audio_backend("sox_io") to adopt to the upcoming changes.

  • If you are in Windows environment, please set torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False and reload backend to use the new interface.

  • Note that this ships with some bug-fixes for formats other than 16bit signed integer WAV, so you might experience some BC-breaking changes as described in the section below.

News
[UPDATE] 2021/03/06

  • All the migration works have been completed on master branch.

[UPDATE] 2021/02/12

  • Added bits_per_sample and encoding argument (replaced dtype) to save function.

[UPDATE] 2021/01/29

  • Added encoding to AudioMetaData

[UPDATE] 2021/01/22

  • Added format argument to load/info/save function.
  • bits_per_sample to AudioMetaData

[UPDATE] 2020/10/21

  • Added Description of "soundfile" backend legacy interface.

[UPDATE] 2020/09/18

  • Added migration guide for "soundfile" backend.
  • Moved the phase when "soundfile" backend signatures change from 0.9.0 to 0.8.0 so that they match with "sox_io" backend, which becomes default in 0.8.0.

[UPDATE] 2020/09/17

  • Added information on deprecation of native libsox structures such as signalinfo_t and encoding_t.

Improving I/O for correct and consistent experience

This is an announcement for users that we are making backward-incompatible changes to I/O functions of torchaudio backends from 0.7.0 release throughout 0.9.0 release.

What is affected?

  • Public APIs

    • torchaudio.load
      • [Linux/macOS] By switching the default backend from "sox" backend to "sox_io" backend in 0.8.0, loading audio formats other than 16bit signed integer WAV returns the correct tensor.
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.save
      • [Linux/macOS] By switching to "sox_io" backend, saving audio files will no longer degrade the data. The supported format will be restricted to the tested formats only. (please refer to the doc for the supported formats.)
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.info
      • [Linux/macOS/Windows] The signature of "soundfile" backend will be change in 0.8.0 to match that of "sox_io" backend.
    • torchaudio.load_wav
      • will be removed in 0.9.0. (load function with normalize=False will provide the same functionality)
  • Internal APIs
    The following functions/classes of "sox" backend were accidentally exposed and will be removed in 0.9.0. There is no replacement for them. Please use save/load/info functions.

    • torchaudio.save_encinfo
      • will be removed in 0.9.0
    • torchaudio.get_sox_signalinfo_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_encodinginfo_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_option_t
      • will be removed in 0.9.0
    • torchaudio.get_sox_bool
      • will be removed in 0.9.0

The signatures of the other backends are not planned to be changed within this overhaul plan.

  • Classes
    • torchaudio.SignalInfo and torchaudio.EncodingInfo
      • will be replaced with AudioMetaData in 0.8.0 for "soundfile" backend
      • will be removed in 0.9.0

Why

There are currently three backends in torchaudio. (Please refer to the documentation for the detail.)

"sox" backend is the original backend, which binds libsox with pybind11. The functionalities (load / save / info) of this backend are not well-tested and have number of issues. (See #726).

Fixing these issues in backward-compatible manner is not straightforward. Therefore while we were adding TorchScript-compatible I/O functions, we decided to deprecate this original "sox" backend and replace it with the new backend ("sox_io" backend), which is confirmed not to have those issues.

When we are switching the default backend for Linux/macOS from "sox" to "sox_io" backend, we would like to align the interface of "soundfile" backend, therefore, we introduced the new interface (not a new backend to reduce the number of public API) to "soundfile" backend.

When / What Changes

The following is the timeline for the planned changes;

Phase Expected Release Expected Changes
1 0.7.0
(Oct 2020)
2 0.8.0
(March 2021)
3 0.9.0

Planned signature changes of "soundfile" backend in 0.8.0

The following is the planned signature change of "soundfile" backend functions in 0.8.0 release.

info function

AudioMetaData implementation can be found here. The placement of the AudioMetaData might be changed.

~0.7.0 0.8.0
def info(
  filepath: str,
) ->
  Tuple[SignalInfo, EncodingInfo]
def info(
  filepath: str,
  format: Optional[str],
) ->
  AudioMetaData

Migration

The values returned from info function will be changed. Please use the corresponding new attributes.

~0.7.0 0.8.0
si, ei = torchaudio.info(filepath)
sample_rate = si.rate
num_frames = si.length
num_channels = si.channels
precision = si.precision
bits_per_sample = ei.bits_per_sample
encoding = ei.encoding
metadata = torchaudio.info(filepath)
sample_rate = metadata.sample_rate
num_frames = metadata.num_frames
num_channels = metadata.num_channels
bits_per_sample = metadata.bits_per_sample
encoding = metadata.encoding

Note If the attribute you are using is missing, file a Feature Request issue.

load function

~0.7.0 0.8.0
def load(
  filepath: str,
  # out: Optional[Tensor] = None,
      # To be removed.
      # Currently not used
      # Raise AssertionError if given
  normalization: Optional[bool] = True,
      # To be renamed to normalize.
      # Currently only accept True
      # Raise AssertionError if given
  channels_first: Optional[bool] = True,
  num_frames: int = 0,
  offset: int = 0,
      # To be renamed to frame_offset
  # signalinfo: SignalInfo = None,
      # To be removed
      # Currently not used
      # Raise AssertionError if given
  # encodinginfo: EncodingInfo = None,
      # To be removed
      # Currently not used
      # Raise AssertionError if given
  filetype: Optional[str] = None
      # To be removed
      # Currently not used
) -> Tuple[Tensor, int]
def load(
  filepath: str,
  frame_offset: int = 0,
  num_frames: int = -1,
  normalize: bool = True,
  channels_first: bool = True,
  format: Optional[str] = None,  # only required for file-like object input
) -> Tuple[Tensor, int]
Migration

Please change the argument names;

  • normalization -> normalize
  • offset -> frame_offst
~0.7.0 0.8.0
waveform, sample_rate = torchaudio.load(
    filepath,
    normalization=normalization,
    channels_first=channels_first,
    num_frames=num_frames,
    offset=offset,
)
waveform, sample_rate = torchaudio.load(
    filepath,
    frame_offset=frame_offset,
    num_frames=num_frames,
    normalize= normalization,
    channels_first=channels_first,
)

save function

~0.7.0 0.8.0
def save(
  filepath: str,
  src: Tensor,
  sample_rate: int,
  precision: int = 16,
    # moved to `bits_per_sample` argument
  channels_first: bool = True
)
def save(
  filepath: str,
  src: Tensor,
  sample_rate: int,
  channels_first: bool = True,
  compression: Optional[float] = None,
    # Added only for compatibility.
    # soundfile does not support compression option
    # Raises Warning if not None
  format: Optional[str] = None,
  encoding: Optoinal[str] = None,
  bits_per_sample: Optional[int] = None,
)
Migration
~0.7.0 0.8.0
torchaudio.save(
    filepath,
    waveform,
    sample_rate,
    channels_first
)
torchaudio.save(
    filepath,
    waveform,
    sample_rate,
    channels_first,
    bits_per_sample=16,
)
# You can also designate audio format with `format` and configure the encoding with `compression` and `encoding`. See https://pytorch.org/audio/master/backend.html#save for the detail 

BC-breaking changes

Read and write operations on the formats other than WAV 16-bit signed integer were affected by small bugs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions