-
Notifications
You must be signed in to change notification settings - Fork 710
Description
tl;dr: how to migrate to new backend/interface in 0.7
-
If you are using
torchaudio
in Linux/macOS environments, please usetorchaudio.set_audio_backend("sox_io")
to adopt to the upcoming changes. -
If you are in Windows environment, please set
torchaudio.USE_SOUNDFILE_LEGACY_INTERFACE = False
and reload backend to use the new interface. -
Note that this ships with some bug-fixes for formats other than 16bit signed integer WAV, so you might experience some BC-breaking changes as described in the section below.
News
[UPDATE] 2021/03/06
- All the migration works have been completed on master branch.
[UPDATE] 2021/02/12
- Added
bits_per_sample
andencoding
argument (replaceddtype
) tosave
function.
[UPDATE] 2021/01/29
- Added
encoding
toAudioMetaData
[UPDATE] 2021/01/22
- Added
format
argument toload
/info
/save
function. bits_per_sample
toAudioMetaData
[UPDATE] 2020/10/21
- Added Description of
"soundfile"
backend legacy interface.
[UPDATE] 2020/09/18
- Added migration guide for
"soundfile"
backend. - Moved the phase when
"soundfile"
backend signatures change from 0.9.0 to 0.8.0 so that they match with"sox_io"
backend, which becomes default in 0.8.0.
[UPDATE] 2020/09/17
- Added information on deprecation of native
libsox
structures such assignalinfo_t
andencoding_t
.
Improving I/O for correct and consistent experience
This is an announcement for users that we are making backward-incompatible changes to I/O functions of torchaudio
backends from 0.7.0 release throughout 0.9.0 release.
What is affected?
-
Public APIs
torchaudio.load
- [Linux/macOS] By switching the default backend from
"sox"
backend to"sox_io"
backend in 0.8.0, loading audio formats other than 16bit signed integer WAV returns the correct tensor. - [Linux/macOS/Windows] The signature of
"soundfile"
backend will be change in 0.8.0 to match that of"sox_io"
backend.
- [Linux/macOS] By switching the default backend from
torchaudio.save
- [Linux/macOS] By switching to
"sox_io"
backend, saving audio files will no longer degrade the data. The supported format will be restricted to the tested formats only. (please refer to the doc for the supported formats.) - [Linux/macOS/Windows] The signature of
"soundfile"
backend will be change in 0.8.0 to match that of"sox_io"
backend.
- [Linux/macOS] By switching to
torchaudio.info
- [Linux/macOS/Windows] The signature of
"soundfile"
backend will be change in 0.8.0 to match that of"sox_io"
backend.
- [Linux/macOS/Windows] The signature of
torchaudio.load_wav
- will be removed in 0.9.0. (
load
function withnormalize=False
will provide the same functionality)
- will be removed in 0.9.0. (
-
Internal APIs
The following functions/classes of"sox"
backend were accidentally exposed and will be removed in 0.9.0. There is no replacement for them. Please usesave
/load
/info
functions.torchaudio.save_encinfo
- will be removed in 0.9.0
torchaudio.get_sox_signalinfo_t
- will be removed in 0.9.0
torchaudio.get_sox_encodinginfo_t
- will be removed in 0.9.0
torchaudio.get_sox_option_t
- will be removed in 0.9.0
torchaudio.get_sox_bool
- will be removed in 0.9.0
The signatures of the other backends are not planned to be changed within this overhaul plan.
- Classes
torchaudio.SignalInfo
andtorchaudio.EncodingInfo
- will be replaced with
AudioMetaData
in 0.8.0 for"soundfile"
backend - will be removed in 0.9.0
- will be replaced with
Why
There are currently three backends in torchaudio
. (Please refer to the documentation for the detail.)
"sox"
backend is the original backend, which binds libsox
with pybind11
. The functionalities (load
/ save
/ info
) of this backend are not well-tested and have number of issues. (See #726).
Fixing these issues in backward-compatible manner is not straightforward. Therefore while we were adding TorchScript-compatible I/O functions, we decided to deprecate this original "sox"
backend and replace it with the new backend ("sox_io"
backend), which is confirmed not to have those issues.
When we are switching the default backend for Linux/macOS from "sox"
to "sox_io"
backend, we would like to align the interface of "soundfile"
backend, therefore, we introduced the new interface (not a new backend to reduce the number of public API) to "soundfile"
backend.
When / What Changes
The following is the timeline for the planned changes;
Phase | Expected Release | Expected Changes |
---|---|---|
1 | 0.7.0 (Oct 2020) |
|
2 | 0.8.0 (March 2021) |
|
3 | 0.9.0 |
|
Planned signature changes of "soundfile"
backend in 0.8.0
The following is the planned signature change of "soundfile"
backend functions in 0.8.0 release.
info
function
AudioMetaData
implementation can be found here. The placement of the AudioMetaData
might be changed.
~0.7.0 | 0.8.0 |
def info(
filepath: str,
) ->
Tuple[SignalInfo, EncodingInfo] |
def info(
filepath: str,
format: Optional[str],
) ->
AudioMetaData |
Migration
The values returned from info
function will be changed. Please use the corresponding new attributes.
~0.7.0 | 0.8.0 |
si, ei = torchaudio.info(filepath)
sample_rate = si.rate
num_frames = si.length
num_channels = si.channels
precision = si.precision
bits_per_sample = ei.bits_per_sample
encoding = ei.encoding |
metadata = torchaudio.info(filepath)
sample_rate = metadata.sample_rate
num_frames = metadata.num_frames
num_channels = metadata.num_channels
bits_per_sample = metadata.bits_per_sample
encoding = metadata.encoding |
Note If the attribute you are using is missing, file a Feature Request issue.
load
function
~0.7.0 | 0.8.0 |
def load(
filepath: str,
# out: Optional[Tensor] = None,
# To be removed.
# Currently not used
# Raise AssertionError if given
normalization: Optional[bool] = True,
# To be renamed to normalize.
# Currently only accept True
# Raise AssertionError if given
channels_first: Optional[bool] = True,
num_frames: int = 0,
offset: int = 0,
# To be renamed to frame_offset
# signalinfo: SignalInfo = None,
# To be removed
# Currently not used
# Raise AssertionError if given
# encodinginfo: EncodingInfo = None,
# To be removed
# Currently not used
# Raise AssertionError if given
filetype: Optional[str] = None
# To be removed
# Currently not used
) -> Tuple[Tensor, int] |
def load(
filepath: str,
frame_offset: int = 0,
num_frames: int = -1,
normalize: bool = True,
channels_first: bool = True,
format: Optional[str] = None, # only required for file-like object input
) -> Tuple[Tensor, int] |
Migration
Please change the argument names;
normalization
->normalize
offset
->frame_offst
~0.7.0 | 0.8.0 |
waveform, sample_rate = torchaudio.load(
filepath,
normalization=normalization,
channels_first=channels_first,
num_frames=num_frames,
offset=offset,
) |
waveform, sample_rate = torchaudio.load(
filepath,
frame_offset=frame_offset,
num_frames=num_frames,
normalize= normalization,
channels_first=channels_first,
) |
save
function
~0.7.0 | 0.8.0 |
def save(
filepath: str,
src: Tensor,
sample_rate: int,
precision: int = 16,
# moved to `bits_per_sample` argument
channels_first: bool = True
) |
def save(
filepath: str,
src: Tensor,
sample_rate: int,
channels_first: bool = True,
compression: Optional[float] = None,
# Added only for compatibility.
# soundfile does not support compression option
# Raises Warning if not None
format: Optional[str] = None,
encoding: Optoinal[str] = None,
bits_per_sample: Optional[int] = None,
) |
Migration
~0.7.0 | 0.8.0 |
torchaudio.save(
filepath,
waveform,
sample_rate,
channels_first
) |
torchaudio.save(
filepath,
waveform,
sample_rate,
channels_first,
bits_per_sample=16,
)
# You can also designate audio format with `format` and configure the encoding with `compression` and `encoding`. See https://pytorch.org/audio/master/backend.html#save for the detail |
BC-breaking changes
Read and write operations on the formats other than WAV 16-bit signed integer were affected by small bugs.