@@ -10,29 +10,29 @@ the audio domain. By supporting PyTorch, torchaudio follows the same philosophy
10
10
of providing strong GPU acceleration, having a focus on trainable features through
11
11
the autograd system, and having consistent style (tensor names and dimension names).
12
12
Therefore, it is primarily a machine learning library and not a general signal
13
- processing library. The benefits of PyTorch is be seen in torchaudio through
13
+ processing library. The benefits of PyTorch can be seen in torchaudio through
14
14
having all the computations be through PyTorch operations which makes it easy
15
15
to use and feel like a natural extension.
16
16
17
- - [ Support audio I/O (Load files, Save files)] ( http://pytorch.org/audio/ )
17
+ - [ Support audio I/O (Load files, Save files)] ( http://pytorch.org/audio/stable/ )
18
18
- Load the following formats into a torch Tensor using SoX
19
19
- mp3, wav, aac, ogg, flac, avr, cdda, cvs/vms,
20
20
- aiff, au, amr, mp2, mp4, ac3, avi, wmv,
21
21
- mpeg, ircam and any other format supported by libsox.
22
- - [ Kaldi (ark/scp)] ( http://pytorch.org/audio/kaldi_io.html )
23
- - [ Dataloaders for common audio datasets (VCTK, YesNo)] ( http://pytorch.org/audio/datasets.html )
22
+ - [ Kaldi (ark/scp)] ( http://pytorch.org/audio/stable/ kaldi_io.html )
23
+ - [ Dataloaders for common audio datasets (VCTK, YesNo)] ( http://pytorch.org/audio/stable/ datasets.html )
24
24
- Common audio transforms
25
- - [ Spectrogram, AmplitudeToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, Resample] ( http://pytorch.org/audio/transforms.html )
25
+ - [ Spectrogram, AmplitudeToDB, MelScale, MelSpectrogram, MFCC, MuLawEncoding, MuLawDecoding, Resample] ( http://pytorch.org/audio/stable/ transforms.html )
26
26
- Compliance interfaces: Run code using PyTorch that align with other libraries
27
- - [ Kaldi: spectrogram, fbank, mfcc, resample_waveform] ( https://pytorch.org/audio/compliance.kaldi.html )
27
+ - [ Kaldi: spectrogram, fbank, mfcc, resample_waveform] ( https://pytorch.org/audio/stable/ compliance.kaldi.html )
28
28
29
29
Dependencies
30
30
------------
31
31
* PyTorch (See below for the compatible versions)
32
32
* libsox v14.3.2 or above (only required when building from source)
33
33
* [ optional] vesis84/kaldi-io-for-python commit cb46cb1f44318a5d04d4941cf39084c5b021241e or above
34
34
35
- The following is the corresponding `` torchaudio `` versions and supported Python versions.
35
+ The following are the corresponding `` torchaudio `` versions and supported Python versions.
36
36
37
37
| `` torch `` | `` torchaudio `` | `` python `` |
38
38
| ------------------------ | ------------------------ | ------------------------------- |
@@ -46,7 +46,7 @@ The following is the corresponding ``torchaudio`` versions and supported Python
46
46
Installation
47
47
------------
48
48
49
- ### Binary Distibutions
49
+ ### Binary Distributions
50
50
51
51
To install the latest version using anaconda, run:
52
52
@@ -127,7 +127,7 @@ BUILD_SOX=1 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py i
127
127
```
128
128
129
129
This is known to work on linux and unix distributions such as Ubuntu and CentOS 7 and macOS.
130
- If you try this on a new system and found a solution to make it work, feel free to share it by opening and issue.
130
+ If you try this on a new system and find a solution to make it work, feel free to share it by opening an issue.
131
131
132
132
#### Troubleshooting
133
133
@@ -195,16 +195,16 @@ Conventions
195
195
196
196
With torchaudio being a machine learning library and built on top of PyTorch,
197
197
torchaudio is standardized around the following naming conventions. Tensors are
198
- assumed to have channel as the first dimension and time as the last
198
+ assumed to have channels as the first dimension and time as the last
199
199
dimension (when applicable). This makes it consistent with PyTorch's dimensions.
200
200
For size names, the prefix ` n_ ` is used (e.g. "a tensor of size (` n_freq ` , ` n_mel ` )")
201
201
whereas dimension names do not have this prefix (e.g. "a tensor of
202
- dimension (channel , time)")
202
+ dimension (channels , time)")
203
203
204
- * ` waveform ` : a tensor of audio samples with dimensions (channel , time)
204
+ * ` waveform ` : a tensor of audio samples with dimensions (channels , time)
205
205
* ` sample_rate ` : the rate of audio dimensions (samples per second)
206
- * ` specgram ` : a tensor of spectrogram with dimensions (channel , freq, time)
207
- * ` mel_specgram ` : a mel spectrogram with dimensions (channel , mel, time)
206
+ * ` specgram ` : a tensor of spectrogram with dimensions (channels , freq, time)
207
+ * ` mel_specgram ` : a mel spectrogram with dimensions (channels , mel, time)
208
208
* ` hop_length ` : the number of samples between the starts of consecutive frames
209
209
* ` n_fft ` : the number of Fourier bins
210
210
* ` n_mel ` , ` n_mfcc ` : the number of mel and MFCC bins
@@ -216,16 +216,16 @@ dimension (channel, time)")
216
216
217
217
Transforms expect and return the following dimensions.
218
218
219
- * ` Spectrogram ` : (channel , time) -> (channel , freq, time)
220
- * ` AmplitudeToDB ` : (channel , freq, time) -> (channel , freq, time)
221
- * ` MelScale ` : (channel , freq, time) -> (channel , mel, time)
222
- * ` MelSpectrogram ` : (channel , time) -> (channel , mel, time)
223
- * ` MFCC ` : (channel , time) -> (channel, mfcc, time)
224
- * ` MuLawEncode ` : (channel , time) -> (channel , time)
225
- * ` MuLawDecode ` : (channel , time) -> (channel , time)
226
- * ` Resample ` : (channel , time) -> (channel , time)
227
- * ` Fade ` : (channel , time) -> (channel , time)
228
- * ` Vol ` : (channel , time) -> (channel , time)
219
+ * ` Spectrogram ` : (channels , time) -> (channels , freq, time)
220
+ * ` AmplitudeToDB ` : (channels , freq, time) -> (channels , freq, time)
221
+ * ` MelScale ` : (channels , freq, time) -> (channels , mel, time)
222
+ * ` MelSpectrogram ` : (channels , time) -> (channels , mel, time)
223
+ * ` MFCC ` : (channels , time) -> (channel, mfcc, time)
224
+ * ` MuLawEncode ` : (channels , time) -> (channels , time)
225
+ * ` MuLawDecode ` : (channels , time) -> (channels , time)
226
+ * ` Resample ` : (channels , time) -> (channels , time)
227
+ * ` Fade ` : (channels , time) -> (channels , time)
228
+ * ` Vol ` : (channels , time) -> (channels , time)
229
229
230
230
Complex numbers are supported via tensors of dimension (..., 2), and torchaudio provides ` complex_norm ` and ` angle ` to convert such a tensor into its magnitude and phase. Here, and in the documentation, we use an ellipsis "..." as a placeholder for the rest of the dimensions of a tensor, e.g. optional batching and channel dimensions.
231
231
0 commit comments