Skip to content

Commit 211270d

Browse files
authored
Update desciptions of lengths parameters (#1890)
1 parent 89aeb68 commit 211270d

File tree

3 files changed

+33
-12
lines changed

3 files changed

+33
-12
lines changed

torchaudio/models/tacotron2.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1080,7 +1080,7 @@ def infer(self, tokens: Tensor, lengths: Optional[Tensor] = None) -> Tuple[Tenso
10801080
If ``None``, it is assumed that the all the tokens are valid. Default: ``None``
10811081
10821082
Returns:
1083-
Tensor, Tensor, and Tensor:
1083+
(Tensor, Tensor, Tensor):
10841084
Tensor
10851085
The predicted mel spectrogram with shape `(n_batch, n_mels, max of mel_specgram_lengths)`.
10861086
Tensor

torchaudio/models/wav2vec2/model.py

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -50,22 +50,29 @@ def extract_features(
5050
Args:
5151
waveforms (Tensor): Audio tensor of shape `(batch, frames)`.
5252
lengths (Tensor or None, optional):
53-
Indicates the valid length of each audio sample in the batch.
53+
Indicates the valid length of each audio in the batch.
5454
Shape: `(batch, )`.
55+
When the ``waveforms`` contains audios with different durations,
56+
by providing ``lengths`` argument, the model will compute
57+
the corresponding valid output lengths and apply proper mask in
58+
transformer attention layer.
59+
If ``None``, it is assumed that the entire audio waveform
60+
length is valid.
5561
num_layers (int or None, optional):
5662
If given, limit the number of intermediate layers to go through.
5763
Providing `1` will stop the computation after going through one
5864
intermediate layers. If not given, the outputs from all the
5965
intermediate layers are returned.
6066
6167
Returns:
62-
List of Tensors and an optional Tensor:
68+
(List[Tensor], Optional[Tensor]):
6369
List of Tensors
6470
Features from requested layers.
65-
Each Tensor is of shape: `(batch, frames, feature dimention)`
71+
Each Tensor is of shape: `(batch, time frame, feature dimension)`
6672
Tensor or None
6773
If ``lengths`` argument was provided, a Tensor of shape `(batch, )`
68-
is retuned. It indicates the valid length of each feature in the batch.
74+
is returned.
75+
It indicates the valid length in time axis of each feature Tensor.
6976
"""
7077
x, lengths = self.feature_extractor(waveforms, lengths)
7178
x = self.encoder.extract_features(x, lengths, num_layers)
@@ -81,17 +88,24 @@ def forward(
8188
Args:
8289
waveforms (Tensor): Audio tensor of shape `(batch, frames)`.
8390
lengths (Tensor or None, optional):
84-
Indicates the valid length of each audio sample in the batch.
91+
Indicates the valid length of each audio in the batch.
8592
Shape: `(batch, )`.
93+
When the ``waveforms`` contains audios with different duration,
94+
by providing ``lengths`` argument, the model will compute
95+
the corresponding valid output lengths and apply proper mask in
96+
transformer attention layer.
97+
If ``None``, it is assumed that all the audio in ``waveforms``
98+
have valid length. Default: ``None``.
8699
87100
Returns:
88-
Tensor and an optional Tensor:
101+
(Tensor, Optional[Tensor]):
89102
Tensor
90103
The sequences of probability distribution (in logit) over labels.
91104
Shape: `(batch, frames, num labels)`.
92105
Tensor or None
93106
If ``lengths`` argument was provided, a Tensor of shape `(batch, )`
94-
is retuned. It indicates the valid length of each feature in the batch.
107+
is retuned.
108+
It indicates the valid length in time axis of the output Tensor.
95109
"""
96110
x, lengths = self.feature_extractor(waveforms, lengths)
97111
x = self.encoder(x, lengths)

torchaudio/models/wavernn.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -341,16 +341,23 @@ def infer(self, specgram: Tensor, lengths: Optional[Tensor] = None) -> Tuple[Ten
341341
specgram (Tensor):
342342
Batch of spectrograms. Shape: `(n_batch, n_freq, n_time)`.
343343
lengths (Tensor or None, optional):
344-
Indicates the valid length in of each spectrogram in time axis.
345-
Shape: `(n_batch, )`.
344+
Indicates the valid length of each audio in the batch.
345+
Shape: `(batch, )`.
346+
When the ``specgram`` contains spectrograms with different duration,
347+
by providing ``lengths`` argument, the model will compute
348+
the corresponding valid output lengths.
349+
If ``None``, it is assumed that all the audio in ``waveforms``
350+
have valid length. Default: ``None``.
346351
347352
Returns:
348-
Tensor and optional Tensor:
353+
(Tensor, Optional[Tensor]):
349354
Tensor
350355
The inferred waveform of size `(n_batch, 1, n_time)`.
351356
1 stands for a single channel.
352357
Tensor or None
353-
The valid lengths of each waveform in the batch. Size `(n_batch, )`.
358+
If ``lengths`` argument was provided, a Tensor of shape `(batch, )`
359+
is retuned.
360+
It indicates the valid length in time axis of the output Tensor.
354361
"""
355362

356363
device = specgram.device

0 commit comments

Comments
 (0)