Skip to content

Commit 136b94f

Browse files
GNroypre-commit-ci[bot]
authored andcommitted
ASR Confidence update and tutorial (#6810)
* small fixes and tests Signed-off-by: Aleksandr Laptev <[email protected]> * various fixes for the tutorial Signed-off-by: Aleksandr Laptev <[email protected]> * tutorial added Signed-off-by: Aleksandr Laptev <[email protected]> * for for a little oops after rebasement Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tests Signed-off-by: Aleksandr Laptev <[email protected]> * unused import removed Signed-off-by: Aleksandr Laptev <[email protected]> * fix review comments Signed-off-by: Aleksandr Laptev <[email protected]> * deprecated parameters for greedy configs Signed-off-by: Aleksandr Laptev <[email protected]> * move re-assigning to configs Signed-off-by: Aleksandr Laptev <[email protected]> * fix comments 2 Signed-off-by: Aleksandr Laptev <[email protected]> * fix config tests Signed-off-by: Aleksandr Laptev <[email protected]> * fix ece test (my env was bugged apparently) Signed-off-by: Aleksandr Laptev <[email protected]> * renamings for confidence ensemble Signed-off-by: Aleksandr Laptev <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fox comments 3 Signed-off-by: Aleksandr Laptev <[email protected]> * return dropped tutorial Signed-off-by: Aleksandr Laptev <[email protected]> * CI flips back and forth, increasing tolerance Signed-off-by: Aleksandr Laptev <[email protected]> --------- Signed-off-by: Aleksandr Laptev <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: jubick1337 <[email protected]>
1 parent 5356689 commit 136b94f

23 files changed

+2836
-451
lines changed

docs/source/starthere/tutorials.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,12 @@ To run a tutorial:
109109
* - ASR
110110
- Hybrid ASR-TTS Models Tutorial
111111
- `Multi-lingual ASR <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_TTS_Tutorial.ipynb>`_
112+
* - ASR
113+
- ASR Confidence Estimation
114+
- `ASR Confidence Estimation <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_Confidence_Estimation.ipynb>`_
115+
* - ASR
116+
- Confidence-based Ensembles
117+
- `Confidence-based Ensembles <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Confidence_Ensembles.ipynb>`_
112118
* - NLP
113119
- Using Pretrained Language Models for Downstream Tasks
114120
- `Pretrained Language Models for Downstream Tasks <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/nlp/01_Pretrained_Language_Models_for_Downstream_Tasks.ipynb>`_

nemo/collections/asr/metrics/rnnt_wer.py

Lines changed: 35 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -100,32 +100,33 @@ class AbstractRNNTDecoding(ConfidenceMixin):
100100
from the `token_confidence`.
101101
aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence.
102102
Valid options are `mean`, `min`, `max`, `prod`.
103-
method_cfg: A dict-like object which contains the method name and settings to compute per-frame
103+
measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame
104104
confidence scores.
105105
106-
name: The method name (str).
106+
name: The measure name (str).
107107
Supported values:
108108
- 'max_prob' for using the maximum token probability as a confidence.
109109
- 'entropy' for using a normalized entropy of a log-likelihood vector.
110110
111111
entropy_type: Which type of entropy to use (str).
112-
Used if confidence_method_cfg.name is set to `entropy`.
112+
Used if confidence_measure_cfg.name is set to `entropy`.
113113
Supported values:
114-
- 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided,
114+
- 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided,
115115
the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)).
116-
Note that for this entropy, the temperature should comply the following inequality:
117-
1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size.
116+
Note that for this entropy, the alpha should comply the following inequality:
117+
(log(V)+2-sqrt(log^2(V)+4))/(2*log(V)) <= α <= (1+log(V-1))/log(V-1)
118+
where V is the model vocabulary size.
118119
- 'tsallis' for the Tsallis entropy with the Boltzmann constant one.
119120
Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)),
120121
where α is a parameter. When α == 1, it works like the Gibbs entropy.
121122
More: https://en.wikipedia.org/wiki/Tsallis_entropy
122-
- 'renui' for the Rényi entropy.
123+
- 'renyi' for the Rényi entropy.
123124
Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)),
124125
where α is a parameter. When α == 1, it works like the Gibbs entropy.
125126
More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy
126127
127-
temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
128-
When the temperature equals one, scaling is not applied to 'max_prob',
128+
alpha: Power scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
129+
When the alpha equals one, scaling is not applied to 'max_prob',
129130
and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i))
130131
131132
entropy_norm: A mapping of the entropy value to the interval [0,1].
@@ -139,7 +140,7 @@ class AbstractRNNTDecoding(ConfidenceMixin):
139140
timestep during greedy decoding. Setting to larger values allows longer sentences
140141
to be decoded, at the cost of increased execution time.
141142
preserve_frame_confidence: Same as above, overrides above value.
142-
confidence_method: Same as above, overrides confidence_cfg.method.
143+
confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg.
143144
144145
"beam":
145146
beam_size: int, defining the beam size for beam search. Must be >= 1.
@@ -255,15 +256,13 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
255256
# initialize confidence-related fields
256257
self._init_confidence(self.cfg.get('confidence_cfg', None))
257258

258-
# Update preserve frame confidence
259-
if self.preserve_frame_confidence is False:
260-
if self.cfg.strategy in ['greedy', 'greedy_batch']:
261-
self.preserve_frame_confidence = self.cfg.greedy.get('preserve_frame_confidence', False)
262-
self.confidence_method_cfg = self.cfg.greedy.get('confidence_method_cfg', None)
263-
264-
elif self.cfg.strategy in ['beam', 'tsd', 'alsd', 'maes']:
265-
# Not implemented
266-
pass
259+
# Confidence estimation is not implemented for these strategies
260+
if (
261+
not self.preserve_frame_confidence
262+
and self.cfg.strategy in ['beam', 'tsd', 'alsd', 'maes']
263+
and self.cfg.beam.get('preserve_frame_confidence', False)
264+
):
265+
raise NotImplementedError(f"Confidence calculation is not supported for strategy `{self.cfg.strategy}`")
267266

268267
if self.cfg.strategy == 'greedy':
269268
if self.big_blank_durations is None:
@@ -278,7 +277,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
278277
),
279278
preserve_alignments=self.preserve_alignments,
280279
preserve_frame_confidence=self.preserve_frame_confidence,
281-
confidence_method_cfg=self.confidence_method_cfg,
280+
confidence_measure_cfg=self.confidence_measure_cfg,
282281
)
283282
else:
284283
self.decoding = greedy_decode.GreedyTDTInfer(
@@ -292,7 +291,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
292291
),
293292
preserve_alignments=self.preserve_alignments,
294293
preserve_frame_confidence=self.preserve_frame_confidence,
295-
confidence_method_cfg=self.confidence_method_cfg,
294+
confidence_measure_cfg=self.confidence_measure_cfg,
296295
)
297296
else:
298297
self.decoding = greedy_decode.GreedyMultiblankRNNTInfer(
@@ -305,7 +304,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
305304
),
306305
preserve_alignments=self.preserve_alignments,
307306
preserve_frame_confidence=self.preserve_frame_confidence,
308-
confidence_method_cfg=self.confidence_method_cfg,
307+
confidence_measure_cfg=self.confidence_measure_cfg,
309308
)
310309

311310
elif self.cfg.strategy == 'greedy_batch':
@@ -321,7 +320,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
321320
),
322321
preserve_alignments=self.preserve_alignments,
323322
preserve_frame_confidence=self.preserve_frame_confidence,
324-
confidence_method_cfg=self.confidence_method_cfg,
323+
confidence_measure_cfg=self.confidence_measure_cfg,
325324
)
326325
else:
327326
self.decoding = greedy_decode.GreedyBatchedTDTInfer(
@@ -335,7 +334,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
335334
),
336335
preserve_alignments=self.preserve_alignments,
337336
preserve_frame_confidence=self.preserve_frame_confidence,
338-
confidence_method_cfg=self.confidence_method_cfg,
337+
confidence_measure_cfg=self.confidence_measure_cfg,
339338
)
340339

341340
else:
@@ -349,7 +348,7 @@ def __init__(self, decoding_cfg, decoder, joint, blank_id: int):
349348
),
350349
preserve_alignments=self.preserve_alignments,
351350
preserve_frame_confidence=self.preserve_frame_confidence,
352-
confidence_method_cfg=self.confidence_method_cfg,
351+
confidence_measure_cfg=self.confidence_measure_cfg,
353352
)
354353

355354
elif self.cfg.strategy == 'beam':
@@ -1006,32 +1005,33 @@ class RNNTDecoding(AbstractRNNTDecoding):
10061005
from the `token_confidence`.
10071006
aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence.
10081007
Valid options are `mean`, `min`, `max`, `prod`.
1009-
method_cfg: A dict-like object which contains the method name and settings to compute per-frame
1008+
measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame
10101009
confidence scores.
10111010
1012-
name: The method name (str).
1011+
name: The measure name (str).
10131012
Supported values:
10141013
- 'max_prob' for using the maximum token probability as a confidence.
10151014
- 'entropy' for using a normalized entropy of a log-likelihood vector.
10161015
10171016
entropy_type: Which type of entropy to use (str).
1018-
Used if confidence_method_cfg.name is set to `entropy`.
1017+
Used if confidence_measure_cfg.name is set to `entropy`.
10191018
Supported values:
1020-
- 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided,
1019+
- 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided,
10211020
the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)).
1022-
Note that for this entropy, the temperature should comply the following inequality:
1023-
1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size.
1021+
Note that for this entropy, the alpha should comply the following inequality:
1022+
(log(V)+2-sqrt(log^2(V)+4))/(2*log(V)) <= α <= (1+log(V-1))/log(V-1)
1023+
where V is the model vocabulary size.
10241024
- 'tsallis' for the Tsallis entropy with the Boltzmann constant one.
10251025
Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)),
10261026
where α is a parameter. When α == 1, it works like the Gibbs entropy.
10271027
More: https://en.wikipedia.org/wiki/Tsallis_entropy
1028-
- 'renui' for the Rényi entropy.
1028+
- 'renyi' for the Rényi entropy.
10291029
Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)),
10301030
where α is a parameter. When α == 1, it works like the Gibbs entropy.
10311031
More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy
10321032
1033-
temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
1034-
When the temperature equals one, scaling is not applied to 'max_prob',
1033+
alpha: Power scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
1034+
When the alpha equals one, scaling is not applied to 'max_prob',
10351035
and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i))
10361036
10371037
entropy_norm: A mapping of the entropy value to the interval [0,1].
@@ -1047,7 +1047,7 @@ class RNNTDecoding(AbstractRNNTDecoding):
10471047
10481048
preserve_frame_confidence: Same as above, overrides above value.
10491049
1050-
confidence_method: Same as above, overrides confidence_cfg.method.
1050+
confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg.
10511051
10521052
"beam":
10531053
beam_size: int, defining the beam size for beam search. Must be >= 1.

nemo/collections/asr/metrics/rnnt_wer_bpe.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -100,32 +100,33 @@ class RNNTBPEDecoding(AbstractRNNTDecoding):
100100
from the `token_confidence`.
101101
aggregation: Which aggregation type to use for collapsing per-token confidence into per-word confidence.
102102
Valid options are `mean`, `min`, `max`, `prod`.
103-
method_cfg: A dict-like object which contains the method name and settings to compute per-frame
103+
measure_cfg: A dict-like object which contains the measure name and settings to compute per-frame
104104
confidence scores.
105105
106-
name: The method name (str).
106+
name: The measure name (str).
107107
Supported values:
108108
- 'max_prob' for using the maximum token probability as a confidence.
109109
- 'entropy' for using a normalized entropy of a log-likelihood vector.
110110
111111
entropy_type: Which type of entropy to use (str).
112-
Used if confidence_method_cfg.name is set to `entropy`.
112+
Used if confidence_measure_cfg.name is set to `entropy`.
113113
Supported values:
114-
- 'gibbs' for the (standard) Gibbs entropy. If the temperature α is provided,
114+
- 'gibbs' for the (standard) Gibbs entropy. If the alpha (α) is provided,
115115
the formula is the following: H_α = -sum_i((p^α_i)*log(p^α_i)).
116-
Note that for this entropy, the temperature should comply the following inequality:
117-
1/log(V) <= α <= -1/log(1-1/V) where V is the model vocabulary size.
116+
Note that for this entropy, the alpha should comply the following inequality:
117+
(log(V)+2-sqrt(log^2(V)+4))/(2*log(V)) <= α <= (1+log(V-1))/log(V-1)
118+
where V is the model vocabulary size.
118119
- 'tsallis' for the Tsallis entropy with the Boltzmann constant one.
119120
Tsallis entropy formula is the following: H_α = 1/(α-1)*(1-sum_i(p^α_i)),
120121
where α is a parameter. When α == 1, it works like the Gibbs entropy.
121122
More: https://en.wikipedia.org/wiki/Tsallis_entropy
122-
- 'renui' for the Rényi entropy.
123+
- 'renyi' for the Rényi entropy.
123124
Rényi entropy formula is the following: H_α = 1/(1-α)*log_2(sum_i(p^α_i)),
124125
where α is a parameter. When α == 1, it works like the Gibbs entropy.
125126
More: https://en.wikipedia.org/wiki/R%C3%A9nyi_entropy
126127
127-
temperature: Temperature scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
128-
When the temperature equals one, scaling is not applied to 'max_prob',
128+
alpha: Power scale for logsoftmax (α for entropies). Here we restrict it to be > 0.
129+
When the alpha equals one, scaling is not applied to 'max_prob',
129130
and any entropy type behaves like the Shannon entropy: H = -sum_i(p_i*log(p_i))
130131
131132
entropy_norm: A mapping of the entropy value to the interval [0,1].
@@ -141,7 +142,7 @@ class RNNTBPEDecoding(AbstractRNNTDecoding):
141142
142143
preserve_frame_confidence: Same as above, overrides above value.
143144
144-
confidence_method: Same as above, overrides confidence_cfg.method.
145+
confidence_measure_cfg: Same as above, overrides confidence_cfg.measure_cfg.
145146
146147
"beam":
147148
beam_size: int, defining the beam size for beam search. Must be >= 1.

0 commit comments

Comments
 (0)