Skip to content

Commit 1fb26f9

Browse files
committed
Merge branch 'rnnt-hybrid-export' of https://github.com/trias702/NeMo into rnnt-hybrid-export
2 parents 478945d + d75784f commit 1fb26f9

File tree

7 files changed

+29
-12
lines changed

7 files changed

+29
-12
lines changed

docs/source/asr/models.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ You may find FastConformer variants of cache-aware streaming models under ``<NeM
218218
Note cache-aware streaming models are being exported without caching support by default.
219219
To include caching support, `model.set_export_config({'cache_support' : 'True'})` should be called before export.
220220
Or, if ``<NeMo_git_root>/scripts/export.py`` is being used:
221-
`python export.py cache_aware_conformer.nemo cache_aware_conformer.onnx --config cache_support=True`
221+
`python export.py cache_aware_conformer.nemo cache_aware_conformer.onnx --export-config cache_support=True`
222222

223223
.. _LSTM-Transducer_model:
224224

@@ -299,7 +299,7 @@ Similar example configs for FastConformer variants of Hybrid models can be found
299299
Note Hybrid models are being exported as RNNT (encoder and decoder+joint parts) by default.
300300
To export as CTC (single encoder+decoder graph), `model.set_export_config({'decoder_type' : 'ctc'})` should be called before export.
301301
Or, if ``<NeMo_git_root>/scripts/export.py`` is being used:
302-
`python export.py hybrid_transducer.nemo hybrid_transducer.onnx --config decoder_type=ctc`
302+
`python export.py hybrid_transducer.nemo hybrid_transducer.onnx --export-config decoder_type=ctc`
303303

304304
.. _Conformer-HAT_model:
305305

docs/source/core/export.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,7 @@ An example can be found in ``<NeMo_git_root>/nemo/collections/asr/models/rnnt_mo
207207
Here is example on now `set_export_config()` call is being tied to command line arguments in ``<NeMo_git_root>/scripts/export.py`` :
208208

209209
.. code-block:: Python
210-
python scripts/export.py hybrid_conformer.nemo hybrid_conformer.onnx --config decoder_type=ctc
210+
python scripts/export.py hybrid_conformer.nemo hybrid_conformer.onnx --export-config decoder_type=ctc
211211
212212
Exportable Model Code
213213
~~~~~~~~~~~~~~~~~~~~~

nemo/collections/tts/models/base.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,18 @@ def list_available_models(cls) -> 'List[PretrainedModelInfo]':
6868
list_of_models.extend(subclass_models)
6969
return list_of_models
7070

71+
def set_export_config(self, args):
72+
for k in ['enable_volume', 'enable_ragged_batches']:
73+
if k in args:
74+
self.export_config[k] = bool(args[k])
75+
args.pop(k)
76+
if 'num_speakers' in args:
77+
self.export_config['num_speakers'] = int(args['num_speakers'])
78+
args.pop('num_speakers')
79+
if 'emb_range' in args:
80+
raise Exception('embedding range is not user-settable')
81+
super().set_export_config(args)
82+
7183

7284
class Vocoder(ModelPT, ABC):
7385
"""

scripts/export.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ def get_args(argv):
6363
parser.add_argument("--device", default="cuda", help="Device to export for")
6464
parser.add_argument("--check-tolerance", type=float, default=0.01, help="tolerance for verification")
6565
parser.add_argument(
66-
"--config",
66+
"--export-config",
6767
metavar="KEY=VALUE",
6868
nargs='+',
6969
help="Set a number of key-value pairs to model.export_config dictionary "
@@ -142,8 +142,14 @@ def nemo_export(argv):
142142
if args.cache_support:
143143
model.set_export_config({"cache_support": "True"})
144144

145-
if args.config:
146-
kv = dict(map(lambda s: s.split('='), args.config))
145+
if args.export_config:
146+
kv = {}
147+
for key_value in args.export_config:
148+
lst = key_value.split("=")
149+
if len(lst) != 2:
150+
raise Exception("Use correct format for --export_config: k=v")
151+
k, v = lst
152+
kv[k] = v
147153
model.set_export_config(kv)
148154

149155
autocast = nullcontext

tests/collections/tts/test_tts_exportables.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,7 @@ def radtts_model():
5454
model = RadTTSModel(cfg=cfg.model)
5555
app_state.is_model_being_restored = False
5656
model.eval()
57-
model.export_config['enable_ragged_batches'] = True
58-
model.export_config['enable_volume'] = True
57+
model.set_export_config({'enable_ragged_batches': 'True', 'enable_volume': 'True'})
5958
return model
6059

6160

tutorials/asr/Offline_ASR_with_VAD_for_CTC_models.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -389,7 +389,7 @@
389389
"source": [
390390
"# Further Reading\n",
391391
"\n",
392-
"There are two ways to incorporate VAD into ASR pipeline. The first strategy is to drop the frames that are predicted as `non-speech` by VAD, as already discussed in this tutorial. The second strategy is to keep all the frames and mask the `non-speech` frames with zero-signal values. Also, instead of using segment-VAD as shown in this tutorial, we can use frame-VAD model for faster inference and better accuracy. For more information, please refer to the script [speech_to_text_with_vad.py](https://github.com/NVIDIA/NeMo/blob/stable/examples/asr_vad/speech_to_text_with_vad.py)."
392+
"There are two ways to incorporate VAD into ASR pipeline. The first strategy is to drop the frames that are predicted as `non-speech` by VAD, as already discussed in this tutorial. The second strategy is to keep all the frames and mask the `non-speech` frames with zero-signal values. Also, instead of using segment-VAD as shown in this tutorial, we can use frame-VAD model for faster inference and better accuracy. For more information, please refer to the script [speech_to_text_with_vad.py](https://github.com/NVIDIA/NeMo/blob/stable/examples/asr/asr_vad/speech_to_text_with_vad.py)."
393393
]
394394
}
395395
],

tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@
8585
"# Install NeMo library. If you are running locally (rather than on Google Colab), comment out the below lines\n",
8686
"# and instead follow the instructions at https://github.com/NVIDIA/NeMo#Installation\n",
8787
"GITHUB_ACCOUNT = \"NVIDIA\"\n",
88-
"BRANCH = \"main\"\n",
88+
"BRANCH = 'main'\n",
8989
"!python -m pip install git+https://github.com/{GITHUB_ACCOUNT}/NeMo.git@{BRANCH}#egg=nemo_toolkit[all]\n",
9090
"\n",
9191
"# Download local version of NeMo scripts. If you are running locally and want to use your own local NeMo code,\n",
@@ -536,7 +536,7 @@
536536
"id": "b1K6paeee2Iu"
537537
},
538538
"source": [
539-
"As we mentioned earlier, this model pipeline is intended to work with custom vocabularies up to several thousand entries. Since the whole medical vocabulary contains 110k entries, we restrict our custom vocabulary to 5000+ terms that occured in given corpus of abstracts.\n",
539+
"As we mentioned earlier, this model pipeline is intended to work with custom vocabularies up to several thousand entries. Since the whole medical vocabulary contains 110k entries, we restrict our custom vocabulary to 5000+ terms that occurred in given corpus of abstracts.\n",
540540
"\n",
541541
"The goal of indexing our custom vocabulary is to build an index where key is a letter n-gram and value is the whole phrase. The keys are n-grams in the given user phrase and their misspelled variants taken from our collection of n-\n",
542542
"gram mappings (see Index of custom vocabulary in Fig. 1)\n",
@@ -1273,7 +1273,7 @@
12731273
"### Filtering by Dynamic Programming(DP) score\n",
12741274
"\n",
12751275
"What else can be done?\n",
1276-
"Given a fragment and its potential replacement, we can apply **dynamic programming** to find the most probable \"translation\" path between them. We will use the same n-gram mapping vocabulary, because its frequencies give us \"translation probability\" of each n-gram pair. The final path score can be calculated as maximum sum of log probalities of matching n-grams along this path.\n",
1276+
"Given a fragment and its potential replacement, we can apply **dynamic programming** to find the most probable \"translation\" path between them. We will use the same n-gram mapping vocabulary, because its frequencies give us \"translation probability\" of each n-gram pair. The final path score can be calculated as maximum sum of log probabilities of matching n-grams along this path.\n",
12771277
"Let's look at an example. "
12781278
]
12791279
},

0 commit comments

Comments
 (0)