Skip to content

Commit caff0c9

Browse files
ericharperartbataevkarpnvbene-gesekmb
authored andcommitted
Merge release r1.20.0 to main (#7167)
* update package info Signed-off-by: ericharper <[email protected]> * Add ASR with TTS Tutorial. Fix enhancer usage. (#6955) * Add ASR with TTS Tutorial * Fix enhancer usage Signed-off-by: Vladimir Bataev <[email protected]> * install_bs (#7019) Signed-off-by: Nikolay Karpov <[email protected]> * Fix typo and branch in tutorial (#7048) Signed-off-by: Vladimir Bataev <[email protected]> * fix syntax error introduced in PR-7079 (#7102) * fix syntax error introduced in PR-7079 Signed-off-by: Alexandra Antonova <[email protected]> * fixes for pr review Signed-off-by: Alexandra Antonova <[email protected]> --------- Signed-off-by: Alexandra Antonova <[email protected]> * fix links for TN (#7117) Signed-off-by: Evelina <[email protected]> * update branch (#7135) Signed-off-by: ericharper <[email protected]> * Fixed main and merging this to r1.20 (#7127) * Fixed main and merging this to r1.20 Signed-off-by: Taejin Park <[email protected]> * Update vad_utils.py Signed-off-by: He Huang (Steve) <[email protected]> --------- Signed-off-by: Taejin Park <[email protected]> Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> * update branch Signed-off-by: ericharper <[email protected]> * fix version Signed-off-by: ericharper <[email protected]> * resolve conflict the other way Signed-off-by: ericharper <[email protected]> * keep both Signed-off-by: ericharper <[email protected]> * revert keep both Signed-off-by: ericharper <[email protected]> --------- Signed-off-by: ericharper <[email protected]> Signed-off-by: Vladimir Bataev <[email protected]> Signed-off-by: Nikolay Karpov <[email protected]> Signed-off-by: Alexandra Antonova <[email protected]> Signed-off-by: Evelina <[email protected]> Signed-off-by: Taejin Park <[email protected]> Signed-off-by: He Huang (Steve) <[email protected]> Co-authored-by: Vladimir Bataev <[email protected]> Co-authored-by: Nikolay Karpov <[email protected]> Co-authored-by: bene-ges <[email protected]> Co-authored-by: Evelina <[email protected]> Co-authored-by: Taejin Park <[email protected]> Co-authored-by: He Huang (Steve) <[email protected]> Signed-off-by: jubick1337 <[email protected]>
1 parent 52ba772 commit caff0c9

File tree

6 files changed

+8
-6
lines changed

6 files changed

+8
-6
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ COPY . .
9494

9595
# start building the final container
9696
FROM nemo-deps as nemo
97-
ARG NEMO_VERSION=1.20.0
97+
ARG NEMO_VERSION=1.21.0
9898

9999
# Check that NEMO_VERSION is set. Build will fail without this. Expose NEMO and base container
100100
# version information as runtime environment variable for introspection purposes

nemo/collections/asr/parts/utils/vad_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -732,7 +732,7 @@ def generate_vad_segment_table(
732732
vad_pred_filepath_list = [os.path.join(vad_pred_dir, x) for x in os.listdir(vad_pred_dir) if x.endswith(suffixes)]
733733

734734
if not out_dir:
735-
out_dir_name = "seg_output_"
735+
out_dir_name = "seg_output"
736736
for key in postprocessing_params:
737737
out_dir_name = out_dir_name + "-" + str(key) + str(postprocessing_params[key])
738738

nemo/package_info.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414

1515

1616
MAJOR = 1
17-
MINOR = 20
17+
MINOR = 21
1818
PATCH = 0
1919
PRE_RELEASE = 'rc0'
2020

tutorials/asr/Offline_ASR.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -655,4 +655,4 @@
655655
"outputs": []
656656
}
657657
]
658-
}
658+
}

tutorials/nlp/SpellMapper_English_ASR_Customization.ipynb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -934,7 +934,9 @@
934934
"id": "9T3CZcCAmxCz"
935935
},
936936
"source": [
937-
"Now we have a folder with generated audios `audio/*.wav` and a nemo manifest with json records like `{\"audio_filepath\": \"audio/0.wav\", \"text\": \"no renal auditory or vestibular toxicity was observed\", \"orig_text\": \"No renal, auditory, or vestibular toxicity was observed.\"}`."
937+
"Now we have a folder with generated audios `audio/*.wav` and a nemo manifest with json records like `{\"audio_filepath\": \"audio/0.wav\", \"text\": \"no renal auditory or vestibular toxicity was observed\", \"orig_text\": \"No renal, auditory, or vestibular toxicity was observed.\"}`.",
938+
"\n",
939+
"Note that TTS model may mispronounce some unknown words, for example, abbreviations like `tRNAs`."
938940
]
939941
},
940942
{

tutorials/tools/CTC_Segmentation_Tutorial.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -280,7 +280,7 @@
280280
"* `max_length` argument - max number of words in a segment for alignment (used only if there are no punctuation marks present in the original text. Long non-speech segments are better for segments split and are more likely to co-occur with punctuation marks. Random text split could deteriorate the quality of the alignment.\n",
281281
"* out-of-vocabulary words will be removed based on pre-trained ASR model vocabulary, and the text will be changed to lowercase \n",
282282
"* sentences for alignment with the original punctuation and capitalization will be stored under `$OUTPUT_DIR/processed/*_with_punct.txt`\n",
283-
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/r1.19.0/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/r1.19.0/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
283+
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
284284
"\n",
285285
"### Audio preprocessing:\n",
286286
"* non '.wav' audio files will be converted to `.wav` format\n",

0 commit comments

Comments
 (0)