Skip to content

Commit d1e9c84

Browse files
authored
[TTS][ZH] add Chinese TTS recipes based on IPA symbol sets. (#6893)
* [TTS] add Chinese TTS recipe based on IPA. * add new pinyin and ipa dictionaries with 36 finals. * add yaml configs for 24-final pinyin and ipa. * add copyright header * add a directory level 24finals to discriminate from 36 finals. Signed-off-by: Xuesong Yang <[email protected]> * unify configs into a single one and add detailed comments providing supported candidates. Signed-off-by: Xuesong Yang <[email protected]> * choose 36-final IPA as default phoneme dict Signed-off-by: Xuesong Yang <[email protected]> --------- Signed-off-by: Xuesong Yang <[email protected]>
1 parent f4b9650 commit d1e9c84

File tree

8 files changed

+1329
-6
lines changed

8 files changed

+1329
-6
lines changed

examples/tts/conf/zh/fastpitch_align_22050.yaml

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
# This config contains the default values for training FastPitch model with aligner using 22KHz sampling
2-
# rate. If you want to train model on other dataset, you can change config values according to your dataset.
1+
# This config contains the default values for training FastPitch model with aligner using 22KHz sampling
2+
# rate. If you want to train model on other dataset, you can change config values according to your dataset.
33
# Most dataset-specific arguments are in the head of the config file, see below.
44

55
name: FastPitch
@@ -28,7 +28,13 @@ lowfreq: 0
2828
highfreq: null
2929
window: hann
3030

31-
phoneme_dict_path: "scripts/tts_dataset_files/zh/pinyin_dict_nv_22.10.txt"
31+
# There are four candidates of `phoneme_dict_path` provided for Chinese as shown below,
32+
# 1) 24-final Pinyin: "scripts/tts_dataset_files/zh/24finals/pinyin_dict_nv_22.10.txt",
33+
# 2) IPA converted from 24-final Pinyin: "scripts/tts_dataset_files/zh/24finals/ipa_dict_nv23.05.txt",
34+
# 3) 36-final Pinyin: "scripts/tts_dataset_files/zh/36finals/pinyin_dict_nv23.05.txt",
35+
# 4) (default) IPA converted from 36-final Pinyin: "scripts/tts_dataset_files/zh/36finals/ipa_dict_nv23.05.txt"
36+
# Suggest to choose IPA symbol set converted from 36-final Pinyin because better audio quality were observed.
37+
phoneme_dict_path: "scripts/tts_dataset_files/zh/36finals/ipa_dict_nv23.05.txt"
3238

3339
model:
3440
learn_alignment: true
@@ -73,6 +79,11 @@ model:
7379
_target_: nemo.collections.tts.g2p.models.zh_cn_pinyin.ChineseG2p
7480
phoneme_dict: ${phoneme_dict_path}
7581
word_segmenter: jieba # Only jieba is supported now.
82+
phoneme_prefix: ""
83+
phoneme_case: lower
84+
tone_prefix: "#"
85+
ascii_letter_prefix: ""
86+
ascii_letter_case: upper
7687

7788
train_ds:
7889
dataset:

nemo/collections/common/tokenizers/text_to_speech/tts_tokenizers.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -694,7 +694,10 @@ def __init__(
694694
pad_with_space=False,
695695
text_preprocessing_func=chinese_text_preprocessing,
696696
):
697-
"""Chinese phoneme-based tokenizer.
697+
"""
698+
Chinese phoneme-based tokenizer.
699+
Note: This tokenizer for now covers Chinese phonemes/tones and English letters because our dataset contains
700+
both Chinese and English graphemes.
698701
Args:
699702
g2p: Grapheme to phoneme module.
700703
punct: Whether to reserve grapheme for basic punctuation or not.

scripts/dataset_processing/tts/sfbilingual/ds_conf/ds_for_fastpitch_align.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ name: "ds_for_fastpitch_align"
33
manifest_filepath: "train_manifest.json"
44
sup_data_path: "sup_data"
55
sup_data_types: [ "align_prior_matrix", "pitch" ]
6-
phoneme_dict_path: "scripts/tts_dataset_files/zh/pinyin_dict_nv_22.10.txt"
6+
phoneme_dict_path: "scripts/tts_dataset_files/zh/24finals/pinyin_dict_nv_22.10.txt"
77

88
dataset:
99
_target_: nemo.collections.tts.data.dataset.TTSDataset

0 commit comments

Comments
 (0)