When running diarize.py with a Hebrew Whisper model (fine-tuned Whisper CTranslate2 architecture ivrit-ai/whisper-large-v3-turbo-ct2), the transcription phase finishes, but the pipeline crashes during the forced alignment phase inside ctc_forced_aligner.
It throws an AssertionError: a != <star> in alignment_utils.py while trying to map the generated tokens/characters to the timestamp segments. This seems to be caused by a mismatch between Hebrew character tokenization and the expected special tokens (like <star>) in the aligner.
The error during the get_spans execution phase.Error Logs/Stderr
/work/whisper-diarization/venv/lib/python3.12/site-packages/torchaudio/__init__.py:178: UserWarning: The 'encoding' parameter is not fully supported by TorchCodec AudioEncoder.
return save_with_torchcodec(
/work/whisper-diarization/venv/lib/python3.12/site-packages/torchaudio/__init__.py:178: UserWarning: The 'bits_per_sample' parameter is not directly supported by TorchCodec AudioEncoder.
return save_with_torchcodec(
Traceback (most recent call last):
File "/work/whisper-diarization/diarize.py", line 183, in <module>
spans = get_spans(tokens_starred, segments, blank_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work/whisper-diarization/venv/lib/python3.12/site-packages/ctc_forced_aligner/alignment_utils.py", line 62, in get_spans
assert seg.label == ltr, f"{seg.label} != {ltr}"
^^^^^^^^^^^^^^^^
AssertionError: a != <star>
Environment:
- OS: Ubuntu 24.04 / Linux
- Python Version: 3.12
faster-whisper Version: 1.2.1
ctranslate2 Version: 4.7.2
- Model Used:
ivrit-ai/whisper-large-v3-turbo-ct2 (Hebrew)
Expected behavior
The alignment engine should gracefully handle non-Latin characters or ignore unknown structural tokens instead of throwing a hard assertion error, allowing the speaker diarization phase to complete.
When running
diarize.pywith a Hebrew Whisper model (fine-tuned Whisper CTranslate2 architectureivrit-ai/whisper-large-v3-turbo-ct2), the transcription phase finishes, but the pipeline crashes during the forced alignment phase insidectc_forced_aligner.It throws an
AssertionError: a != <star>inalignment_utils.pywhile trying to map the generated tokens/characters to the timestamp segments. This seems to be caused by a mismatch between Hebrew character tokenization and the expected special tokens (like<star>) in the aligner.The error during the
get_spansexecution phase.Error Logs/StderrEnvironment:
faster-whisperVersion: 1.2.1ctranslate2Version: 4.7.2ivrit-ai/whisper-large-v3-turbo-ct2(Hebrew)Expected behavior
The alignment engine should gracefully handle non-Latin characters or ignore unknown structural tokens instead of throwing a hard assertion error, allowing the speaker diarization phase to complete.