Audio Codes "at utterance level"

## ❓ Questions

I'm interested in using the encoder to encode an audio fragment of a few seconds into just one codebook vector. However, the model returns a sequence of several `audio_codes` (of course, it is the only way to succesfully decode the audio afterwards).

How would you recommend using the encoder, and/or pre-postprocessing the audio input or `audio_codes` to obtain just one audio code "at utterance level"?

Thanks in advance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Audio Codes "at utterance level" #83

❓ Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Audio Codes "at utterance level" #83

Description

❓ Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions