Auto-Talkloid in OpenUTAU (text-to-speech) #1783
skrunkly7654
started this conversation in
Ideas
Replies: 1 comment
-
|
Wow this sounds great. It's like ai tuning for talk |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I've been thinking, I think it would be really cool to have a function or plugin that can automatically generate a talkloid, essentially turning it into a text to speech.
there are apparently plugins for standard UTAU to do this (HANASU) and a plugin for VOICEVOX that can import an UTAU and essentially make it a text-tp-speech, however this is for Japanese only. UTAU is a software that supports any language you can imagine, so I feel like it's only fair to implement a system like this that all voicebanks can use.
You can already make UTAUs talk by tuning them to sound like human speech, and adding this as a feature would make the process a whole lot faster since from experience, talkloids can be tough.
Since this use-case is outside of OpenUTAU's intended function, I would bet that this wouldn't be considered to implement in the program itself. However, this could be an external plugin that users could optionally install,
My idea is: you open up the program and type into a box seperate from the UTAU window, and then clicking a button to generate the sentence and it imports to your openUTAU piano roll with the notes and auto-generated pitch. After that, the user can tweak the output to their liking. It uses the phonemizer system on OpenUTAU so there's no worry about it only being available in Japanese.
There is also the possibility of just making a text to speech from other open-source software, but i would wager that this is a pretty bad idea because most voicebank creators don't want their data being ported to a different engine without consent, and another huge issue is that most text to speech at least for english do not support phoneme control like IPA or ARPABET if a word is pronounced incorrectly.
[EDIT]: I made a rough mockup of my concept of how this should work and look:


alt text: A plugin can lead to a pup-up to create a TTS. This plugin is as simple and barebones as possible, and is intended to work with the tools already given in OpenUTAU, only having language choice, text box to type a sentece, and two expression parameters.
Speed is how slow the sentence should be spoken,
and expressiveness is the variation in pitch.
The user has the choice to generate the output at the playhead on the piano roll, or select notes that have already been generated and overwrite them using the
re-generate option if they are not satisfied with the first output.
The language selection is seperate from the phonemizer system, and the user must select the correct phonemizer on the voicebank beforehand. The text box only exists to map put words on the notes and make the program understand the tone for the pitch.
The text input should take into account punctuation
(.?! Etc) and tones (for chinese) to generate the outpot realistically.
After the user has generated the sentence, they can tweak the output on the piano roll just as they would with tuning singing, like editing the pitch, edit phonemes, and phoneme timings.
My concept is specifically to make TTS work with OpenUTAU and not act as a seperate engine or system.
Beta Was this translation helpful? Give feedback.
All reactions