Wow. Thanks for posting the direct link to examples. Those sound incredibly good and would be impressive for a frontier lab. For two people over a few months, it's spectacular.
A little overacted, it reminds me of the voice acting in those flash cartoons you'd see in the early days of YouTube. That's not to say it isn't good work, it still sounds remarkably human. Just silly humans :)
Sounds great. One of the female examples has convincing uptalk. There must be a way to manipulate the latent space to control uptalk, vocal fry, smoker’s voice, lispiness, etc.
Is there some sort of system prompt or hint at how it should be voiced, or does it interpret it from the text?
Because it would be hilarious if it just derived it from the text and it did this sort of voice acting when you didn't want it to, like reading a matter-of-fact warning label.
This is awesome. Are extensions like PostGIS supported? I wish react-native natively supported WASM. It would be cool to run this client-side instead of sqlite.
> [S1] Oh fire! Oh my goodness! What's the procedure? What to we do people? The smoke could be coming through an air duct!
Seriously impressive. Wish I could direct link the audio.
Kudos to the Dia team.