Oh hell yeah

Lady Butterfly she/her@lemmy.world · 5 months ago

Oh hell yeah

SinAdjetivos@lemmy.world · 5 months ago

A single voice actor couldn’t produce enough lines to fully train an AI model…

The model is trained on a massive corpus of existing data and then fine tuned to match the target voice actor. Using less than ~30s of reference audio you can get a pretty decent fine tuning the main issue is that it currently isn’t on par with the quality and consistency of an in studio voice actor, especially over long time domains.

XM34@feddit.org · 5 months ago

Hence my usage of the words “fully train”. The other commentor wants to license every piece of audio used in training the model which obviously includes the base model…

SinAdjetivos@lemmy.world · 5 months ago

You can feed an infinite amount of data into existing models and it won’t improve the issues. The problem is with the models themselves.

And the audio used to train the base model are licensed. Usually under an MIT, creative commons, etc. license.