[GUIDE] AI voice generation and training using RVC models

eatsl · Post by **eatsl** » Sun Dec 10, 2023 12:39 pm

Dears,

for about a year or so I have been working on AI models and their use for our specific purposes.

I wanted to share some of my findings with the community, and so here is a guide on how to create voice models, and how to make use of them:

1) installs:

https://github.com/IAHispano/Applio-RVC-Fork/releases we will use this software to train our models aswell as using them via TTS or UVR5

https://www.youtube.com/watch?v=jh83ViDUvJs a good guide on how to install and use this software

https://www.weights.gg/ find your voice models here, or somewhere else on the interwebs

https://docs.applio.org/rvc/train/train ... reparation

Installation itself is surprisingly easy compared to Turtoise TTS or some other solutions out there as of now. You need to allow it to install though, and it will take some time to load all dependencies and install them (the terminal will do this for you, just watch once every few minutes to give the appropriate imput)

2) Run

Once installed, open up the Applio-RVC-Fork folder created, and run go-applio.bat --> the WebUI should open after you choose your GPU type.

Either drag and drop the .pth and .index file of the models you want to use into their respective fields in the resources tab.

Switch to the TTS tab and click the refresh button. After that, you should be able to choose the .pth file of your model on the right side, and the TTS Method and Model on the left (Edge TTS outputs the best results as of now).

After all this, just type what your AI voice model should say, and click the Convert button.

3) Train:

First, click the Training tab and choose your dataset:

When creating a dataset, try to cut out all silent/low quality parts out of your soundfile (i did this using audacity https://www.audacity.de/ as its free and easy to use:

Before you cut though, try to select a part of silence and hit the Effect tab and choose the Noise Reduction FX -> create noise profile. After that, do the same again, but now run the preview or the FX itself after adjusting the setting appropriately.

Then, select the parts you want to cut, and then [CTRL]+[X] in order to cut that part out.

When done, export your audio file and put all your files into an dedicated folder. Copy the folders path and paste it into the WebUI -> choose 40k samplingrate and V2, then hit the big green "process data" button.

When done, extract the features (should be done pretty fast). Then you are ready to train your model.

I would hit for at least 200 Epochs, but rather choose 300. This will take some time (a few hours, and a bit electricity...)

When done, chose "save voice" as mode, and then first hit the "train feature index" button, and when finished the "save model" button.

It will create a .pth and .index file for your model -> load this into the WebUI and test out if your model needs retraining.

proof of concept (short videofile demonstrating capabilities): https://mega.nz/file/g1x0FDbI#4vT-v9IBq ... 3TNvsoLzA8

[used for postproduction to compensate the slightly overdriven highs of the mint-output: cakewalk https://www.bandlab.com/products/cakewalk + TDR Nova https://www.tokyodawn.net/tdr-nova/ + https://www.soundonsound.com/techniques ... us-effects]

Find attached one of the first models i trained as further proof of concept (Princess Miki, 300Epochs - its doing okay):
https://mega.nz/file/owQSwIpZ#28HYo16x9 ... 0NONwkVmS4

sleepyguy · Post by **sleepyguy** » Mon Dec 11, 2023 10:23 am

Nice job will look into this seems to be decent quality!

I recently started looking into https://github.com/coqui-ai/TTS over the weekend.

The demo of coqui-ai was something I saw on reddit : https://www.reddit.com/r/LocalLLaMA/com ... s_so_cool/

forbiddendesire · Post by **forbiddendesire** » Mon Dec 11, 2023 2:53 pm

Thanks for sharing! I downloaded and tested with your audio model. I spent some time trying to tweak the settings but couldn't make it sound like the real Princess Miki. However, this seems like a great tool! I've been looking for a good AI TTS software and this seems perfect. I'm going to try to see if I can generate any voices myself as well

markus · Post by **markus** » Sun Jan 07, 2024 10:02 pm

eatsl wrote: ↑Sun Dec 10, 2023 12:39 pm I wanted to share some of my findings with the community, and so here is a guide on how to create voice models, and how to make use of them:

Hi!

I've finally got myself a new GPU and the very first thing was not to test what graphics it could produce but working with those tutorials.
Trained my first (test-) model, just a few short audios with bad quality (echos, background music (which was separated really well)), ... still the result is just amazing.

All those possibilities, ... all those ideas, ... makes my head spinning!

Just wanted to drop my thousand THX for sharing your findings!

@All:
Give it a try, it's by far not that complicated to work with these things as it might look.

Best greetings,
Markus

eatsl · Post by **eatsl** » Mon Jan 08, 2024 2:43 pm

markus wrote: ↑Sun Jan 07, 2024 10:02 pm
eatsl wrote: ↑Sun Dec 10, 2023 12:39 pm I wanted to share some of my findings with the community, and so here is a guide on how to create voice models, and how to make use of them:
Hi!

I've finally got myself a new GPU and the very first thing was not to test what graphics it could produce but working with those tutorials.
Trained my first (test-) model, just a few short audios with bad quality (echos, background music (which was separated really well)), ... still the result is just amazing.

All those possibilities, ... all those ideas, ... makes my head spinning!

Just wanted to drop my thousand THX for sharing your findings!

@All:
Give it a try, it's by far not that complicated to work with these things as it might look.

Best greetings,
Markus

Its great to read that, because my hope is to make better productions possible.

About training ai models in general: its like distilling brandy, you don't want to contaminate the spirit with spoilt fruit if you want to produce a high quality product.

I would use the tensorboard (go-tensorboard.bat) to monitor the training process by watching the loss/g/total scalar and smooth it out at max - as soon as that scale goes up again, stop training.
You could also try to intrepret the norm/ scalars, but its less reliable.

Also, may use Audacity to cut out silence and reduce noise of the dataset before you train.

You may also give the model inference function a try as its able to produce way better outputs.

br

MiloSmurf · Post by **MiloSmurf** » Mon Jan 08, 2024 8:02 pm

I've been looking for AI voice generation as well. I've found Genny (genny.lovo.ai) to work pretty well. You will need to register, but a free plan is available and you can use throwaway email addresses.

You will need a one minute sample of the voice you would like to use. Upload the sample and then you can generate your voice in just a few minutes. After that, you can generate audio based on any text you type in the site. You can adjust the speed if the generated voice is speaking too slow or too fast. As long as you don't change the text, you can regenerate audio without consuming your allotted time.

You can generate up to 5 minutes of audio with a free account. If you need more than 5 minutes, just create a new account and start over with the same audio sample. This used to be 20 minutes, but unfortunately they brought it down to 5...

Downloading mp3 files is reserved for paying subscribers, but you can use Audacity to record the audio as well. You just play the audio you generated and then capture it. There are plenty of tutorials available on how to do this. Just google "record desktop audio audacity" and you'll find it.

Hope this helps!

[GUIDE] AI voice generation and training using RVC models

[GUIDE] AI voice generation and training using RVC models

Re: [GUIDE] AI voice generation and training using RVC models

Re: [GUIDE] AI voice generation and training using RVC models

Re: [GUIDE] AI voice generation and training using RVC models

Re: [GUIDE] AI voice generation and training using RVC models

Re: [GUIDE] AI voice generation and training using RVC models

Who is online