I'll be making a video about it soon but i am eepy
I'll be making a video about it soon but i am eepy
Potluck Vocoder Final Update - #DiffSinger
It's out!
github.com/mrtigermeat/...
I've included guides on how to use the lite model, and finetune your own model. Have fun, and let me know if there's any problems or questions by submitting an issue on github, or emailing me at contact@tigermeat.xyz.
New Cover: ARPK ft. TRITON for #DiffSinger
Baker released a new version of ARPK and I had to cover it!!!
Full ver: youtu.be/QT-kyQFr5Iw
The key differences I noticed:
- Potluck is a lot clearer and captures articulations a bit better.
- TGM is a lot... buzzier? Fuzzier? It has a texture that I now find undesirable lol.
The only downside to the current vocoder that I can see is that it needs fine-tuning to get good results
Here's the same USTx rendered with the same version of TIGER (unreleased v111) using the tgm_hifigan vocoder he was previously distributed with.
Potluck Vocoder sample - #DiffSinger
BSKY is letting me post videos today, yay! Here's a sample of TIGER singing "Clocks" with the new Potluck Vocoder --- sample with TGM_HIFIGAN will be in the replies.
training because of how warm the computer gets, so luckily was able to offset that. I still plan on making some donations to offset the training further, but considering I do not have any income right now, that will be put on hold until I do :)
i stil need to total up the emissions statistics for this project, but my guess is i ended up using about 50-70 kWh of energy total in all the experiments, which is the equivalent of running a microwave for 50-70 hours straight, which isn't pretty, BUT! I did get to turn my heat down/off while
--so you can continue to use my DiffSingers commercially with no issues. Otherwise, I'll be working on getting this released publically in the next few weeks, along with updates for my DiffSingers!!!
I'm really excited and glad I got this to work :)
--whos data was not included in the training, and it does not sound as good as other public vocoders. But, if you have the means to fine-tune, you will get great results.
If you have purchased a commercial license from me for my DS, contact me at the email in my bio and I'll send you a beta copy--
--- I also think it's a silly name, which is why I like it.
There will a public "plug-n-play" version of the model, but you won't get a better sound than base NSF unless you fine-tune, to which I will be releasing configurations & guides on how to do so. I tested the vocoder on a few singers--
Commercial Friendly Vocoder Update - #DiffSinger
I've finally got a result I'm super happy with, which I'll try to show off soon, and a few things:
The vocoder will be called the "Potluck" Vocoder, cuz I wouldn't have been able to make it without the help of the DS Community's contributions! ---
well I still can't get the video to upload and I don't want to put it on youtube so i'll just post an update without it
I wanna post an update about the vocoder but Bluesky won't let me upload videos rn ๐
update on this: I'm still in really rough shape. I'm working hard on myself but for now the vocoder project is on hold until I have my life a little more together. Thanks for understanding ๐งก
btw, taking a few days off from synth stuff since I had a rough few days and need to focus on myself for a bit. I'll be back soon with updates ๐
Gizmo, my white cat, laying on top of my PC
gizmo is keeping the vocoder safe during training,don't you worry
(I'm keeping an eye on my PC temps & making sure gizmo isn't getting too hot ๐)
The vocoder is just about at half way for the fine-tuning step! The results are looking fantastic so far, i'm so excited to show off the final model :DDD
it's like 92k segments ๐ณ
TrainingDatasetR2 (for the vocoder) has 140.32 hours of data. woag
woag
The fine-tuning will include the pc augmentation to allow for pitch shifting. I did not train the base model with it because that apparently doesn't work and i wasted about 20k steps of training it with that just for the output to sound like a horde of murderous bees
I chose to use LYSE for this example, because her data is not included in the vocoder, so it's kind of like a stress test, the model has a better chance of sounding good with a singer already in the data.
It's still a bit muffled, but fine-tuning should fix that. This was trained for 100k steps from scratch, and I plan on fine-tuning for about another 100k with a slightly larger dataset. I think by then, it'll sound on at least on par with the public pc-nsf-hifigan model that OpenVPI made.
#DiffSinger Commercial friendly Vocoder Update
The first round of training has been completed, and I'm super happy with the results! Please see the below example using LYSE's 1.0 VB version. The only change is that I am using my custom vocoder. More info following this post.
It's at 82k steps right now and sounds? Fantastic? Like I'm so excited. I'll show off a sample when it hits 100k steps. This is so exciting!!!
15k steps in and its already sounding promising! I'm sooo excited for this :)
a visualization of a mel-spectrogram from training a vocoder.
made a silly little goofy little cooky little mistake cuz i'm silly and little and goofy so i just had to restart it but we're so back cuz it's actually working
a screenshot of a terminal window showing a pc-nsf-hifigan vocoder training, it also includes energy consumption information.
and we're OFF!!!!!
(i obfuscated the name of my drive because it's weird)
The first part of the final vocoder training is starting today, I anticipate it taking at least a few days. Hopefully completion by the end of the month will still happen! I'm still taking data submissions as I can add more data to the second round of training (fine-tuning)
think i might take advantage of the free time i have (while not applying for jobs of course) that I might do some livestreaming. Maybe a labelling stream, or a coding stream? That seems kinda fun and chill.