Conditional Variational Autoencoder with Adversarial Learning for
End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for
End-to-End Text-to-Speech
Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. Our method adopts …