Ask a Question

Prefer a chat interface with context about you and your work?

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

Several recent end-to-end text-to-speech (TTS) models enabling single-stage training and parallel sampling have been proposed, but their sample quality does not match that of two-stage TTS systems. In this work, we present a parallel end-to-end TTS method that generates more natural sounding audio than current two-stage models. Our method adopts …