Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Expressive paragraph text-to-speech synthesis with multi-step variational autoencoder
Neural networks have been able to generate high-quality single-sentence speech. However, it remains a challenge concerning audio-book speech synthesis due to the intra-paragraph correlation of semantic and acoustic features as well as variable styles. In this paper, we propose a highly expressive paragraph speech synthesis system with a multi-step variational …