Ask a Question

Prefer a chat interface with context about you and your work?

Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows

Improving Multi-Speaker TTS Prosody Variance with a Residual Encoder and Normalizing Flows

Text-to-speech systems recently achieved almost indistinguishable quality from human speech.However, the prosody of those systems is generally flatter than natural speech, producing samples with low expressiveness.Disentanglement of speaker id and prosody is crucial in text-to-speech systems to improve on naturalness and produce more variable syntheses.This paper proposes a new neural …