Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech
Fine-Grained Robust Prosody Transfer for Single-Speaker Neural Text-To-Speech
We present a neural text-to-speech system for fine-grained prosody transfer from one speaker to another.Conventional approaches for end-to-end prosody transfer typically use either fixed-dimensional or variable-length prosody embedding via a secondary attention to encode the reference signal.However, when trained on a single-speaker dataset, the conventional prosody transfer systems are not …