Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis
Semi-supervised Training for Improving Data Efficiency in End-to-end Speech Synthesis
Although end-to-end text-to-speech (TTS) models such as Tacotron have shown excellent results, they typically require a sizable set of high-quality <;text, audio> pairs for training, which are expensive to collect. In this paper, we propose a semi-supervised training framework to improve the data efficiency of Tacotron. The idea is to …